pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-11-05 00:14:54 +08:00

Author	SHA1	Message	Date
Zafar Takhirov	ee77ccbb6d	Enabling inplace relu Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28710 Test Plan: Imported from OSS Differential Revision: D18146120 Pulled By: z-a-f fbshipit-source-id: d8f0982f5a2ae35f7deb34e67cdb64be700a9d6c	2019-11-04 15:37:02 -08:00
Jerry Zhang	f7f538566e	Quantized Tensor support copy (#28612 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28612 att Test Plan: python test/test_quantized_tensor.py Imported from OSS Differential Revision: D18255247 fbshipit-source-id: 814b12640fdf9d79b27482ee642ce430dbaeea68	2019-11-04 15:36:30 -08:00
Igor Fedan	432724b3e2	Fix torch.where to accept only tensors with same dtypes(CPU) (#29078 ) * Make zeros argument of torch.where same dtype as other argument * Added check for torch.where on CPU that both arguments have same dtype * Changes based on PR comments * Fix flake8 * Fixed test for CUDA * Changes basen on PR comments * Changes based on PR review	2019-11-04 18:32:42 -05:00
Brian Vaughan	cc98c93bf3	Tensoriterator type promotion fixes (#28961 ) * preserve original tensoriterator behavior when not explicitly promoting Summary: Cherry-picking of https://github.com/pytorch/pytorch/pull/28231 to 1.3.1 branch. Fix: https://github.com/pytorch/pytorch/issues/28010 A mixed-type index assignment that would have been an error in 1.2 was unintentionally made possible (with incorrect results) in 1.3. This PR restores the original behavior. This is BC-breaking because: ``` a = torch.ones(5, 2, dtype=torch.double) b = torch.zeros(5, dtype=torch.int) a[:, [1]] = b.unsqueeze(-1) ``` now raises an error (as in 1.2) whereas it did not in 1.3. * Compute correct strides after type promotion (#28253) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28253 Instead of trying to fix strides after changing dtypes, wait until after promotion to set them. fixes: https://github.com/pytorch/pytorch/issues/27824 fixes: https://github.com/pytorch/pytorch/issues/28502 Test Plan: Imported from OSS Differential Revision: D18124950 Pulled By: nairbv fbshipit-source-id: e4db90b2a6bb0f5d49cb388e0cd1971303c6badd	2019-10-31 23:56:18 -04:00
Vishwak Srinivasan	eb8e8c1bcf	[v1.3.1] Add polygamma and lgamma to the docs (#28964 ) * Add Polygamma to the docs (#27696) Summary: Fixes https://github.com/pytorch/pytorch/issues/25347 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27696 Differential Revision: D17916790 Pulled By: ezyang fbshipit-source-id: ac2635a300b1ef0ab437e3ffac152239754fe828 * Add documentation for torch.lgamma (#27812) Summary: Changelog: - Add doc string in _torch_docs.py, _tensor_docs.py - Expose in docs/source/torch.rst, docs/source/tensors.rst Pull Request resolved: https://github.com/pytorch/pytorch/pull/27812 Test Plan: - Remove `lgamma`, `lgamma_` from the blacklist Fixes https://github.com/pytorch/pytorch/issues/27783 Differential Revision: D17907630 Pulled By: ezyang fbshipit-source-id: 14e662a4e5262126889a437e5c4bfb21936730e8	2019-10-31 23:49:09 -04:00
Pavel Belevich	f0a4ac3ee0	argmax for half datatype fix (#28787 ) (#28915 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28787 Stack from [ghstack](https://github.com/ezyang/ghstack): * #28787 argmax for half datatype fix Test Plan: Imported from OSS Differential Revision: D18194420 Pulled By: pbelevich fbshipit-source-id: d2abec1ea8a9ce3a93aec5a2c5bba57d163197e6	2019-10-31 23:47:46 -04:00
Richard Zou	bd9766a36a	Return None correctly from `Tensor.names` (#28659 ) (#28922 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28659 Previously, we would return None from `Tensor.names` without bumping the refcount. This is a bug; the Python API requires the developer to increment the refcount on new references to None. This is because None is a singleton object and does not automatically have its reference count bumped when one uses Py_None (which is a pointer to the actual None singleton object). See the following for Python documentation on this: - https://docs.python.org/3/c-api/none.html#c.Py_RETURN_NONE - https://docs.python.org/3/extending/extending.html#back-to-the-example Fixes https://github.com/pytorch/pytorch/issues/28646 Test Plan: - New test. Differential Revision: D18140593 Pulled By: zou3519 fbshipit-source-id: 302a09021b68229e2e7b1b584b3549b30506bdab	2019-10-31 23:46:19 -04:00
Jessica Lin	80cca51adc	Update hyperlink syntax for XLA, torchaudio, torchtext, and C++ (#28022 )	2019-10-30 12:53:16 -07:00
Dmytro Dzhulgakov	9d45ee1d81	Add note that cuda quantization is not supported (#27829 ) Summary: People get confused with partial support otherwise: https://github.com/pytorch/pytorch/issues/27811 #27729 Suggestions on where else put warnings are welcomed (probably in tutorials - cc SethHWeidman ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/27829 Differential Revision: D17910931 Pulled By: dzhulgakov fbshipit-source-id: 37a169a4bef01b94be59fe62a8f641c3ec5e9b7c	2019-10-30 12:53:10 -07:00
Jessica Lin	de394b672d	Add autofunctions in torch.rst This is the v1.3.0 version of a 3 Part PR originally made to master PR: https://github.com/pytorch/pytorch/pull/27677/ Originally by @dzhulgakov	2019-10-10 09:23:22 -07:00
Jessica Lin	92c6401bb9	Include add_docsstr method in _torch_docs.py This is the v1.3.0 version of a 3 Part PR originally made to master PR: https://github.com/pytorch/pytorch/pull/27677/ originally by @dzhulgakov	2019-10-10 09:23:14 -07:00
Jessica Lin	b4f32dd292	Update to quantization Organize APIs logically in subsections. Fix typos. This is the v1.3.0 version of a 3 Part PR originally made to master PR: https://github.com/pytorch/pytorch/pull/27677/ originally by @dzhulgakov	2019-10-10 09:22:39 -07:00
Chris Gottbrath	3e451b4796	updated the list of APIs that can be used in with quantized tensors.	2019-10-10 09:22:39 -07:00
Chris Gottbrath	036a591556	capitalization changes requested by jessica	2019-10-10 09:22:39 -07:00
Chris Gottbrath	86d9ee8dee	Removed "NOTE" on the URLs.	2019-10-10 09:22:39 -07:00
Chris Gottbrath	aa44ffb4c9	added the quantization formula to the quantization doc	2019-10-10 09:22:39 -07:00
Chris Gottbrath	162b054e39	cleaning up URLs	2019-10-10 09:22:39 -07:00
Chris Gottbrath	7f044f7398	added a draft ops list from Zafar and Raghu	2019-10-10 09:22:39 -07:00
Chris Gottbrath	0c81d6ba4b	changes from Raghu about the model preparation.	2019-10-10 09:22:39 -07:00
Chris Gottbrath	d1752f2bf8	change to the URL we link to for the concept of custom ops	2019-10-10 09:22:39 -07:00
Chris Gottbrath	49fbeb8cc8	adding quantization.rst file for quantization feature This was written by Raghu, Jessica, Dmytro and myself.	2019-10-10 09:22:39 -07:00
Chris Gottbrath	f0d3fc70b4	take2: Docstring only changes in quantization, fake_quantize, and observer (#27574 ) * docstring only formatting changes in the quantize.py and fake_quantization.py files to render better in HTML. * docstring change on observer.py as well * just kind of tweaking the docstrings a bit more. * switching to r""" for the mult-line string. Per Zafar's suggestion. * trying to resolve the merge conflict soumith saw * trying to avoid a conflict when this gets merged back to master	2019-10-10 08:22:16 -07:00
Chris Gottbrath	fb489555a9	Quant other doc changes for relbranch pr (#27640 ) * Cherry picked in changes from Jessica's branch. Consolidate all quantization docs in quantization.rst. Add a link to quantization docs from torch.rst. Order quantization.rst alphabetically in index.rst * Fix Quantized reference * Add prose for Quantized Functions in the torch.nn docs * Remove Quantization section * Updates to index for v1.3.0 * Update "Package Reference" to "Python API" * Add in torchaudio and torchtext reference links so they show up across all docs not just the main page * Add "Other Languages" section, add in C++ docs, add in Javadocs * Add link to XLA docs under Notes: http://pytorch.org/xla/ * Doc tests caught that we'd somehow dropped documenting a few functions like result_type, can_cast, promote_types * Add javasphinx extension	2019-10-10 08:21:49 -07:00
Jessica Lin	b5144f1068	Add javadocs for v1.3.0 (#27656 ) * Add javadocs for v1.3.0 * Delete Tensor-Tensor_float32 because it is not public * Delete Tensor-Tensor_float64 because it is not public * Delete Tensor-Tensor_int32 because it is not public * Delete Tensor-Tensor_int64 because it is not public * Delete Tensor-Tensor_int8 because it is not public * Delete Tensor-Tensor_uint8 because it is not public * Add reference to DType and TensorImageUtils	2019-10-10 08:21:35 -07:00
Richard Zou	a5c08a6abd	Update docs CI for v1.3.0 (#27638 ) This PR updates the docs CI. After this is merged, we open a PR from 1.3.0 -> master. That open PR will build docs on this branch and push them to pytorch.github.io:site-v1.3.0. This is done in dry_run mode so the pushing won't actually happen; I will follow up with a subsequent change to drop dry_run mode after verifying that everything builds correctly.	2019-10-10 08:21:10 -07:00
Brian Vaughan	6cc759269f	add type promotion info to torch.add/mul/div docs (#27501 )	2019-10-10 08:20:44 -07:00
Soumith Chintala	6742476ba3	fix install_requires properly	2019-10-09 12:24:36 -04:00
Richard Zou	067aee5f30	Documentation for named tensors (#27573 ) `docs/source/named_tensor.rst` is the entry point; most users will land either here or the named tensor tutorial when looking to use named tensors. We should strive to make this as readable, concise, and understandable as possible. `docs/source/name_inference.rst` lists all of the name inference rules. It should be clear but it's hard to make it concise. Please let me know if anything doesn't make sense and please propose alternative wordings and/or restructuring to improve the documentation. This should ultimately get cherry-picked into the 1.3 branch as one monolithic commit so it would be good to get all necessary changes made in this PR and not have any follow ups. Test Plan: - built and reviewed locally with `cd docs/ && make html`. ghstack-source-id: dc2ca7a204f86d4849bd45673c189d5bbddcb32c Pull Request resolved: https://github.com/pytorch/pytorch/pull/27173	2019-10-09 08:52:22 -04:00
driazati	0a7f7e6d30	[jit] Set existing attributes under recursive script (#27545 ) Landing in master in #27514	2019-10-09 08:51:48 -04:00
Zafar Takhirov	e9fc91cbca	Adding docstrings for nnq.functional (#27473 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27363 Test Plan: Imported from OSS Differential Revision: D17758907 Pulled By: zafartahirov fbshipit-source-id: f560f2726cf51ceebdbf22ebef2d067422340cf2	2019-10-09 08:51:06 -04:00
Soumith Chintala	23df957e94	Revert "Mark protobuf include path as system include (#23012 )" This reverts commit a2b3403962efce151d4c447e27106f9617c52595.	2019-10-08 20:11:56 -04:00
David Reiss	a7b161c08b	Clean up JavaDoc comments in pytorch_android Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27455 Test Plan: Imported from OSS Differential Revision: D17800658 Pulled By: dreiss fbshipit-source-id: dbd01d9fa5ac82c50daf54c2869dc18be233d8dd	2019-10-08 17:01:33 -04:00
David Reiss	6bae48c127	Various cleanups to pytorch_android API (#27454 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27454 See detailed discussion at https://github.com/pytorch/pytorch/issues/27350 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D17800480 Pulled By: dreiss fbshipit-source-id: bf174e8b16231b89be771de0fa54c41e864a3eb0	2019-10-08 17:01:33 -04:00
David Reiss	c248943743	Refactor python_android test to separate Android-specific components (#27453 ) Summary: All of the test cases move into a base class that is extended by the intrumentation test and a new "HostTests" class that can be run in normal Java. (Some changes to the build script and dependencies are required before the host test can actually run.) ghstack-source-id: fe1165b513241b92c5f4a81447f5e184b3bfc75e Pull Request resolved: https://github.com/pytorch/pytorch/pull/27453 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D17800410 fbshipit-source-id: 1184f0caebdfa219f4ccd1464c67826ac0220181	2019-10-08 17:01:33 -04:00
Jiakai Liu	e058a37fe4	Modify PyTorch's integration of NNPACK to use a unified underlying thread pool implementation. (#27547 )	2019-10-08 17:00:12 -04:00
Ailing	aa7112a618	Add missing Optional annotation. (#27557 )	2019-10-08 16:55:12 -04:00
Pavel Belevich	b728ffabc3	#include <stdexcept> into flat_hash_map.h (#27480 )	2019-10-07 22:20:02 -04:00
driazati	d67898a93b	update (#27386 )	2019-10-07 22:19:18 -04:00
Ailing	9a25673478	Revert to align_corners=True as default. (#27469 )	2019-10-07 16:02:53 -04:00
Thomas Viehmann	17613ad73c	Fix native ctc_loss gradient indexing bug for large target sizes Fixes: #27442 Thank you Mohamed Yousef (@ASDen) for the report with minimal reproducing example and detailed analysis!	2019-10-07 08:58:30 -07:00
Ivan Kobzarev	beaae6a2b6	[android][torchvision] Add methods to write image tensor content to buffer (#27407 ) ghstack-source-id: fd0fc8e7d2c99d67930dd34a286020e6d47ad402 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27359	2019-10-07 01:25:56 -04:00
Raghuraman Krishnamoorthi	328f49968c	MovingAverage Observer (#27396 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27396 Observer that estimates moving averages of min and max values per batch, more suited for quantization aware training instead of minmax observers that track extremal values across batches ghstack-source-id: 91369018 Test Plan: buck test caffe2/test:quantization -- 'test_per_tensor_observers $test_quantization\.ObserverTest$' --print-passing-details buck test caffe2/test:quantization -- 'test_per_channel_observers $test_quantization\.ObserverTest$' --print-passing-details Differential Revision: D17727213 fbshipit-source-id: 024a890bf3dd0bf269d8bfe61f19871d027326f0	2019-10-06 22:22:46 -07:00
Zafar Takhirov	7f9096f868	Replacing the skip_list with white_list in the qconfig propagation Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27183 Test Plan: Imported from OSS Differential Revision: D17700548 Pulled By: zafartahirov fbshipit-source-id: 18e6ffbda496b14ac1da1783f928ad539cdb1d16	2019-10-06 22:22:46 -07:00
Supriya Rao	7e94ee235f	Avoid calling tensor.numel() in for loops (#27298 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27298 PR #26908 toggles NonVariableTypeMode in ATen dispatcher, which is where USE_STATIC_DISPATCH takes place. This causes an issue with numel() as it gets called through the dispatch mode and probably not getting inlined. Also the thread local state is expensive to read/write so many times and this kills perf. PR #27274 is another approach to fix this and has more details. Test Plan: Quantized mobilenetV2 perf before this change Main run finished. Milliseconds per iter: 28.6782. Iters per second: 34.8696 Perf after this change Main run finished. Milliseconds per iter: 22.2585. Iters per second: 44.9267 Imported from OSS Differential Revision: D17742565 fbshipit-source-id: 43c6045cc001c46916ba339555c9d809a2537eff	2019-10-06 22:22:46 -07:00
Zafar Takhirov	318fb8e8b9	Factored out the default mappings Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27164 Test Plan: Imported from OSS Differential Revision: D17694475 Pulled By: zafartahirov fbshipit-source-id: df8df5f7d66062ed35da957064a31344e1d3c961	2019-10-06 22:22:46 -07:00
James Reed	43bb1b2356	Fix reprs for _intrinsic modules Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27184 Test Plan: Imported from OSS Differential Revision: D17717481 Pulled By: jamesr66a fbshipit-source-id: 4bd72bcd42191d9b21d03f5bb6698198dbffffda	2019-10-06 22:22:46 -07:00
James Reed	87fbd27cc0	Allow set for qconfig for dynamic_quantize Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27181 Test Plan: Imported from OSS Differential Revision: D17717482 Pulled By: jamesr66a fbshipit-source-id: f3930fc87831cbdcf4390cd769c594bb13f5cd81	2019-10-06 22:22:46 -07:00
Zafar Takhirov	225c38b719	Rename _intrinsic to intrinsic Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27194 Test Plan: Imported from OSS Differential Revision: D17704957 Pulled By: zafartahirov fbshipit-source-id: 46f02d129aa77c3047b2a6c606bfadd831a6b0fc	2019-10-06 22:22:46 -07:00
Daya Khudia	8074526e7f	Enabling intra-op parallelism (#26692 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26692 Adding intra-op parallelism for qconv and qlinear. export OMP_NUM_THREADS=4 python test/test_quantized.py TestQuantizedConv.test_qconv python test/test_quantized.py TestQuantizedLinear.test_qlinear TODO: Performance numbers. ghstack-source-id: 91135613 Test Plan: export OMP_NUM_THREADS=4 python test/test_quantized.py TestQuantizedConv.test_qconv python test/test_quantized.py TestQuantizedLinear.test_qlinear Differential Revision: D17540567 fbshipit-source-id: e9962bdf0c25fd3ac4bd0673eee1edd697924406	2019-10-06 22:22:46 -07:00
Zafar Takhirov	06a866de94	Suppressing hypothesis health check for qnnpack_add Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27193 Test Plan: Imported from OSS Differential Revision: D17704958 Pulled By: zafartahirov fbshipit-source-id: d8ab58b724cce2f5130b10ead0f10f5f32e26cfb	2019-10-06 22:22:46 -07:00
Raghuraman Krishnamoorthi	9b22a55499	Handle uninitialized min/max values in histogram observer (#27151 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27151 We need to be ab le to handle observers with no min/max data correctly as models sometimes have modules that do not get any data. ghstack-source-id: 91113403 Test Plan: buck test caffe2/test:quantization -- test_minmax_observer buck test caffe2/test:quantization -- test_per_channel_minmax_observer buck test caffe2/test:quantization --test_histogram_observer Reviewed By: csummersea Differential Revision: D17690828 fbshipit-source-id: e95709333ea0f66d79ddb8141b7cba5a83347dbd	2019-10-06 22:22:46 -07:00
Supriya Rao	f8d3eac4c3	Unify quantized conv and linear tests (#26992 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26992 Run the same test for FBGEMM and QNNPACK backends. Checks that QNNPACK or FBGEMM are supported before running it (using supported_qengines) Test Plan: python test/test_quantized.py TestQuantizedLinear python test/test_quantized.py TestQuantizedConv python test/test_quantized_models.py python test/test_quantized_nn_mods.py Imported from OSS Differential Revision: D17689171 fbshipit-source-id: e11c0a5e41f5f4e6836a614a5b61e4db3c5e384b	2019-10-06 22:22:46 -07:00
Jianyu Huang	68b4d22da7	Uninitialize the accumulation buffer to save some overhead (#27005 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27005 Similar to https://github.com/pytorch/pytorch/pull/27002, we want to save some overhead. ghstack-source-id: 91046563 Test Plan: CI Differential Revision: D17641819 fbshipit-source-id: 9320919242a48f48532035e61d9844de671d39af	2019-10-06 22:22:46 -07:00
Raghuraman Krishnamoorthi	66b73b0950	Fuse module enhancements (#26457 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26457 Enhancement to fuse module to support sequentials, fuse list can now be just like the state dict. Also add support for Conv-Relu and linear-relu fusion Also support inplace and out of place fusion of models. ghstack-source-id: 91076386 Test Plan: buck test caffe2/test:quantization -- 'test_fusion_sequential_model_train $test_quantization\.FusionTest$' --print-passing-details buck test caffe2/test:quantization -- 'test_fusion_sequential_model_eval $test_quantization\.FusionTest$' --print-passing-details Differential Revision: D17466382 fbshipit-source-id: 0a548f8f4c366f3ecc59db693bac725ccd62328e	2019-10-06 22:22:46 -07:00
Raghuraman Krishnamoorthi	32eb3b8d7b	Add control for observers in Fake-quantize module (#27113 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27113 Fix bug in fake quant control of observer and fake-quantize operations. Add test to ensure that features work as expected ghstack-source-id: 91071181 Test Plan: buck test mode/dev-nosan caffe2/test:fake_quant -- test_fake_quant_control Differential Revision: D17678875 fbshipit-source-id: 2912ad8b6e674daa1d129f7a7c6f27d8c1b4f93b	2019-10-06 22:22:46 -07:00
Raghuraman Krishnamoorthi	5a2a34cd2d	Support for add relu functional module (#26612 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26612 Add support for add relu functional module, this allows for fusion of add and relu quantized operations ghstack-source-id: 91055976 Test Plan: buck test caffe2/test:quantization -- 'test_functional_module $test_quantization\.FunctionalModuleTest$' --print-passing-details Differential Revision: D17518268 fbshipit-source-id: e1e8b4655d6b32405863ab9d1c7da111fb4343cc	2019-10-06 22:22:46 -07:00
Raghuraman Krishnamoorthi	a8083e18e8	Default observer and fake-quant for backends (#26627 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26627 ghstack-source-id: 91008337 Test Plan: buck test caffe2/test:quantization -- --print-passing-details Differential Revision: D17518194 fbshipit-source-id: 1eb8a7a85dc811c4ee5228d68563abb157613ceb	2019-10-06 22:22:46 -07:00
Raghuraman Krishnamoorthi	bc3fb36ed7	Emulate weight and activation only quant with fake quant, numerics test (#26625 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26625 ghstack-source-id: 91008296 Test Plan: buck test caffe2/test:quantized -- 'test_weight_only_activation_only_fakequant $test_quantized_models\.ModelNumerics$' --print-passing-details Differential Revision: D17520342 fbshipit-source-id: 26e148d3299afcfdfb1187aff6ab80687ed8df47	2019-10-06 22:22:46 -07:00
Raghuraman Krishnamoorthi	e0822f1089	Quantization aware training: Freeze batch norm support (#26624 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26624 For QAT we need to be able to control batch norm for all modules from the top. Adding helper functions to enable/disable batch norm freezing during training ghstack-source-id: 91008297 Test Plan: buck test caffe2/test:quantization -- --print-passing-details Differential Revision: D17512199 fbshipit-source-id: f7b981e2b1966ab01c4dbb161030177274a998b6	2019-10-06 22:22:46 -07:00
Raghuraman Krishnamoorthi	cbfd4e05e9	Per channel fake quant (#26623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26623 Per-channel fake quant cpu and cuda operators, per-channel support in fake quant module, tests for per-channel fake-quant and serializability of fake quant modules ghstack-source-id: 91008299 ghstack-source-id: 91008299 Test Plan: buck test mode/dev caffe2/test:fake_quant -- Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/1970324848875929 ✓ caffe2/test:fake_quant - test_backward_per_tensor (test_fake_quant.TestFakeQuantizePerTensor) 0.242 1/10 (passed) ✓ caffe2/test:fake_quant - test_numerical_consistency_per_tensor (test_fake_quant.TestFakeQuantizePerTensor) 0.204 2/10 (passed) ✓ caffe2/test:fake_quant - test_fq_serializable (test_fake_quant.TestFakeQuantizePerTensor) 0.174 3/10 (passed) ✓ caffe2/test:fake_quant - test_numerical_consistency_per_channel (test_fake_quant.TestFakeQuantizePerChannel) 0.279 4/10 (passed) ✓ caffe2/test:fake_quant - test_forward_per_tensor (test_fake_quant.TestFakeQuantizePerTensor) 0.241 5/10 (passed) ✓ caffe2/test:fake_quant - test_forward_per_channel (test_fake_quant.TestFakeQuantizePerChannel) 0.353 6/10 (passed) ✓ caffe2/test:fake_quant - test_fq_module (test_fake_quant.TestFakeQuantizePerTensor) 0.354 7/10 (passed) ✓ caffe2/test:fake_quant - test_backward_per_channel (test_fake_quant.TestFakeQuantizePerChannel) 0.334 8/10 (passed) ✓ caffe2/test:fake_quant - test_fq_serializable (test_fake_quant.TestFakeQuantizePerChannel) 0.168 9/10 (passed) ✓ caffe2/test:fake_quant - test_fq_module (test_fake_quant.TestFakeQuantizePerChannel) 0.429 10/10 (passed) ✓ caffe2/test:fake_quant - main 0.000 (passed) Differential Revision: D17439406 fbshipit-source-id: 64bfff5e4f40bc2ab8af2b432c7bc33805418077	2019-10-06 22:22:46 -07:00
James Reed	b9a2c8ac5c	Improve repr for quantized modules Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27008 Test Plan: Imported from OSS Differential Revision: D17649174 Pulled By: jamesr66a fbshipit-source-id: e3e6c4bb31e1ad8ed1ebe27f803f90d564ecfe53	2019-10-06 22:22:46 -07:00
Raghuraman Krishnamoorthi	6d7a73c0da	Per-channel baseline (#26516 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26516 ghstack-source-id: 90982010 Test Plan: Integrate per-channel support into conv and linear modules. The following tests pass: buck test caffe2/test:quantized -- 'test_linear_api $test_quantized_nn_mods\.ModuleAPITest$' --print-passing-details buck test caffe2/test:quantized -- 'test_conv_api $test_quantized_nn_mods\.ModuleAPITest$' --print-passing-details buck test caffe2/test:quantized -- 'test_float_quant_compare_per_channel $test_quantized_models\.ModelNumerics$' --print-passing-details Differential Revision: D17342622 fbshipit-source-id: f0d618928e3d9348672c589a6b7a47049c372a2e	2019-10-06 22:22:46 -07:00
James Reed	15e4827617	Dont zero out buffers in dynamic linear (#27002 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27002 This was taking a significant amount of time in my benchmarks with larger output sizes (e.g. final output projection in a language classification model) Test Plan: Imported from OSS Differential Revision: D17641765 Pulled By: jamesr66a fbshipit-source-id: b0ef30767eec9774fc503bb51fed039222026bba	2019-10-06 22:22:46 -07:00
Soumith Chintala	024fa34700	fix AvgPool2d for 2^31-1 sized inputs, and get test_cuda_kernel_loop_overflow_large to working state	2019-10-06 22:05:27 -07:00
Nick Korovaiko	95d2c7fc98	fix segfault when printing error msg for list comp (#27398 ) * fix segfault when printing error msg for list comp * simplify error msg printing	2019-10-06 23:07:54 -04:00
Richard Zou	7ba2baee00	Make `align_to` method-only. (#27304 ) (#27367 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27304 The ellipsis version of `align_to` only works if it is called as a method. To prevent any confusion, this PR disables `torch.align_to` (but keeps `Tensor.align_to`. Test Plan: - [namedtensor ci] Differential Revision: D17743809 Pulled By: zou3519 fbshipit-source-id: cf5c53dcf45ba244f61bb1e00e4853de5db6c241	2019-10-06 23:07:07 -04:00
Jiakai Liu	ccf3a6de3d	add AutoNonVariableTypeMode for USE_STATIC_DISPATCH on JIT->ATen path (#27274 ) (#27321 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27274 This is yet another fix to address #26764. PR #26908 toggles NonVariableTypeMode in ATen dispatcher, which is where USE_STATIC_DISPATCH takes place thus it's most logically sound place to do such tweaks. However, we observed nontrivial perf regression due to this fix. Turns out the numel() tensor method gets called in several for-loops thus incurs ~7M thread_local updates in a single forward call: ``` 7173330 numel 558 size 416 q_scale 302 _empty_affine_quantized 288 contiguous 257 q_zero_point 216 qscheme 173 empty 110 set_ 105 as_strided 104 permute ... ``` As numel() is not called from a single place so a natural workaround is to update function_wrapper.py so that it only adds the guard on gen_namespace_function() case and ignore the gen_tensor_method() case. But some tensor methods are actually being called from JIT side directly (e.g. "aten::eq_" -> "(self).eq_") so the only "band aid" left on the table is to insert guard on JIT->aten path as originally did on #26868 - this is a simplified version of it as it doesn't hurt to extend the NonVariableMode scope a little bit to also cover stack drop/pack calls. On Android we only expose JIT API so we don't need worry about TensorMethods being called directly. On iOS we don't provide a wrapper yet but we can mention this caveat in the doc. Hopefully by the time it's widely used we can finish Variable/Tensor unification and remove all these hacks. Test Plan: - Verified it runs quantized/fp32 MobileNetV2 models; - Verified it fixes the perf regression (revert #26908 separately); Differential Revision: D17732489 Pulled By: ljk53 fbshipit-source-id: c14ca66aebc6b6f17ad6efac7ca47f9487c98de5	2019-10-06 23:06:22 -04:00
iurii zdebskyi	1ba6fc4ca6	Fixed Error message for tensor.align_to (#27221 ) (#27250 ) Summary: Fixing this [issue1](https://github.com/pytorch/pytorch/issues/27074) and [issue2](https://github.com/pytorch/pytorch/issues/27073) Tested via unit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/27221 Differential Revision: D17716235 Pulled By: izdeby fbshipit-source-id: c7bafd16b469c91924ebc3dba77ca56424d4c93c	2019-10-06 23:05:33 -04:00
iurii zdebskyi	d4d4bf5686	Enabled comparison ops with named tensors (#27162 ) (#27249 ) Summary: Fixing this [issue](https://github.com/pytorch/pytorch/issues/27077). Tested via unit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/27162 Differential Revision: D17694187 Pulled By: izdeby fbshipit-source-id: 939017c91605c89a0e08e0c3f8fe21de93bba95b	2019-10-06 23:04:42 -04:00
Lu Fang	cee965fae9	Fix ONNX Interpolate (#27233 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27179 Reviewed By: hl475 Differential Revision: D17698364 Pulled By: houseroad fbshipit-source-id: 8fddd1c13e7af026962cf2d9c05fd7c957d8526e	2019-10-06 23:02:27 -04:00
eellison	544c16cdbf	make class types callable (#26743 ) (#27226 ) Summary: Allowing invoking of a UDT if they have a `__call__` method Fix for https://github.com/pytorch/pytorch/issues/26725 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26743 Differential Revision: D17677795 Pulled By: eellison fbshipit-source-id: 0ceb6088e22c4689e0735fdb9e07418a75603486	2019-10-06 23:01:53 -04:00
driazati	494a5563b4	[jit] Fix `toIValue` dict iteration (#27112 )	2019-10-06 23:01:20 -04:00
Ivan Kobzarev	ba4c3a1c2c	Module method destroy (#27111 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27090 Test Plan: Imported from OSS Differential Revision: D17674096 Pulled By: IvanKobzarev fbshipit-source-id: d1c0db3797730bff90db83259a38904e71f7941d	2019-10-06 23:00:32 -04:00
Pieter Noordhuis	c556f9052f	Bump gloo (#27087 ) Includes a bugfix for the uv transport used on macOS. See https://github.com/facebookincubator/gloo/pull/220 for details.	2019-10-06 22:59:52 -04:00
Richard Zou	5c80dd3c1f	Turn on named tensor testing for v1.3.0 (#27084 ) Previously, we would only test named tensors if: 1) we built with BUILD_NAMEDTENSOR=1 2) TEST_NAMEDTENSOR=1 is in the environment. This PR makes it so that we ALWAYS test named tensors. This is OK because all the release binaries should be able to run the named tensor tests and be green; otherwise, there is something wrong.	2019-10-06 22:59:19 -04:00
Wanchao	2d8ee11139	[jit] Serializing autograd ops into its own namespace (#27079 ) Summary: This PR serialize autograd ops into its own namespace by turning the serialization op name into torch.autograd.op, this is to keep the original code namespace rather than turning all to the global namespace, this will be more properly handled in the future when we handle the module namespace. This change also preserve BC until we have namespace handling Test Plan: Reviewers: Subscribers: Tasks: Tags:	2019-10-06 22:58:36 -04:00
Ailing	ebc2519bec	Serialize XLA Tensor (#27042 )	2019-10-06 22:56:59 -04:00
ngimel	8c9e4b250d	make cudnn rnn respect current stream (#27044 )	2019-10-06 22:55:54 -04:00
Soumith Chintala	6a6f047fc6	fix pytorch_linux_xenial_py3_6_gcc5_4_build for release branch	2019-10-06 19:38:14 -07:00
Junjie Bai	deadc27c23	Update to ROCm 2.8 (#27337 ) Summary: New docker images built with tag 324. Related jenkins changes: `83ec813357` `aa235a14c8` Triggered CI runs: https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-devtoolset7-rocmrpm-centos7.5-trigger-test/48682/ https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-trigger/55638/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/27337 Differential Revision: D17753827 Pulled By: bddppq fbshipit-source-id: 2c3f77b0b7c680013c7cc6d7953fe0da4922fe48	2019-10-04 16:32:05 -04:00
Tao Xu	6276fda119	Fix circle CI Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27307 Test Plan: Imported from OSS Differential Revision: D17746444 Pulled By: xta0 fbshipit-source-id: ed37f91921f1ea7db6c63ba69f04883856341c39	2019-10-04 16:31:54 -04:00
Edward Yang	65ee8f2c23	Provide (but skip) 3.5 job by default on all PRs. (#27293 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27293 This doesn't turn on 3.5 signal, but it makes it so that [test all] will include it if you do request it. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17738741 Pulled By: ezyang fbshipit-source-id: 2b1af4d7bf26fd84a593fde292d6bfa2aabc1148	2019-10-04 16:31:44 -04:00
Edward Yang	6126cfab2c	Report docker push / pull time (#26861 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26861 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17712801 Pulled By: ezyang fbshipit-source-id: 504594452e6594d79e41856ce5177ab370dc26f1	2019-10-04 16:31:36 -04:00
Edward Yang	e2f6fed611	Don't apply should_run to the nightly/postnightly branches. (#27061 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27061 Previously the cronjobs were run on master, but now the nightly builds count as "PRs" so we must whitelist them from should_run calculation. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17669066 Pulled By: ezyang fbshipit-source-id: 3b92bf1d09aefa7ef524ea93dfa8c6f566161887	2019-10-04 16:31:25 -04:00
Lu Fang	667deb92f7	Turn Caffe2 CUDA 9.1 + py2 to CUDA 10.1 + py3 (#26835 ) Summary: For TensorRT test introduced in https://github.com/pytorch/pytorch/pull/26426 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26835 Reviewed By: hl475 Differential Revision: D17580108 Pulled By: houseroad fbshipit-source-id: c57fafec228b78c26b8a7946c92ad7434425bbd4	2019-10-04 16:31:16 -04:00
Soumith Chintala	0e88de5580	fix OSX CI build (#27373 ) Summary: fix OSX caffe2 CI build, attempt 1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27373 Differential Revision: D17768461 Pulled By: soumith fbshipit-source-id: b0a076c07382327730b5d86b8a00f5388c368b5e	2019-10-04 16:28:24 -04:00
Vitaly Fedyunin	3c8ce2a57e	Make nonzero non differentiable as it supposed to be (#26980 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/26038 Somewhere between v1.1 and master `nonzero` become `abstract` and was marked as differentiable (by mistake) we need to but them into TH section of `tools/autograd/derivatives.yaml ` to fix it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26980 Differential Revision: D17632276 Pulled By: VitalyFedyunin fbshipit-source-id: d6cabcc53348af6148cea5a1bd1af2ef12547373	2019-10-04 10:59:55 -07:00
Orion Reblitz-Richardson	f2080fb3f2	[tensorboard] Add method add_hparams to API doc (#27349 )	2019-10-04 02:12:36 -04:00
Ivan Kobzarev	84afb7b0c1	[android][1.3.0] gradle.properties version bump (#27275 )	2019-10-04 01:13:53 -04:00
Hong Xu	b6e976ae2d	Work around a gcc-7 bug in building Debug version of Sleef (#26993 ) (#27160 ) Summary: We always build the Release version of Sleef on gcc 7. Sep 26 02:59:19 cd /var/lib/jenkins/cpp-build/caffe2/build/sleef/src/libm && /opt/cache/bin/cc -DDORENAME=1 -DENABLE_ALIAS=1 -DENABLE_BUILTIN_MATH=1 -DENABLE_PUREC_SCALAR=1 -DENABLE_SYS_getrandom=1 -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DIDEEP_USE_MKL -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DSLEEF_STATIC_LIBS=1 -DTH_BLAS_MKL -D_FILE_OFFSET_BITS=64 -I/var/lib/jenkins/cpp-build/caffe2/build/aten/src -I/var/lib/jenkins/workspace/aten/src -I/var/lib/jenkins/cpp-build/caffe2/build -I/var/lib/jenkins/workspace -isystem /var/lib/jenkins/cpp-build/caffe2/build/third_party/gloo -isystem /var/lib/jenkins/workspace/cmake/../third_party/gloo -isystem /var/lib/jenkins/workspace/cmake/../third_party/googletest/googlemock/include -isystem /var/lib/jenkins/workspace/cmake/../third_party/googletest/googletest/include -isystem /var/lib/jenkins/workspace/third_party/protobuf/src -isystem /opt/python/2.7.9/include -isystem /var/lib/jenkins/workspace/third_party/gemmlowp -isystem /var/lib/jenkins/workspace/third_party/neon2sse -I/var/lib/jenkins/workspace/cmake/../third_party/benchmark/include -isystem /var/lib/jenkins/workspace/third_party -isystem /var/lib/jenkins/workspace/cmake/../third_party/eigen -isystem /var/lib/jenkins/workspace/torch/include -isystem /opt/rocm/hip/include -isystem /include -I/var/lib/jenkins/cpp-build/caffe2/build/caffe2/contrib/aten -I/var/lib/jenkins/workspace/third_party/onnx -I/var/lib/jenkins/cpp-build/caffe2/build/third_party/onnx -I/var/lib/jenkins/workspace/third_party/foxi -I/var/lib/jenkins/cpp-build/caffe2/build/third_party/foxi -isystem /var/lib/jenkins/workspace/third_party/ideep/include -I/var/lib/jenkins/workspace/third_party/NNPACK/include -I/var/lib/jenkins/workspace/third_party/NNPACK/src -I/var/lib/jenkins/workspace/third_party/cpuinfo/include -I/var/lib/jenkins/workspace/third_party/pthreadpool/include -I/var/lib/jenkins/workspace/third_party/FXdiv/include -I/var/lib/jenkins/workspace/third_party/psimd/include -I/var/lib/jenkins/workspace/third_party/FP16/include -I/var/lib/jenkins/workspace/third_party/sleef/src/common -I/var/lib/jenkins/workspace/third_party/sleef/src/arch -I/var/lib/jenkins/cpp-build/caffe2/build/sleef/src/libm/include -I/var/lib/jenkins/workspace/third_party/sleef/src/libm -Wall -Wno-unused -Wno-attributes -Wno-unused-result -Wno-psabi -ffp-contract=off -fno-math-errno -fno-trapping-math -g -O1 -fPIC -DCAFFE2_USE_GLOO -DHAVE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD -std=gnu99 -o CMakeFiles/sleefpurec_scalar.dir/sleefsimdsp.c.o -c /var/lib/jenkins/workspace/third_party/sleef/src/libm/sleefsimdsp.c Sep 26 02:59:20 /var/lib/jenkins/workspace/third_party/sleef/src/libm/sleefsimdsp.c: In function 'gammafk': Sep 26 02:59:20 /var/lib/jenkins/workspace/third_party/sleef/src/libm/sleefsimdsp.c:3103:1: internal compiler error: in trunc_int_for_mode, at explow.c:55 Sep 26 02:59:20 } Sep 26 02:59:20 ^ Sep 26 02:59:20 Please submit a full bug report, Sep 26 02:59:20 with preprocessed source if appropriate. Sep 26 02:59:20 See <file:///usr/share/doc/gcc-7/README.Bugs> for instructions. Sep 26 02:59:20 sleef/src/libm/CMakeFiles/sleefpurec_scalar.dir/build.make:67: recipe for target 'sleef/src/libm/CMakeFiles/sleefpurec_scalar.dir/sleefsimdsp.c.o' failed Sep 26 02:59:20 make[2]: Leaving directory '/var/lib/jenkins/cpp-build/caffe2/build' Also updated Sleef submodule to include fixes that are missed in https://github.com/pytorch/pytorch/issues/26749 https://github.com/pytorch/pytorch/issues/26994 provides a potentially cleaner fix Close https://github.com/pytorch/pytorch/issues/26892 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26993 Differential Revision: D17669103 Pulled By: ezyang fbshipit-source-id: 1b87a4a8fecc6441de3b008aee6929537768be1a	2019-10-04 01:06:11 -04:00
Tao Xu	8626a1cc81	Update the link for iOS demo app in README.md (#27145 ) Summary: Update the link for iOS demo app in README.md Pull Request resolved: https://github.com/pytorch/pytorch/pull/27145 Differential Revision: D17746591 Pulled By: xta0 fbshipit-source-id: 6f49a0daddc8b79804e1b8487ba1db3807a3f481	2019-10-03 22:05:08 -07:00
peterjc123	831566ec90	Fixed seek offset size to 64bit. (#27125 for 1.3.0) (#27069 ) * Fixed seek offset size to 64bit. (#27047) Summary: Fixes https://github.com/pytorch/pytorch/issues/26998 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27047 Differential Revision: D17666050 Pulled By: ezyang fbshipit-source-id: f02ebd5320ae25f8949be20d0744fe3cd3e2fee9 (cherry picked from commit 1afe3fc01eb194a3e7ce58240462de2121646233) * Use _lseeki64 instead for MSVC (cherry picked from commit f49f78d4c89b42474b3357a10de76d179b383e2c)	2019-10-04 01:03:59 -04:00
peterjc123	f7b3b20457	Fix Windows CI (#27120 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27031 Differential Revision: D17665998 Pulled By: ezyang fbshipit-source-id: 6926e304c75ba878520627f1e829412f633b1bec	2019-10-04 01:03:02 -04:00
Lu Fang	8ce38cf27d	Resubmit [pytorch][PR] [ONNX] Updating producer_version in exported ONNX models to PyTorch 1.3. (#27049 )	2019-10-04 00:56:57 -04:00
Mikhail Zolotukhin	a94f9c7246	Fix race condition in Function::optimized_graph(). (#27323 ) The current logic is buggy, and will fail in the following situation: Thread A: check optimized_graph_, it is empty. Thread A: claim the mutex in order to initialize optimized_graph_. Thread A: copy graph_ into optimized_graph_. Thread A: start running optimizations on optimized_graph_. Thread B: check optimized_graph_, it is not empty. Thread B: start using optimized_graph_. BUG: Thread B is using the graph while it's still being mutated by Thread A. [ghstack-poisoned]	2019-10-04 00:54:59 -04:00
Vishwak Srinivasan	2fc3bb8571	Remove outdated note in cholesky_solve and triangular_solve doc strings (#27018 ) We do support inputs with dim > 2 in _out variants	2019-10-04 00:36:42 -04:00
Jiakai Liu	f694d4d872	move parallel_for/parallel_reduce common implementation to cpp (#26969 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26969 template got inflated into many places. This PR extracted out common implementation that doesn't depend on template param. After: Compressed ARMv7 AAR size: 5,677,469->5,398,011 RAW libpytorch.so size: 16,862,108->16,047,004 Test Plan: - Test perf/correctness as #26702; - Run tests for non-mobile native aten_threading: ``` ATEN_THREADING=NATIVE python setup.py develop --cmake pytest -s -v test/test_torch.py::TestTorch pytest -s -v test/test_jit.py ``` Differential Revision: D17628089 Pulled By: ljk53 fbshipit-source-id: 987d1f28174870384d6642d0bd4912b138348f66	2019-10-03 21:35:13 -07:00
Igor Fedan	d7b6d945eb	Fix test_overwrite_module_params_on_conversion_cpu_cuda after type promotion introduced for comparison ops (#27066 )	2019-10-03 16:18:01 -04:00
Jerry Zhang	09f0e949cd	PyTorch Graph Mode Quantization API (#26390 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26390 `quantize_script`: top level API for graph mode quantization Test Plan: there are some known issues, we can enable test after all known issues are fixed. Imported from OSS Differential Revision: D17645132 fbshipit-source-id: 61f261d5607409d493b39a2f4e05ebd017279f6b	2019-09-27 19:23:51 -07:00
Dmytro Dzhulgakov	da93cc5c2a	Fix race condition in torch::jit::Function (#27009 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27009 JIT can be called concurrently from two threads, so even the read from GraphExecutor has to be guarded by the lock. This was a recent regression introduced by https://github.com/pytorch/pytorch/pull/26571/files#diff-40af5094abe4f522e8a78adb591dde19 Reviewed By: jamesr66a, wanchaol Differential Revision: D17645407 fbshipit-source-id: f0a3a5d6d8ced04e043bdc56f4263f91d6189be1	2019-09-27 18:44:45 -07:00
Jerry Zhang	f8db764f6c	Remove unimplmented passes (#26978 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26978 We can add them later if there is a need. Test Plan: ci Imported from OSS Differential Revision: D17643009 fbshipit-source-id: 053ec65c4acc03371aab4760793282682f039933	2019-09-27 18:33:46 -07:00
Jerry Zhang	766767652a	Move patterns in QuantFusion to a separate file (#26848 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26848 att Test Plan: ci Imported from OSS Differential Revision: D17636399 fbshipit-source-id: 7a2bc99a5dd7120c3b7de2adc72c772cb0759066	2019-09-27 18:10:57 -07:00
Jerry Zhang	5e79b5b1c7	Move some class/functions in test_jit.py to jit_utils.py (#26839 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26839 att Test Plan: ci Imported from OSS Differential Revision: D17643010 fbshipit-source-id: 5768b70410b7bdfdbee734d3a00296e5b1ad30d5	2019-09-27 18:07:24 -07:00
Jerry Zhang	b0f1b5c757	Add QuantFusion to graph_executor (#26591 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26591 att Test Plan: . Imported from OSS Differential Revision: D17636651 fbshipit-source-id: 85f3fba1ac0f890622f8c3d8bfb1894de5c050e0	2019-09-27 18:01:18 -07:00
Igor Fedan	541de7e140	Migrate le/gt/ge/eq/ne from the TH to Aten. Added support of type promotion. (#26981 ) Summary: https://github.com/pytorch/pytorch/issues/24606 Migrate ne and ne_ from the TH to Aten (CUDA) https://github.com/pytorch/pytorch/issues/24740 Migrate ne and ne_ from the TH to Aten (CPU) https://github.com/pytorch/pytorch/issues/24573 Migrate gt and gt_ from the TH to Aten (CUDA) https://github.com/pytorch/pytorch/issues/24709 Migrate gt and gt_ from the TH to Aten (CPU) https://github.com/pytorch/pytorch/issues/24556 Migrate eq and eq_ from the TH to Aten (CUDA) https://github.com/pytorch/pytorch/issues/24696 Migrate eq and eq_ from the TH to Aten (CPU) https://github.com/pytorch/pytorch/issues/24568 Migrate ge and ge_ from the TH to Aten (CUDA) https://github.com/pytorch/pytorch/issues/24703 Migrate ge and ge_ from the TH to Aten (CPU) https://github.com/pytorch/pytorch/issues/24582 Migrate le and le_ from the TH to Aten (CUDA) https://github.com/pytorch/pytorch/issues/24719 Migrate le and le_ from the TH to Aten (CPU) Performance characteristics are similar to https://github.com/pytorch/pytorch/issues/25998 This PR migrates comparison ops from TH to ATen and adds type promotion in the same way as in https://github.com/pytorch/pytorch/issues/25998 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26981 Differential Revision: D17635651 Pulled By: ifedan fbshipit-source-id: 6ec7615207f5c248a6dd85fc54c25bd5e6d328e6	2019-09-27 17:28:56 -07:00
Elias Ellison	ff8b7ef63d	fix range for non-int inputs and pow implementation (#26926 ) Summary: Previously we did not throw if an input to `range` was a non-integer. We also typed the result from `int int` as an integer but returned a float value. The return type should be a float, because if the exponent is negative `int int` returns a float. Batching these two PRs together because it is easier to land and we're almost at the branch cut. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26926 Differential Revision: D17643039 Pulled By: eellison fbshipit-source-id: b49203a9d420417e1307bbb653d2e33cd9e530e3	2019-09-27 17:14:23 -07:00
Peter Bell	9080f1c5dd	Rewrite argmax and argmin as TensorIterator reductions (#26181 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/8817 This rewrites `argmax` and `argmin` to use `TensorIterator` as suggested by ngimel in https://github.com/pytorch/pytorch/issues/8817. To support this, the reduction operation is now passed the index along with the current element. I also had to change a few places where the input and output tensor `dtype`s were assumed to be the same. Unfortunatley, this isn't enough to reimplement the variants of `min` and `max` that return indices. There are several places where multiple tensor outputs are assumed to all have the same `dtype` and so returning `pair<scalar_t, int64_t>` for `ops.project` isn't possible. #### Performance Results Edit: These timings are invalid, see below for a better perf comparison Timings reported by [`argmax.py`](https://gist.github.com/SsnL/6898c240d22faa91da16fc41359756a2): ``` cuda : 0.1432 cpu : 26.976 numpy: 2.1350 ``` So, the `TensorIterator` reductions are much faster on the GPU but significantly slower on the CPU. `htop` shows the cpu kernel using 4 cores for the cpu reduction so it's not clear what the issue is there. Should I just revert to the old implementation on CPU or is it worth investigating further? I see that other `TensorIterator` cpu reductions are similarly faster in `numpy` e.g. `max`, `mean` `std`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26181 Differential Revision: D17631979 Pulled By: pbelevich fbshipit-source-id: 58424818ef32cef031d436cb6191e9a6ca478581	2019-09-27 16:58:55 -07:00
Brian Vaughan	0c6a18de8d	Add torch.promote_types function Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26655 Test Plan: Imported from OSS Differential Revision: D17556196 Pulled By: nairbv fbshipit-source-id: eeebce8968bfb2ffd25c066595bc19e5dee6ea6f	2019-09-27 16:48:38 -07:00
Ying Zhang	024a422f41	Add fakefp16 transformation. Summary: ATT. Reviewed By: hyuen Differential Revision: D17559866 fbshipit-source-id: 58e3de97d00f20a9b5556e35504c520926d43cbd	2019-09-27 16:46:03 -07:00
Supriya Rao	aa0b28428c	Add optimized quantize function for ARM (#26867 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26867 Use caffe2::Int8Quantize for pytorch mobile. Currently this is only implemented for uint8 tensors and runs using NEON intrinsics. For all other cases it falls back to naive pytorch quantize_val implementation. Previously, naive implementation of quantize_val is slow on mobile, taking up more than 50% of the execution time. Results Before aten::quantize_per_tensor 42.893 ms Total model runtime 70.5ms After aten::quantize_per_tensor 0.340 ms Total model runtime 27.5ms Test Plan: Tested current python tests work python test/test_quantized.py TestQNNPackOps Also tested using quantized mobilenetV2 on mobile and compared output Imported from OSS Differential Revision: D17638732 fbshipit-source-id: 76445d1e415e6e502d05ba5b900e5e1d875fc1b0	2019-09-27 16:43:16 -07:00
Elliott Clark	ad58045af9	Remove LOG(INFO) from math_cpu.cc (#27001 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27001 This unconditional log line spams the logs enough that it's a drag on cpu and will eventually fill up logs. Test Plan: Allow unit test and automated testing to give feedback. Reviewed By: jspark1105 Differential Revision: D17638140 fbshipit-source-id: 4e8a44bda31327ba7e797f7579a9e3bf866eef7e	2019-09-27 16:37:49 -07:00
Hong Xu	6d715c9e79	Bring back the optimization of integer.pow({2.0, 3.0}) on CPU (#26938 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26938 They were accidentally removed in #26020 Test Plan: Imported from OSS Differential Revision: D17632120 Pulled By: pbelevich fbshipit-source-id: d62f2b5635fb4976fd4eda2f2015fdf67138a0c0	2019-09-27 16:35:04 -07:00
Vikas Mehta	3a18e2e768	support re-creating/destroying process groups when some trainers recover after failures (#26912 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26912 group name is used as prefix in the c10d store and without a consistent name process group cannot be initialized. When process group doesn't have an explicit name (only WORLD (default) process group can have an explicit name), we use global _group_counter to generate the name. We need to reset the counter on destruction to allow consistent value to be generated when we re-create process groups after some trainers recover from failure. Test Plan: existing tests passed Reviewed By: mrshenli Differential Revision: D17594268 fbshipit-source-id: 17f4d2746584dadaa5d468085d871ff3e95a1c84	2019-09-27 16:16:58 -07:00
Supriya Rao	250f482aa5	Support qadd_relu on pytorch mobile (#26982 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26982 Fused add+relu support Test Plan: python test/test_quantized.py TestQNNPackOps.test_qnnpack_add Also, Add torch.backends.quantized.engine = "qnnpack" Ran python test/test_quantized.py TestQuantizedOps.test_qadd_relu_different_qparams python test/test_quantized.py TestQuantizedOps.test_qadd_relu_same_qparams Imported from OSS Differential Revision: D17635063 fbshipit-source-id: dd1cdf07f66c4cd657c1907f1b650e50d3d4725f	2019-09-27 16:13:42 -07:00
James Reed	b518ff3cb8	Re-write of tensor-scalar mul Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26937 Test Plan: Imported from OSS Differential Revision: D17618028 Pulled By: jamesr66a fbshipit-source-id: 90ef461972e826327a19467ad4cefdeb35e13adc	2019-09-27 16:09:27 -07:00
Summer Deng	91a0eb7cc5	Add int8 resize nearest 3d op in DNNLOWP (#26063 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26063 As title Test Plan: buck test mode/opt caffe2/caffe2/quantization/server:resize_nearest_3d_dnnlowp_op_test Reviewed By: protonu, amylittleyang Differential Revision: D17330625 fbshipit-source-id: 137b1faa86b4346512c49ee5d163ca1d75c1accd	2019-09-27 15:53:27 -07:00
Summer Deng	646b69b3d0	Xray image inference on multi-cpu and dumping dnnlowp tensors (#22537 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22537 Enable multi-CPU model evaluation; Dump intermediate tensors in conv dnnlowp operators for debugging Test Plan: Local run and dump tensors: ``` buck run mode/opt experimental/summerdeng/xray_image:test_net_quantization -- --model_path=/mnt/public/summerdeng/xray_image/models/oct_resnext101_50.mdl --batch_size=1 --test_max_images=100 --octave_conv --octave_conv_ratio=0.5 --output_dir=/mnt/public/summerdeng/xray_image/output --num_cpus=4 --caffe2_dnnlowp_dump_tensors ``` Dumped .mtx files can be found here: /mnt/public/summerdeng/xray_image/dump_tensors Histogram plots can be found here: https://our.intern.facebook.com/intern/anp/view/?id=112033 Example flow runs for model evaluation: f124056759 Evaluating fp32 Oct-ResNext101 with 16 cpus ``` fry flow-cpu --resource '{"cpu_core": 16}' --binary-type local --name [quantization_eval]oct_resnext101_0.5_fp32_16cpu --disable-source-snapshot true --distribute-to-local-dir="/mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/xray_image/output" --flow-entitlement gpu_prod ~/fbsource/fbcode/buck-out/gen/experimental/summerdeng/xray_image/test_net_quantization.par --test_data="/mnt/vol/gfsai-oregon/ai-group/users/zyan3/octconv/xray_v11_annotation_data_fullfeat_32x4_dedup_split_05202019_posonly_labeled_2018_05_29_test.csv" --batch_size=1 --model_path="/mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/xray_image/oct_resnext101_50.mdl" --octave_conv --octave_conv_ratio=0.5 --test_max_images=-1 --output_dir="/mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/xray_image/output" --num_cpus=16 ``` f124275053 Evaluating int8 Oct-ResNext101 with 16 cpus ``` fry flow-cpu --resource '{"cpu_core": 16}' --binary-type local --name [quantization_eval]oct_resnext101_0.5_int8_nongroupwise_l2approx --disable-source-snapshot true --distribute-to-local-dir="/mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/xray_image/output" --flow-entitlement gpu_prod ~/fbsource/fbcode/buck-out/gen/experimental/summerdeng/xray_image/test_net_quantization.par --test_data="/mnt/vol/gfsai-oregon/ai-group/users/zyan3/octconv/xray_v11_annotation_data_fullfeat_32x4_dedup_split_05202019_posonly_labeled_2018_05_29_test.csv" --batch_size=1 --model_path="/mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/xray_image/oct_resnext101_50.mdl" --octave_conv --octave_conv_ratio=0.5 --test_max_images=-1 --int8_model_saved --int8_model_type="mdl" --output_dir="/mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/xray_image/output" --int8_model_mdl_name="int8_oct_resnext101_50_nongroupwise_l2approx.mdl" --num_cpus=16 ``` Reviewed By: stephenyan1231 Differential Revision: D16106577 fbshipit-source-id: 9de359f2afe7f9a7722ae404f0d9aeca1d9c3c75	2019-09-27 15:53:23 -07:00
Summer Deng	ee68c512c5	Add P99 method with configurable thresholds Summary: Update the P99 quantization method with configurable thresholds. Add dnnlowp options for the configuration. Test Plan: buck run mode/opt experimental/summerdeng/xray_image:test_net_quantization -- --model_path=/mnt/public/summerdeng/xray_image/models/oct_resnext101_50_2B_pretrained.mdl --batch_size=1 --test_max_images=100 --octave_conv --octave_conv_ratio=0.5 --output_dir=/mnt/public/summerdeng/xray_image/output --quantize --histogram_file=/mnt/public/summerdeng/xray_image/activation_histograms/oct_resnext101_50_2B_pretrained_hist_200k_compiled.txt --int8_model_type="mdl" --int8_model_mdl_name="int8_oct_resnext101_50_2B_l2_nongroupwise.mdl" --skip_first_conv --weight_quant="l2" --activation_quant="p99" --activation_p99_threshold=0.999 --measure_quantization_error Reviewed By: amylittleyang Differential Revision: D16626158 fbshipit-source-id: 7718dcf429f73aa54e82a6b6f6e631d94e3a134c	2019-09-27 15:53:20 -07:00
Karl Ostmo	55a358546f	Revert D17631902: [pytorch][PR] [ONNX] Updating producer_version in exported ONNX models to PyTorch 1.3. Test Plan: revert-hammer Differential Revision: D17631902 Original commit changeset: 6d5896465740 fbshipit-source-id: ebf9e5e1c582027dbba2db68328ea4136a974c6b	2019-09-27 15:49:36 -07:00
Pavel Belevich	5aa01fd89a	C++ API parity: AdaptiveMaxPool3d Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26775 Test Plan: Imported from OSS Differential Revision: D17627824 Pulled By: pbelevich fbshipit-source-id: c4ae077ea5575c5d1df795e74a0dcb74a695ad06	2019-09-27 15:31:37 -07:00
Jerry Zhang	2afa5fe112	Better error message for calculate_qparams (#26985 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26985 Produce better error message when `calculate_qparams` doesn't return something we expect. It should return a Tuple of two tensors. Test Plan: ci Imported from OSS Differential Revision: D17636252 fbshipit-source-id: 6caee48134f46d2f25dec3fa655e99c15043a67f	2019-09-27 15:28:26 -07:00
Zino Benaissa	23260f3e7d	Add logging in constant propagation pass Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26653 Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D17621895 Pulled By: bzinodev fbshipit-source-id: eda7df423a995590fd50052424891b6d04277882	2019-09-27 15:24:42 -07:00
Ivan Kobzarev	3e480f8fb8	Fix fbjni packaging, exclude for publishing, include by default (#26995 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26995 Fix current setup, exclude fbjni - we can not use independently pytorch_android:package, for example for testing `gradle pytorch_android:cAT` But for publishing it works as pytorch_android has dep on fbjni that will be also published For other cases - we have 2 fbjni.so - one from native build (CMakeLists.txt does add_subdirectory(fbjni_dir)), and from dependency ':fbjni' We need both of them as ':fbjni' also contains java classes As a fix: keep excluding for publishing tasks (bintrayUpload, uploadArchives), but else - pickFirst (as we have 2 sources of fbjni.so) # Testing gradle cAT works, fbjni.so included gradle bintrayUpload (dryRun==true) - no fbjni.so Test Plan: Imported from OSS Differential Revision: D17637775 Pulled By: IvanKobzarev fbshipit-source-id: edda56ba555678272249fe7018c1f3a8e179947c	2019-09-27 15:21:26 -07:00
Jiakai Liu	38f7a51cf2	add AutoNonVariableTypeMode guard on JIT->ATen boundary Summary: - This PR together with #26908 attempt to address issue #26764 (`Issue 1` mentioned below). - Current flow without USE_STATIC_DISPATCH (for server build): ``` S1. jit::load() a. JIT calls variable_factories.h methods to instantiate variable instances. b. JIT calls some ATen methods during intitalization, e.g.: conv_prepack, q_scale. b.1 First runs corresponding `Operation` in generated register_aten_ops_xxx.cpp, which calls `at::` functions, then calls ATen dispatcher. b.2 ATen dispatcher dispatches to corresponding VariableType methods. b.3 VariableType method uses `AutoNonVariableTypeMode` guard before calling into ATen implementation, as ATen generally expects `CHECK(!is_variable())`. b.4 VariableType method uses `as_variable` to wrap the results. x. Somewhere in JIT it expects `CHECK(is_variable())` - not sure before/after S1.a / S1.b. S2. module::forward() a. JIT interpreter calls some ATen methods (via JIT registry). a.1 - a.4: same as S1.b.1 - S1.b.4. x. Different from S1.x, seems JIT doesn't expect `CHECK(is_variable())` during the entire `forward()` call. ``` - Current flow with USE_STATIC_DISPATCH (for mobile build): ``` M1. jit::load() a. JIT calls variable_factories.h methods to instantiate variable instances. b. JIT calls some ATen methods during intitalization, e.g.: conv_prepack, q_scale. b.1 First runs corresponding `Operation` in generated register_aten_ops_xxx.cpp, which calls `at::` functions, then calls ATen dispatcher. b.2 ATen dispatcher dispatches to corresponding ATen implementation directly. // Issue 1: NO VariableType methods / `AutoNonVariableTypeMode` so `CHECK(!is_variable())` in ATen will fail! // (Hypothetical) Issue 2: NO `as_variable()` to wrap result as variable. M1.x will fail if is ever used to check this result. x. Somewhere in JIT it expects `CHECK(is_variable())` - not sure before/after M1.a / M1.b. M2. module::forward() // PR #26477 wraps this call with `AutoNonVariableTypeMode` guard. a. JIT interpreter calls some ATen methods (via JIT registry). a.1 same as M1.b.1, calls into register_aten_ops_xxx.cpp. a.2 same as M1.b.2, calls ATen implementation directly. // `CHECK(!is_variable())` in ATen won't fail thanks to the outer scope `AutoNonVariableTypeMode` guard. x. Same as above, seems JIT never expects `CHECK(is_variable())` during the entire `forward()` call. ``` - Wrong solution: if we wrap M1 with `AutoNonVariableTypeMode`, it will solve `Issue 1` for some models but will fail M1.x for some other models. - Proposed solution: I feel the root cause is that mobile build doesn't have `VariableType` as a barrier sitting between JIT and ATen to convert between is_variable() and !is_variable(). Without `VariableType` the best alternative place to put a barrier is M1.b.2 as Edward did in #26908. For some reason we also need toggle variable state for c10 ops: this is what this PR does. We haven't figured how non-mobile build works without this logic so it's kinda bandaid for now. This PR doesn't try to address (Hypothetical) Issue 2 as I haven't seen it. PR #26477 can be replaced by #26908 + this PR but we can keep it until M2.x is no longer true. - Ultimate solution: After Variable and Tensor are completely merged: #23032 then is_variable() checks can be changed to requires_grad() checks and all problems will be solved. We can clean up these hacks by then. - References: * Effect of `AutoNonVariableTypeMode`: all `is_variable()` inside current thread scope returns false: https://github.com/pytorch/pytorch/blob/master/c10/core/TensorImpl.h#L811 * Effect of `as_variable`: https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/VariableTypeUtils.h#L159 It calls `make_variable`: https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/variable.h#L539 Test Plan: - Load and run MobileNetV2 fp32 & int8 models. Differential Revision: D17595179 Pulled By: ljk53 fbshipit-source-id: ed417ba6b696d722ea04fe18adf6b38ababa6b7c	2019-09-27 15:17:27 -07:00
Shen Li	a625734f6a	Acquire GIL before creating py::object in RPC python handler Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26988 Test Plan: Imported from OSS Differential Revision: D17635297 Pulled By: mrshenli fbshipit-source-id: 43c93e44fe0dceba9a41a292c53a665c612843e9	2019-09-27 15:13:32 -07:00
Karl Ostmo	baa227b410	Revert D17579439: Add std::variant backport as torch::variant Test Plan: revert-hammer Differential Revision: D17579439 Original commit changeset: 6416521047f5 fbshipit-source-id: 0a57bef5d1d2d5366f84fcfa52b3968e01802164	2019-09-27 14:31:50 -07:00
Richard Zou	290405321a	Better named tensor error messages. (#26974 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26974 Suggest `Tensor.rename` to rename tensors and/or drop names on named tensors. Test Plan: - [namedtensor ci] Differential Revision: D17628950 Pulled By: zou3519 fbshipit-source-id: b701f46c46093046691eace698be8282d049d37a	2019-09-27 14:12:36 -07:00
Spandan Tiwari	6b3c0c1f22	Updating producer_version in exported ONNX models to PyTorch 1.3. (#26976 ) Summary: Bumping up the `producer_version` in exported ONNX models in view of the next release. Updating tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26976 Reviewed By: hl475 Differential Revision: D17631902 Pulled By: houseroad fbshipit-source-id: 6d58964657402ac23963c49c07fcc813386aabf0	2019-09-27 13:50:24 -07:00
Dmytro Dzhulgakov	0ae0c9788e	Fix misuages for TORCH_CHECK/TORCH_INTERNAL_ASSERT with string (#26897 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26897 TORCH_INTERNAL_ASSERT("foo") doesn't do what you think it does :) I'll try to do a fix to catch it in the compiler, but for now - let's fix usages Found them using regex: ``` ag --cpp "TORCH_(CHECK\|INTERNAL_ASSERT)\([ \n]*\"" --multiline ``` Test Plan: Imported from OSS Differential Revision: D17624299 Pulled By: dzhulgakov fbshipit-source-id: 74f05737ef598fd92b5e61541ee36de2405df23d	2019-09-27 13:45:19 -07:00
Dmytro Dzhulgakov	764bf826e3	Remove fbgemm_is_cpu_supported in favor of torch.backends.quantized.supported_qengines (#26840 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26840 Cleaning up top-level namespace. Also cosmetic changes to torch.backends.quantized Test Plan: Imported from OSS Differential Revision: D17604403 Pulled By: dzhulgakov fbshipit-source-id: c55af277ea7319d962a82a6120f65ccd47a60abc	2019-09-27 13:45:15 -07:00
Vincent Quenneville-Belair	e4fba752cb	fix type annotation Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26930 Test Plan: Imported from OSS Differential Revision: D17614745 Pulled By: vincentqb fbshipit-source-id: 1c29543f74d9cf307e9665aa890b4830b886fe63	2019-09-27 13:39:36 -07:00
Richard Zou	2cdfec6b24	Make named tensor implementations more robust (#26968 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26968 To make implementations of an operator more robust, we should have a separate "named area" where name propagation happens and an "unnamed area" where the implementation is. Right now, many functions are implemented without an "unnamed area". The problem with that is that if someone modifies the implementation, it is very easy to break namedtensor support by using a helper function that does not propagate names correctly. The test coverage for named tensors is also insufficient to catch such breakages. This PR modifies some named tensor implementations to have separate "named area" and "unnamed area". The following implementations were changed: - dropout, softmax, log_softmax, bernoulli - dot, mm, addmm, addmv, mv Test Plan: - [namedtensor ci] Differential Revision: D17627920 Pulled By: zou3519 fbshipit-source-id: 9300ac3962219b1fcd8c4c8705a2cea6f8c9d23d	2019-09-27 13:25:41 -07:00
Sebastian Messmer	95a08c5b95	Add documentation for overload names (#23844 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23844 - ghstack-source-id: 90941095 Test Plan: testinprod Differential Revision: D16660167 fbshipit-source-id: 504b57535156bfeba62396aca7f6a431d8233b7a	2019-09-27 13:15:52 -07:00
Jiakai Liu	486305066a	fix mobile.sh build (#26975 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26975 ExportModule doesn't exist in mobile libtorch.a, it doesn't fail for regular mobile build guess _save_for_mobile was stripped altogether. But for host toolchain with different linker flag this will fail. Add #if macro as Module::save. Test Plan: - scripts/build_mobile.sh works; Differential Revision: D17629869 Pulled By: ljk53 fbshipit-source-id: 7d3cebe0a7c3f7b56928eb5a9d9c9174403fe6e5	2019-09-27 12:52:33 -07:00
Simran Suresh Motwani	d63d7ab997	Expose PiecewiseLinearTransform to PyTorch Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26903 Test Plan: Unit Test Reviewed By: bddppq Differential Revision: D17585637 fbshipit-source-id: fe669aaf3301d7efb5c28ec0097945d55a71773d	2019-09-27 12:49:04 -07:00
Will Feng	71011211c1	Add std::variant backport as torch::variant Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26836 Test Plan: Imported from OSS Differential Revision: D17579439 Pulled By: yf225 fbshipit-source-id: 6416521047f5b93c01514e3cd153c9abc3ad3417	2019-09-27 12:44:13 -07:00
Pavel Belevich	bb7a415bcc	C++ API parity: AdaptiveMaxPool2d Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26772 Test Plan: Imported from OSS Differential Revision: D17627823 Pulled By: pbelevich fbshipit-source-id: 195f1edabbbbe245de3568beb0c7925eb347118a	2019-09-27 12:41:38 -07:00
Richard Zou	3ad1bbe16a	Named tensor support for: index_fill_, index_fill, squeeze, median(Tensor) (#26914 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26914 Also added dimname overloads for index_fill_ and squeeze. Test Plan: - [namedtensor ci] Differential Revision: D17609136 Pulled By: zou3519 fbshipit-source-id: 29c7ad52ffe24e0b3ad679111fee7a78eca7acdf	2019-09-27 12:28:49 -07:00
Will Feng	2f1932fc5c	Fix issues in torch::tensor constructor (#26890 ) Summary: This PR contains the following: 1. Fix ambiguous overload problem when `torch::tensor({{1, 2}})` is used: ``` ../test/cpp/api/tensor.cpp: In member function ‘virtual void TensorTest_MultidimTensorCtor_Test::TestBody()’: ../test/cpp/api/tensor.cpp:202:41: error: call of overloaded ‘tensor(<brace-enclosed initializer list>)’ is ambiguous auto tensor = torch::tensor({{1, 2}}); ^ In file included from ../caffe2/../torch/csrc/api/include/torch/types.h:7:0, from ../caffe2/../torch/csrc/api/include/torch/detail/static.h:4, from ../caffe2/../torch/csrc/api/include/torch/nn/pimpl.h:4, from ../caffe2/../torch/csrc/api/include/torch/nn/module.h:3, from ../caffe2/../torch/csrc/api/include/torch/nn/cloneable.h:3, from ../test/cpp/api/support.h:7, from ../test/cpp/api/tensor.cpp:2: ../torch/csrc/autograd/generated/variable_factories.h:177:644: note: candidate: at::Tensor torch::tensor(c10::ArrayRef<unsigned char>) ../torch/csrc/autograd/generated/variable_factories.h:177:1603: note: candidate: at::Tensor torch::tensor(c10::ArrayRef<signed char>) ../torch/csrc/autograd/generated/variable_factories.h:177:2562: note: candidate: at::Tensor torch::tensor(c10::ArrayRef<short int>) ../torch/csrc/autograd/generated/variable_factories.h:177:3507: note: candidate: at::Tensor torch::tensor(c10::ArrayRef<int>) ../torch/csrc/autograd/generated/variable_factories.h:177:4450: note: candidate: at::Tensor torch::tensor(c10::ArrayRef<long int>) ../torch/csrc/autograd/generated/variable_factories.h:177:5404: note: candidate: at::Tensor torch::tensor(c10::ArrayRef<float>) ../torch/csrc/autograd/generated/variable_factories.h:177:6354: note: candidate: at::Tensor torch::tensor(c10::ArrayRef<double>) ../torch/csrc/autograd/generated/variable_factories.h:177:7630: note: candidate: at::Tensor torch::tensor(c10::ArrayRef<bool>) ../torch/csrc/autograd/generated/variable_factories.h:177:9224: note: candidate: at::Tensor torch::tensor(c10::ArrayRef<c10::Half>) ../torch/csrc/autograd/generated/variable_factories.h:177:10838: note: candidate: at::Tensor torch::tensor(c10::ArrayRef<c10::BFloat16>) In file included from ../caffe2/../torch/csrc/api/include/torch/types.h:7:0, from ../caffe2/../torch/csrc/api/include/torch/detail/static.h:4, from ../caffe2/../torch/csrc/api/include/torch/nn/pimpl.h:4, from ../caffe2/../torch/csrc/api/include/torch/nn/module.h:3, from ../caffe2/../torch/csrc/api/include/torch/nn/cloneable.h:3, from ../test/cpp/api/support.h:7, from ../test/cpp/api/tensor.cpp:2: ../torch/csrc/autograd/generated/variable_factories.h:193:19: note: candidate: at::Tensor torch::tensor(torch::detail::InitListTensor) inline at::Tensor tensor(detail::InitListTensor list_init_tensor) { ^ ``` After this PR, the multidim tensor constructor `torch::tensor(...)` should be ready for general use. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26890 Differential Revision: D17632608 Pulled By: yf225 fbshipit-source-id: 2e653d4ad85729d052328a124004d64994bec782	2019-09-27 12:07:50 -07:00
Xiaomeng Yang	f77b295edc	Disable cudnn transpose for int types (#26934 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26934 Disable cudnn transpose for int types Did experiment with int + 4d/5d Test Plan: buck test mode/dev-nosan caffe2/caffe2/python/operator_test:utility_ops_test Reviewed By: houseroad Differential Revision: D17607176 fbshipit-source-id: 83b9f9cf654b33d68b657f1b5a17d9bbd06df529	2019-09-27 11:36:10 -07:00
Raghuraman Krishnamoorthi	8fa9900c28	control of observer/fake-quant operations (#26520 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26520 Hooks to enable control of observer and fake quant that can be used by model.apply() to control fake quant during QAT ghstack-source-id: 90897063 Test Plan: buck test caffe2/test:quantization -- --print-passing-details Differential Revision: D17491155 fbshipit-source-id: 80ff0d7a1ac35c96e054b4f0165a73c56c2f53cc	2019-09-27 11:01:34 -07:00
Edward Yang	b2e43e4a2e	Fix all factory invocations in quantized to correctly propagate options. (#26966 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26966 Without this, you may allocate intermediates which are non-variables when you should allocate variables. Should help with discussion in #26868. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17629863 Pulled By: ezyang fbshipit-source-id: 0dd9b218d3fc2dbbbbd9b1712db8ab4dac16ea22	2019-09-27 10:43:54 -07:00
Raghuraman Krishnamoorthi	102a148641	Default histogram observer (#26622 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26622 ghstack-source-id: 90897064 Test Plan: buck test caffe2/test:quantization -- --print-passing-details Differential Revision: D17508787 fbshipit-source-id: ae733ab35ec9b0233264014b8054d4d870fb05e1	2019-09-27 10:39:21 -07:00
Natalia Gimelshein	6bf6788158	make repeat respect the current stream (#26946 ) Summary: Kernel launch did not have the stream argument. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26946 Test Plan: should be covered by current tests Differential Revision: D17629397 Pulled By: ngimel fbshipit-source-id: f91a72d0908b5672c6df045c9df49bf1d48a5ac9	2019-09-27 10:24:27 -07:00
Lingyi Liu	428204dfa4	Fix the QuantizedAVX2 build issue (#26854 ) Summary: The QuantizedAVx2 does not support the int32 type. We switch to use at::quantize_vec function instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26854 Differential Revision: D17609872 Pulled By: llyfacebook fbshipit-source-id: b4a77d93ce0ebfef696506b5cdbe3e91fe44bb36	2019-09-27 10:20:26 -07:00
Raghuraman Krishnamoorthi	b0a2f6f2f5	Serialization and range reduction support for Fake Quant/Observer (#26519 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26519 ghstack-source-id: 90895631 Test Plan: buck test caffe2/test:quantization -- 'test_histogram_observer $test_quantization\.ObserverTest$' --print-passing-details and buck test caffe2/test:fake_quant -- 'test_fq_serializable $test_fake_quant\.TestFakeQuantizePerTensorAffine$' --print-passing-details Differential Revision: D17217408 fbshipit-source-id: 0da7efdcdae0c065dd035c5dd2b6a78231545ece	2019-09-27 10:09:39 -07:00
Will Feng	3acbcb96d4	Include `iteration_` in SGD optimizer serialization (#26906 ) Summary: This PR fixes https://github.com/pytorch/pytorch/issues/24192 by including the private field `iteration_` in SGD optimizer serialization. Under the hood, `iteration_` is serialized into an `IValue`, then stored in a JIT module as an attribute. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26906 Differential Revision: D17628359 Pulled By: yf225 fbshipit-source-id: beec1367459e973a1c9080dc86f502e4c7bc5ebd	2019-09-27 09:37:20 -07:00
Pavel Belevich	0a393f6ef5	C++ API parity: AdaptiveMaxPool1d Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26755 Test Plan: Imported from OSS Differential Revision: D17627828 Pulled By: pbelevich fbshipit-source-id: f898a4d2c269b98eb5905291914caa25bca87ce0	2019-09-27 09:10:39 -07:00
Raghuraman Krishnamoorthi	9a5e2e80b8	Fake quantization enhancements for QAT/PTQ support- fix tests (#26876 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26876 Add ability to turn fake quantization and observers independently. ghstack-source-id: 90892132 Test Plan: buck test caffe2/test:quantized -- 'test_conv_bn_relu $test_qat\.IntrinsicQATModuleTest$' --print-passing-details Differential Revision: D17592961 fbshipit-source-id: 24c60c94ed7c6c9fa55c634a8545731614e4f52f	2019-09-27 08:59:29 -07:00
Brian Vaughan	2a43b74196	Add torch.can_cast(from, to) function (#26805 ) Summary: https://github.com/pytorch/pytorch/issues/25472 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26805 Differential Revision: D17628434 Pulled By: nairbv fbshipit-source-id: 6af8031ac3afda1505d338075c0637ad043f8b7e	2019-09-27 08:40:34 -07:00
Edward Yang	76a76a6cb9	Switch nightly jobs to trigger on 'nightly' branch rather than cron. (#26830 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26830 Fixes #26817 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17608535 Pulled By: ezyang fbshipit-source-id: 18b47af508bd606391b1e6436cefe586b9926ace	2019-09-27 07:25:19 -07:00
Lu Fang	8ec0414053	Automatic update of fbcode/onnx to 034921bd574cc84906b7996c07873454b7dd4135 (#26955 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26955 Previous import was ab6b94203c595f74b1f126eb118eef22e4c05a57 Included changes: - [034921bd](https://github.com/onnx/onnx/commit/034921bd): Fix warnings (#2358) <Changming Sun> - [2873fea8](https://github.com/onnx/onnx/commit/2873fea8): Fix spec and shape inference for Unsqueeze op (#2347) <Hariharan Seshadri> - [a3c91452](https://github.com/onnx/onnx/commit/a3c91452): Bump NMS version for avoiding regression in existing models (#2348) <Wei-Sheng Chin> Test Plan: ci Reviewed By: hl475 Differential Revision: D17623703 fbshipit-source-id: 2abc610ed6786680a622ade4a82594469d10f917	2019-09-27 03:31:32 -07:00
Sebastian Messmer	b60656bb0c	Move Generator ops to c10 (#26434 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26434 ghstack-source-id: 90902124 Test Plan: unit tests Differential Revision: D17465434 fbshipit-source-id: 469206d44e328c19008daf2f6a323dcd1ac97984	2019-09-27 02:05:07 -07:00
Shihao Xu	f01ae84bc1	RPC Backend Registry (#26919 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26919 Make the names more systematic ghstack-source-id: 90870882 Differential Revision: D17262797 fbshipit-source-id: 5a2e513a0d0cca5b699b40cbf530f51776392a2a	2019-09-27 01:11:39 -07:00
svcscm	e2ef49b559	Updating submodules Summary: GitHub commits: `b6b5955e72` `99ef0247c0` `017ffed361` `acb696352e` `209a420612` `7dfeddb5ba` Test Plan: n/a Reviewed By: cdelahousse fbshipit-source-id: 751c4f1b52cb58d481c84c621a305480a258787d	2019-09-27 00:35:22 -07:00
Negin Raoof	6b9bcd0606	export baddbmm (#26901 ) Summary: Adding symbolic for baddbmm export Pull Request resolved: https://github.com/pytorch/pytorch/pull/26901 Reviewed By: hl475 Differential Revision: D17620967 Pulled By: houseroad fbshipit-source-id: 3931dff5a4afdcb4a45d967fb0efaf84029c16e5	2019-09-26 22:53:21 -07:00
Lara Haidar	614edfce81	Add Support to Dicts and Strings in ONNX for Inputs and Outputs (#25889 ) Summary: ONNX does not support dictionaries for inputs and output. The reason is that the arg flattening and unflattening does not handle Dictionary types. This PR adds flattening/unflattening support for dictionaries and strings. However this feature should be handled with caution for input dictionaries; and users need to verify their dict inputs carefully, and keep in mind that dynamic lookups are not available. This PR will allow exporting cases where models have dictionnary outputs (detection and segmentation models in torchvision), and where dictionary inputs are used for model configurations (MultiScaleRoiAlign in torchvision). Pull Request resolved: https://github.com/pytorch/pytorch/pull/25889 Reviewed By: hl475 Differential Revision: D17613605 Pulled By: houseroad fbshipit-source-id: c62da4f35e5dc2aa23a85dfd5e2e11f63e9174db	2019-09-26 22:31:09 -07:00
Lu Fang	7163bfdf58	Fix the weird bug in control_flow_op_test.py (#26931 ) Summary: In some version of python, then_net and else_net may switch the order. Let's make sure we are iterating the right arg node. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26931 Reviewed By: hl475 Differential Revision: D17614829 Pulled By: houseroad fbshipit-source-id: 3f1b4eb91ecf4d808f58c34896d3e628aa2e0af0	2019-09-26 20:44:03 -07:00
svcscm	3b1b45898e	Updating submodules Summary: GitHub commits: `e33f2fe68f` `f25f6f4101` `8c5eacf758` `ae45835703` `661db3896e` `aa25d200c1` `ad7794b41e` `bc23c7482b` Test Plan: n/a Reviewed By: cdelahousse fbshipit-source-id: fe12edaf711ddaa40c9a04dfb103905e7ed6603f	2019-09-26 18:23:31 -07:00
Jongsoo Park	7e95439e9f	batch size 0 tests for etc DNNLOWP operators (#26877 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26877 Add batch_size == 0 testings of other DNNLOWP operators not covered by the other diffs. Test Plan: CI Reviewed By: jianyuh Differential Revision: D17596315 fbshipit-source-id: ddf5325f422402cafacbef9114314d92c49fc284	2019-09-26 17:41:33 -07:00
Jongsoo Park	492660768f	use new depthwise conv fbgemm interface (#26898 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26898 This diff removes call sites using the old depth-wise conv fbgemm interface in Caffe2. Test Plan: CI Reviewed By: dskhudia Differential Revision: D17515368 fbshipit-source-id: 7200cf12ddac1103402e690596c58f378f95b1e9	2019-09-26 17:33:04 -07:00
Lu Fang	257b61495e	Revert D17610292: [pytorch][PR] Choose num_threads in parallel_for based on GRAIN_SIZE Test Plan: revert-hammer Differential Revision: D17610292 Original commit changeset: 60b9fe4b0eec fbshipit-source-id: cfa0be39eef5bf306ef128c134f86a135bb3d5c9	2019-09-26 17:16:18 -07:00
Sebastian Messmer	092b2f7fee	Make TypeDefault, TypeDerived and VariableType anonymous namespaces (#26882 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26882 Reduce binary size by 500kb by making TypeDerived and VariableType anonymous namespaces instead of classes. TypeDefault is also a namespace now but can't be anonymous because VariableType calls into it.his also has the nice side effect that VariableType.h and ${TypeDerived.h} are much smaller because they don't have to list the operator declarations anymore. ghstack-source-id: 90865080 Test Plan: Measure libtorch.so size Differential Revision: D17599686 fbshipit-source-id: da3c6641060b7410a7808f36a0a18ee3246ce2d2	2019-09-26 16:59:04 -07:00
Sebastian Messmer	771bcce6f1	Fix binary size in schema inference (#26878 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26878 Before, for each function signature used in one or more ops, there's a template instantiation that creates the FunctionSchema object for it. As we've seen in the past, all these vector<> constructors in the FunctionSchema object take quite some binary size. With this PR, we now create an intermediate constexpr std::array that has minimal binary size and can be embedded into the executable, then at runtime we will run a small piece of code that constructs the vector<>'s from it. This reduces libtorch.so binary size by 800kb ghstack-source-id: 90842811 Test Plan: measure libtorch.so size Differential Revision: D17597752 fbshipit-source-id: 53442b565a7747c0d0384b2e3b845729c3daddfd	2019-09-26 16:59:00 -07:00
Sebastian Messmer	54b66c8c20	Fix shared_ptr binary size in op registration (#26869 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26869 Having a lot of shared_ptr<Functor> cost us ~1.1MB of binary size in libtorch.so. This PR fixes that. ghstack-source-id: 90842812 Test Plan: measure libtorch.so size Differential Revision: D17595674 fbshipit-source-id: 05151047ee8e85c05205b7510a33915ba98bab58	2019-09-26 16:58:56 -07:00
Sebastian Messmer	1a5d641de3	Improve binary size of function schema inference (#26860 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26860 This improves libtorch.so size by 100-200kb ghstack-source-id: 90842815 Test Plan: measure libtorch.so size Differential Revision: D17593224 fbshipit-source-id: effbb5f3b7690b67edaabacf2ff9292a73c991a4	2019-09-26 16:58:52 -07:00
Sebastian Messmer	84e298e7b3	Fix c10 registration binary size (#26827 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26827 The templates there had a binary size impact of ~20MB. This PR fixes that. ghstack-source-id: 90842814 Test Plan: build it and see binary size of libtorch.so go down from 95MB to 70MB. Differential Revision: D17566642 fbshipit-source-id: 57bebffce8e036675a452434bc1a9733f5f2cf6d	2019-09-26 16:58:48 -07:00
Jiakai Liu	a6eec839ea	use parallel_for in DepthwiseConvKernel (#26879 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26879 Integrate with the at::parallel_for API for mobile. Test Plan: - Verified numerical results are the same as before. - Benchmarked depthwise3x3_winograd layers in MobileNetV2 on two devices: ``` +-------------------+----------------+--------+-----------+----------+------------+-----------+ \| Input \| Kernel \| Groups \| S9 Single \| S9 Multi \| OP5 Single \| OP5 Multi \| +-------------------+----------------+--------+-----------+----------+------------+-----------+ \| [1, 32, 112, 112] \| [32, 1, 3, 3] \| 32 \| 6796 \| 1676 \| 8520 \| 5361 \| \| [1, 144, 56, 56] \| [144, 1, 3, 3] \| 144 \| 8004 \| 5523 \| 9591 \| 4157 \| \| [1, 192, 28, 28] \| [192, 1, 3, 3] \| 192 \| 2771 \| 730 \| 3345 \| 1436 \| \| [1, 192, 28, 28] \| [192, 1, 3, 3] \| 192 \| 2688 \| 730 \| 3358 \| 1979 \| \| [1, 384, 14, 14] \| [384, 1, 3, 3] \| 384 \| 1641 \| 461 \| 1895 \| 874 \| \| [1, 384, 14, 14] \| [384, 1, 3, 3] \| 384 \| 1765 \| 444 \| 1914 \| 870 \| \| [1, 384, 14, 14] \| [384, 1, 3, 3] \| 384 \| 1636 \| 448 \| 1896 \| 852 \| \| [1, 384, 14, 14] \| [384, 1, 3, 3] \| 384 \| 1639 \| 452 \| 1964 \| 1010 \| \| [1, 576, 14, 14] \| [576, 1, 3, 3] \| 576 \| 2575 \| 677 \| 2854 \| 1274 \| \| [1, 576, 14, 14] \| [576, 1, 3, 3] \| 576 \| 2595 \| 749 \| 2836 \| 1291 \| \| [1, 960, 7, 7] \| [960, 1, 3, 3] \| 960 \| 1586 \| 432 \| 1714 \| 675 \| \| [1, 960, 7, 7] \| [960, 1, 3, 3] \| 960 \| 1552 \| 421 \| 1690 \| 1770 \| \| [1, 960, 7, 7] \| [960, 1, 3, 3] \| 960 \| 1680 \| 424 \| 1690 \| 837 \| +-------------------+----------------+--------+-----------+----------+------------+-----------+ \| TOTAL \| 36928 \| 13167 \| 43267 \| 22386 \| +-------------------+----------------+--------+-----------+----------+------------+-----------+ ``` Differential Revision: D17598249 Pulled By: ljk53 fbshipit-source-id: aaeea221494f11b153a35af2b818a603f1f32ddf	2019-09-26 16:54:48 -07:00
Pavel Belevich	77bfe61ff4	C++ API parity: TensorTest.Data fix Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26920 Test Plan: Imported from OSS Differential Revision: D17614135 Pulled By: pbelevich fbshipit-source-id: 96d70a5e7724338d2829bf006696c2d0ac1025a6	2019-09-26 16:51:24 -07:00
James Reed	388430f6bc	Make quantized max_pool2d error message more specific and less silly Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26918 Test Plan: Imported from OSS Differential Revision: D17609624 Pulled By: jamesr66a fbshipit-source-id: 3bc900d5035e9311ab95e3d4a945e95062396afa	2019-09-26 16:48:13 -07:00
James Reed	b1a09dbec7	Support ceil_mode in quantized maxpool Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26916 Test Plan: Imported from OSS Differential Revision: D17609625 Pulled By: jamesr66a fbshipit-source-id: a9e1878e7946ee71b6888a91f0dcb2e889939376	2019-09-26 16:48:09 -07:00
Ivan Kobzarev	55fc377857	Check if QNNPACK is supported before set (#26935 ) Summary: ghstack-source-id: 0e873a56a879cab30b7fa1778e65d9cb89474f05 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26935 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26936 Differential Revision: D17617452 Pulled By: IvanKobzarev fbshipit-source-id: 4dbcdc55044dd2050b28062baa8b58c8387a1e4e	2019-09-26 16:36:54 -07:00
Supriya Rao	8d5c2aa71c	Set quantized engine backend for mobile in speed_benchmark_torch (#26911 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26911 Check if QNNPACK is present as a backend (should always be present on mobile). If it is present then set the backend to QNNPACK Test Plan: Test on mobile ./speed_benchmark_torch --model mobilenet_quantized_scripted.pt --input_dims="1,3,224,224" --input_type=float --warmup=5 --iter 20 --print_output True Imported from OSS Differential Revision: D17613908 fbshipit-source-id: af96722570a0111f13d69c38ccca52416ea5e460	2019-09-26 16:28:23 -07:00
BowenBao	638c4375de	Export index_fill and index_copy, fix caffe2 scatter Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23052 Reviewed By: hl475 Differential Revision: D16428486 Pulled By: houseroad fbshipit-source-id: 8c5905052763fd70197c67aba5f28eeff0790721	2019-09-26 16:23:32 -07:00
Jongsoo Park	d5490c662e	batch size 0 tests in BatchMatMul ops (#26874 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26874 Add batch_size == 0 testings of BatchMatMul DNNLOWP operator. Test Plan: CI Reviewed By: jianyuh Differential Revision: D17596117 fbshipit-source-id: 029e29e6c2bd7894d83dac46e8ce8484cc92b1c0	2019-09-26 16:08:39 -07:00
Jongsoo Park	ec1f0f08f1	batch size 0 support in norm operators (#26894 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26894 Add batch_size == 0 testings of norm DNNLOWP operators. Test Plan: CI Reviewed By: jianyuh Differential Revision: D17595416 fbshipit-source-id: 23086ecf8818be30da031eb4fc2922daea79ea7c	2019-09-26 16:08:35 -07:00
Igor Fedan	f99bc714c7	Migrate lt and lt_ from the TH to Aten (#25998 ) Summary: https://github.com/pytorch/pytorch/issues/24593 https://github.com/pytorch/pytorch/issues/24727 torch.lt(Tensor a, Tensor b) will compute common dtype (highest) based on inputs and then compare values. The result will be Bool tensor ``` >>> x = torch.tensor([0], dtype=torch.int) >>> y = torch.tensor([0.5], dtype=torch.double) >>> x < y tensor([True]) ``` Previously it was impossible to make comparison of two tensors with different dtype. torch.lt(Tensor a, Tensor b, out=c) will compute common dtype (highest) based on inputs and then compare values. The result can be populated only to Bool tensor ``` >>> x = torch.tensor([0], dtype=torch.int) >>> y = torch.tensor([0.5], dtype=torch.double) >>> z = torch.empty([1], dtype=torch.bool) >>> torch.lt(x, y, out=z) tensor([True]) ``` Previously it was impossible to make comparison of two tensors with different dtype. Also previously the result dtype could be Bool and Byte(deprecated). Currently it will accept only Bool result. a.lt_(Tensor b) Expects that a and b has same dtype, otherwise it's possible to get an overflow(Example: 'a' is uint8, 'b' is float32. 'a' will be promoted to float32 and the result will be also float32. Then it will be casted back to uint8 so potential for overflow). Will not compute common dtype. Result will have type of a. ``` >>> x = torch.tensor([0], dtype=torch.double) >>> y = torch.tensor([0.5], dtype=torch.double) >>> x < y tensor([True]) ``` Works similar to previous implementation. torch.lt(Tensor a, Scalar b) will check if there is no overflow when converting b to the same type as a. Then will compute common dtype and compare. ``` >>> x = torch.tensor([0], dtype=torch.double) >>> x < 0.5 tensor([True]) >>> x = torch.tensor([0], dtype=torch.int) >>> x < 0.5 tensor([True]) ``` Fix https://github.com/pytorch/pytorch/issues/22301. torch.lt(Tensor a, Scalar b, out=c) will check if there is no overflow when converting b to the same type as a. Then will compute common dtype and compare. The result can be populated only to Bool tensor ``` >>> x = torch.tensor([0], dtype=torch.double) >>> torch.lt(x, 0.5, out=z) tensor([True]) ``` Previously the result dtype could be Bool and Byte(deprecated). Currently it will accept only Bool result. The rest works similar to previous implementation. torch.lt_(Tensor a, Scalar b) will check if there is no overflow when converting b to the same type as a. Then will compute common dtype and compare. Result will have type of a. ``` >>> x = torch.tensor([0], dtype=torch.int) >>> x.lt_(1) tensor([1], dtype=torch.int32) >>> x = torch.tensor([0], dtype=torch.int) >>> x.lt_(1.0) tensor([1], dtype=torch.int32) ``` Works similar to previous implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25998 Differential Revision: D17431853 Pulled By: ifedan fbshipit-source-id: b5effc6a5d9b32da379395b32abc628b604faaf7	2019-09-26 16:05:27 -07:00
Hong Xu	9dd8a129de	Fix Vec256<T>::abs() for floating point when applied on -0.0 (#26422 ) Summary: Currently when a Vec256<T> (base) object contains -0.0, Vec256<T>::abs() would not produce 0.0, but -0.0 instead. This commit fixes this issue. This bug will mostly affect CPUs without AVX support, such as ARM, PowerPC, and older Intel models. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26422 Differential Revision: D17607346 fbshipit-source-id: e8d4595f0e88ad93018a61f89b9e3dcada485358	2019-09-26 15:55:55 -07:00
Hong Xu	755b7e484f	Remove an unused function propagate_names_if_namedtensor_enabled Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26176 Differential Revision: D17452289 Pulled By: yf225 fbshipit-source-id: 46926e6774a37e40141763c598b6fe84118ba5be	2019-09-26 15:47:55 -07:00
David Clissold	ac99936553	No sccache (#26059 ) Summary: Proposed change: Check whether sccache is available before running it to show statistics. (If not available, simply skip it. Showing these stats isn't mandatory to build.) https://github.com/pytorch/pytorch/issues/26058 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26059 Differential Revision: D17364967 Pulled By: vincentqb fbshipit-source-id: 0250c6ba5573bc0b292ae8e2188b3e1fa700409e	2019-09-26 15:45:14 -07:00
Hong Xu	6f92aa2f82	Use intrinsics for trigonometric functions on CPU (#26431 ) Summary: A little benchmarking shows real improvements. Benchmarking script: ```python import timeit for n, t in [(10_000, 8000), (100_000, 800)]: for dtype in ('torch.float', 'torch.double'): print(f'================ dtype {dtype}, {t} times ================================') for op in ('sin', 'sinh', 'cos', 'cosh', 'tan'): print(f'a.{op}() (a.numel() == {n}) for {t} times') print(timeit.timeit(f'a.{op}()', setup=f'import torch; a = torch.arange({n}, device="cpu", dtype={dtype})', number=t)) ``` RHEL 7.7, Debug build, gcc 8.3, turbo off: Before this commit: ``` ================ dtype torch.float, 8000 times ================================ a.sin() (a.numel() == 10000) for 8000 times 2.690067914001702 a.sinh() (a.numel() == 10000) for 8000 times 7.025003784001456 a.cos() (a.numel() == 10000) for 8000 times 2.691191975001857 a.cosh() (a.numel() == 10000) for 8000 times 6.7473940790005145 a.tan() (a.numel() == 10000) for 8000 times 39.14060311800131 ================ dtype torch.double, 8000 times ================================ a.sin() (a.numel() == 10000) for 8000 times 5.442704386001424 a.sinh() (a.numel() == 10000) for 8000 times 6.778444146999391 a.cos() (a.numel() == 10000) for 8000 times 5.429267812000035 a.cosh() (a.numel() == 10000) for 8000 times 6.625128638002934 a.tan() (a.numel() == 10000) for 8000 times 6.888564799002779 ================ dtype torch.float, 800 times ================================ a.sin() (a.numel() == 100000) for 800 times 2.343601189000765 a.sinh() (a.numel() == 100000) for 800 times 6.4455943499997375 a.cos() (a.numel() == 100000) for 800 times 2.3377084899984766 a.cosh() (a.numel() == 100000) for 800 times 6.357531049001409 a.tan() (a.numel() == 100000) for 800 times 46.93665131099988 ================ dtype torch.double, 800 times ================================ a.sin() (a.numel() == 100000) for 800 times 5.122997600999952 a.sinh() (a.numel() == 100000) for 800 times 6.233409892000054 a.cos() (a.numel() == 100000) for 800 times 5.071856587001093 a.cosh() (a.numel() == 100000) for 800 times 6.0974346790026175 a.tan() (a.numel() == 100000) for 800 times 6.5203832980005245 ``` After this commit: ``` ================ dtype torch.float, 8000 times ================================ a.sin() (a.numel() == 10000) for 8000 times 1.5905082239987678 a.sinh() (a.numel() == 10000) for 8000 times 6.8216283560032025 a.cos() (a.numel() == 10000) for 8000 times 1.630263119997835 a.cosh() (a.numel() == 10000) for 8000 times 6.738510535000387 a.tan() (a.numel() == 10000) for 8000 times 1.7482984089983802 ================ dtype torch.double, 8000 times ================================ a.sin() (a.numel() == 10000) for 8000 times 2.0000513029990543 a.sinh() (a.numel() == 10000) for 8000 times 6.876631892999285 a.cos() (a.numel() == 10000) for 8000 times 2.0672772910002095 a.cosh() (a.numel() == 10000) for 8000 times 6.678993797999283 a.tan() (a.numel() == 10000) for 8000 times 2.3625312719996145 ================ dtype torch.float, 800 times ================================ a.sin() (a.numel() == 100000) for 800 times 1.2381345620015054 a.sinh() (a.numel() == 100000) for 800 times 6.400261008999223 a.cos() (a.numel() == 100000) for 800 times 1.284327255001699 a.cosh() (a.numel() == 100000) for 800 times 6.332740200999979 a.tan() (a.numel() == 100000) for 800 times 1.392364119998092 ================ dtype torch.double, 800 times ================================ a.sin() (a.numel() == 100000) for 800 times 1.6348750549987017 a.sinh() (a.numel() == 100000) for 800 times 6.312609101998532 a.cos() (a.numel() == 100000) for 800 times 1.700102185997821 a.cosh() (a.numel() == 100000) for 800 times 6.141731683001126 a.tan() (a.numel() == 100000) for 800 times 1.9891383869980928 ``` RHEL 7.7, Release build, gcc 8.3, turbo off: Before this commit: ``` ================ dtype torch.float, 8000 times ================================ a.sin() (a.numel() == 10000) for 8000 times 1.0220722929989279 a.sinh() (a.numel() == 10000) for 8000 times 0.9413958889999776 a.cos() (a.numel() == 10000) for 8000 times 1.013564700999268 a.cosh() (a.numel() == 10000) for 8000 times 0.9127178879971325 a.tan() (a.numel() == 10000) for 8000 times 25.249723791999713 ================ dtype torch.double, 8000 times ================================ a.sin() (a.numel() == 10000) for 8000 times 3.3466339340011473 a.sinh() (a.numel() == 10000) for 8000 times 0.909793314000126 a.cos() (a.numel() == 10000) for 8000 times 3.4019737700000405 a.cosh() (a.numel() == 10000) for 8000 times 0.918371007002861 a.tan() (a.numel() == 10000) for 8000 times 4.902741645997594 ================ dtype torch.float, 800 times ================================ a.sin() (a.numel() == 100000) for 800 times 0.9870414770011848 a.sinh() (a.numel() == 100000) for 800 times 0.9038734009991458 a.cos() (a.numel() == 100000) for 800 times 0.9786967349973565 a.cosh() (a.numel() == 100000) for 800 times 0.8774048919985944 a.tan() (a.numel() == 100000) for 800 times 30.299459709000075 ================ dtype torch.double, 800 times ================================ a.sin() (a.numel() == 100000) for 800 times 3.3855797659998643 a.sinh() (a.numel() == 100000) for 800 times 0.8303290260009817 a.cos() (a.numel() == 100000) for 800 times 3.3702223940017575 a.cosh() (a.numel() == 100000) for 800 times 0.822016927999357 a.tan() (a.numel() == 100000) for 800 times 4.889868417001708 ``` After this commit: ``` ================ dtype torch.float, 8000 times ================================ a.sin() (a.numel() == 10000) for 8000 times 0.542676458000642 a.sinh() (a.numel() == 10000) for 8000 times 0.90598970100109 a.cos() (a.numel() == 10000) for 8000 times 0.6119738140005211 a.cosh() (a.numel() == 10000) for 8000 times 0.902145998999913 a.tan() (a.numel() == 10000) for 8000 times 0.7713400800021191 ================ dtype torch.double, 8000 times ================================ a.sin() (a.numel() == 10000) for 8000 times 0.609621113002504 a.sinh() (a.numel() == 10000) for 8000 times 0.8993683010012319 a.cos() (a.numel() == 10000) for 8000 times 0.6876834479990066 a.cosh() (a.numel() == 10000) for 8000 times 0.8859291590015346 a.tan() (a.numel() == 10000) for 8000 times 0.9243346840012236 ================ dtype torch.float, 800 times ================================ a.sin() (a.numel() == 100000) for 800 times 0.5219837559998268 a.sinh() (a.numel() == 100000) for 800 times 0.8755807839988847 a.cos() (a.numel() == 100000) for 800 times 0.5899826130007568 a.cosh() (a.numel() == 100000) for 800 times 0.8757360769996012 a.tan() (a.numel() == 100000) for 800 times 0.7496912290007458 ================ dtype torch.double, 800 times ================================ a.sin() (a.numel() == 100000) for 800 times 0.578619064999657 a.sinh() (a.numel() == 100000) for 800 times 0.7951330530013365 a.cos() (a.numel() == 100000) for 800 times 0.6442456569966453 a.cosh() (a.numel() == 100000) for 800 times 0.7975544330001867 a.tan() (a.numel() == 100000) for 800 times 0.875703464000253 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/26431 Differential Revision: D17470502 fbshipit-source-id: 82e930993c7b2827b04cbe5f9a962913a6069b62	2019-09-26 15:38:36 -07:00
Sebastian Messmer	5c67b01467	Switch internal CUDA build to C++14 (#26757 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26757 This doesn't switch any open source builds or CI. The internal fbcode build is C++17 already for quite some time, but in CUDA code, we had it restricted to C++11. This diff changes that to C++14. Because this doesn't change anything open source, the risk of this is low. ghstack-source-id: 90728524 Test Plan: waitforsandcastle Differential Revision: D17558142 fbshipit-source-id: 9cfd47e38e71d5a2fdae2f535c01f281bf007d9a	2019-09-26 14:57:21 -07:00
Ethan Steinberg	bf1d957dc8	Fix the Bernoulli distribution sampler (#26864 ) Summary: The current Bernoulli distribution sampler is slightly off in that it returns true slightly too often. This is most obvious at very low p values, like p = 0, although it theoretically occurs at every probability. See https://github.com/pytorch/pytorch/issues/26807. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26864 Differential Revision: D17610459 Pulled By: ezyang fbshipit-source-id: 28215ff820a6046822513f284793e7b850d38438	2019-09-26 14:14:57 -07:00
Peter Bell	e425bdb832	Choose num_threads in parallel_for based on GRAIN_SIZE (#26886 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24080 The OpenMP implementation of `parallel_for` now chooses the number of cores to use on a sliding scale between 1 and `OMP_NUM_THREADS`. This prevents wasteful core usage on many-core systems such as in https://github.com/pytorch/pytorch/issues/24080. This is also consistent with the comment on GRAIN_SIZE: `e327df3965/aten/src/ATen/Parallel.h (L10-L11)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/26886 Differential Revision: D17610292 Pulled By: ezyang fbshipit-source-id: 60b9fe4b0eecb41a28c1488e3a575674c8f7000c	2019-09-26 14:11:43 -07:00
Edward Yang	9f0deb4725	Get rid of -u (expansion of undefined variable) setting (#26907 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26907 Somehow CircleCI broke this on update to their OS X workers; the error looks like /bin/bash: line 1: PROMPT_COMMAND: unbound variable I'm not sure if I've killed all the occurrences that are necessary, let's see! Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17607486 Pulled By: ezyang fbshipit-source-id: 5e9a7ff69d4b18e759965bf97c67d38404841187	2019-09-26 13:30:31 -07:00
Karl Ostmo	b2f671a3fb	fix typo in job name: nigthly->nightly Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26881 Differential Revision: D17607874 Pulled By: kostmo fbshipit-source-id: 758a7c5135eb04ffca8231b5d907ababbe55e74b	2019-09-26 12:26:36 -07:00
vishwakftw	43b07ff2c4	Fix nuclear norm with requires_grad=True (#26303 ) Summary: Changelog: - Selectively assign compute_uv in the at::svd used internally in the implementation of at::nuclear_norm Pull Request resolved: https://github.com/pytorch/pytorch/pull/26303 Test Plan: - Add tests in common_method_invocations.py Refixes: https://github.com/pytorch/pytorch/issues/18275 Differential Revision: D17605357 Pulled By: ezyang fbshipit-source-id: d87d60afe678e2546dca6992ea66f2daeb6b0346	2019-09-26 12:08:25 -07:00
Zachary DeVito	0e3389dced	Fix circular deps in loading (#26758 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26758 This PR changes the order in which we import classes and functions so that is is no longer necessary for them to defined in order in a file, or for there to be proper import statements in the exported file. Actually importing a function/class now is driven by the need to resolve the entity during unpickling, type resolution, or value resolution. While this should allow significant simplification to the code that serializes classes, this work has not been done yet in order to avoid inevitable forward compat issues in the transition period. Notes: * Individual functions have been replaced with a SourceImporter object that exposes a resolveType method. This method loads the type if it has not been loaded yet, potentially parsing (but not loading) the file it exists in if that file hasn't been parsed yet. * Some legacy functionality needed to be added as a method to this object since the old format still used some of this logic for class resolution. Test Plan: Imported from OSS Differential Revision: D17558989 Pulled By: zdevito fbshipit-source-id: 7eae3470bcbd388c4de463e3462d527776ed46c6	2019-09-26 11:39:16 -07:00
Edward Yang	78a52549e4	Refactor dispatch structure so fallback code lives inline. (#26367 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26367 This is necessary for boxed fallback, as boxed fallback must live inside the templated code. Error reporting code never has to be in templated code, so that stays in the C++ file. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17448556 Pulled By: ezyang fbshipit-source-id: 8244589251e359886dbfcd1c306ae6c033c7a222	2019-09-26 10:59:55 -07:00
Edward Yang	272d7c021f	Change calling convention of ATenDispatch from getOp to callUnboxed. (#26857 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26857 Previously, ATenDispatch took TensorTypeId and returned a function pointer, to avoid requiring a direct dependence on Tensor (which would have caused a header cycle). Thanks to the work of Sebastian, it is now possible to include TensorBody.h without inducing a cycle; so we can now replace this indirect implementation with a more direct implementation of unboxedCall and move most of the implementation details into ATenDispatch (simplifying generated code). This is a necessary prerequisite for boxed fallback work I want to do, as I want to handle generation of boxing from inside ATenDispatch, not generated code. Unfortunately, we still need to generate the multidispatch list in function_wrapper.py to accommodate c10 dispatcher. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17602540 Pulled By: ezyang fbshipit-source-id: 6927e66924405f5bf5cb67f1b57e49bc9a0f58ec	2019-09-26 10:59:50 -07:00
svcscm	0bfe12d04c	Updating submodules Summary: GitHub commits: `cfdf778eaf` `7f55d6c14f` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 2523bce9933cb27b7a02da1650d7ad6f05b0ff30	2019-09-26 09:49:27 -07:00
Jongsoo Park	d3cab6571e	batch size 0 tests for Quantize/Dequantize DNNLOWP ops (#26873 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26873 Add batch_size == 0 testings of Quantize and Dequantize DNNLOWP operators. Test Plan: CI Reviewed By: jianyuh Differential Revision: D17595077 fbshipit-source-id: 4a4f60d471a1b1b5746131b08623aa8b1d0059f5	2019-09-26 08:28:03 -07:00
Jongsoo Park	78b0c58a9d	batch size 0 support in FC DNNLOWP operators (#26872 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26872 Add batch_size == 0 handlings in int8 FC operators. Added associated test cases. Test Plan: CI Reviewed By: jianyuh Differential Revision: D17595385 fbshipit-source-id: d271b7bdbaf723fd6dee6f194da8c7fdfeef5fa2	2019-09-26 08:24:17 -07:00
Jongsoo Park	41c1cc2f51	batch size 0 tests for element-wise DNNLOWP ops (#26870 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26870 Add batch_size == 0 testings of element-wise DNNLOWP operators. Test Plan: CI Reviewed By: jianyuh Differential Revision: D17595162 fbshipit-source-id: f358748b56b236cce8736bac16054ea84541bf7f	2019-09-26 08:22:08 -07:00
Jongsoo Park	1aaf4810bb	batch size 0 support in Conv DNNLOWP ops (#26871 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26871 Add batch_size == 0 handlings in int8 Conv operators. Added associated test cases. Test Plan: CI Reviewed By: jianyuh Differential Revision: D17594809 fbshipit-source-id: 54506afc7ef4bfbfed0272c52d2842f6e144f725	2019-09-26 08:18:56 -07:00
Pieter Noordhuis	2991bfdbe0	Add bitwise distributed reduction ops (#26824 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26824 These ops are named after the bitwise reduction ops in MPI. This is based on the work done by knottb in #22449. Closes #22449. Test Plan: Imported from OSS Differential Revision: D17600210 Pulled By: pietern fbshipit-source-id: 44c7041ce01bc5de170a4591c5a696e4f24431ef	2019-09-26 08:09:49 -07:00
Edward Yang	dec0b6b792	Add some missing constructors to IValue. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26806 Test Plan: Imported from OSS Differential Revision: D17581325 Pulled By: ezyang fbshipit-source-id: 1340ed949a649d11cc821775a33f84513e9a5944	2019-09-26 07:56:40 -07:00
Edward Yang	60b57d960f	Make resize_as_ generic, so XLA works. (#26809 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26809 resize_as_ shouldn't do multiple dispatch on its second argument. Because it currently has per CPU/CUDA dispatch, however, it will do proper dispatch on all arguments. Bad! There is only a very minor downside to this patch which is we have an extra dynamic dispatch now. Thank you Ailing for reporting this problem. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17581324 Pulled By: ezyang fbshipit-source-id: e62cbb6cf497a7d6e53c4a24b905fef7a29b0826	2019-09-26 07:56:36 -07:00
Jongsoo Park	8fb756d3b2	batch size 0 support in ChannelShuffle DNNLOWP op (#26858 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26858 Handle batch size = 0 in ChannelShuffle operator Test Plan: CI Reviewed By: jianyuh Differential Revision: D17591041 fbshipit-source-id: 63373aa752406c1f38401c3e93d8e1954ce7281e	2019-09-26 00:40:07 -07:00
Dmytro Dzhulgakov	0a8a779abe	Add more inplace arguments to quantization top level API (#26782 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26782 At least we should be consistent on top-level APIs and prepare/convert/etc. Logic is inplace=False by default but top-level APIs take care of doing fewer copies. Also renames always-inplace methods like add_observer to have underscore in the end. One fix for MinMaxObserver was triggered by deepcopy surfacing that we were accidentally keeping autograd around Test Plan: Imported from OSS Differential Revision: D17595956 Pulled By: dzhulgakov fbshipit-source-id: 801f9f5536b553f24c7a660064dd6fce685edd65	2019-09-26 00:07:07 -07:00
Jongsoo Park	5231699de2	Enable batch_size = 0 support in DNNLOWP Concat operator (#26849 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26849 We were having division-by-zero errors when one of the input tensor dimension is 0 . Examples: P111481720 and P111481374 This diff adds unit tests for empty input tensors and fixes division-by-zero errors in the partition function. Test Plan: buck test caffe2/caffe2/quantization/server:concat_dnnlowp_op_test -- --stress-runs=100 Reviewed By: jianyuh Differential Revision: D17574566 fbshipit-source-id: 1d2c21308bde99b3c4f2da82f53201eec42b5d8b	2019-09-26 00:03:40 -07:00
Jiakai Liu	60372dc713	remove backward functions from jit-op-registry for mobile build (#26851 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26851 Add codegen option to remove backward ops from jit-op-registry as they are not likely to be used for inference only mobile build. Measured ARM-v7 AAR build size change: 5,804,182 -> 5,331,219. Test Plan: - build and integrate with demo app; Differential Revision: D17587422 Pulled By: ljk53 fbshipit-source-id: 08c0fc7a710698a0d4baaf16bbb73cb812b1126a	2019-09-25 23:17:25 -07:00
Jiakai Liu	ed2607486f	add mobile friendly at:parallel_for backend Summary: This diff implemented at::parallel_for()/parallel_reduce() and other ATen/Parallel.h APIs for mobile using caffe2::ThreadPool. caffe2::ThreadPool doesn't support submitting individual tasks separately and running them in parallel - all tasks need to be submit in one batch which will lock the thread pool until all of them finish - as a result we didn't wrap caffe2::ThreadPool with TaskThreadPoolBase interface and reuse at::parallel_for() implementation in ParallelNative.h. Because of this constraint, intraop_launch() / intraop_launch_future() are not supported yet. This diff doesn't touch inter-ops pool - it's still default native c10 thread pool. Will work on it when it's widely used. Test Plan: - This is early draft to receive feedback. Will do more thorough tests. Differential Revision: D17543412 Pulled By: ljk53 fbshipit-source-id: 53a3259409c7207d837b9135d87d8daa6ad15e30	2019-09-25 22:33:06 -07:00
Nikolay Korovaiko	14d7d5718e	Improvements to GuardElimination and InsertBailouts Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25430 Differential Revision: D17584722 Pulled By: Krovatkin fbshipit-source-id: 9db099b904d71572c1bf3aef5419d38435cecbb5	2019-09-25 21:23:55 -07:00
svcscm	20ed6ba077	Updating submodules Summary: GitHub commits: `f767351c4b` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: d0bfc9e5e62669ada8d56b853490a373eb8ba2f7	2019-09-25 21:14:38 -07:00
Mikhail Zolotukhin	058ba0e761	Remove unnecessary functions and cleanup code in quantization.cpp. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26852 Test Plan: Imported from OSS Differential Revision: D17587742 Pulled By: ZolotukhinM fbshipit-source-id: f345ea4d524fde9741d6629dec1ea8ab870e49a5	2019-09-25 20:57:55 -07:00
Arnaud Bonnet	8f359a48a6	Fix building with PARALLEL_BACKEND=NATIVE_TBB (#26742 ) Summary: Fixing https://github.com/pytorch/pytorch/issues/26721 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26742 Test Plan: ``` export USE_OPENMP=0 export USE_TBB=1 export BLAS=MKL export MKL_THREADING=TBB export MKLDNN_THREADING=TBB export PARALLEL_BACKEND=NATIVE_TBB export USE_CUDA=0 python setup.py build ``` Reviewed By: dskhudia Differential Revision: D17586233 Pulled By: ilia-cher fbshipit-source-id: 8e8befa6aa776b8c2b27bb4b79a3bff33dbcba7e	2019-09-25 20:37:25 -07:00
Hong Xu	c25c507ffe	Remove three unused declaration. (#26699 ) Summary: `frac()` in `Vec256<int{16,32,64}_t>` is not overridden. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26699 Differential Revision: D17549502 Pulled By: soumith fbshipit-source-id: 87c65286032bfc88c447ec4eef1e3ebc73da5d27	2019-09-25 20:22:02 -07:00
James Reed	f37aa2de12	Try to disable annoying hypothesis warnings again (#26853 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26853 This is the same as https://github.com/pytorch/pytorch/pull/25188 but we add a version check for if the hypothesis version is too old Test Plan: Imported from OSS Differential Revision: D17589086 Pulled By: jamesr66a fbshipit-source-id: b968965719593ff989d612384e00dfb823cf0a73	2019-09-25 20:21:58 -07:00
James Reed	20ebd13f0a	Re-write of tensor-scalar quantized add Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26766 Test Plan: Imported from OSS Differential Revision: D17587105 Pulled By: jamesr66a fbshipit-source-id: 4da6ea98a4c5cc36fd191d9845c1ef409efce464	2019-09-25 20:19:28 -07:00
James Reed	1d55616aa2	Fix broken failure messages for OverloadedMethodValue Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26846 Test Plan: Imported from OSS Differential Revision: D17587050 Pulled By: jamesr66a fbshipit-source-id: e5f3ea05b496afae15994b539f018ed0499ca62b	2019-09-25 20:16:46 -07:00
James Reed	df16fb9ca1	Throw if someone tries to torch.save() quantized modules (#26828 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26828 Pickle serialization for quantized modules is currently broken by https://github.com/pytorch/pytorch/issues/24045, so let's be loud and fail if the user tries to do it Test Plan: Imported from OSS Differential Revision: D17579127 Pulled By: jamesr66a fbshipit-source-id: 3deccac7e4590c6f648f22bb79c57badf3bf0487	2019-09-25 19:55:17 -07:00
Mikhail Zolotukhin	d842435c01	Remove convert_to_ssa argument from runCleanupPasses - it is only used in one place. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26703 Test Plan: Imported from OSS Differential Revision: D17543131 Pulled By: ZolotukhinM fbshipit-source-id: c4a209c55ac76d8472e64af79f76e9a61fd2a941	2019-09-25 19:18:46 -07:00
Mikhail Zolotukhin	9df887df02	Use optimized_graph in graph_executor. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26705 Test Plan: Imported from OSS Differential Revision: D17543281 Pulled By: ZolotukhinM fbshipit-source-id: 91c40559aac6f2a1f77060fa28c33725a2b8e5f9	2019-09-25 19:18:42 -07:00
Ivan Kobzarev	ed82a28cf0	QEngine::QNNPACK enabled, module.eval() Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26855 Test Plan: Imported from OSS Differential Revision: D17589837 Pulled By: IvanKobzarev fbshipit-source-id: 0084538e9b9d760a8728cdcd5723fc7fae5838c7	2019-09-25 18:11:08 -07:00
Igor Fedan	2eb592324f	Migrate multinomial from the TH to Aten (CUDA) (#26481 ) Summary: https://github.com/pytorch/pytorch/issues/24604 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26481 Differential Revision: D17489859 Pulled By: ifedan fbshipit-source-id: 0702044c7c0f78e5e30826e8a5a83da27156bdb3	2019-09-25 17:57:05 -07:00
Natalia Gimelshein	90ffab6e37	enable double backward for non-cudnn LSTM and GRU (#26660 ) Summary: An attempt to enable double backward for non-cudnn LSTM and GRU (see https://github.com/pytorch/pytorch/issues/25315, https://github.com/pytorch/pytorch/issues/20449). RNN works already because it does not rely on fused kernels. This does not implement double backward function itself, because that is pretty hard to spell out. Instead, it implements backward using differentiable operations, so that double backward can be done automatically. The good: seems to work, no effect on performance on the usual case without double backward. because fused lstm backward is used. The bad: Performance of backward and, especially, double backward, is pretty bad. Scripting would still be a preferred way if we want a performant solution. Performance and/or memory use can be slightly improved if in-place variants can be used for sigmoid_backward and tanh_backward to avoid cat in the end, but I'm not yet sure it's possible, and in any case it is only slight improvement. The ugly: I could not figure out a way to reuse workspace that contains the sum of the gates with the applied sigmoid and tanh operations, so that's probably another perf and memory hit. cc soumith, albanD. If you think this approach is viable, I can extend to GRU and RNN. Thanks to mcarilli whose approach to double backward in weight norm I copied. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26660 Test Plan: added tests to check gradgrad for GRU and LSTM with cudnn disabled. Differential Revision: D17581489 Pulled By: ngimel fbshipit-source-id: efd204289e9a0e94d94896a0b3bff5cf6246cafa	2019-09-25 17:38:18 -07:00
Hong Xu	91549ef6c8	Move the CUDA implementation of log to ATen. (#26494 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26494 Close #24586 Test Plan: Imported from OSS Differential Revision: D17572497 Pulled By: VitalyFedyunin fbshipit-source-id: e1bcd33021464eaa4affd4c6d3283c8403069945	2019-09-25 17:04:08 -07:00
Martin Yuan	7fc06ea541	Bytecode export flow (#25187 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25187 The bytecode export flow: dump the bytecode format for the light weighted interpreter. * The bytecode is generated without input spec optimization. It would be more generic (input independent) with no obvious performance degradation (to be tested). * Main API: torch::jit::script::Module::save(filename, extra_files, bool bytecode_format = false). * Both bytecode and module object are exported in pickle format. * The module object (in data.pkl) is the same as the original JIT model. * The serializer is dependent on pickle only (no protobuf or Json). * The major functionality is forked in ScriptModuleSerializer2::serialize(). * The test loader is test_bc_export.cpp. * Simple APIs are added in Code and its implementation to get necessary information (instructions, operators and constants). * Since there's no dependency on graph/node, GetAttr is promoted from an operator to first-class instruction (https://github.com/pytorch/pytorch/pull/25151) . * Some definitions (instructions, writeArchive, etc) that are shared by full JIT and bytecode are pulled out of the local namespace (https://github.com/pytorch/pytorch/pull/25148). The output layout looks like: * folders of methods. * In each method folder (for example, forward/): * bytecode.pkl: instructions and operators * constants{.pkl,/}: constant list in constants.pkl. If there are tensors in constants, the binary tensor files in constants/ folder. * data{.pkl,/}: the module object, with binary tensor files in data/ folder. The same as in torchscript. Test Plan: Imported from OSS Differential Revision: D17076411 fbshipit-source-id: 46eb298e7320d1e585b0101effc0fcfd09219046	2019-09-25 16:35:45 -07:00
Vincent Quenneville-Belair	660d9e24dd	Highlighting in the doc that square root comes before adding epsilon Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26735 Test Plan: Imported from OSS Differential Revision: D17558505 Pulled By: vincentqb fbshipit-source-id: 36449c501f3ab3bc7cadd1f580258904b39369d4	2019-09-25 15:52:28 -07:00
Richard Zou	08425d8c01	Fix CUDA named tensor `copy_` (#26829 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26829 The TensorIterator loop for `copy_` uses operations that are currently unsupported by named tensors. The solution is to wrap `copy_` in a function that does the name propagation and ignore names when running the implementation of `copy_`. There is no test case because I'm not sure how to trigger the incorrect behavior, but there is definitely code in CUDA copy that doesn't support named tensors (expand_as isn't supported): `aaf30cdf36/aten/src/ATen/native/cuda/Copy.cu (L141-L148)` Test Plan: - [namedtensor ci] Differential Revision: D17577310 Pulled By: zou3519 fbshipit-source-id: e11c52243800e1331fad738084304badcfd51ae2	2019-09-25 15:45:35 -07:00
Elias Ellison	d43480d6d1	support iterables, rangevalue in list comprehensions (#26768 ) Summary: Support IterableValue expressions and rangevalue in list comprehensions. Just as with supporting list comprehensions where the expression changes the input list types, we need to correctly type the list we create and it works. Fixes https://github.com/pytorch/pytorch/issues/26693 Fixes https://github.com/pytorch/pytorch/issues/22483 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26768 Differential Revision: D17562762 Pulled By: eellison fbshipit-source-id: 7ce8bf8605758dfd99057bc0376b4b724c4f9251	2019-09-25 15:41:32 -07:00
Bulent Abali	a23109e12e	Do not call cpuinfo_initialize() on other than x86 arch. (#26265 ) Summary: cpuinfo_initialize() was not implemented for s390 arch. cpuinfo calls are x86 specific to determine vector extensions AVX, AVX512 etc. Without this patch an unnecessary error log is printed in s390 arch: Error in cpuinfo: processor architecture is not supported in cpuinfo Pull Request resolved: https://github.com/pytorch/pytorch/pull/26265 Differential Revision: D17452301 Pulled By: izdeby fbshipit-source-id: 9ca485550385c26dec18aac5953c887f1ffbfb7a	2019-09-25 15:08:45 -07:00
JCooky	9383601523	fix to operate on cuda kernel with clang and libc++ (#25553 ) Summary: We find a bug about `std::tuple` with nvcc. In C++11, `std::tuple` constructor is constexpr in libstdc++, but is not constexpr in libc++. `c36b77fcda/aten/src/ATen/native/cuda/Loops.cuh (L109-L111)` The lines have occurred crashes in CUDA with a message `scan failed with synchronize`. It is a error message of cuda initialization. The purpose of this PR is fixed for loop in nvcc and libc++ by not using `std::tuple`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25553 Differential Revision: D17582118 Pulled By: yf225 fbshipit-source-id: d6f62ed46c2415b48eb49f8a051cf3c0e7cb23ce	2019-09-25 15:03:28 -07:00
nmilosev	5fc52482cf	torch.load default encoding change to 'utf-8' (#26421 ) Summary: Default encoding when using torch.load to 'utf-8' This commit provides changes for cases where user tries to torch.load a pickled module with non-ASCII characters in the docstring as discussed in https://github.com/pytorch/pytorch/issues/21743. The default encoding was changed from 'ascii' to 'utf-8'. Documentation for `torch.load` was updated and two tests (loading py2 unicode module with unicode in it; error throwing when user explicitly sets wrong encoding) were written. ~~This commit provides changes for better error handling in cases where user tries to `torch.load` a pickled module with non-ASCII characters in the docstring as discussed in https://github.com/pytorch/pytorch/issues/21743.~~ Ping ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/26421 Differential Revision: D17581633 Pulled By: yf225 fbshipit-source-id: f8e77dcf7907092771149aad8ede6cfb73c21620	2019-09-25 14:59:02 -07:00
Richard Zou	92a2d4232a	Named tensor support for: all, any, bitwise_not, cumprod, cumsum, and more (#26815 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26815 This PR adds named tensor support for: - any, all, `bitwise_not(_)`, cumprod, cumsum, `logical_not` In addition, it adds smoke tests for a variety of tensor attributes and fns: - is_shared, is_signed - retain_grad, register_hook Test Plan: - [namedtensor ci] Differential Revision: D17575905 Pulled By: zou3519 fbshipit-source-id: 37bfa327e68112c5bf0f6bf1f467a527f50fa1c4	2019-09-25 14:56:28 -07:00
Sam Pepose	4bd1da1458	Revert D17473200: [pytorch][distributed] add function to get NCCL version for logging Test Plan: revert-hammer Differential Revision: D17473200 Original commit changeset: 4881ed5221b3 fbshipit-source-id: c5635ce89de1644d2135b657427cbd0c3af83576	2019-09-25 14:53:59 -07:00
Owen Anderson	81bbb7ebab	Convert TensorIterator to use function_ref, a lightweight alternative to std::function. (#26592 ) Summary: function_ref is pulled over from LLVM. It is to callables what StringRef is to strings. This allows it to be substantially lighter weight, particularly in code size. That comes at the cost of not being usable in situations where the callable's lifetime is shorter than the function_ref. This means it is suitable for callback-like scenarios, but not for situations where the callable needs to be stored. In converting TensorIterator, I only encountered one situation that required refactoring to comply with function_ref's constraints. In my local Release build, this reduces the size of libtorch by 4MB, from 70MB->66MB. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26592 Differential Revision: D17516202 fbshipit-source-id: 267476891f767f4827a4d38149f70e5035c56c48	2019-09-25 14:48:50 -07:00
Mingbo Wan	5379e87a32	Cuda101 upgrade (#26823 ) Summary: test run: https://github.com/pytorch/pytorch/issues/26732 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26823 Reviewed By: soumith Differential Revision: D17576095 Pulled By: mingbowan fbshipit-source-id: 269cf443aea18b47bbee63996d035bc5bcd2726b	2019-09-25 14:44:12 -07:00
Lu Fang	b6a1d618b2	Revert D17565828: [pytorch][PR] [ONNX] Export baddbmm Test Plan: revert-hammer Differential Revision: D17565828 Original commit changeset: 85f605a7b3fa fbshipit-source-id: 7705325087d83362f71a717be880a13e9f575b37	2019-09-25 14:24:18 -07:00
Will Feng	b5d15315d8	Improve C++ maxpool and avgpool (#26521 ) Summary: This PR makes the following improvements: 1. Add `forward_with_indices` method to all C++ MaxPool modules, to return the max indices along with the outputs. (We can't make two `forward` methods that return different types based on input, because that will break the type deduction of `torch::detail::return_type_of_forward_t`) 2. Add `max_poolNd_with_indices` to `torch::nn::functional`, to be used when indices of the max values are needed. (We can't merge this with `torch::nn::functional::max_poolNd` because the return type of `max_poolNd` has to be defined statically). 3. Improve `pretty_print` of C++ MaxPoolNd and AvgPoolNd modules to match the Python `extra_repr`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26521 Differential Revision: D17507358 Pulled By: yf225 fbshipit-source-id: b6c0e2b27b38378cdc0c75f4bfc797b3c6b17cd9	2019-09-25 13:52:58 -07:00
Basil Hosmer	167722d36e	Typevar matching fix + implicit conversions from Scalar to int/float (#26453 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26453 Previously, schema matching would incorrectly widen typevar bindings when later occurrences were supertypes of earlier ones. This allowed callsites like `floatlist.append(tensor.item())` to pass the typechecker, causing a runtime assert (issue #24856). An earlier, reverted fix (#25136) insisted on strict equality across all occurrences of a typevar, necessitating explicit casts around Scalar-typed arguments to int- or float-typed parameters, like `tensor.item()` above. This was per the original type system design, but turned out to break existing user code that relied on the de facto dynamic downcast. (The error required a specialized list representation.) The current fix includes the prevention of typevar widening, but adds logic to insert implicit conversions from Scalar to float or int as needed to satisfy a matched schema. Test Plan: Imported from OSS Differential Revision: D17470598 Pulled By: bhosmer fbshipit-source-id: d260dbf3cd78b9c2f2229bc61afc84e1910b5659	2019-09-25 13:49:55 -07:00
Lingyi Liu	03007b3dda	Quantized Interpolate Kernel(upsample_bilinear2d) (#26631 ) Summary: We implement the quantized upsample_bilinear2d case for interpolate kernel in this PR. For nhwc performance improvement: import torch, time for dtype in [torch.qint8, torch.quint8, torch.qint32]: print('**', str(dtype), '**') x = torch.rand(1, 56, 56, 256) q_x = torch.quantize_per_tensor(x, 0.5, 1, dtype) q_x = q_x.permute([0, 3, 1, 2]) x = x.permute([0, 3, 1, 2]) NITER = 100 s = time.time() for i in range(NITER): float_out = torch.nn.functional.interpolate(x, size=5, scale_factor=None, mode="bilinear", align_corners=True) time_per_iter_float = (time.time() - s) / NITER s = time.time() for i in range(NITER): quant_out = torch.nn.quantized.functional.interpolate(q_x, size=5, scale_factor=None, mode="bilinear", align_corners=True) time_per_iter_quant = (time.time() - s) / NITER ref_quantized = torch.quantize_per_tensor(float_out, 0.5, 1, dtype) # torch.testing.assert_allclose(ref_quantized.dequantize(), quant_out.dequantize()) print('time/iter ms (float)', 'time/iter ms (quant)', 'quant/float', sep='\t') print(time_per_iter_float 1000, time_per_iter_quant * 1000, time_per_iter_quant / time_per_iter_float, sep='\t') bytes_float = (x.numel() + float_out.numel()) * x.element_size() bytes_quant = (q_x.numel() + quant_out.numel()) * q_x.element_size() float_bw_gbps = bytes_float / time_per_iter_float / 1e9 quant_bw_gbps = bytes_quant / time_per_iter_quant / 1e9 print('GB/s float', 'GB/s quant', sep='\t') print(float_bw_gbps, quant_bw_gbps, sep='\t') ===========without nhwc handling=========== ** torch.qint8 * time/iter ms (float) time/iter ms (quant) quant/float 1.999044418334961 2.5860953330993652 1.2936657681940702 GB/s float GB/s quant 1.6192056416115257 0.3129103516188541 torch.quint8 * time/iter ms (float) time/iter ms (quant) quant/float 2.02730655670166 2.6061582565307617 1.2855274639721328 GB/s float GB/s quant 1.596632728927902 0.3105014816242217 torch.qint32 * time/iter ms (float) time/iter ms (quant) quant/float 2.0180463790893555 2.4047350883483887 1.1916153728010588 GB/s float GB/s quant 1.603959172365819 1.3460376636426636 ===========with nhwc handling=========== torch.qint8 * time/iter ms (float) time/iter ms (quant) quant/float 2.0913314819335938 0.09696483612060547 0.04636512047863123 GB/s float GB/s quant 1.5477527249803915 8.345458337015 torch.quint8 * time/iter ms (float) time/iter ms (quant) quant/float 2.1065664291381836 0.09959936141967773 0.04728042754408879 GB/s float GB/s quant 1.5365591871338384 8.124710725706763 torch.qint32 *** time/iter ms (float) time/iter ms (quant) quant/float 2.044203281402588 0.6003522872924805 0.29368521846837126 GB/s float GB/s quant 1.5834354779917448 5.391607675216635 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26631 Differential Revision: D17521498 Pulled By: llyfacebook fbshipit-source-id: 385ae0f77777cd8bee385cafb80e492127b7d103	2019-09-25 13:43:43 -07:00
Bram Wasti	334e78b1ce	Fix Future default constructor missing for ParallelNative Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26739 Test Plan: Imported from OSS Differential Revision: D17577908 Pulled By: bwasti fbshipit-source-id: a09cdbd8619a926e93418a692ce859d4157f2da8	2019-09-25 12:52:11 -07:00
Negin Raoof	63fd10549a	Export baddbmm (#25738 ) Summary: Added ONNX export for baddbmm in opset9 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25738 Reviewed By: hl475 Differential Revision: D17565828 Pulled By: houseroad fbshipit-source-id: 85f605a7b3fa4783ef4f6ced86223133c85062d5	2019-09-25 12:28:06 -07:00
Sebastian Messmer	3288da064f	Fix CI docker builds (#26704 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26704 nccl 2.1.15 isn't available for CUDA 10.1 and 2.4.8 isn't available for cuda 9.1 :( ghstack-source-id: 90714191 Test Plan: build docker images on Jenkins Differential Revision: D17543120 fbshipit-source-id: 882c5a005a9a3ef78f9209dea9dcec1782060b25	2019-09-25 12:25:21 -07:00
Edward Yang	ae2a8fea3d	Validate Docker version in CI. (#26496 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26496 It is a BAD BAD idea to deploy Docker versions which are not deployed (per ossci-job-dsl) because those versions will get GC'ed after two weeks. At the moment, there is no verification that your Docker version is deployed. This adds an Azure job to check this. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17575100 Pulled By: ezyang fbshipit-source-id: 8df2331c6e6899c585bc2917b55e8955908b0e4a	2019-09-25 12:18:52 -07:00
Tao Xu	f7742d2b21	Prepare for Cocoapods 1.3 Release (#26751 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26751 ### Summary We're going to use the AWS s3 bucket - `s3://ossci-ios` to store the release binary. To release the cocoapods, we can follow the steps below: 1. Open a fake PR to trigger the CI job that pulls the code from the 1.3.0 tag branch and does the building and uploading. 2. Verify the binary locally - Run tests on both arm64 and simulator 3. Publish the cocoapods officially ### Test plan - podspec lint command succeeds - `pod spec lint --verbose --allow-warnings --no-clean --use-libraries --skip-import-validation` Test Plan: Imported from OSS Differential Revision: D17577131 Pulled By: xta0 fbshipit-source-id: 55fee918ecc5c4e0b6d714488a12351b4370afac	2019-09-25 12:16:06 -07:00
Vitaly Fedyunin	f396b019b1	Remove one unnecessary copy of the output during the type promotion. (#26816 ) Summary: Output tensors doesn't need to be copied during type promotion as we are not using any data from them. Simple allocation gives steady 10% performance gain. BEFORE ``` In [1]: x = torch.randn(64, 2048, 7,7) In [2]: y = torch.randn(64, 2048, 7,7, dtype=torch.float64) In [3]: timeit x.add_(y) 77.3 ms ± 257 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` AFTER ``` In [1]: x = torch.randn(64, 2048, 7,7) In [2]: y = torch.randn(64, 2048, 7,7, dtype=torch.float64) In [3]: timeit x.add_(y) 68.2 ms ± 713 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/26816 Differential Revision: D17573455 Pulled By: VitalyFedyunin fbshipit-source-id: 47286abce5e7e665eb61e46ae358c896e945bef2	2019-09-25 12:06:51 -07:00
Rohan Varma	d9055319d4	add function to get NCCL version for logging (#26583 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26583 Adds a function that uses the nccl api to get the version code. Converts it to a readable version. Will be used for logging NCCL version in exception messages. Test Plan: See above Differential Revision: D17473200 fbshipit-source-id: 4881ed5221b397f2f967262668c2b376b6bf3c64	2019-09-25 11:56:31 -07:00
vishwakftw	aaf30cdf36	Port CUDA implementation of expm1 to ATen (#26598 ) Summary: Closes https://github.com/pytorch/pytorch/issues/24562 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26598 Differential Revision: D17531503 Pulled By: VitalyFedyunin fbshipit-source-id: 8119c796e142f073ad4e274dda1ad99344215c48	2019-09-25 11:11:58 -07:00
Ashkan Aliabadi	729f8425f7	Use Caffe2's implementation of grouped depthwise 3x3 convolutions (#26556 ) Summary: Use Caffe2's implementation of grouped depthwise 3x3 convolutions instead of NNPACK. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26556 Test Plan: _Correctness_ - Manually check the results using the --print-output flag on speed_benchmark_torch. _Performance_ - All measurements below on Pixel 2 Before: Multi-threaded: > adb shell "./speed_benchmark_torch \ > --model=./xraymobilev3.pt \ > --input_dims="1,3,224,224" \ > --input_type=float --warmup=5 \ > --iter=25" > > Main run finished. Milliseconds per iter: 876.002. Iters per second: 1.14155 Single-threaded: > adb shell "./speed_benchmark_torch \ > --model=./xraymobilev3.pt \ > --input_dims="1,3,224,224" \ > --input_type=float --warmup=5 \ > --iter=25 > --caffe2_threadpool_force_inline=true" > > Main run finished. Milliseconds per iter: 459.409. Iters per second: 2.17671 After: Multi-threaded: > adb shell "./speed_benchmark_torch \ > --model=./xraymobilev3.pt \ > --input_dims="1,3,224,224" \ > --input_type=float --warmup=5 \ > --iter=25 > > Main run finished. Milliseconds per iter: 285.68. Iters per second: 3.50042 Single-threaded: > adb shell "./speed_benchmark_torch \ > --model=./xraymobilev3.pt \ > --input_dims="1,3,224,224" \ > --input_type=float --warmup=5 \ > --iter=25 > --caffe2_threadpool_force_inline=true" > Main run finished. Milliseconds per iter: 278.999. Iters per second: 3.58425 > Differential Revision: D17533311 Pulled By: AshkanAliabadi fbshipit-source-id: 9ee8acf02b8e3e8da1922b188ed0a6459a90b67d	2019-09-25 11:02:27 -07:00
Edward Yang	1cae5195a6	Refactor checked_tensor_unwrap to take DeviceType instead of Backend (#26290 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26290 Fixes #26206 Happily, I also can delete the dead Dense***Tensor cases, since they are for the defunct THS backend. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17404368 Pulled By: ezyang fbshipit-source-id: 79d71ad40c4325c9f52d2825aceb65074d2e20e8	2019-09-25 10:59:07 -07:00
Jerry Zhang	b0bb5e338e	quantized_tensor tests (#26784 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26784 Previously we are using empty to generate test tensors, this PR changes the test tensors to use randint so that we can test things properly Also added a set_sizes_and_strides and removed .contiguous() in int_repr function to preserve the original size and strides Test Plan: python test/test_quantized_tensor.py Imported from OSS Differential Revision: D17566575 fbshipit-source-id: 89379fb09b500dd156118e6ee0709df59f169990	2019-09-25 10:33:30 -07:00
Mike Ruberry	25cd3c6b7d	Lets generic tests use multiple devices (#26594 ) Summary: - Separates device type from default (test) device - Adds multidevice decorator - Updates generic tests to use multidevice decorator where applicable TorchXLA wants to change the default test device based on the test environment. Separating the device type and the default (test) device enables that functionality. Additionally, many existing tests only run on multiple devices and are required, as a consequence, to make CUDA-specific API calls. The multidevice decorator simplifies the existing code and limits the CUDA dependency. Eventually this should let us run multidevice tests on multiple device types. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26594 Test Plan: tests were manually run with the CUDA test device set to 'cuda:1'. Differential Revision: D17568910 Pulled By: mruberry fbshipit-source-id: c442f748a31a970be8c21deb12a67c3b315c1128	2019-09-25 10:16:22 -07:00
Nikolay Korovaiko	db5791d543	autodiff changes to enable profiling Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25397 Differential Revision: D17565747 Pulled By: Krovatkin fbshipit-source-id: b772437d9e02df99db6e662cb7d1227359959bed	2019-09-25 10:11:44 -07:00
Jerry Zhang	0cb10d7ebf	move more functions to InsertObserversHelper (#26773 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26773 att Test Plan: ci Imported from OSS Differential Revision: D17563673 fbshipit-source-id: 5a6fb4238b6886695c2d25db11fec22ebe5d0c08	2019-09-25 10:06:05 -07:00
svcscm	f0e507cbaf	Updating submodules Summary: GitHub commits: `5096b0ae1f` `ecd6c10ea3` `67abe5d0aa` `90580f7e06` `7f98961c7b` `f8da6e6e36` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 60ce61531cf6d4ac8616b3986b40b423abc7de15	2019-09-25 09:52:18 -07:00
Hong Xu	ee6cdb5726	Upgrade sleef to v3.4.0. (#26749 ) Summary: This reset the sleef submodule to upstream, since everything else except a small build sanity fix <`191f655caa`> has been merged to upstream. The new release includes an important fix for trigonometric functions on MacOS, which would unblock https://github.com/pytorch/pytorch/issues/26431. This should supersede https://github.com/pytorch/pytorch/issues/20536. Close https://github.com/pytorch/pytorch/issues/20536. cc colesbury resistor Pull Request resolved: https://github.com/pytorch/pytorch/pull/26749 Differential Revision: D17572783 Pulled By: ezyang fbshipit-source-id: dd7827e8c8500a0050e3e318d184134c792d3ecc	2019-09-25 08:25:43 -07:00
Ailing Zhang	0f1fbc0eb2	Hub improvements (#26723 ) Summary: Resubmit of https://github.com/pytorch/pytorch/pull/25980. Our old serialization was in tar (like `resnet18-5c106cde.pth` was in this format) so let's only support automatically unzip if checkpoints are zipfiles. We can still manage to get it work with tarfile, but let's delay it when there's an ask. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26723 Differential Revision: D17551795 Pulled By: ailzhang fbshipit-source-id: 00b4e7621f1e753ca9aa07b1fe356278c6693a1e	2019-09-25 08:21:50 -07:00
Edward Yang	61dd485b3a	Revert D17549623: Add some missing constructors to IValue. Test Plan: revert-hammer Differential Revision: D17549623 Original commit changeset: 8880c09d85a1 fbshipit-source-id: 002bb1173dbcf6a1d18e1c4b84b4365f145c38dd	2019-09-25 07:47:06 -07:00
svcscm	f094afe4b9	Updating submodules Summary: GitHub commits: `6668c21398` `189aebb344` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: f2037290b58ac295eeb94626e172491a8526875d	2019-09-25 07:27:36 -07:00
Edward Yang	c5b57aa57d	Add some missing constructors to IValue. (#26718 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26718 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17549623 Pulled By: ezyang fbshipit-source-id: 8880c09d85a15b2a63dcf0c242ba6a2dd941decb	2019-09-25 07:21:57 -07:00
Edward Yang	037cfce745	Remove unnecessary include from TensorBody (#26360 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26360 This is not just for aesthetics: this include blocks the inclusion of headers like ivalue.h from ATenDispatch.h (as it causes an include cycle.) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17429163 Pulled By: ezyang fbshipit-source-id: 03feb210c12bc891d95bbb5a11ffd694ec05005c	2019-09-25 07:21:52 -07:00
Edward Yang	52614f5fd9	Implement multiple dispatch in boxed c10 dispatcher (#26118 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26118 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17404367 Pulled By: ezyang fbshipit-source-id: 14a16baa4b59f97182725092531a54603f3d92b8	2019-09-25 07:21:48 -07:00
Edward Yang	b56ad744a2	Delete backwards compatibility Backend overload for registerOp (#25914 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25914 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17284083 Pulled By: ezyang fbshipit-source-id: 430ac7ea2bd042b1f4bb874e53679d0fde326dec	2019-09-25 07:21:44 -07:00
Richard Zou	3346759774	Named tensor support for logsumexp, mode, kthvalue, median, min, max (#26563 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26563 This adds name inference rules for pre-existing logsumexp, mode, kthvalue, and median ops. Also adds overloads so that they can take `Dimname` dimensions. There are a lot of min/max overloads. This PR adds name inference to the following overloads for (both) min and max: - min(Tensor, int dim) - min(Tensor, Dimname dim) - min(Tensor) (full reduction) Test Plan: - new tests and [namedtensor ci] Differential Revision: D17557050 Pulled By: zou3519 fbshipit-source-id: a099a0ef04ad90d021a38a0668fc44902e1c7171	2019-09-25 07:04:31 -07:00
Brian Vaughan	002c250139	Expose a torch.result_type and simplify tensor iterator Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26012 Test Plan: Imported from OSS Differential Revision: D17556197 Pulled By: nairbv fbshipit-source-id: c0be3ac9e99fecc26a181e301defc1942bc6708c	2019-09-25 06:52:23 -07:00
Lara	5001ec4252	Support Negative Axis in Size in ONNX (#26436 ) Summary: Currently, we export invalid ONNX models when size() is used with a negative dim. This PR fixes the issue and allows exporting these models to ONNX (ex: input.size(-1)). Pull Request resolved: https://github.com/pytorch/pytorch/pull/26436 Reviewed By: hl475 Differential Revision: D17565905 Pulled By: houseroad fbshipit-source-id: 036bc384b25de77506ef9fbe24ceec0f7e3cff8b	2019-09-25 06:08:16 -07:00
Lara	d396c7332a	Update ONNX Export for Interpolate in Opset 11 (#26778 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26778 - Add support for linear and cubic interpolate in opset 11. - Add support for 1d and 3d interpolate in nearest mode for opset 7 and 8. - Add tests for all cases of interpolate in ORT tests (nearest/linear/cubic, 1d/2d/3d, upsample/downsample). Original PR resolved: https://github.com/pytorch/pytorch/pull/24805 Reviewed By: hl475 Differential Revision: D17564911 Pulled By: houseroad fbshipit-source-id: 591e1f5b361854ace322eca1590f8f84d29c1a5d	2019-09-25 05:43:20 -07:00
Richard Zou	60343a82e9	Named tensor support for: atan2, output_nr, detach{_}, requires_grad_ (#26543 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26543 Also adds a test for logical_xor (it already had named tensor support but there was no test) Test Plan: - [namedtensor ci] Differential Revision: D17501403 Pulled By: zou3519 fbshipit-source-id: 49be15580be9fb520e25a8020164e5a599d22d40	2019-09-25 05:23:57 -07:00
Richard Zou	be93d30e37	Revert D17458232: Fake quantization enhancements for QAT/PTQ support Test Plan: revert-hammer Differential Revision: D17458232 Original commit changeset: f44380c60f1a fbshipit-source-id: 64a244c720b61fa912bacbb23fcbf9faed0757c2	2019-09-25 04:56:30 -07:00
Raghuraman Krishnamoorthi	e2c3d7e52c	Fake quantization enhancements for QAT/PTQ support (#26420 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26420 Flags for enabling/disabling observer and fake quant independently. Improve repr for fake quant. ghstack-source-id: 90704254 Test Plan: buck test caffe2/test:fake_quant -- --print-passing-details buck test caffe2/test:quantization -- --print-passing-details Differential Revision: D17458232 fbshipit-source-id: f44380c60f1a10a8ea09bca8ab79ba5d1867ed62	2019-09-25 02:02:00 -07:00
Pieter Noordhuis	a395c31147	Add <cinttypes> include to resolve PRIu32 macro (#26745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26745 This file doesn't appear to be included by default on GCC 7.3 and causes compilation to fail. Adding this include fixes compilation. Test Plan: Imported from OSS Differential Revision: D17566444 Pulled By: pietern fbshipit-source-id: 9afb3d4596e424efc5a6ea6ab3b1cffdb2b41fbb	2019-09-25 00:57:28 -07:00
Raghuraman Krishnamoorthi	bc4519dc27	Handle DeQuantStub() for QAT (#26518 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26518 Skip Dequantize() modules for QAT alone. For fake quant insertion, DeQuantize() is a no-op and we should not be inserting fake-quant. ghstack-source-id: 90704220 Test Plan: buck test caffe2/test:quantization -- --print-passing-details Tests in test_quantization pass with changes: Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/281475121296989 Summary (total time 73.03s): PASS: 28 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 0 Differential Revision: D17439333 fbshipit-source-id: f716c23500324ae08c8d104ee2c9587fa6926571	2019-09-25 00:35:34 -07:00
Mikhail Zolotukhin	9949638818	Improve error message in IR parser when accessing undefined variable. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26771 Test Plan: Imported from OSS Differential Revision: D17562853 Pulled By: ZolotukhinM fbshipit-source-id: b4d4bc6001e3ea06f4d1b8691ad2a339a04c16ea	2019-09-25 00:23:13 -07:00
Jiakai Liu	6d0b004574	rename caffe2::mobile_threadpool to caffe2::mobile_pthreadpool Summary: Rename old mobile_threadpool() API, replace it with a new version that returns caffe2::ThreadPool instead of pthreadpool_t. Test Plan: - builds Differential Revision: D17543413 Pulled By: ljk53 fbshipit-source-id: a3effd24e8ce9d677a2a04ebe6b6e1582e6f0a65	2019-09-24 22:27:35 -07:00
Will Feng	d4dc844ec3	Add comments for multidim tensor factory limitations, and rename ListInitTensor for better clarity (#26756 ) Summary: This PR includes the following improvements: 1. Add comments for limitations of the multidim tensor factory function `torch::tensor(...)`, noting the fact that `torch::tensor({})` and mixed data type such as `torch::tensor({{bool, 2.0}})` are not supported at the moment. (I will also update https://pytorch.org/cppdocs/notes/tensor_creation.html to include usage examples for the multidim tensor factory function `torch::tensor(...)`) 2. Rename `ListInitTensor` to `InitListTensor`, for better naming consistency. This addresses reviews in https://github.com/pytorch/pytorch/pull/26210. I will work on a separate PR to move the factory function to `at::`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26756 Differential Revision: D17560136 Pulled By: yf225 fbshipit-source-id: eb8b45226e999784da48f75cc8953a998582df99	2019-09-24 19:21:23 -07:00
Jongsoo Park	e54a9e1b5a	use new fbgemm PackedDepthWiseConvMatrix without template parameter (#26760 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26760 Follow-up of D17514003 . Change Caffe2 code to use the new PackedDepthWiseConvMatrix interface. Test Plan: CI Reviewed By: dskhudia Differential Revision: D17514350 fbshipit-source-id: 691d9f1fd35bdb7dd8ba152287f3a34359dc1f4c	2019-09-24 19:06:04 -07:00
Edward Yang	0c0b4b6326	Revert D17559660: [fix] quantized_tensor tests Test Plan: revert-hammer Differential Revision: D17559660 Original commit changeset: d4ce81d57729 fbshipit-source-id: b6c9dc31f08935d255fa9eb3a830bafc76a13799	2019-09-24 18:59:48 -07:00
Edward Yang	1bb895e1c1	Revert D17330801: [pytorch][PR] Update ONNX Export for Interpolate in Opset 11 Test Plan: revert-hammer Differential Revision: D17330801 Original commit changeset: 1bdefff9e72f fbshipit-source-id: dff07477403170c27260f736ab6e6010f0deca9f	2019-09-24 18:56:45 -07:00
davidriazati	ef8d1c50c4	Fix builtin lookup for Python functions Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26688 Pulled By: driazati Differential Revision: D17560634 fbshipit-source-id: e1c50d1ca24e0313c2b7d704c488a29ef6a47cad	2019-09-24 18:02:36 -07:00
Richard Zou	cc4219a799	Wrap dimensions during named inference (#26558 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26558 Previously, name inference gets called after dimensions are wrapped. This PR makes it so that name inference always wraps dimensions so that it can be called anywhere. Ideally we would only wrap dimensions once, but many of our operators wrap dimensions in weird places. Wrapping dimensions in name inference is pretty inexpensive and only happens for named tensors (name inference does not run on unnamed tensors.) Test Plan: - [namedtensor ci] Differential Revision: D17557049 Pulled By: zou3519 fbshipit-source-id: 68c5636489e233dbf2588ab6ad4e379a6fe4c8ba	2019-09-24 17:47:55 -07:00
Jerry Zhang	27ad34a703	Revert D17558701: [refactor] move more functions to InsertObserversHelper Test Plan: revert-hammer Differential Revision: D17558701 Original commit changeset: 96ef87db74bd fbshipit-source-id: fc398d3b8bb1cd0bae573e3fdac5cfb883b31373	2019-09-24 17:33:58 -07:00
Lu Fang	89c5dc57d9	Add whitelist for backward compatible checks for function schemas (#26740 ) Summary: Now, we skip all function schema contains quantize key word Pull Request resolved: https://github.com/pytorch/pytorch/pull/26740 Reviewed By: hl475 Differential Revision: D17561753 Pulled By: houseroad fbshipit-source-id: c5e47ada072e71bfa2341a0af8f1743e86ef733c	2019-09-24 17:31:04 -07:00
Lu Fang	b93f0947a8	Automatic update of fbcode/onnx to ab6b94203c595f74b1f126eb118eef22e4c05a57 (#26736 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26736 Previous import was 23bb6ea1a71f08e200114a153f48bd7adb66d486 Included changes: - [ab6b9420](https://github.com/onnx/onnx/commit/ab6b9420): Relax IF's shape inference rule (#2345) <Wei-Sheng Chin> - [c5af774a](https://github.com/onnx/onnx/commit/c5af774a): Clarify behavior in ConvTranspose (#2343) <Wei-Sheng Chin> - [a20ba2f1](https://github.com/onnx/onnx/commit/a20ba2f1): Fix node test case model for Gemm scalar bias case (#2342) <Hariharan Seshadri> - [1aa176e0](https://github.com/onnx/onnx/commit/1aa176e0): Update pybind (#2340) <Changming Sun> - [7840504d](https://github.com/onnx/onnx/commit/7840504d): Update gen_doc script to validate proto3 files (#2122) <Raymond Yang> - [bd35e623](https://github.com/onnx/onnx/commit/bd35e623): Fix some backend tests (#2335) <Hariharan Seshadri> Test Plan: ci Reviewed By: hl475 Differential Revision: D17552449 fbshipit-source-id: 424acb261b54fc98485f782f6922b11b28c836eb	2019-09-24 17:14:13 -07:00
Richard Zou	925e51ea7f	Add a lot of dimname overloads (#26636 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26636 This PR defines a lot of dimname overloads so that when named tensor support is added for those operators, we will not have to modify the autogenerated TensorMethods.h, thereby avoiding potential merge conflicts in the future. Overloads were added for the following: - all - any - argmax - argmin - cumsum - cumprod - index_copy - kthvalue - mode - permute - squeeze - index_add - index_fill - scatter - scatter_add - index_select - gather - sort - argsort Test Plan: - [namedtensor ci] Differential Revision: D17522984 Pulled By: zou3519 fbshipit-source-id: eca6dea819ba4e4e43b71b700d5cf09176f00061	2019-09-24 17:03:36 -07:00
Jerry Zhang	67bde6b724	quantized_tensor tests (#25429 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25429 Previously we are using empty to generate test tensors, this PR changes the test tensors to use randint so that we can test things properly Also added a set_sizes_and_strides and removed .contiguous() in int_repr function to preserve the original size and strides Test Plan: python test/test_quantized_tensor.py Imported from OSS Differential Revision: D17559660 fbshipit-source-id: d4ce81d577296c1137270fdaa6b1359fb703896f	2019-09-24 17:00:24 -07:00
Sebastian Messmer	4820ff3adc	Switch our Android CI to Clang (#26656 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26656 Updating the NDK to r18 or newer triggers a path in our CI scripts so that we now build with clang instead of gcc. Google discontinued the gcc support for android quite a while ago, clang is the only way forward. ghstack-source-id: 90698985 Test Plan: CI Reviewed By: dreiss Differential Revision: D17533570 fbshipit-source-id: 5eef4d5a539d8bb1a6682f000d0b5d33b3752819	2019-09-24 16:42:27 -07:00
Satendra Gera	5de5f793ed	Added test case for reinit (#26506 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26506 [pytorch] [distributed] Made test forgiving to allow rpc agent to return one of the two errors. ghstack-source-id: 90667534 Test Plan: Made sure pg based UT works. Differential Revision: D17488899 fbshipit-source-id: 41f76cf4b4a0ca5e651a5403d6e67b639f0b9c4f	2019-09-24 16:39:33 -07:00
Jerry Zhang	7516156a35	move more functions to InsertObserversHelper (#26696 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26696 att Test Plan: ci Imported from OSS Differential Revision: D17558701 fbshipit-source-id: 96ef87db74bd1a5d4ddc69867ae71d78c0df83fd	2019-09-24 16:30:13 -07:00
Sebastian Messmer	d21232055e	Address review comments in https://github.com/pytorch/pytorch/pull/26272 (#26587 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26587 - ghstack-source-id: 90557226 Test Plan: unit tests Differential Revision: D17515048 fbshipit-source-id: 3459ee80efec29080060ec29d67642d789dd8749	2019-09-24 16:30:11 -07:00
Lu Fang	e95f3125fd	Make ONNX_ATEN_FALLBACK also works for _export (#26738 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26738 someone may use torch._export directly. Here we change the onnx_export_type's default value to None, and if it's pytorch onnx caffe2 bundle, we set it to ONNX_ATEN_FALLBACK, otherwise, it's ONNX. Test Plan: ci Reviewed By: hl475 Differential Revision: D17546452 fbshipit-source-id: 38e53926e2b101484bbbce7b58ebcd6af8c42438	2019-09-24 16:30:09 -07:00
Michael Suo	f43b7c4435	Revert D17513451: Register values listed in __constants__ as attributes of the Module. Test Plan: revert-hammer Differential Revision: D17513451 Original commit changeset: cf8f9b450e71 fbshipit-source-id: 319ec9399173eb06556969dc6be365b319c1ab6c	2019-09-24 16:30:06 -07:00
Michael Suo	1058373205	Revert D17514653: [quant] Un-hardcode epsilon constant in FoldConvBatchNorm2d. Test Plan: revert-hammer Differential Revision: D17514653 Original commit changeset: 7d9cc8f619b7 fbshipit-source-id: 2cf32082a46fe169a1db4926df78a9f3256616ad	2019-09-24 16:30:04 -07:00
Hong Xu	9dd9a7ef5c	Simplify operator `sign` using the helper. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25592 Test Plan: Imported from OSS Differential Revision: D17552470 Pulled By: VitalyFedyunin fbshipit-source-id: 6c8cc4f46dd390c231b2d0aac664ad2a6ac8876e	2019-09-24 16:30:02 -07:00
Ivan Kobzarev	c8109058c4	Refactor android torchvision: not hardcoded mean/std (#26690 ) Summary: - Normalization mean and std specified as parameters instead of hardcode - imageYUV420CenterCropToFloat32Tensor before this change worked only with square tensors (width==height) - added generalization to support width != height with all rotations and scalings - javadocs Pull Request resolved: https://github.com/pytorch/pytorch/pull/26690 Differential Revision: D17556006 Pulled By: IvanKobzarev fbshipit-source-id: 63f3321ea2e6b46ba5c34f9e92c48d116f7dc5ce	2019-09-24 16:29:59 -07:00
Lara	de3d4686ca	Update ONNX Export for Interpolate in Opset 11 (#24805 ) Summary: - Add support for linear and cubic interpolate in opset 11. - Add support for 1d and 3d interpolate in nearest mode for opset 7 and 8. - Add tests for all cases of interpolate in ORT tests (nearest/linear/cubic, 1d/2d/3d, upsample/downsample). Pull Request resolved: https://github.com/pytorch/pytorch/pull/24805 Reviewed By: hl475 Differential Revision: D17330801 Pulled By: houseroad fbshipit-source-id: 1bdefff9e72f5e70c51f4721e1d7347478b7505b	2019-09-24 16:29:57 -07:00
Supriya Rao	d959b4e2d2	Add threadpool in qlinear and qconv for mobile (#26728 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26728 Use Caffe2::mobile_threadpool() in linear and conv operators Perf Without threadpool - 76ms With threadpool - 41 ms Test Plan: python test/test_quantized.py TestQNNPackOps Imported from OSS Differential Revision: D17553510 fbshipit-source-id: dd5b06f526f65d87727ec7e3dad0a5fa74cba9f9	2019-09-24 16:29:55 -07:00
Rohan Varma	f57ecd5f29	add timeout parameter to connect function in TCPStore (#26554 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26554 Previously, in `TCPStore`'s constructor we did not pass in a timeout to the `connect` function, which thus used the default timeout (-1, so infinite). But the timeout variable in `TCPStore.cpp `is configurable by the user and set to be 300 seconds by default, so we should be passing this into the connect function. Test Plan: see above. Differential Revision: D17486779 fbshipit-source-id: 42d38a3b8d492d9e9ff09110990a8e4a3a1292b2	2019-09-24 16:29:52 -07:00
jon-tow	5e5b9a9321	Add C++ nn::Identity (#26713 ) Summary: Summary: Adds `torch::nn::Identity` module support for the C++ API. Issue: https://github.com/pytorch/pytorch/issues/25883 Reviewer: yf225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26713 Differential Revision: D17550982 Pulled By: yf225 fbshipit-source-id: f24483846e82d5d276d77a1a0c50884f3bc05112	2019-09-24 16:29:49 -07:00
Michael Suo	c0c2921a06	fix annotation regex for flake8 (#26694 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26694 Previously we would not properly populate `errorDesc` for: ``` ./torch/jit/__init__.py:13:1: F401 'torch.nn.ModuleList' imported but unused ``` because we wanted only letters and spaces. Be more permissive Test Plan: Imported from OSS Differential Revision: D17551999 Pulled By: suo fbshipit-source-id: b82567df1fa3c9729e7427dc3461bedfb40933dc	2019-09-24 16:29:47 -07:00
Jerry Zhang	3f72bcfcaa	Remove _dequantize_per_tensor (#26681 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26681 att Test Plan: ci Imported from OSS Differential Revision: D17542833 fbshipit-source-id: 653e906b0e146763609c69ef0de7f9cf38621586	2019-09-24 10:54:56 -07:00
davidriazati	d0fff0ebc8	Make `is_optional` check more robust (#26312 ) Summary: If the `Union` contains a non-class type, `issubclass` would fail, this adds a check for that case ](https://our.intern.facebook.com/intern/diff/17505206/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/26312 Pulled By: driazati Differential Revision: D17505206 fbshipit-source-id: 1331e412f938e2f08ecb079972147f11e3ec77cd	2019-09-24 10:44:40 -07:00
David Riazati	5cc353482d	Add doc building instructions Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26553 Differential Revision: D17551426 Pulled By: driazati fbshipit-source-id: 53ce05882091aca4617586bc53944ee4c8b3a622	2019-09-24 10:38:23 -07:00
Mikhail Zolotukhin	eddda3afdc	Un-hardcode epsilon constant in FoldConvBatchNorm2d. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26584 Test Plan: Imported from OSS Differential Revision: D17514653 Pulled By: ZolotukhinM fbshipit-source-id: 7d9cc8f619b7dbe26fa58eac37cc131929c004d4	2019-09-24 10:30:35 -07:00
Mikhail Zolotukhin	6c758ff244	Register values listed in __constants__ as attributes of the Module. (#26581 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26581 We're currently inlining immediate values of the constants directly into IR when we generate it providing no way to access these values by their names later. This change registers such values as atrtibutes of the module so that they are not lost after IR generation. Differential Revision: D17513451 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: cf8f9b450e7178692211abd905ffd2d7ce5a6ce1	2019-09-24 10:30:31 -07:00
Jerry Zhang	52b69fbcd4	Remove _dequantize_per_channel in the pattern (#26680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26680 This was introduced before under the assumption that we'll have a qconv_per_tensor_affine and a qconv_per_channel_affine, but turns out we don't have these, so we'll remove thse functions. Test Plan: python test/test_jit.py 'TestJit.test_quant_fusion' Imported from OSS Differential Revision: D17542607 fbshipit-source-id: b90ce5738170f0922bdc2eb1c4dbecd930f68a48	2019-09-24 10:27:52 -07:00
James Reed	cf272d43ab	Trivial quantized torch.mean implementation Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26253 Test Plan: Imported from OSS Differential Revision: D17529994 Pulled By: jamesr66a fbshipit-source-id: e3aff71da35b05ed61710cdb88d72b51c944168b	2019-09-24 10:18:15 -07:00
Ailing	9f1da984ef	Enable hub tests on MacOS (#26697 ) Summary: fix https://github.com/pytorch/pytorch/issues/26032. This was broken by a bad openssl release in conda. Should be fixed now. Testing... Pull Request resolved: https://github.com/pytorch/pytorch/pull/26697 Differential Revision: D17542095 Pulled By: ailzhang fbshipit-source-id: ba99f9b36ef2a7c793842cf91bd46fb2634ac1aa	2019-09-24 10:11:00 -07:00
Spandan Tiwari	af3b15b74c	Setting automatic default selection for ONNX IR v4 semantics in ONNX export API (#26146 ) Summary: This is a follow-up PR for https://github.com/pytorch/pytorch/pull/23284. In that PR we had removed changing the default behavior for `keep_initializers_as_input` argument to the export API. With this PR we are enabling that change in that if `keep_initializers_as_input` is not specified then value/behavior for this argument is chosen automatically depending on whether the export type is ONNX or not. This was part of the earlier PR was removed for further review. The test points have also been updated. This change may fail some internal tests which may require explicitly setting `keep_initializers_as_input=True` to preserve old behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26146 Reviewed By: hl475 Differential Revision: D17369677 Pulled By: houseroad fbshipit-source-id: 2aec2cff50d215714ee8769505ef24d2b7865a11	2019-09-24 10:02:31 -07:00
Nikolay Korovaiko	8b12602264	Add traces to specialize_autograd and lower_grad_of (2nd try) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22752 Differential Revision: D17543836 Pulled By: Krovatkin fbshipit-source-id: 5cbca220943a580169bf60ac09780b6e67075d2b	2019-09-24 09:58:43 -07:00
Mike Ruberry	a172fbf972	Expands TestAutogradDeviceType (#26708 ) Summary: - Ports all CUDA tests to TestAutogradDeviceType except those using multiple devices Pull Request resolved: https://github.com/pytorch/pytorch/pull/26708 Differential Revision: D17549435 Pulled By: mruberry fbshipit-source-id: b564186444201d1351934b6a7d21f67bdfca6e3b	2019-09-24 09:52:53 -07:00
Daya Khudia	fa7b621afd	Remove duplicate calculation of output shape (#26684 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26684 Output heights and widths are already calculated by conv_p. Remove the duplicate calculation. ghstack-source-id: 90633432 Test Plan: buck test mode/dev caffe2/test:quantized ``` Summary (total time 18.69s): PASS: 45 FAIL: 0 SKIP: 10 caffe2/test:quantized - test_qadd_scalar_relu (test_quantized.TestQuantizedOps) caffe2/test:quantized - test_equal (test_quantized.TestQuantizedOps) caffe2/test:quantized - test_qnnpack_add (test_quantized.TestQNNPackOps) caffe2/test:quantized - test_qconv_unpack (test_quantized.TestQNNPackOps) caffe2/test:quantized - test_qlinear_unpack (test_quantized.TestQNNPackOps) caffe2/test:quantized - test_compare_tensor_scalar (test_quantized.TestComparatorOps) caffe2/test:quantized - test_qconv_qnnpack (test_quantized.TestQNNPackOps) caffe2/test:quantized - test_qlinear_qnnpack (test_quantized.TestQNNPackOps) caffe2/test:quantized - test_qnnpack_relu (test_quantized.TestQNNPackOps) caffe2/test:quantized - test_qnnpack_maxpoolMore details at https://our.intern.facebook.com/intern/buck/build/3b394f1e-ab99-4e59-bdf5-2766f46e9869 2d (test_quantized.TestQNNPackOps) FATAL: 0 TIMEOUT: 0 OMIT: 0 ``` Differential Revision: D17538375 fbshipit-source-id: b4b60e93fdec4cc7bbf6aee7182381221dfac243	2019-09-24 09:49:24 -07:00
Dmytro Dzhulgakov	128a65e2e0	Use noop observer to pass dtype for dynamic quantization (#26709 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26709 Polishes implementation from #25975. Primarily, we use NoopObserver to communicate that weights need to be quantized to float16. The very top-level API (quantize_dynamic) stays the same with `dtype` argument but the implementation follows the common flow. One can argue that dynamic fp16 quantization doesn't really fit into the 'observer' mechanism. It's in fact not ideal, but it's better to have the same flow than branching on both dtype and qconfig. Test Plan: Imported from OSS Differential Revision: D17544103 Pulled By: dzhulgakov fbshipit-source-id: 6af3f18c35929a1a53ea734079c005f656e4925f	2019-09-24 09:24:39 -07:00
Hong Xu	ae0732cde3	Speed up an integer to the power of a positive integer on CPU (#26020 ) Summary: Current integer scalar exps are always cast to double. This commit avoids cast if the tensor is also integral and the scalar is positive to speed up. Benchmark (Debian Buster, g++ 8, Intel(R) Xeon(R) E-2136 CPU @ 3.30GHz 0 0:0 3300.00 MHz , Debug build, Turbo turned off): ```python import timeit for n, t in [(1000, 13000), (10_000, 1300)]: for e in (2, 3, 4): for dtype in ('torch.int16', 'torch.int32', 'torch.int64'): print(f'a.pow({e}) (a.numel() == {n}) for {t} times') print(f'dtype {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a.pow({e})', setup=f'import torch; a = torch.arange({n}, device="cpu", dtype={dtype})', number=t)) ``` Before: ``` a.pow(2) (a.numel() == 1000) for 13000 times dtype torch.int16, 13000 times 1.6958350749996498 a.pow(2) (a.numel() == 1000) for 13000 times dtype torch.int32, 13000 times 0.7989626339999631 a.pow(2) (a.numel() == 1000) for 13000 times dtype torch.int64, 13000 times 0.7973162800003593 a.pow(3) (a.numel() == 1000) for 13000 times dtype torch.int16, 13000 times 1.8660746679997828 a.pow(3) (a.numel() == 1000) for 13000 times dtype torch.int32, 13000 times 0.8101709959996697 a.pow(3) (a.numel() == 1000) for 13000 times dtype torch.int64, 13000 times 0.8135280149999744 a.pow(4) (a.numel() == 1000) for 13000 times dtype torch.int16, 13000 times 5.010833072999958 a.pow(4) (a.numel() == 1000) for 13000 times dtype torch.int32, 13000 times 4.801007671999741 a.pow(4) (a.numel() == 1000) for 13000 times dtype torch.int64, 13000 times 3.963344578000033 a.pow(2) (a.numel() == 10000) for 1300 times dtype torch.int16, 1300 times 1.6216251330001796 a.pow(2) (a.numel() == 10000) for 1300 times dtype torch.int32, 1300 times 0.5672429639998882 a.pow(2) (a.numel() == 10000) for 1300 times dtype torch.int64, 1300 times 0.5544572270000572 a.pow(3) (a.numel() == 10000) for 1300 times dtype torch.int16, 1300 times 1.656308512999658 a.pow(3) (a.numel() == 10000) for 1300 times dtype torch.int32, 1300 times 1.502670819999821 a.pow(3) (a.numel() == 10000) for 1300 times dtype torch.int64, 1300 times 0.5757876879997639 a.pow(4) (a.numel() == 10000) for 1300 times dtype torch.int16, 1300 times 4.775718216999849 a.pow(4) (a.numel() == 10000) for 1300 times dtype torch.int32, 1300 times 4.754745475000163 a.pow(4) (a.numel() == 10000) for 1300 times dtype torch.int64, 1300 times 3.737249878000057 ``` After: ``` a.pow(2) (a.numel() == 1000) for 13000 times dtype torch.int16, 13000 times 1.1006453190002503 a.pow(2) (a.numel() == 1000) for 13000 times dtype torch.int32, 13000 times 1.0849009019998448 a.pow(2) (a.numel() == 1000) for 13000 times dtype torch.int64, 13000 times 1.093259106000005 a.pow(3) (a.numel() == 1000) for 13000 times dtype torch.int16, 13000 times 1.0859826279997833 a.pow(3) (a.numel() == 1000) for 13000 times dtype torch.int32, 13000 times 1.1076840900000207 a.pow(3) (a.numel() == 1000) for 13000 times dtype torch.int64, 13000 times 1.0755480369998622 a.pow(4) (a.numel() == 1000) for 13000 times dtype torch.int16, 13000 times 1.918211066999902 a.pow(4) (a.numel() == 1000) for 13000 times dtype torch.int32, 13000 times 1.9183043200000611 a.pow(4) (a.numel() == 1000) for 13000 times dtype torch.int64, 13000 times 1.930021430999659 a.pow(2) (a.numel() == 10000) for 1300 times dtype torch.int16, 1300 times 0.7271483560002707 a.pow(2) (a.numel() == 10000) for 1300 times dtype torch.int32, 1300 times 0.7289002070001516 a.pow(2) (a.numel() == 10000) for 1300 times dtype torch.int64, 1300 times 0.7267536800000016 a.pow(3) (a.numel() == 10000) for 1300 times dtype torch.int16, 1300 times 0.7301799359997858 a.pow(3) (a.numel() == 10000) for 1300 times dtype torch.int32, 1300 times 0.7289195180001116 a.pow(3) (a.numel() == 10000) for 1300 times dtype torch.int64, 1300 times 0.7270008230002531 a.pow(4) (a.numel() == 10000) for 1300 times dtype torch.int16, 1300 times 1.5354506029998447 a.pow(4) (a.numel() == 10000) for 1300 times dtype torch.int32, 1300 times 1.528263066999898 a.pow(4) (a.numel() == 10000) for 1300 times dtype torch.int64, 1300 times 1.5369428439998956 ``` --- Best viewed with whitespace changes turned off Pull Request resolved: https://github.com/pytorch/pytorch/pull/26020 Differential Revision: D17485400 Pulled By: VitalyFedyunin fbshipit-source-id: 3a16b074825a5aab0f7e7af3d8100f9e4b7011a3	2019-09-24 09:17:09 -07:00
David Pollack	66d27504e3	allow building docker without torchvision (#26168 ) Summary: There is an issue with the torchvision version not matching the pytorch version if one builds the docker from a tag, see issue https://github.com/pytorch/pytorch/issues/25917. The current solution requires one to re-init the submodules or manually change the version of torchvision. This PR allows one to build the docker image without torchvision, which not only fixes the above mentioned bug but also frees non-image pytorch users from the tyranny of torchvision 😆. In all seriousness, for NLP researchers especially torchvision isn't a necessity for pytorch and all non-essential items shouldn't be in the docker. This option removes one extra thing that can go wrong. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26168 Differential Revision: D17550001 Pulled By: soumith fbshipit-source-id: 48b8b9e22b75eef3afb392c618742215d3920e9d	2019-09-24 09:12:57 -07:00
Will Feng	3cae3021e5	Add tests for C++ functional cosine_similarity and pairwise_distance, and clean up functional test code (#26559 ) Summary: This ensures that `F::cosine_similarity` and `F::pairwise_distance` can be used simply by including `torch/torch.h` and set `namespace F = torch::nn::functional`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26559 Differential Revision: D17507421 Pulled By: yf225 fbshipit-source-id: f895dde3634d5c8ca66ee036903e327e5cdab6b1	2019-09-24 09:10:42 -07:00
svcscm	714b05e499	Updating submodules Summary: GitHub commits: `ff4a61094e` `ad81c3823e` `518d8a1832` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 2a9a47805569a43e05d044c5494b57f6a7996bc4	2019-09-24 08:56:02 -07:00
zou3519	ff78d743b4	Don't generate named tensor functions to RegistrationFunctions.h (#26685 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26685 This prevents XLA from picking up on named tensor APIs. I ran into some problems while attempting to support dimname overloads in XLA; since we don't need the first iteration of named tensors to work with XLA this is OK. Test Plan: - run CI. Differential Revision: D17538893 Pulled By: zou3519 fbshipit-source-id: 93d579c93f5b1dc68541c07c4a3d61792859507d	2019-09-24 08:52:20 -07:00
Edward Yang	05f708187c	Typo fix (#26417 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26417 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17548776 Pulled By: ezyang fbshipit-source-id: 8c79893ee4216780edb838671e701de5518c4cd0	2019-09-24 08:41:54 -07:00
Elias Ellison	efaa65dd60	resolve ignored module method type annotations (#26683 ) Summary: Previously we weren't passing an rcb around, causing NamedTuples with unused methods to fail. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26683 Differential Revision: D17539656 Pulled By: eellison fbshipit-source-id: 50091e78eea5fa3a22b4655b65384eee47a1c9d6	2019-09-24 08:16:08 -07:00
Hong Xu	5e5cbceeba	remove tools/setup_helpers/cudnn.py (#25876 ) Summary: FindCUDNN.cmake and cuda.cmake have done the detection. This commit deletes `tools/setup_helpers/cudnn.py` as it is no longer needed. Previously in https://github.com/pytorch/pytorch/issues/25482, one test failed because TensorRT detects cuDNN differently, and there may be situations we can find cuDNN but TensorRT cannot. This is fixed by passing our detection result down to TensorRT. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25876 Differential Revision: D17346270 Pulled By: ezyang fbshipit-source-id: c1e7ad4a1cb20f964fe07a72906f2f002425d894	2019-09-24 07:44:33 -07:00
albanD	9f3351de81	Add warning to anomaly_mode doc fix #26408 (#26615 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26615 Stack from [ghstack](https://github.com/ezyang/ghstack): * #26615 Add warning to anomaly_mode doc fix #26408 Test Plan: Imported from OSS Differential Revision: D17527854 Pulled By: albanD fbshipit-source-id: d925dae049e64d88a50d08c46db33e3aabc1b849	2019-09-24 07:27:39 -07:00
Pearu Peterson	cf1dbc79db	Vectorize unary operator erfinv (#26629 ) Summary: Resolves https://github.com/pytorch/pytorch/issues/19088 for erfinv. erfinv speedup (MKL, AMD Ryzen Threadripper 2970WX 24-Core Processor): 22x Pull Request resolved: https://github.com/pytorch/pytorch/pull/26629 Differential Revision: D17527230 Pulled By: ezyang fbshipit-source-id: 0a5a53a88f7eb219617120383a454a01ad78279a	2019-09-24 07:24:50 -07:00
Vishwak Srinivasan	c643290982	Add derivative for cholesky_inverse (#26451 ) Summary: Changelog: - Add derivative of cholesky_inverse. The equations are derived akin to the derivative of solve methods using the technique detailed [here](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=2ahUKEwiXrOjIyM7kAhWstlkKHRxqCDgQFjAAegQIAhAC&url=https%3A%2F%2Fpeople.maths.ox.ac.uk%2Fgilesm%2Ffiles%2FNA-08-01.pdf&usg=AOvVaw0BNISOvM_I9KjPrl0xv1R_) Pull Request resolved: https://github.com/pytorch/pytorch/pull/26451 Test Plan: - Added tests for cholesky_inverse in test_autograd.py Closes https://github.com/pytorch/pytorch/issues/4669. Differential Revision: D17548526 Pulled By: ezyang fbshipit-source-id: 51aa8b900a8dc4012b01a73d432606f216f62c9d	2019-09-24 07:12:41 -07:00
Hong Xu	7bdc0c138a	Move the CUDA implementation of trunc to ATen. (#25423 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25423 Fix #24650 Test Plan: Imported from OSS Differential Revision: D17397489 Pulled By: VitalyFedyunin fbshipit-source-id: 933f915a44ff9b7803ddb2708bf0e723433ee0b6	2019-09-24 07:08:55 -07:00
Lu Fang	d6ee58494f	Automatic update of fbcode/onnx to 23bb6ea1a71f08e200114a153f48bd7adb66d486 (#26441 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26441 Previous import was 1316afc9f972f81340faa05763e2898f38bcc3b0 Included changes: - [23bb6ea1](https://github.com/onnx/onnx/commit/23bb6ea1): Gemm optional bias (#2330) <James Allingham> - [1ac1f219](https://github.com/onnx/onnx/commit/1ac1f219): Changes for AIX platform (#1913) <kavanabhat> - [13b026f5](https://github.com/onnx/onnx/commit/13b026f5): Updated test cases for reshape (#2127) <James Allingham> - [97fcfe30](https://github.com/onnx/onnx/commit/97fcfe30): Replace is by == (#2326) <G. Ramalingam> - [3b5601e6](https://github.com/onnx/onnx/commit/3b5601e6): Updated docs for strides and dilations attributes (#2291) <James Allingham> - [d0c697b1](https://github.com/onnx/onnx/commit/d0c697b1): Revamped test cases for Gemm (#2060) <James Allingham> - [a3955c3c](https://github.com/onnx/onnx/commit/a3955c3c): Add more shape inference tests for Logical operators to improve coverage (#2133) <Hariharan Seshadri> - [e2e12d97](https://github.com/onnx/onnx/commit/e2e12d97): Change incorrect use of ValueError to TypeError (#2304) <prcvih> - [1f4b5f8c](https://github.com/onnx/onnx/commit/1f4b5f8c): Support dynamic 'pads' and 'value' in Pad operator (#2031) <Hariharan Seshadri> Test Plan: ci Reviewed By: hl475 Differential Revision: D17466717 fbshipit-source-id: 0f89a7a5a821d2c693492c99b4bebd5966e21d9f	2019-09-24 05:38:52 -07:00
Pavel Belevich	450504cd95	C++ API parity: at::Tensor::set_data Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26647 Test Plan: Imported from OSS Differential Revision: D17542604 Pulled By: pbelevich fbshipit-source-id: 37d5d67ebdb9348b5561d983f9bd26d310210983	2019-09-24 04:51:22 -07:00
Mikhail Zolotukhin	2cf1183ec1	Use optimized graph in Inline (essentially, making Inline recursive now). (#26489 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26489 This basically fixes Inline(recurse=true) and makes it a default. One reservation against running inlining recursively in the original implementation was that we might hit a quadratic behavior, but in this implementation it's not an issue since we're inlining only already inlined graphs and as we recursively descend the call tree we're caching graphs we've already optimized. Test Plan: Imported from OSS Differential Revision: D17485744 Pulled By: ZolotukhinM fbshipit-source-id: 2ed7bdc69863b90a8c10a385d63f8e7c9e7b05f5	2019-09-24 00:22:29 -07:00
Mikhail Zolotukhin	c522b6356c	Add 'optimized_graph' to Function. (#26488 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26488 Currently the main use case for this graph is inlining and that's the only optimization we perform. We probably should run more cleanups on this graph in future. Test Plan: Imported from OSS Differential Revision: D17485745 Pulled By: ZolotukhinM fbshipit-source-id: 7b30c9ba47b4e5fff3591a0063560bfeb68f2164	2019-09-24 00:22:26 -07:00
Mikhail Zolotukhin	c034f9796f	Use std::mutex instead of std::call_once in Function when we initialize GraphExecutor. (#26571 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26571 We will need a mutex for computing optimized graph too, which will be implemented in subsequent commits. Test Plan: Imported from OSS Differential Revision: D17510883 Pulled By: ZolotukhinM fbshipit-source-id: 273b25426785e50f67a103204de98f6ed14182db	2019-09-24 00:22:22 -07:00
Mikhail Zolotukhin	76e2ffc877	Remove 'recurse' parameter from Inline. (#26487 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26487 The way it is implemented currently is bad because while we're inlining to a graph G, we are also mutating all the graphs that are being inlined. The problem is that the graphs we're inlining are usually the original graphs of functions, so we're silently changing them behind the scenes, and we don't have a way to recover 'unoptimized' graphs afterwards. Test Plan: Imported from OSS Differential Revision: D17485748 Pulled By: ZolotukhinM fbshipit-source-id: 6094ef56077240e9379d4c53680867df1b6e79ef	2019-09-24 00:22:18 -07:00
Sebastian Messmer	a65db650a8	Enable registering stackbased kernels with lambdas (#26658 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26658 By SFINAE'ing the lambda registration to only kernels that aren't stackbased kernels, an attempt to register a stackbased lambda kernel will correctly fallback to the stackbased registration function and work as expected. ghstack-source-id: 90610843 Test Plan: unit tests Differential Revision: D17533871 fbshipit-source-id: 1bfe3106b0576d46798a51bdaa5b7b5508164766	2019-09-24 00:18:36 -07:00
Karl Ostmo	839e636fa1	Revert D17495679: [pytorch][PR] A few hub improvements Test Plan: revert-hammer Differential Revision: D17495679 Original commit changeset: 695df3e803ad fbshipit-source-id: 6c85bc980991971b08714f05155dd23147eed233	2019-09-23 23:38:19 -07:00
Mike Ruberry	98bbb7788c	Updates and extends TestNNDeviceType (#26638 ) Summary: - Moves several tests to TestNNDeviceType - Merges helper base with TestNNDeviceType <s>- Enables non-default stream for TestNN (like recent updates to TestTorch and TestCUDA)</s> Reverted non-default stream due to failure of test_variable_sequence_cuda (main.TestNN). Pull Request resolved: https://github.com/pytorch/pytorch/pull/26638 Differential Revision: D17543899 Pulled By: mruberry fbshipit-source-id: 001fa191f5fe424f2e7adc378b8fb5ee7f264f16	2019-09-23 22:48:21 -07:00
Dmytro Dzhulgakov	ade60f8a8d	Allow per-channel QTensor accept any floating type for scales (#26676 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26676 Just makes it more user-friendly to be able to pass any floating point or int point values to scales or zero_points for per-channel quantization. It matches behavior or per tensor quantizer where those arguments are scalars (not tensors) and thus automatic casting is applied. Test Plan: Imported from OSS Differential Revision: D17537051 Pulled By: dzhulgakov fbshipit-source-id: e955ccdb5b4691828a559dc8f1ed7de54b6d12c4	2019-09-23 22:29:05 -07:00
Dmytro Dzhulgakov	b93823cb65	Per-channel quantized tensor to have only a single axis (#26675 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26675 Based on offline poll, we're very unlikely to have multi-axis quantized tensors in the foreseeable future. Let's simplify API and just return int instead of list. It also matches the singular `axis` name. Test Plan: Imported from OSS Differential Revision: D17537052 Pulled By: dzhulgakov fbshipit-source-id: 676abc3b251d288468aaed467b5e5ca4063b98b0	2019-09-23 22:29:01 -07:00
Dmytro Dzhulgakov	9aad4d7b5f	Fix _empty_per_channel_affine_quantized to be less hacky (#26243 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26243 This is an attempt to fix _empty_per_channel_affine_quantized to be more sane. It's a factory function that nevertheless receives a Tensor argument and it throws the codegen off course. Before people did a hacky workaround of appending _like to the function name to trick codegen, it also required non-natural argument order. This PR explicitly allows to override the 'category' of the function to make codegen do the right thing. Now name and the argument order (in C++) make more sense. Test Plan: Imported from OSS Differential Revision: D17443221 Pulled By: dzhulgakov fbshipit-source-id: c98c1c74473d8cbf637f511d26ceb949d8ae2a1a	2019-09-23 22:28:58 -07:00
Wenqi Cao	fbc3c14830	adding OpProfile proto into ProfDAGProtos to support storing operation cost (#26677 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26677 This diff adds OpProfile proto into ProfDAGProtos to support storing operation cost. During performance estimation idx, net_name, type, and exec_time will be stored in this proto. Test Plan: ``` buck test caffe2/caffe2/fb/net_transforms/tests/:stats_collector_test buck test caffe2/caffe2/fb/net_transforms/tests/:perf_estimator_test buck run caffe2/caffe2/fb/distribute/snntest/cogwheel/:cogwheel_snntest_offline_training_simple_online_training ``` Reviewed By: heslami Differential Revision: D17533791 fbshipit-source-id: a339c8eadcac891aa631daaf64522b69876b5045	2019-09-23 20:44:15 -07:00
Lingyi Liu	ba8002ec13	Quantized Interpolate Kernel(upsample_nearest2d) (#26617 ) Summary: In this PR, we implemented the support of quantized interpolate with upsample_nearest2d case. import torch, time for dtype in [torch.qint8, torch.quint8, torch.qint32]: print('**', str(dtype), '**') x = torch.rand(1, 56, 56, 256) q_x = torch.quantize_per_tensor(x, 0.5, 1, dtype) q_x = q_x.permute([0, 3, 1, 2]) x = x.permute([0, 3, 1, 2]) NITER = 100 s = time.time() for i in range(NITER): # float_out = torch.nn.functional.avg_pool2d(x, kernel_size=5, stride=None, padding=0) # float_out = torch.nn.functional.adaptive_avg_pool2d(x, output_size=5) float_out = torch.nn.functional.interpolate(x, size=5, scale_factor=None, mode="nearest", align_corners=None) time_per_iter_float = (time.time() - s) / NITER s = time.time() for i in range(NITER): # quant_out = torch.nn.quantized.functional.avg_pool2d(q_x, kernel_size=5, stride=None, padding=0) # quant_out = torch.nn.quantized.functional.adaptive_avg_pool2d(q_x, output_size=5) quant_out = torch.nn.quantized.functional.interpolate(q_x, size=5, scale_factor=None, mode="nearest", align_corners=None) time_per_iter_quant = (time.time() - s) / NITER ref_quantized = torch.quantize_per_tensor(float_out, 0.5, 1, dtype) # torch.testing.assert_allclose(ref_quantized.dequantize(), quant_out.dequantize()) print('time/iter ms (float)', 'time/iter ms (quant)', 'quant/float', sep='\t') print(time_per_iter_float 1000, time_per_iter_quant * 1000, time_per_iter_quant / time_per_iter_float, sep='\t') bytes_float = (x.numel() + float_out.numel()) * x.element_size() bytes_quant = (q_x.numel() + quant_out.numel()) * q_x.element_size() float_bw_gbps = bytes_float / time_per_iter_float / 1e9 quant_bw_gbps = bytes_quant / time_per_iter_quant / 1e9 print('GB/s float', 'GB/s quant', sep='\t') print(float_bw_gbps, quant_bw_gbps, sep='\t') =========without special handling of NHWC layout============= ** torch.qint8 * time/iter ms (float) time/iter ms (quant) quant/float 2.08712100982666 2.1624231338500977 1.0360794240817361 GB/s float GB/s quant 1.5508750976872339 0.37421723220248165 torch.quint8 * time/iter ms (float) time/iter ms (quant) quant/float 2.056601047515869 2.184889316558838 1.0623787823107091 GB/s float GB/s quant 1.573890086222483 0.3703693335250963 torch.qint32 * time/iter ms (float) time/iter ms (quant) quant/float 2.0152783393859863 2.067704200744629 1.0260142037623525 GB/s float GB/s quant 1.6061622539873104 1.5654386148823074 =========with special handling of NHWC layout============= torch.qint8 * time/iter ms (float) time/iter ms (quant) quant/float 2.044649124145508 0.009250640869140625 0.004524317038018256 GB/s float GB/s quant 1.5830902044636819 87.47675014597938 torch.quint8 * time/iter ms (float) time/iter ms (quant) quant/float 2.049403190612793 0.009107589721679688 0.004444020465761265 GB/s float GB/s quant 1.579417859221808 88.8507305147644 torch.qint32 *** time/iter ms (float) time/iter ms (quant) quant/float 2.0601415634155273 0.01062631607055664 0.0051580513976618066 GB/s float GB/s quant 1.5711852318699757 304.6082930818039 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26617 Differential Revision: D17519146 Pulled By: llyfacebook fbshipit-source-id: 126876e550ef7009fd75f5ccc033599f1f37456d	2019-09-23 20:32:19 -07:00
Jerry Zhang	aa95c7951e	_per_channel_affine_qtensor -> _make_per_channel_quantized_tensor (#26679 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26679 making it more explicit that it's a factory function. Test Plan: ci Imported from OSS Differential Revision: D17540861 fbshipit-source-id: bf66c87d6afad411afd5620cf2143a8f5596ad6b	2019-09-23 19:01:27 -07:00
Jerry Zhang	8a919f4f3d	Skip observing bias across function call hierarchy (#26642 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26642 att Test Plan: python test/test_jit.py 'TestJit.test_insert_observers' Imported from OSS Differential Revision: D17538667 fbshipit-source-id: ac8f561160eed0803f6ac48cf0fed253adb58bb5	2019-09-23 18:49:40 -07:00
David Reiss	af96e0cb5b	Whitelist ATen/core sources and headers for Caffe2 (#26609 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26609 Previously, we were globbing all of ATen/core and excluding specific files. However, this frequently resulted in new files being missed, and PyTorch diffs triggering Caffe2 builds. Now, instead, we will list the ATen/core files that are required for Caffe2. Test Plan: Ran internal Caffe2Go unit test. Reviewed By: smessmer Differential Revision: D17504740 fbshipit-source-id: 5b9bf7a6e8fa7848b2dfd375246d32630ca40cd5	2019-09-23 18:31:06 -07:00
Jerry Zhang	d63143dc5b	_per_tensor_affine_qtensor -> _make_per_tensor_quantized_tensor (#26678 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26678 making it more explicit that it's a factory function. Test Plan: ci Imported from OSS Differential Revision: D17540862 fbshipit-source-id: 14c5a4dcc7bb85ae849c9e4e0882601005e2ed3a	2019-09-23 18:27:53 -07:00
Jerry Zhang	7d612066ce	Add ObserveHelper and remove some common function parameters (#26641 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26641 att Test Plan: python test/test_jit.py 'TestJit.test_insert_observers*' Imported from OSS Differential Revision: D17538668 fbshipit-source-id: 42d0b251b245337227f877e57611b50f392a6d7e	2019-09-23 18:24:44 -07:00
Michael Suo	7f89464b2d	fix github actions for forked PRs (#26562 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26562 I was trying to be too clever with GITHUB_HEAD_REF... Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D17538517 Pulled By: suo fbshipit-source-id: 82c71ee3c6edb299ac8eb73675d96967e00a29f1	2019-09-23 17:59:37 -07:00
Supriya Rao	45391ccecb	Update qengine flag in python to string (#26620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26620 This change updates torch.backend.quantized.engine to accept string ("fbgemm"/"qnnpack"/"none" for now). set_qengine and get_qengine return an int which represents the at::QEngine enum Test Plan: python test/test_torch.py Imported from OSS Differential Revision: D17533582 fbshipit-source-id: 5103263d0d59ff37d43dec27243cb76ba8ba633f	2019-09-23 17:56:50 -07:00
Jerry Zhang	5d82cefa55	remove unneeded code (#26640 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26640 Remove some code that we forgot to remove before Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D17538669 fbshipit-source-id: 9614e45f6e5ad6f2fe2b4936deb23d0ffdfcd86a	2019-09-23 17:39:27 -07:00
Ailing Zhang	1eaaf8b68b	A few hub improvements (#25980 ) Summary: This PR does a few small improvements to hub: - add support `verbose` option in `torch.load`. Note that this mutes hitting cache message but keeps the message of first download as suggested. fixes https://github.com/pytorch/pytorch/issues/24791 - add support loading state dict from tar file or zip file in `torch.hub.load_state_dict_from_url`. - add `torch.hub.download_url_to_file` as public API, and add BC bit for `_download_url_to_file`. - makes hash check in filename optional through `check_hash`, many users don't have control over the naming, relaxing this constraint could potentially avoid duplicating download code on user end. - move pytorch CI off `pytorch/vision` and use `ailzhang/torchhub_example` as a dedicated test repo. fixes https://github.com/pytorch/pytorch/issues/25865 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25980 Differential Revision: D17495679 Pulled By: ailzhang fbshipit-source-id: 695df3e803ad5f9ca33cfbcf62f1a4f8cde0dbbe	2019-09-23 17:24:19 -07:00
Lara	c79d116a7d	Update ONNX Export for Gather and Scatter for Opset 11 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24790 Reviewed By: hl475 Differential Revision: D17159723 Pulled By: houseroad fbshipit-source-id: a63bb7c681120de85588dafecd03f04742dde8b7	2019-09-23 17:13:25 -07:00
Lara	3569a1c6dd	Fix Exporting RNN/LSTM's Initial State (h0/c0) to ONNX Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22813 Reviewed By: hl475 Differential Revision: D16275791 Pulled By: houseroad fbshipit-source-id: 6e2259e84e1f5a674daabcbe0df99b1360ed2b35	2019-09-23 17:08:24 -07:00
James Reed	cb9fd0ce58	quantized torch.topk (#26486 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26486 This PR adds a quantized version of torch.topk, supporting all the same options Benchmark script ``` import torch import time for dtype in [torch.qint8, torch.quint8, torch.qint32]: X = torch.rand(6, 5, 1024) qX = torch.quantize_linear(X, 0.01, 0, dtype) X = qX.dequantize() NITER = 10000 s = time.time() for i in range(NITER): float_out = torch.topk(X, 50) float_time_per_iter = (time.time() - s) / NITER s = time.time() for i in range(NITER): quant_out = torch.topk(qX, 50) quant_time_per_iter = (time.time() - s) / NITER print(dtype) print('float ms', 'quant ms', 'float gB/s', 'quant gB/s', sep='\t') nbytes_float = (X.numel() + float_out[0].numel()) * X.element_size() nbytes_quant = (qX.numel() + quant_out[0].numel()) * qX.element_size() print(float_time_per_iter * 1000, quant_time_per_iter * 1000, nbytes_float / float_time_per_iter / 1e9, nbytes_quant / quant_time_per_iter / 1e9, sep='\t') ``` Results ``` torch.qint8 float ms quant ms float gB/s quant gB/s 0.3706729888916016 0.3370296716690064 0.34769191136743244 0.09559989136992947 torch.quint8 float ms quant ms float gB/s quant gB/s 0.38260042667388916 0.34079675674438475 0.3368527346412275 0.09454315325003715 torch.qint32 float ms quant ms float gB/s quant gB/s 0.38033516407012935 0.3364055633544922 0.3388590174539739 0.38310900305828427 ``` Test Plan: Imported from OSS Differential Revision: D17529988 Pulled By: jamesr66a fbshipit-source-id: b5edfe90c592b6c84459d1c0c77e4c18f5b04417	2019-09-23 16:47:47 -07:00
James Reed	64d58c2f41	Allow batch size of 0 in Conv Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26214 Test Plan: Imported from OSS Differential Revision: D17377035 Pulled By: jamesr66a fbshipit-source-id: feb2ce195742e7102df0497e6c345e7173a10e19	2019-09-23 14:47:29 -07:00
Zachary DeVito	fcd13549f9	add CondValue to unify refinements and code emission (#26145 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26145 This is step towards isinstance type refinement. It primarily does yak shaving in compiler.cpp to unify the handling of special case behavior that occurs in conditional expressions: * Handling type refinement as part of emission. * Handling `is None` static-if specialization. It introduces a CondValue object that is a Value that also has additional type refinements that are true when that Value is true, and potentialy a static-true/false value that, if set, will cause if statements to be handled statically, omitting typechecking of the other side. This ends up expanding some behavior, for instance `is None` specialization used to happen only for single expressions, but now works through boolean logic. Test Plan: Imported from OSS Differential Revision: D17359500 Pulled By: zdevito fbshipit-source-id: ce93804496c8b4c3197a5966bc28c608465fda64	2019-09-23 14:24:18 -07:00
Jianyu Huang	cbdbdd3c8c	Fix the flaky test_qlinear test caused by hypothesis deadline (#26663 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26663 As Title says. Example error: https://circleci.com/gh/pytorch/pytorch/2894108?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link%2Fconsole ``` Sep 23 19:08:00 Unreliable test timings! On an initial run, this test took 453.13ms, which exceeded the deadline of 200.00ms, but on a subsequent run it took 23.01 ms, which did not. If you expect this sort of variability in your test timings, consider turning deadlines off for this test by setting deadline=None. ``` ghstack-source-id: 90613535 Test Plan: CI Differential Revision: D17534476 fbshipit-source-id: d3ab91c8b290a0433eab4af3fc73ecbf728ec5bf	2019-09-23 14:19:39 -07:00
Mingbo Wan	21314cfdde	"fixing" gcc bug introduced with cuda 10.1 (#26445 ) Summary: > Cuda 10.1: Nvidia, you're now "fixing" gcc bugs that gcc doesn't even have see [discussion]( https://devtalk.nvidia.com/default/topic/1048037/linux/cuda-10-1-nvidia-you-re-now-quot-fixing-quot-gcc-bugs-that-gcc-doesn-t-even-have/) for detail Pull Request resolved: https://github.com/pytorch/pytorch/pull/26445 Reviewed By: soumith Differential Revision: D17533850 Pulled By: mingbowan fbshipit-source-id: d668b0c4a3c71d58b4a0fa8e00d873708add3dea	2019-09-23 14:08:16 -07:00
Dmytro Dzhulgakov	ebc2365fd3	Serialization for per channel qtensor (#26339 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26339 Serializes per-channel tensor in both torch.serialization and jit. Since we didn't bind Quantizer properly yet, I chose to save a tuple representing quantizer settings. To avoid recursive tensor serialization calls, I'm using tuple instead of tensor to store scales and zero points. driazati - please check the serialization logic. Is there a good test that compares that JIT serialization and python serialization are equivalent? (I haven't tested it yet) Test Plan: Imported from OSS Differential Revision: D17443222 Pulled By: dzhulgakov fbshipit-source-id: a34758de1ffd2ec1cdc5355f5baf95284a4ccf4b	2019-09-23 13:28:11 -07:00
James Reed	c0aa6a01ce	NHWC specialization for quantized::cat (#26524 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26524 This creates an NHWC specialization for `quantized::cat` that kicks in when all inputs are `NHWC`. This ensures the correct layout is propagated downstream as well as is an optimized implementation specifically for this data layout Benchmark script based on Squeezenet shapes: ``` import torch, time torch.manual_seed(0) # NHWC sizes = [ (1, 54, 54, 64), (1, 54, 54, 128), (1, 26, 26, 128), (1, 26, 26, 256), (1, 12, 12, 256) ] for size in sizes: x = torch.rand(size) y = torch.rand(size) qX = torch.quantize_linear(x, 0.01, 3, torch.qint8).permute([0, 3, 1, 2]) qY = torch.quantize_linear(y, 0.01, 3, torch.qint8).permute([0, 3, 1, 2]) ref = torch.cat([qX.dequantize(), qY.dequantize()], dim=1) NITER = 1000 s = time.time() for i in range(NITER): out = torch.ops.quantized.cat([qX, qY], dim=1, scale=0.01, zero_point=3) time_per_iter = (time.time() - s) / NITER print('time per iter ms', time_per_iter * 1000) print('gb/s', (qX.numel() + qY.numel() + out.numel()) * qX.element_size() / time_per_iter / 1e9) torch.testing.assert_allclose(out.dequantize(), ref) ``` Before this change ``` time per iter ms 0.6898486614227295 gb/s 1.0821156026605054 time per iter ms 1.5480577945709229 gb/s 0.9644291093239284 time per iter ms 0.3180875778198242 gb/s 1.0881028500775023 time per iter ms 0.6702737808227539 gb/s 1.032748139350315 time per iter ms 0.13010454177856445 gb/s 1.1333655073392244 ``` After this change ``` time per iter ms 0.11604785919189453 gb/s 6.432656364350577 time per iter ms 0.15956878662109375 gb/s 9.356416324360508 time per iter ms 0.040181636810302734 gb/s 8.613685939027139 time per iter ms 0.06564664840698242 gb/s 10.544696748392909 time per iter ms 0.018549680709838867 gb/s 7.949247337814738 ``` Test Plan: Imported from OSS Differential Revision: D17503593 Pulled By: jamesr66a fbshipit-source-id: ec5d57ad8fbcb3fd9379e8bd370abd29d386f953	2019-09-23 13:19:29 -07:00
Ivan Kobzarev	69631c3ee3	nightly prefix for android nightly jobs (#26652 ) Summary: At the moment we have the same names for PR jobs and nightly jobs and results as we see on https://ezyang.github.io/pytorch-ci-hud/build/pytorch-master: pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v7a_build-1 pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v7a_build-2 => adding nightly_ prefix for nightly jobs Pull Request resolved: https://github.com/pytorch/pytorch/pull/26652 Differential Revision: D17533456 Pulled By: IvanKobzarev fbshipit-source-id: 586f48dc361c9143d8223e6742bbe78ef96b64fe	2019-09-23 13:04:09 -07:00
Aapo Kyrola	aeb6532e7f	BlobReference __getattr__ can only throw AttributeError (#26654 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26654 As per python contract, __getattr__ can only throw AttributeError. Throwing something else breaks hasattr() and causes upstream issues. Similar bug was in pytorch earlier. Test Plan: builds Differential Revision: D17529471 fbshipit-source-id: bb6ac6c9e3be8b80fa2967e6a2e293afd1594cf9	2019-09-23 13:01:00 -07:00
Dmytro Dzhulgakov	8fc8652598	Import torch.quantization when one imports torch Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26649 Test Plan: Imported from OSS Differential Revision: D17529567 Pulled By: dzhulgakov fbshipit-source-id: 8bf814c69ceb5e13891b57659cc729cccbfbc853	2019-09-23 12:58:17 -07:00
Richard Zou	567a1981a7	Fix ellipsis behavior for `Tensor.align_to` to glob all missing dims (#26648 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26648 Previously: - `Tensor.align_to(names)` only works on fully named tensors. In addition, the desired ordering `names` must not have any None-names. - `Tensor.align_to(names)` accepted `...`, but expanded it based on position. i.e., in `tensor.align_to('N', ..., 'C', 'H')`, `...` expand to `*tensor.names[1:-2]`. This is wildly incorrect: see the following concrete example. ``` tensor = tensor.refine_names('N', 'C', 'H, 'W') tensor.align_to('W', ...) # ... expands to 'C', 'H, 'W' ``` This PR changes it so that `...` in `tensor.align_to` grabs all unmentioned dimensions from `tensor`, in the order that they appear. `align_to` is the only function that takes ellipsis that requires this change. This is because all other functions (`refine_to`) require their list of names to work in a positional manner, but `align_to` lets the user reorder dimensions. This does not add very much overhead to `align_to`, as shown in the following benchmark. However, in the future, we should resolve to make these operations faster; align_to should be as fast as view but isn't most likely due to Python overhead. ``` [ins] In [2]: import torch ...: named = torch.randn(3, 3, 3, 3, names=('N', 'C', 'H', 'W')) ...: unnamed = torch.randn(3, 3, 3, 3) ...: %timeit unnamed[:] ...: %timeit unnamed.view(-1) ...: %timeit named.align_to(...) ...: %timeit named.align_to('N', 'C', 'H', 'W') 31 µs ± 126 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 43.8 µs ± 146 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 69.6 µs ± 142 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 66.1 µs ± 1.13 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` Test Plan: - new tests [namedtensor ci] allows the user to transpose and permute dimensions. Differential Revision: D17528207 Pulled By: zou3519 fbshipit-source-id: 4efc70329f84058c245202d0b267d0bc5ce42069	2019-09-23 12:16:46 -07:00
Edward Yang	fdf2bdef0c	Revert D17450502: [pytorch][PR] [WIP] Enabled bfloat16 dtype on CUDA Test Plan: revert-hammer Differential Revision: D17450502 Original commit changeset: 0a5acc5fe1b1 fbshipit-source-id: 6360e750e9805dc9c7c6ca8a9c16256ecd749416	2019-09-23 12:11:52 -07:00
Richard Zou	e0f86f7aba	Add namedtensor build & tests to default sets (#26633 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26633 This will enable named tensor CI on all pull requests. Previously, named tensor CI only ran on master. This is essential for the experimental release because we would like to prevent failures in the named tensor tests. In addition, when cherry-picking changes to the release branch, the first signals appear on the pull requests and it is good to be able to tell that something is wrong before the cherry-pick is merged. Test Plan: - run CI - check that the named tensor build / tests are indeed running on this PR. Differential Revision: D17523064 Pulled By: zou3519 fbshipit-source-id: d8d09bf584f1293bd3cfd43bf710d84f87d766ae	2019-09-23 12:08:52 -07:00
Mike Ruberry	a9a9d362e2	Makes test_indexing.py device generic (#26634 ) Summary: - Makes test_indexing.py device generic - Removes test_indexing_cuda.py Note: a couple tests in test_indexing.py were already CPU and CUDA tests, meaning these tests were run multiple times when CUDA was available. Genericizing test_indexing.py corrects this and lets these tests be run on other device types, like XLA, too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26634 Differential Revision: D17529001 Pulled By: mruberry fbshipit-source-id: e71ba28d947749255a0aceeb7b77a42c4811439d	2019-09-23 11:52:48 -07:00
Ivan Kobzarev	2a574e49b0	Sync docker images Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26651 Differential Revision: D17531136 Pulled By: IvanKobzarev fbshipit-source-id: 550757ac409f59b2a3783455a5a0144724078598	2019-09-23 11:25:25 -07:00
Tao Xu	781f861847	Add testing script for iOS x86 build (#26632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26632 ### Summary This script builds the TestApp (located in ios folder) to generate an iOS x86 executable via the `xcodebuild` toolchain on macOS. The goal is to provide a quick way to test the generated static libraries to see if there are any linking errors. The script can also be used by the iOS CI jobs. To run the script, simply see description below: ```shell $ruby scripts/xcode_ios_x86_build.rb --help -i, --install path to the cmake install folder -x, --xcodeproj path to the XCode project file ``` ### Note The script mainly deals with the iOS simulator build. For the arm64 build, I haven't found a way to disable the Code Sign using the `xcodebuiild` tool chain (XCode 10). If anyone knows how to do that, please feel free to leave a comment below. ### Test Plan - The script can build the TestApp and link the generated static libraries successfully - Don't break any CI job Test Plan: Imported from OSS Differential Revision: D17530990 Pulled By: xta0 fbshipit-source-id: f50bef7127ff8c11e884c99889cecff82617212b	2019-09-23 11:21:21 -07:00
Daya Khudia	fc926d9242	fix operator level benchmark to have NHWC layout (#26577 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26577 Have the NHWC layout expected by qconv kernel. for rexnext101-32x4d shapes Before : ``` Forward Execution Time (us) : 4787.046 Forward Execution Time (us) : 1320.065 Forward Execution Time (us) : 2611.631 Forward Execution Time (us) : 2562.389 Forward Execution Time (us) : 1072.342 Forward Execution Time (us) : 2330.658 Forward Execution Time (us) : 1894.549 Forward Execution Time (us) : 3446.532 Forward Execution Time (us) : 2381.251 Forward Execution Time (us) : 1157.339 Forward Execution Time (us) : 2712.621 Forward Execution Time (us) : 3789.905 Forward Execution Time (us) : 4057.886 Forward Execution Time (us) : 6104.570 Forward Execution Time (us) : 11328.552 Forward Execution Time (us) : 3707.519 Forward Execution Time (us) : 4681.272 Forward Execution Time (us) : 2459.266 Forward Execution Time (us) : 849.564 Forward Execution Time (us) : 3000.764 Forward Execution Time (us) : 3019.704 Forward Execution Time (us) : 5216.046 Forward Execution Time (us) : 3403.549 Forward Execution Time (us) : 1291.878 Forward Execution Time (us) : 2057.147 ``` After ``` Forward Execution Time (us) : 4398.649 Forward Execution Time (us) : 993.619 Forward Execution Time (us) : 2252.265 Forward Execution Time (us) : 2230.500 Forward Execution Time (us) : 977.389 Forward Execution Time (us) : 2233.356 Forward Execution Time (us) : 1223.085 Forward Execution Time (us) : 2758.765 Forward Execution Time (us) : 2208.028 Forward Execution Time (us) : 821.816 Forward Execution Time (us) : 2396.748 Forward Execution Time (us) : 2505.803 Forward Execution Time (us) : 2771.251 Forward Execution Time (us) : 4816.474 Forward Execution Time (us) : 10065.299 Forward Execution Time (us) : 2424.949 Forward Execution Time (us) : 3854.800 Forward Execution Time (us) : 2297.426 Forward Execution Time (us) : 682.403 Forward Execution Time (us) : 2297.541 Forward Execution Time (us) : 2317.828 Forward Execution Time (us) : 4517.372 Forward Execution Time (us) : 2716.691 Forward Execution Time (us) : 942.385 Forward Execution Time (us) : 1717.172 ``` ghstack-source-id: 90536232 Test Plan: buck build mode/opt caffe2/benchmarks/operator_benchmark/pt:qconv_test --show-output Differential Revision: D17512291 fbshipit-source-id: 7764b2ab38e0e8e0aab982006915176638004df6	2019-09-23 11:12:51 -07:00
Dmytro Dzhulgakov	a79b3685db	Simplify observers declaration with functools.partial (#26492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26492 Previous definition of observers was quite clumsy - with things like `default_observer()()`. This PR strips a way a lot of craft and allows to pass just class names directly. In order to override default arguments either `functools.partial` can be used or convenient wrapper `MyObserver.with_args(x=1)` is provided. Also rename `QConfig_dynamic` to `QConfigDynamic` because it violates the naming convention. Test Plan: Imported from OSS Differential Revision: D17521265 Pulled By: dzhulgakov fbshipit-source-id: ba9df19b368641acf4093c43df9990796284fd9e	2019-09-23 10:15:59 -07:00
Iurii Zdebskyi	76697a3bfc	Enabled bfloat16 dtype on CUDA Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26407 Differential Revision: D17450502 Pulled By: izdeby fbshipit-source-id: 0a5acc5fe1b1555c61ebe038aee9eaaae9dac228	2019-09-23 09:19:04 -07:00
jasjuang	e4821012ad	prevent generating caffe2::mkldnn for multiple times (#25257 ) Summary: This is a similar problem to https://github.com/pytorch/pytorch/issues/25004. After the merge of https://github.com/pytorch/pytorch/issues/25167, I recompiled torch and discovered another similar bug. ezyang please take a look Pull Request resolved: https://github.com/pytorch/pytorch/pull/25257 Differential Revision: D17528116 Pulled By: ezyang fbshipit-source-id: 1657d9ee6dced3548f246010b05e2b3c25c37dee	2019-09-23 08:53:02 -07:00
Pearu Peterson	786d225968	ATen port of lgamma (cuda) (#26600 ) Summary: Resolves issue https://github.com/pytorch/pytorch/issues/24585 . Btw, there are two ways to define unary operator support: 1. Use `IMPLEMENT_UNARY_OP_VEC_CUDA(someunaryop)` in `aten/src/ATen/UnaryOps.cpp` and in `native_functions.yaml` have: ``` - func: someunaryop(Tensor self) -> Tensor use_c10_dispatcher: full supports_named_tensor: True variants: method, function dispatch: CPU: someunaryop CUDA: someunaryop ``` 2. Or, in `aten/src/ATen/UnaryOps.cpp` have ``` Tensor& someunaryop_out(Tensor& result, const Tensor& self) { return unary_op_impl_out(result, self, someunaryop_stub); } Tensor someunaryop(const Tensor& self) { return unary_op_impl(self, someunaryop_out); } Tensor& someunaryop_(Tensor& self) { return unary_op_impl_(self, someunaryop_out); } ``` and in `native_functions.yaml` (note that `dispatch` section is removed): ``` - func: someunaryop(Tensor self) -> Tensor use_c10_dispatcher: full supports_named_tensor: True variants: method, function ``` It turns out that the way 1 is 3% more performant than the way 2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26600 Differential Revision: D17527166 Pulled By: ezyang fbshipit-source-id: 112ba298ad3f67d04078b921859e73dcd184852b	2019-09-23 08:46:44 -07:00
vishwakftw	15b506068b	Remove deprecated torch.gels (#26480 ) Summary: Changelog: - Remove `torch.gels` which was deprecated in v1.2.0 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26480 Test Plan: - No tests were changed and all callsites for `torch.gels` where modified to `torch.lstsq` when `torch.lstsq` was introduced Differential Revision: D17527207 Pulled By: zou3519 fbshipit-source-id: 28e2fa3a3bf30eb6b9029bb5aab198c4d570a950	2019-09-23 07:15:39 -07:00
Daniel	557246b77d	Fixing the calling parameters of write_gif function of the moviepy. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21218 Differential Revision: D17509260 Pulled By: ezyang fbshipit-source-id: 51e392cbcc20ade4c38c4edb75919f9bb314a830	2019-09-23 06:53:24 -07:00
Richard Zou	808f4a4d61	Revert D17521607: Name inference for min(Tensor, dim?) / max(Tensor, dim?) Test Plan: revert-hammer Differential Revision: D17521607 Original commit changeset: 303e3cef2291 fbshipit-source-id: a27b99c2c1c8b2e389d34395ba28a74d2946bb5a	2019-09-23 05:43:40 -07:00
Richard Zou	4fada96218	Renames `tensor.renamed -> rename`, `tensor.names_ -> rename_` (#26548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26548 This makes the naming more consistent with PyTorch's API. The original concern was that `tensor.rename` might make the operation seem like it is in-place. However, we have many "verb" APIs: `tensor.add(other)`, for example, doesn't add other to tensor in-place, but `tensor.add_(other)` does. `tensor.rename_` does exactly the same place as `tensor.rename`, but in-place. Test Plan: - [namedtensor ci] Differential Revision: D17502021 Pulled By: zou3519 fbshipit-source-id: 6a5b93136a820075013cd1e30fb8fc6b9d77d7d9	2019-09-22 15:38:26 -07:00
Richard Zou	d3e90bc47d	Name inference for min(Tensor, dim?) / max(Tensor, dim?) (#25582 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25582 There are a lot of min/max overloads. This PR adds name inference to the following overloads for (both) min and max: - min(Tensor, int dim) - min(Tensor, Dimname dim) - min(Tensor) (full reduction) Test Plan: - new tests [namedtensor ci] Differential Revision: D17521607 Pulled By: zou3519 fbshipit-source-id: 303e3cef22916dbc9da6a092d4f23e39e74c39e4	2019-09-22 12:20:51 -07:00
Pavel Belevich	6b25562489	C++ API parity: at::Tensor::detach Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26251 Test Plan: Imported from OSS Differential Revision: D17427578 Pulled By: pbelevich fbshipit-source-id: c3d23a8c2da4148b86e7760ba5023eb38f7835af	2019-09-22 06:10:48 -07:00
Owen Anderson	bdf10380d6	Whenever possible, use function pointers rather than std::function to represent Operation's. (#26560 ) Summary: This takes a lot of pressure off of the C++ typechecker as well as generating much more efficient and smaller code. In my not-super-rigorous testing, compile time for register_prim_ops.cpp went from 68s to 35s, and the size of libtorch went from 72MB to 70MB. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26560 Differential Revision: D17507305 fbshipit-source-id: 8bbd2c08304739432efda96da71f0fa80eb7668b	2019-09-21 20:51:24 -07:00
Supriya Rao	99226cd51e	Unify Quantization APIs for add, pool and relu (#26586 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26586 Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Differential Revision: D17515129 Pulled By: supriyar fbshipit-source-id: 951e90205aa19581ea006a91d9514fc7a94409ef	2019-09-21 13:41:16 -07:00
Martin Yuan	7e619650c9	Move unpickler related codes from pickler.h/cpp to unpickler.h/cpp (#26432 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26432 Move unpickler related codes from pickler.h/cpp to unpickler.h/cpp. In import flow we link to unpickler only. Test Plan: Imported from OSS Differential Revision: D17465410 fbshipit-source-id: 9d34629aa05bc0b45383e8f809c87baa186c9804	2019-09-21 11:56:48 -07:00
Jerry Zhang	95cb22f21f	_dequantize_linear -> _dequantize_per_tensor (#26576 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26576 to match `quantize_per_tensor` Test Plan: ci Imported from OSS Differential Revision: D17517439 fbshipit-source-id: 8c20f9b5d2a50d0e42e4444994b0987e6204ac56	2019-09-21 11:52:19 -07:00
Lingyi Liu	eca01eb0a6	quantized average_pool2d and adaptive_avg_pool2d implementation(Revert d17437015) (#26580 ) Summary: In this PR, we tried to fix the windows build issue of d17437015. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26580 Differential Revision: D17517341 Pulled By: llyfacebook fbshipit-source-id: db726596aa8f7c992c5a7ddc2781dc3aa0312284	2019-09-21 11:10:26 -07:00
Sebastian Messmer	fcfca9ad62	Skip some fragile tests (#26599 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26599 These fail due to tolerance in equality comparison. Disable them for now. ghstack-source-id: 90553855 Test Plan: unit tests Differential Revision: D17517085 fbshipit-source-id: a4d9278e356318719ccd84047404915a97944f52	2019-09-21 11:06:42 -07:00
Jerry Zhang	2e82ee0335	quantize_linear_per_channel -> quantize_per_channel (#26575 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26575 To keep consistent with `quantize_per_tensor` we also rename `quantize_linear_per_channel` to `quantize_per_channel` Test Plan: ci Imported from OSS Differential Revision: D17517360 fbshipit-source-id: 3af7d8f0fbe99148b79fcb1ad2fe811f776590cd	2019-09-21 11:02:17 -07:00
Jerry Zhang	2667493f4c	Expose supportedQEngines to python (#26474 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26474 att Test Plan: python test/test_torch.py Imported from OSS Differential Revision: D17517373 fbshipit-source-id: af931761d6ee31a88808d05f686002a83b6b25af	2019-09-21 10:36:13 -07:00
Pavel Belevich	d117842e56	C++ API parity: at::Tensor::version Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26561 Test Plan: Imported from OSS Differential Revision: D17507167 Pulled By: pbelevich fbshipit-source-id: 167890c7b745acc9cb9ce4185f1d8c1745aaecc2	2019-09-21 08:37:46 -07:00
Sebastian Messmer	aa78523467	Fix CI (#26593 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26593 This broke due to a merge conflict between my diffs and ezyang's multi dispatch diff being reverted and then relanded. ghstack-source-id: 90549856 Test Plan: unit tests Differential Revision: D17515837 fbshipit-source-id: c0bfd5f159ee4de80035079a1a2f39d5beafec41	2019-09-21 01:06:11 -07:00
Jerry Zhang	d09d1d9aac	Add inplace argument to InsertObservers and InsertQuantDeQuant (#26389 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26389 att Test Plan: . Imported from OSS Differential Revision: D17504458 fbshipit-source-id: a1a5c908eabf270c1e8d2098532ffc46978a240c	2019-09-20 22:43:29 -07:00
Jerry Zhang	1bec8d7a15	Get scalar type from observer module (#26425 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26425 Currently the scalar type is hardcoded for weight and normal tensor but what we want is to get it from corresponding observer module Test Plan: there are some known issues right now, will test e2e later when all the issues are fixed Imported from OSS Differential Revision: D17504459 fbshipit-source-id: f5a21789c2ebeb60bff4acc777db80170063c9f8	2019-09-20 22:19:18 -07:00
Jerry Zhang	254122dd4e	quantize_linear -> quantize_per_tensor (#26574 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26574 Since we also have `quantized::linear`, `quantize_linear` sounds confusing, so we plan to rename it before the branch cut Test Plan: ci Imported from OSS Differential Revision: D17514876 fbshipit-source-id: 01d9005e6ec8cb9950b9d8bba122109c389641d3	2019-09-20 21:58:48 -07:00
Yinghai Lu	7e6a55e417	Add DimType info in dumped debug nets (#26589 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26589 Just for better debugging purpose. Test Plan: Dump the net and check the dim type info is in the pb_txt. Reviewed By: dreamingleo Differential Revision: D17505931 fbshipit-source-id: ceba4c3849eb271c22227fa07a05d5bcb07344a5	2019-09-20 21:46:19 -07:00
Mikhail Zolotukhin	1d2fb8d1a6	Compiler warnings cleanup for quantization.cpp. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26585 Test Plan: Imported from OSS Differential Revision: D17514739 Pulled By: ZolotukhinM fbshipit-source-id: a666564aad9ca8837b592d285da22701a4bf76df	2019-09-20 21:06:42 -07:00
Sebastian Messmer	5016796089	Enable creation of boxing wrappers for some aten operators (#26273 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26273 [namedtensor ci] ghstack-source-id: 90459908 Test Plan: unit tests Differential Revision: D17393318 fbshipit-source-id: 1831da121ca7c64d1148b34c88a57bdc16c9fddf	2019-09-20 20:44:45 -07:00
Hong Xu	9ed6074827	Correct the test of a big number (2 ^ 31) (#26491 ) Summary: 2 ^ 31 is 29, which is not a big number. Corrected to 2 ** 31. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26491 Differential Revision: D17494296 fbshipit-source-id: 83d320e8fb6d1b7df41e4474933a98107c8e4129	2019-09-20 19:14:55 -07:00
Sebastian Messmer	8f68a7f241	Add two levels to use_c10_dispatcher (#26272 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26272 ``` use_c10_dispatcher: 'unboxed_only' ``` This is the previous implementation. The operator is registered with c10, but only in its unboxed form. No boxing wrapper is generated. ``` use_c10_dispatcher: 'full' ``` This does everything done by 'unboxed_only', but additionally creates a boxing wrapper so the op can be called through the c10 dispatcher using a boxed operator call. This only changes registration, not the calling path. These operators are still called through the unboxed function pointer. The final goal is to have 'full' for all operators, but this isn't immediately going to work for all ops. [namedtensor ci] ghstack-source-id: 90459907 Test Plan: unit tests Differential Revision: D17393317 fbshipit-source-id: d629edfb3baede8c4ac869aa1886e512782ed2aa	2019-09-20 18:55:29 -07:00
Sebastian Messmer	ed207b53ab	c10::KernelFunction (#26337 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26337 - Factor out boxing and unboxing functionality from the c10 dispatcher into a c10::KernelFunction class - Move that class and everything else it depends on into ATen/core/boxing - This also allows us to get rid of c10::KernelCache. Instead, we now store a pointer to the unboxed functor in c10::KernelFunction. - We're also getting rid of the DispatchTableEntry struct and instead store KernelFunction directly. - To make this work, we need to change the dispatcher calling API from Dispatcher::lookup().callBoxed/callUnboxed and OperatorEntry::lookup().callBoxed/callUnboxed to Dispatcher::callBoxed/callUnboxed and OperatorEntry::callBoxed/callUnboxed. ghstack-source-id: 90459911 Test Plan: unit tests Differential Revision: D17416607 fbshipit-source-id: fd221f1d70eb3f1b4d33092eaa7e37d25684c934	2019-09-20 18:55:25 -07:00
Jiakai Liu	8f54d0d6b6	update android/iOS build library packing (#26565 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26565 For OSS mobile build we should keep QNNPACK off and PYTORCH_QNNPACK on as we don't include caffe2 ops that use third_party/QNNPACK. Update android/iOS build script to include new libraries accordingly. Test Plan: - CI build Differential Revision: D17508918 Pulled By: ljk53 fbshipit-source-id: 0483d45646d4d503b4e5c1d483e4df72cffc6c68	2019-09-20 17:48:15 -07:00
Ivan Kobzarev	f7ba68e1f7	Support IValue string type (#26517 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26517 Support IValue string kind added 2 instrumented tests -> regenerated test.pt # Test plan Start android emulator ``` cd ./android/ gradle pytorch_android:cAT ``` tests passed # Nits Moved method IValue#getBool() - to have an order: bool, long, double, string Test Plan: Imported from OSS Differential Revision: D17513683 Pulled By: IvanKobzarev fbshipit-source-id: d328f25772b61f54fb6fd3b2afacde3d7372f25c	2019-09-20 17:29:42 -07:00
Satendra Gera	b401e9d8e0	Corrected variable name and added test (#26503 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26503 [pytorch] [distributed] Corrected variable name and added test ghstack-source-id: 90454793 Test Plan: Made sure pg based UT works. Differential Revision: D17488846 fbshipit-source-id: 6e6cba110a6f61ee1af3d37c5a41c69701de1a8b	2019-09-20 17:18:17 -07:00
Supriya Rao	516cf051ee	Revert D17504331: Unify Quantization APIs for add, pool and relu Test Plan: revert-hammer Differential Revision: D17504331 Original commit changeset: 35cb2189067a fbshipit-source-id: d433288f1dbb430d647c6694b3e3ad4276787c3b	2019-09-20 17:13:01 -07:00
Jiakai Liu	d6e3aed032	add eigen blas for mobile build (#26508 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26508 Enable BLAS for pytorch mobile build using Eigen BLAS. It's not most juicy optimization for typical mobile CV models as we are already using NNPACK/QNNPACK for most ops there. But it's nice to have good fallback implementation for other ops. Test Plan: - Create a simple matrix multiplication script model: ``` import torch class Net(torch.nn.Module): def __init__(self): super(Net, self).__init__() self.weights = torch.ones(1000, 1000) def forward(self, x): return torch.mm(x, self.weights) n = Net() module = torch.jit.trace_module(n, {'forward': torch.ones(1000, 1000)}) module.save('mm.pk') ``` - Before integrate with eigen blas: ``` adb shell 'cd /data/local/tmp; \ ./speed_benchmark_torch \ --model=mm.pk \ --input_dims="1000,1000" \ --input_type=float \ --warmup=5 \ --iter=5' Milliseconds per iter: 2218.52. ``` - After integrate with eigen blas: ``` adb shell 'cd /data/local/tmp; \ ./speed_benchmark_torch_eigen \ --model=mm.pk \ --input_dims="1000,1000" \ --input_type=float \ --warmup=5 \ --iter=5' Milliseconds per iter: 314.535. ``` - Improve MobileNetV2 single thread perf by ~5%: ``` adb shell 'cd /data/local/tmp; \ ./speed_benchmark_torch \ --model=mobilenetv2.pk \ --input_dims="1,3,224,224" \ --input_type=float \ --warmup=5 \ --iter=20 \ --print_output=false \ --caffe2_threadpool_force_inline=true' Milliseconds per iter: 367.055. adb shell 'cd /data/local/tmp; \ ./speed_benchmark_torch_eigen \ --model=mobilenetv2.pk \ --input_dims="1,3,224,224" \ --input_type=float \ --warmup=5 \ --iter=20 \ --print_output=false \ --caffe2_threadpool_force_inline=true' Milliseconds per iter: 348.77. ``` Differential Revision: D17489587 fbshipit-source-id: efe542db810a900f680da7ec7e60f215f58db66e	2019-09-20 15:45:11 -07:00
Jiakai Liu	6fcbc37753	improve how pytorch_android cmake imports static lib (#26525 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26525 Create a util function to avoid boilerplate code as we are adding more libraries. Test Plan: - build CI; Differential Revision: D17495394 Pulled By: ljk53 fbshipit-source-id: 9e19f96ede4867bdff5157424fa68b71e6cff8bf	2019-09-20 15:45:06 -07:00
Lingyi Liu	f0b7132b87	Revert D17437015: [pytorch][PR] Add the quantized average_pool2d support and adaptive_avg_pool2d support Test Plan: revert-hammer Differential Revision: D17437015 Original commit changeset: 496aed1e4171 fbshipit-source-id: 53e22a85e06bd9d7827579b124b7f136230b6c1d	2019-09-20 15:01:49 -07:00
Supriya Rao	f337459619	Unify Quantization APIs for add, pool and relu (#26335 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26335 Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Imported from OSS Differential Revision: D17504331 fbshipit-source-id: 35cb2189067ac5cc6a7307179ef0335d1cec7b8f	2019-09-20 14:58:35 -07:00
Lingyi Liu	11f9fe2433	Fix the API for record observer (#26413 ) Summary: Mainly want to resolve comments from https://github.com/pytorch/pytorch/pull/25830. Overall, we want to provide a recording observer for recording the runtime tensor values of activation path in order to debug the numerical accuracy loss offline. According to the feedback from https://github.com/pytorch/pytorch/issues/25830, it might be better to record all the observers in a dict and query the dict to get corresponding tensor values. hx89 is working on how to insert the recording observers into model under debug. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26413 Differential Revision: D17506502 Pulled By: llyfacebook fbshipit-source-id: 3ab90dc78920e7ec3fa572c2a07327a9991c530a	2019-09-20 14:27:56 -07:00
Lingyi Liu	6411b92d6e	Add the quantized average_pool2d support and adaptive_avg_pool2d support (#25899 ) Summary: //copied from PR https://github.com/pytorch/pytorch/issues/25676 ===============For avg_pool2d============== import torch, time for dtype in [torch.qint8, torch.quint8, torch.qint32]: print('**', str(dtype), '**') x = torch.rand(1, 56, 56, 256) q_x = torch.quantize_linear(x, 0.5, 1, dtype) q_x = q_x.permute([0, 3, 1, 2]) x = x.permute([0, 3, 1, 2]) NITER = 100 s = time.time() for i in range(NITER): float_out = torch.nn.functional.avg_pool2d(x, kernel_size=3, stride=None, padding=0) time_per_iter_float = (time.time() - s) / NITER s = time.time() for i in range(NITER): quant_out = torch.nn.quantized.functional.avg_pool2d(q_x, kernel_size=3, stride=None, padding=0) time_per_iter_quant = (time.time() - s) / NITER ref_quantized = torch.quantize_linear(float_out, 0.5, 1, dtype) torch.testing.assert_allclose(ref_quantized.dequantize(), quant_out.dequantize()) print('time/iter ms (float)', 'time/iter ms (quant)', 'quant/float', sep='\t') print(time_per_iter_float 1000, time_per_iter_quant * 1000, time_per_iter_quant / time_per_iter_float, sep='\t') bytes_float = (x.numel() + float_out.numel()) * x.element_size() bytes_quant = (q_x.numel() + quant_out.numel()) * q_x.element_size() float_bw_gbps = bytes_float / time_per_iter_float / 1e9 quant_bw_gbps = bytes_quant / time_per_iter_quant / 1e9 print('GB/s float', 'GB/s quant', sep='\t') print(float_bw_gbps, quant_bw_gbps, sep='\t') Before the vectorization: ** torch.qint8 * time/iter ms (float) time/iter ms (quant) quant/float 2.67439603805542 7.126874923706055 2.6648539791017924 GB/s float GB/s quant 1.2470733401269298 0.11699265230915809 torch.quint8 * time/iter ms (float) time/iter ms (quant) quant/float 2.587001323699951 7.011299133300781 2.7102031487456535 GB/s float GB/s quant 1.2892022781148076 0.11892118481150399 torch.qint32 * time/iter ms (float) time/iter ms (quant) quant/float 2.6659250259399414 7.03080415725708 2.637285028215745 GB/s float GB/s quant 1.2510359321992184 0.4743650833393638 After the vectorization torch.qint8 * time/iter ms (float) time/iter ms (quant) quant/float 2.6113319396972656 0.5631613731384277 0.2156605847679846 GB/s float GB/s quant 1.2771903676047593 1.48055608884072 torch.quint8 * time/iter ms (float) time/iter ms (quant) quant/float 2.5221967697143555 0.5518221855163574 0.21878633425529784 GB/s float GB/s quant 1.322326647963202 1.5109794819499591 torch.qint32 * time/iter ms (float) time/iter ms (quant) quant/float 2.5173258781433105 4.0132904052734375 1.5942673295177407 GB/s float GB/s quant 1.324885279636461 0.8310308159154421 ===============For adaptive_avg_pool2d============== import torch, time for dtype in [torch.qint8, torch.quint8, torch.qint32]: print('', str(dtype), '**') x = torch.rand(1, 56, 56, 256) q_x = torch.quantize_linear(x, 0.5, 1, dtype) q_x = q_x.permute([0, 3, 1, 2]) x = x.permute([0, 3, 1, 2]) NITER = 100 s = time.time() for i in range(NITER): float_out = torch.nn.functional.adaptive_avg_pool2d(x, output_size=5) time_per_iter_float = (time.time() - s) / NITER s = time.time() for i in range(NITER): quant_out = torch.nn.quantized.functional.adaptive_avg_pool2d(q_x, output_size=5) time_per_iter_quant = (time.time() - s) / NITER ref_quantized = torch.quantize_linear(float_out, 0.5, 1, dtype) torch.testing.assert_allclose(ref_quantized.dequantize(), quant_out.dequantize()) print('time/iter ms (float)', 'time/iter ms (quant)', 'quant/float', sep='\t') print(time_per_iter_float 1000, time_per_iter_quant * 1000, time_per_iter_quant / time_per_iter_float, sep='\t') bytes_float = (x.numel() + float_out.numel()) * x.element_size() bytes_quant = (q_x.numel() + quant_out.numel()) * q_x.element_size() float_bw_gbps = bytes_float / time_per_iter_float / 1e9 quant_bw_gbps = bytes_quant / time_per_iter_quant / 1e9 print('GB/s float', 'GB/s quant', sep='\t') print(float_bw_gbps, quant_bw_gbps, sep='\t') ~ //Before the vectorization ** torch.qint8 * time/iter ms (float) time/iter ms (quant) quant/float 2.286238670349121 4.600362777709961 2.0121970804594342 GB/s float GB/s quant 1.4158031888707898 0.17590264922602994 torch.quint8 * time/iter ms (float) time/iter ms (quant) quant/float 2.2867274284362793 4.474163055419922 1.9565790831832832 GB/s float GB/s quant 1.4155005794518536 0.180864217503144 torch.qint32 * time/iter ms (float) time/iter ms (quant) quant/float 2.3176145553588867 4.264359474182129 1.8399778618588218 GB/s float GB/s quant 1.3966360335956578 0.7590504551966285 //After the vectorization: torch.qint8 * time/iter ms (float) time/iter ms (quant) quant/float 2.3224568367004395 0.23195743560791016 0.09987588657942796 GB/s float GB/s quant 1.3937240722194333 3.4886400510473843 torch.quint8 * time/iter ms (float) time/iter ms (quant) quant/float 2.255082130432129 0.2124309539794922 0.09420098324258604 GB/s float GB/s quant 1.435364129899667 3.8093130254365883 torch.qint32 *** time/iter ms (float) time/iter ms (quant) quant/float 2.266514301300049 1.6029787063598633 0.7072440290539581 GB/s float GB/s quant 1.4281242338260862 2.0192807222938463 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25899 Differential Revision: D17437015 Pulled By: llyfacebook fbshipit-source-id: 496aed1e41711048d0853254d6819d3fb141a0c0	2019-09-20 14:20:16 -07:00
Richard Zou	87f80ff8ea	Support torch.pow with named tensors (#26541 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26541 `torch.pow` already supports named tensors; every one of its constituent codepaths propagates names: - TensorIterator propagates names - resize_as_ and fill_ propagate names (exponent == 0 or base == 1) - resize_as_ and copy_ propagate names (exponent == 1) This PR adds `supports_named_tensor = True` to the pow overloads, enabling `pow` to take named tensors. Test Plan: - [namedtensor ci] Differential Revision: D17501402 Pulled By: zou3519 fbshipit-source-id: 07ee91d685e55dd58bbbb3a3fc9e185de8bb7515	2019-09-20 14:15:03 -07:00
Richard Zou	98b5b6fc13	Implement resize_, resize_as_ for named tensors (#26493 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26493 resize_ and resize_as_ are low level functions that are not meant to be used as a part of the regular PyTorch user's routine. However, they are used to implement a lot of our operations: `out=` functionality is implemented by resizing an output to be the correct size. To keep in line with already implemented `out=` functionality, we do the following: - resize_as_(self, other) propagates names according to `out=` functionality. This means that if self doesn't have names, then we propagate other.names. If self does have names, they must be equal to other.names. In addition, resize_ cannot resize a named tensor to anything but the same size. Test Plan: - [namedtensor ci] Differential Revision: D17501404 Pulled By: zou3519 fbshipit-source-id: e396e7fba55e1419355933925226d02dccb03012	2019-09-20 14:14:59 -07:00
Rajan Singh	916eee182c	Fix for Conv shape check prints overflowed ints (#25827 ) Summary: Fix for issue https://github.com/pytorch/pytorch/issues/19947 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25827 Differential Revision: D17508653 Pulled By: soumith fbshipit-source-id: 1afec60b9b39de5f2d0be44a170650aa4c1879cf	2019-09-20 14:11:47 -07:00
Jiakai Liu	9f4174c496	expose USE_STATIC_DISPATCH macro to public headers Summary: USE_STATIC_DISPATCH needs to be exposed as we don't hide header files containing it for iOS (yet). Otherwise it's error-prone to request all external projects to set the macro correctly on their own. Also remove redundant USE_STATIC_DISPATCH definition from other places. Test Plan: - build android gradle to confirm linker can still strip out dead code; - integrate with demo app to confirm inference can run without problem; Differential Revision: D17484260 Pulled By: ljk53 fbshipit-source-id: 653f597acb2583761b723eff8026d77518007533	2019-09-20 14:01:49 -07:00
Xing Wang	73ae23a4ea	add support for real4bits quant (#25426 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25426 Add embedding table 4bit quantization support. * add the conversion from fp32 to int4. * using brew to pass the context so that the 4bit operators are added when generating the predictor net. Reviewed By: kennyhorror, chocjy Differential Revision: D16859892 fbshipit-source-id: a06c3f0b56a7eabf9ca4a2b2cb6c63735030d70b	2019-09-20 13:45:23 -07:00
BowenBao	1a114948ce	Fix jit/pass/peephole.cpp fuse addmm (#26357 ) Summary: Fix https://github.com/pytorch/pytorch/issues/26328. Reversing the order of inserting nodes. Previously the IR graph looks like ``` graph(%0 : Float(3, 3)): %5 : Float(3, 3) = aten::addmm(%0, %0, %0, %6, %6) %6 : int = prim::Constant[value=1]() return (%5) ``` where %6 is used before created. Now ``` graph(%0 : Float(3, 3)): %5 : int = prim::Constant[value=1]() %6 : Float(3, 3) = aten::addmm(%0, %0, %0, %5, %5) return (%6) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/26357 Reviewed By: hl475 Differential Revision: D17463945 Pulled By: houseroad fbshipit-source-id: 4f483c2bc004a4a88f0976a7b37d7994d97ba41a	2019-09-20 13:32:03 -07:00
Supriya Rao	8c4b7a1b4b	Changes to support int8 weight and fp32 bias in QNNPACK (#26307 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26307 Add support for FP32 bias. Re-quantize bias during time time based on input scale. If the value of input scale changes in the packed struct we requantize the bias with the updated input scale. Test Plan: python test/test_quantized.py TestQNNPackOps Differential Revision: D17504253 Pulled By: supriyar fbshipit-source-id: 49fe36a0bee91aaeb085db28eec4ded8c684dcf4	2019-09-20 13:17:56 -07:00
Vitaly Fedyunin	f55a9da00e	Move the CUDA implementation of floor to ATen. (#25372 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25372 Close #24617 Test Plan: Imported from OSS Differential Revision: D17397478 fbshipit-source-id: 11a515235391ae796e2f84cde1913e56561c41bc	2019-09-20 13:15:29 -07:00
Hong Xu	71ec9a0035	Clarify and correct the doc of atan2. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26180 Reviewed By: ezyang Differential Revision: D17500224 Pulled By: albanD fbshipit-source-id: 98b9f32aa443963fe1e89b83e15bed9ff83a2694	2019-09-20 12:58:12 -07:00
Will Feng	da8fbe5bf0	Minor improvement to C++ nn::Distance tests (#26539 ) Summary: C++ `nn::Distance` tests can take advantage of the newly released multi-dimensional tensor constructor https://github.com/pytorch/pytorch/pull/26210 to simplify the tensor constructions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26539 Differential Revision: D17501041 Pulled By: yf225 fbshipit-source-id: 21d5f95ab3ec02227115c823c581218cee2ce458	2019-09-20 12:40:52 -07:00
Edward Yang	a5bcde97af	Revert D17427577: C++ API parity: at::Tensor::version Test Plan: revert-hammer Differential Revision: D17427577 Original commit changeset: e9b3e76ca44d fbshipit-source-id: a5bbae208ba33a31f90ab5c9b199f232de0c6d1b	2019-09-20 11:19:43 -07:00
Edward Yang	b59e856517	Revert D17486465: [jit] Make `is_optional` check more robust Test Plan: revert-hammer Differential Revision: D17486465 Original commit changeset: c513cef3bbc0 fbshipit-source-id: 567311c001d7dd0b7ab9ffe8bb894954bea583c9	2019-09-20 11:06:19 -07:00
Pavel Belevich	198521978b	C++ API parity: at::Tensor::version Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26217 Test Plan: Imported from OSS Differential Revision: D17427577 Pulled By: pbelevich fbshipit-source-id: e9b3e76ca44df883e3038b688dd7b930752d93a2	2019-09-20 11:02:41 -07:00
Richard Zou	30fc011b9e	Refactor Dimname.h API to be nicer (#26366 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26366 Changes: - `NameType::NORMAL` -> `NameType::BASIC` - `Dimname::is_wildcard` -> `Dimname::isWildcard()` - `Dimname::is_normal` -> `Dimname::isBasic()`. - `at::is_valid_identifier` -> `Dimname::isValidName(string)` - `at::match`, `at::unify` are now methods on `Dimname`. I am adopting CamelCase for struct members of a named tensor related struct. Test Plan: - [namedtensor ci] Differential Revision: D17484757 Pulled By: zou3519 fbshipit-source-id: 21c128e5025e81513e14d34506a7d7744caefdc2	2019-09-20 10:59:49 -07:00
Richard Zou	6703587156	Delete tagged names Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26365 Test Plan: - [namedtensor ci] Differential Revision: D17484759 Pulled By: zou3519 fbshipit-source-id: 44068c1e9d84adf36c5ab5e7006a153b948914d6	2019-09-20 10:59:45 -07:00
Richard Zou	858cf76ef7	Disable tagged names (#26479 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26479 This PR doesn't delete the code for them yet because it takes some effort to determine what to delete. I will send a followup PR fully deleting tagged names, but this PR disables their creation. Test Plan: - [namedtensor ci] Differential Revision: D17484758 Pulled By: zou3519 fbshipit-source-id: 451409e36eac98ffee1b98884d0f675bb5d46c9d	2019-09-20 10:59:41 -07:00
Will Feng	49777e6730	Fix options usage in C++ module / optimizer constructors (#26483 ) Summary: With this PR, we establish the following conventions: 1. Options in C++ module / optimizer constructors should always be `const SomeOptions&` type, not `SomeOptions` type. 2. The options constructor arg should always be named `options_`, not `options`, to not be confused with the module / optimizer's internal field `options`. 3. We never use `std::move` to assign `options_` to the module / optimizer's internal field `options` in the constructor definition. Instead, we simply use `options(options_)`. Here is the reasoning: We might be tempted to declare the constructor as `SomeModule(SomeOptions options_)` and have `options(std::move(options_))` in the member initialization list. However, this can be a dangerous design because the constructor might use `options_` to set values for other member fields in the member initialization list (e.g. `8317f75b79/torch/csrc/api/include/torch/optim/lbfgs.h (L30-L34)`), and use-after-move can cause hard-to-debug problems. Instead, we choose to explicitly use `const SomeOptions&` type for `options_`, and never use `std::move` to assign it to the internal `options` field. This way we have stronger guarantee on the validity of `options_` at any point in the constructor. Notable exceptions to the above conventions: 1. C++ Embedding module doesn't adhere to the conventions now, which will be fixed after https://github.com/pytorch/pytorch/pull/26358 is landed. 2. C++ dataloader and dataset classes likely need similar changes. We will do it when we start to work on dataloader/dataset parity. Thanks ShahriarSS for discovering the options usage inconsistency! 🚀 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26483 Differential Revision: D17500451 Pulled By: yf225 fbshipit-source-id: 49361a3519e4ede933789db75731d40144f0b617	2019-09-20 10:56:19 -07:00
davidriazati	4c40dbcb75	Resolve NamedTuple types in Python (#26443 ) Summary: When used as annotations on Python functions, `NamedTuple`s go through our Python annotation -> type mapping which previously had no way of lookup up `NamedTuple`s (which are created lazily by checking if the type has certain properties, so the lookup is creating the `TupleType` from scratch). This PR threads through the necessary data to make them work. Fixes #26437 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26443 Pulled By: driazati Differential Revision: D17486441 fbshipit-source-id: a6bbb543ff05a5abe61f1a7f68db9ecdb652b358	2019-09-20 10:53:25 -07:00
davidriazati	9a5b784eac	Make `is_optional` check more robust (#26312 ) Summary: If the `Union` contains a non-class type, `issubclass` would fail, this adds a check for that case Pull Request resolved: https://github.com/pytorch/pytorch/pull/26312 Pulled By: driazati Differential Revision: D17486465 fbshipit-source-id: c513cef3bbc038f15c021eb0c1bf36be0df1eb90	2019-09-20 10:50:00 -07:00
Jerry Zhang	4444b91141	Fix quantized::conv2d patterns in QuantFusion (#26515 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26515 Fix patterns of `prepack` and `permute` after recent changes to `quantized::conv2d` and `quantized::conv2d_prepack` Test Plan: python test/test_jit.py 'TestJit.test_quant_fusion' Imported from OSS Differential Revision: D17502573 fbshipit-source-id: 1a719fd610e8ea9dc16075abaa042556e1edbceb	2019-09-20 10:40:44 -07:00
Rohan Varma	efd933dd01	use timeout in connect function to prevent against (#26364 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26364 Per https://github.com/pytorch/pytorch/issues/25769, we sometimes get an infinite loop when `TCPStore` calls `tcputil::connect`, and the server continually returns `ECONNRESET` or `ECONNREFUSED`. If a proper timeout is passed in, we guard against this by throwing an exception once the timeout has passed. Testing: Tested with modifying `TCPStore` to connect to an invalid port, thus getting `ECONNREFUSED`. If a valid timeout is passed in, the function correctly throws an exception. Steps below: 1) in TCPStore.cpp's constructor, replace the `connect` call with this line: `storeSocket_ = tcputil::connect(tcpStoreAddr_, 1, true, std::chrono::milliseconds(3000));` 2) Build the `TCPStoreTest` binary. 3) Run the binary. Expected output: ``` terminate called after throwing an instance of 'std::runtime_error' what(): Connecting to TCP store timed out. Aborted (core dumped) ``` ghstack-source-id: 90480086 Test Plan: See above. Differential Revision: D17430164 fbshipit-source-id: 1482aca72fcc3ddb95ea25649ec057edda5d1934	2019-09-20 10:28:30 -07:00
Edward Yang	9b7011c5c2	Implement multiple dispatch (#26468 ) (#26501 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26501 Instead of considering only the TensorTypeSet of the first argument, we collect all Tensor and TensorList arguments and union them together before computing the dispatch type id. XLA companion patch at https://github.com/pytorch/xla/pull/1031 Billing of changes: * ATenDispatch fallback code (i.e., what gets run if there is no entry for a function in the table) now lives out-of-line in a function `getFallbackOp`. This gave me an opportunity to write a more detailed error message, providing information about what registrations were available. There is a TODO in the fallback code, suggesting that we could automatically redispatch in the event that there is no handler for the key. But this is a bit of a design question, because it's not clear if automatic redispatch would cover up errors in the dispatch table (i.e., there should have been something registered at some key, but there wasn't.) * Collection of Tensor/TensorList arguments is done using the trusty old IterArgs helper class. A minor bit of refactoring I had to do to get here was move the IterArgs functionality in torch/csrc/utils/variadic.h into ATen/core. There's some refactoring due on that file too (it has copies of some C++ helper pieces which already live in c10--you can't actually move the whole thing because it is literally incompatible with other code in the codebase). So instead of calling `type_set()` to get the type set of the dispatch argument, now we just call `at::detail::multi_dispatch_tensor_type_set` on all of the tensor/tensor list arguments. * The code generator is adjusted to codegen collection of arguments as needed. There is a little bit of a hack in the code generator to turn 'self' arguments into 'this'. I think this may be duplicated with some logic somewhere else but I have to double check. The new generated code looks like this: ``` inline Tensor & Tensor::copy_(const Tensor & src, bool non_blocking) const { static auto table = globalATenDispatch().getOpTable("aten::copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> Tensor(a!)"); return table->getOp<Tensor & (Tensor &, const Tensor &, bool)>(at::detail::multi_dispatch_tensor_type_set(this, src))(const_cast<Tensor&>(this), src, non_blocking); } ``` The key difference is that previously we wrote `type_set()` as argument to getOp; now it is a call to `multi_dispatch_tensor_type_set` which collects the type ids together. After turning on multi-dispatch, I had to refactor existing code which previously dispatched one place, but now dispatches somewhere else. The primary component affected by this is sparse. Binary operations (add/sub/mul/div/addmm) now dispatch to sparse kernels even if you did add(dense, sparse). So I delete all the sparse handling code from dense kernels, and bulk up the sparse error handling to handle when the first argument is dense. In the case of addmm, I can eliminate the bridge code entirely (well, not quite: more on this below). I also updated the dispatch on sparse to actually point at sparse kernels. Pay special attention to the handling of `div_` by scalar: previously this logic lived in the "dense" `div_` implementation, but there is actually not any sparse kernel we dispatch to. I solved this particular problem by making a redispatch, but another valid approach would have been to add specific dispatches for sparse div on scalar. This codepath is poorly tested because it is only exercised from C++. * One minor annoyance is that because I now want separate dispatch for dense and sparse, I also need to replicate the `add`, `add_`, `add_out` trifecta on the sparse side. I opted for a compromise here: I wrote new a new `add_sparse` trifecta, but reused the implementation between CPU and CUDA. This means that I hav to do another dispatch once I get to `add_out`. The alternative would have been to do twice as many copies for CPU and CUDA (thereby eliminating the extra dispatch) but that seemed distinctly not worth it. * A lot of kernels in sparse assumed that the dispatch argument must be sparse. This is no longer true with dispatch, so I converted the asserts into plain error checking. This also means that we've perturbed the error message in the case of TestSparseOneOff.test_cuda_sparse_cpu_dense_add (I just updated the saved error message) * `addmm` is a little bit even more special: the bridge code also handled broadcasting. I replicated the broadcasting logic between CPU and CUDA implementations to avoid an extra dispatch. * `_sparse_addmm` gave me a bit of trouble, because I had forgotten why we had `torch.sparse.addmm` in the first place. But in the end, its changes followed along with the structural changes I made in addmm. I opted for an extra dispatch here for simplicity. * c10d has some Variable-Tensor confusion in its sparse code. I've worked around it by judiciously inserting "no variable type" guards, but a more correct fix would be to just solve the confusion entirely. Benchmark: Apply the following patch to the base commit and this commit: ``` diff --git a/aten/src/ATen/native/Const.cpp b/aten/src/ATen/native/Const.cpp new file mode 100644 index 0000000000..b66f4d3ece --- /dev/null +++ b/aten/src/ATen/native/Const.cpp @@ -0,0 +1,10 @@ +#include <ATen/ATen.h> + +namespace at { +namespace native { + +Tensor _const5(const Tensor& self, const Tensor& second, const Tensor& third, const Tensor& fourth, const Tensor& fifth) { + return self; +} + +}} // namespace at::native diff --git a/aten/src/ATen/native/native_functions.yaml b/aten/src/ATen/native/native_functions.yaml index b494ed7950..fddae638bb 100644 --- a/aten/src/ATen/native/native_functions.yaml +++ b/aten/src/ATen/native/native_functions.yaml @@ -5878,3 +5878,9 @@ dispatch: CPU: im2col_backward_cpu CUDA: im2col_backward_cuda + +# For benchmarking +- func: _const5(Tensor self, Tensor second, Tensor third, Tensor fourth, Tensor fifth) -> Tensor + variants: function + dispatch: + CPU: _const5 ``` Comparisons with timeit: One-argument, representative case: Before: ``` In [6]: %timeit x.reshape(1, 1) 1.46 µs ± 1.38 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [7]: %timeit x.reshape(1, 1) 1.48 µs ± 29.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [8]: %timeit x.reshape(1, 1) 1.52 µs ± 61.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` After: ``` In [3]: %timeit x.reshape(1, 1) 1.42 µs ± 1.34 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [4]: %timeit x.reshape(1, 1) 1.43 µs ± 1.01 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [5]: %timeit x.reshape(1, 1) 1.42 µs ± 0.982 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` Five-argument, synthetic case (we expect, with enough Tensor arguments, for there to be a slowdown, as we scale `O(n)` with number of arguments, compared to old dispatcher which is `O(1)` with number of arguments): Before: ``` In [1]: import torch In [2]: x = torch.zeros(1) In [3]: %timeit torch._const5(x, x, x, x, x) 949 ns ± 1.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [4]: %timeit torch._const5(x, x, x, x, x) 954 ns ± 1.96 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [5]: %timeit torch._const5(x, x, x, x, x) 947 ns ± 0.601 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` After: ``` In [3]: %timeit torch._const5(x, x, x, x, x) 985 ns ± 9.11 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [4]: %timeit torch._const5(x, x, x, x, x) 984 ns ± 1.17 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [5]: %timeit torch._const5(x, x, x, x, x) 988 ns ± 0.555 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D17499154 Pulled By: ezyang fbshipit-source-id: 8ea237c2e935134b0f4f8d6cfd89c6a93037c02c	2019-09-20 10:12:04 -07:00
Owen Anderson	74710f9b9f	Implement more size-oriented opcodes in the depickler. (#26454 ) Summary: These are intentionally not yet used by the encoder to avoid backcompat issues. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26454 Differential Revision: D17480844 fbshipit-source-id: e88ae7f5b94e32c7f12341a750aa4b9f7374bfb7	2019-09-20 09:42:17 -07:00
Mike Ruberry	60dd203a1d	Fixes test_wrapped_number (#26523 ) Summary: test_wrapped_number was calling torch.set_default_tensor_type('torch.FloatTensor'), which was setting the default tensor types for all following tests until a class boundary (with unittest) or until end of file (with pytest). Tests that don't expect the default tensor type to be set this way were then failing if run afterwards. This fixes the issue by copying the default_tensor_type decorator from test_nn and using that instead with test_wrapped_number. The decorator correctly resets the default tensor type after the test has run. This fixes the many errors encountered when running pytest test_jit.py. Note: test_wrapped_number was introduced in https://github.com/pytorch/pytorch/issues/22273. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26523 Differential Revision: D17495283 Pulled By: mruberry fbshipit-source-id: ab518c78b7706af7cb1c2d1c17823d311178996d	2019-09-20 09:39:00 -07:00
Satendra Gera	9ca901895f	Make distructor virtual for class with virtual function (#26504 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26504 [pytorch] [distributed] Make distructor virtual for class with virtual function Not having virtual distructor may lead to a memory leak. ghstack-source-id: 90454880 Test Plan: Made sure pg based UT works. Differential Revision: D17488876 fbshipit-source-id: 5fdc55e175fd2b22e931b740c36cb1feed454066	2019-09-20 09:36:29 -07:00
Richard Zou	e2515a4d6d	Allocate empty tensor instead of empty_like in binary ops, fix pow (#26498 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26498 We should allocate an empty tensor as a result tensor when performing binary ops. Currently some ops use `empty_like(self)` as the initial result tensor before passing it into TensorIterator. This is not very efficient because TensorIterator may resize the tensor due to broadcasting, causing more memory allocation. By using an empty tensor as the result tensor, we only need to allocate/resize memory once as opposed to twice. Also fixes https://github.com/pytorch/pytorch/issues/26495. The bug there is that the implementation of `pow` is missing a resize in one case. Test Plan: - new test - run tests Differential Revision: D17500025 Pulled By: zou3519 fbshipit-source-id: bff4949af5e75541c04669b961bcf2e1ec456faf	2019-09-20 07:38:08 -07:00
jon-tow	872ca919a9	Distance module (#26424 ) Summary: Adds `Distance` module parity. https://github.com/pytorch/pytorch/issues/25883 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26424 Differential Revision: D17487314 Pulled By: yf225 fbshipit-source-id: c7d124cb4afb08a4733e7212af0bb276bf32d172	2019-09-20 07:28:49 -07:00
Jianyu Huang	f433ee1499	Add the FP16 weight support for LSTM in dynamic_quantize (#25975 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25975 We would like to add the FP16 weight support for the dynamic quantized LSTM. Test Plan: buck test mode/dev caffe2/test:quantization -- 'test_quantized_rnn $test_quantization\.PostTrainingDynamicQuantTest$' --print-passing-details ``` [jianyuhuang@devvm794.ftw3.facebook.com: ~/fbsource/fbcode/caffe2/test] $ buck test mode/dev caffe2/test:quantization -- 'test_quantized_rnn $test_quantization\.PostTrainingDynamicQuantTest$' --print-passing-details Building: finished in 13.4 sec (100%) 8134/8134 jobs, 81 updated Total time: 13.9 sec Trace available for this run at /tmp/testpilot.20190910-210241.2092790.log TestPilot test runner for Facebook. See https://fburl.com/testpilot for details. Testpilot build revision c86e65add357582accb6ec0be23b92c8a2c510bd fbpkg ca46e8f5b26c451a8b0b2462c11bb61d at Mon Sep 9 22:16:37 2019 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/696/t.par Discovering tests Running 1 tests Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/1125900050322971 ✓ caffe2/test:quantization - test_quantized_rnn (test_quantization.PostTrainingDynamicQuantTest) 0.183 1/1 (passed) Test output: > test_quantized_rnn (test_quantization.PostTrainingDynamicQuantTest) ... ok > > ---------------------------------------------------------------------- > Ran 1 test in 0.184s > > OK Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/1125900050322971 Summary (total time 4.35s): PASS: 1 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 0 ``` Differential Revision: D17299116 fbshipit-source-id: 7fe91ece25867f2c0496f1b63fb1041e6b815166	2019-09-19 22:19:22 -07:00
Jiakai Liu	956b708437	turn off autograd mode in android JNI wrapper (#26477 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26477 - At inference time we need turn off autograd mode and turn on no-variable mode since we strip out these modules for inference-only mobile build. - Both flags are stored in thread-local variables so we cannot simply set them to false glboally. - Add "autograd/grad_mode.h" header to all-in-one header 'torch/script.h' to reduce friction for iOS engs who might need do this manually in their project. P.S. I tried to hide AutoNonVariableTypeMode in codegen but figured it's not very trivial (e.g. there are manually written part not covered by codegen). Might try it again later. Test Plan: - Integrate with Android demo app to confirm inference runs correctly. Differential Revision: D17484259 Pulled By: ljk53 fbshipit-source-id: 06887c8b527124aa0cc1530e8e14bb2361acef31	2019-09-19 21:25:39 -07:00
Bulent Abali	afa5d0823b	Fixes big endian arch bugs. (#26383 ) Summary: Serialization.cpp fails on big endian machines. This patch fixes the endian bugs and also makes the pytorch model files portable across different endian architectures. x86 generated model file can be read on s390 arch. First problem, is serialization.cpp forgets to convert "size" value of the storage elements to the native byte order. torch.load throws an assertion as a result (see the first stack trace below). Second problem is when it reads the model from storage (doRead) it decodes values to little endian which is the wrong order on a big endian machine. The decode should be to THP_nativeByteOrder() instead (see the model dump below) ```loaded_model = torch.load( opt.model_file, map_location=torch.device("cpu")) File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 422, in load return _load(f, map_location, pickle_module, **pickle_load_args) File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 616, in _load deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly) RuntimeError: storage has wrong size: expected 2305843009213693952 got 32 (the very long number is actually 32 in the wrong endianness) ``` Model file load on x86 (correct output) ```>>> import torch >>> torch.load('400f2k_best.model', map_location=torch.device("cpu")) {'epoch': 24, 'model_type': 'emb_aec', 'classifier_model': OrderedDict([('model.0.weight', tensor([[ 2.4608e-01, -1.1174e-01, -1.0854e-01, 4.0124e-01, -1.5261e-02, -1.2206e-01, 1.3229e-01, -1.2615e-01, -5.2773e-01, 2.6333e-01, -3.1462e-03, -1.4902e-01, 9.8545e-02, -1.5789e-01, -2.2625e-01, -1.0776e-01, -9.0895e-02, -3.8530e-01, 9.1152e-01, -3.9720e-01, -8.5848e-01, -4.7837e-02, -1.5178e-01, 8.5023e-02, 1.5013e-01, -9.9294e-02, -2.7422e-01, -4.3986e-01, -4.4297e-01, -3.9570e-01, ``` Model file load on s390x (wrong endianness; notice the exponents) ```>>> import torch >>> torch.load( "400f2k_best.model", map_location=torch.device("cpu")) {'epoch': 24, 'model_type': 'emb_aec', 'classifier_model': OrderedDict([('model.0.weight', tensor([[ 9.2780e+21, -9.7722e-11, 4.1350e+33, 7.782e+34, 4.2056e-31, 9.0784e+18, 1.1846e-32, 3.3320e-32, -4.8288e-28, -7.2679e+12, 1.5379e-16, -5.2604e+12, -4.7240e+17, 4.6092e-21, -1.8360e-20, -2.7712e-31, 1.4548e-16, -2.5089e-27, 7.9094e-10, 7.1977e+34, 1.1930e+26, 8.4536e+15, 2.7757e+23, -5.8455e-10, -1.5611e+09, -1.1311e-23, 6.6451e+19, -2.0970e+20, 3.4878e-19, -1.0857e-12, 7.8098e+22, 5.3998e-35], ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/26383 Differential Revision: D17480891 fbshipit-source-id: f40569c7b9c4a1935dceb41f1a2508ce21ea3491	2019-09-19 19:58:02 -07:00
Jerry Zhang	8f50ea0f5c	Add NoQEngine to QEngine and refactor the name of set/get qengine (#26471 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26471 att Test Plan: . Imported from OSS Differential Revision: D17491215 fbshipit-source-id: 5790aa0113bfdbeeb838f3d1530397606ccaa1e9	2019-09-19 17:42:09 -07:00
Will Feng	aad0263a6b	Support multidimensional inputs to torch::tensor (#26210 ) Summary: This PR adds support for multidimensional inputs to `torch::tensor`, to match the Python `torch.tensor` API. Closes https://github.com/pytorch/pytorch/issues/16099. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26210 Differential Revision: D17456761 Pulled By: yf225 fbshipit-source-id: a53ce74c535c13c5dcb833f19e9b6b79d12376b5	2019-09-19 17:37:55 -07:00
Ivan Kobzarev	436c60a854	javadocs for Tensor, IValue, Module (#26149 ) Summary: At the moment it includes https://github.com/pytorch/pytorch/pull/26219 changes. That PR is landing at the moment, afterwards this PR will contain only javadocs. Applied all dreiss comments from previous version. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26149 Differential Revision: D17490720 Pulled By: IvanKobzarev fbshipit-source-id: f340dee660d5ffe40c96b43af9312c09f85a000b	2019-09-19 16:50:43 -07:00
Elias Ellison	0f42881269	fix schema matching of tuples to vartype lists (#25944 ) Summary: In schema matching we allow a homogenous tuple to be matched to list arguments. This logic wasn't yet extended for vartype lists, causing stuff like `len((1, 2, 3))` to fail. Fix for https://github.com/pytorch/pytorch/issues/20500 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25944 Differential Revision: D17482510 Pulled By: eellison fbshipit-source-id: aa63318c27a01d965a7a7b68ce8bec638168dc26	2019-09-19 15:46:27 -07:00
Tao Xu	5f2c320840	Disable bitcode for iOS CI jobs (#26478 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26478 ### Summary Since QNNPACK [doesn't support bitcode](`7d2a4e9931/scripts/build-ios-arm64.sh (L40)`), I'm going to disable it in our CMake scripts. This won't hurt any existing functionalities, and will only affect the build size. Any application that wants to integrate our framework should turn off bitcode as well. ### Test plan - CI job works - LibTorch.a can be compiled and run on iOS devices Test Plan: Imported from OSS Differential Revision: D17489020 Pulled By: xta0 fbshipit-source-id: 950619b9317036cad0505d8a531fb8f5331dc81f	2019-09-19 15:38:57 -07:00
Natalia Gimelshein	e72b0be2e1	fix cdist gradient computation if first arg is 1xn (#26254 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/26076. mruberry if https://github.com/pytorch/pytorch/issues/26248 goes in soon, I'll rebase after it, otherwise this should go in because it's a bug fix. Side note: cdist backward testing is very light and I suspect is not testing all the code paths, but that's a separate issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26254 Test Plan: added test for the affected size to test_autograd.py. Streams are tested by existing tests. Differential Revision: D17480945 Pulled By: ngimel fbshipit-source-id: 0f18c9fd05e462d22c410a2ebddc2bcc9580582d	2019-09-19 15:28:49 -07:00
Elias Ellison	1f2fa8d4d8	Make jit dicts ordered (#26465 ) Summary: Makes c10::Dict Ordered and bins binds the OrderedDict() and dict() constructor into torchscript. For the case of the empty constructor dict() i typed it as [str, Tensor] because: • we're almost dropping support for python 2, at which point all dicts are ordered • then it's more conventional to write x : Dict[int, int] = {} which is already supported • It is possible to construct an arbitrarily typed empty OrderedDict through OrderedDict(torch.jit.annotate(List[Tuple[key, value]], []) We could consider dropping the no inputs aten::dict constructor since then the types would be more explicit. This replaces https://github.com/pytorch/pytorch/issues/26170 and https://github.com/pytorch/pytorch/pull/26372 b/c ghstack was poisioned and i had to resubmit Pull Request resolved: https://github.com/pytorch/pytorch/pull/26465 Differential Revision: D17481604 Pulled By: eellison fbshipit-source-id: d2d49795a518c3489881afac45d070e5262c5849	2019-09-19 15:09:02 -07:00
Will Feng	4f7848e520	Make c10::Scalar::to<T>() const (#26406 ) Summary: Since `c10::Scalar::to<T>()` is not an in-place operation, we should be able to make it const. This removes the need of using `const_cast` at https://github.com/pytorch/pytorch/pull/26210#discussion_r324880325. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26406 Differential Revision: D17452258 Pulled By: yf225 fbshipit-source-id: 26881e2861f0f1f46cc2d92cc02a467e1f7eaa64	2019-09-19 15:06:14 -07:00
Lu Fang	30e7665f55	Add a CI Job to Check BC Changes in Function Schemas (#26329 ) Summary: Ready for review, and results, please check https://circleci.com/gh/pytorch/pytorch/2827354?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link Also, an experimental PR stacked on this have caught bc-breaking changes introduced by it. https://github.com/pytorch/pytorch/pull/26398 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26329 Reviewed By: hl475 Differential Revision: D17485668 Pulled By: houseroad fbshipit-source-id: b10682f1785a20ea04521992e0973b1380b4dd3b	2019-09-19 14:56:21 -07:00
Michael Suo	5304358859	Revert D17481256: Implement multiple dispatch Test Plan: revert-hammer Differential Revision: D17481256 Original commit changeset: b3206936b4ca fbshipit-source-id: a162c42168c17e24b5eaff83a7aae48beef3d2c2	2019-09-19 14:53:40 -07:00
Will Feng	ce3d024727	Make `options.name_` private, and change all callsites to use `options.name()` (#26419 ) Summary: The implementation of several modules in C++ frontend currently has calls to `options.name_`, which is bad practice because `options.name_` should be a private options field and we should use `options.name()` to access its value. This PR makes `options.name_` actually private and changes all callsites of `options.name_` to `options.name()`. After this change, we can change all module options to have a map as the underlying data structure, and require that all options must be able to be stored in `c10::IValue`. These changes together would make serializing module options much easier. Note that this PR is BC-breaking in the following way: Previously, calling `options.name_` in C++ module implementation works because `options.name_` was a public field. After this PR, `options.name_` becomes private, and to get the value of `options.name_` we should call `options.name()`, and to set the value of `options.name_` we should call `options.name(new_value)`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26419 Differential Revision: D17481507 Pulled By: yf225 fbshipit-source-id: 93e4ed0e1d79ef57104ad748809d03e25da61ed3	2019-09-19 14:48:22 -07:00
Michael Suo	587128e3dc	Use github actions for flake8 (#25824 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25824 Use github actions for flake8. This is nice because it makes it easier to create inline annotations for lint violations. It ends up looking like this: https://github.com/suo/pytorch/pull/21/files Test Plan: Imported from OSS Differential Revision: D17487007 Pulled By: suo fbshipit-source-id: 663094ea2bbbdb1da5b7e5d294c70735a319d5e5	2019-09-19 14:37:37 -07:00
shihongzhi	454bf21b36	port lgamma from TH to Aten (#25138 ) Summary: https://github.com/pytorch/pytorch/issues/24722 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25138 Differential Revision: D17171782 Pulled By: VitalyFedyunin fbshipit-source-id: b0026f0ce5306debf19036f97b8624bf0a56f349	2019-09-19 14:33:03 -07:00
Edward Yang	0705f759a3	Implement multiple dispatch (#26468 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26468 Instead of considering only the TensorTypeSet of the first argument, we collect all Tensor and TensorList arguments and union them together before computing the dispatch type id. XLA companion patch at https://github.com/pytorch/xla/pull/1031 Billing of changes: * ATenDispatch fallback code (i.e., what gets run if there is no entry for a function in the table) now lives out-of-line in a function `getFallbackOp`. This gave me an opportunity to write a more detailed error message, providing information about what registrations were available. There is a TODO in the fallback code, suggesting that we could automatically redispatch in the event that there is no handler for the key. But this is a bit of a design question, because it's not clear if automatic redispatch would cover up errors in the dispatch table (i.e., there should have been something registered at some key, but there wasn't.) * Collection of Tensor/TensorList arguments is done using the trusty old IterArgs helper class. A minor bit of refactoring I had to do to get here was move the IterArgs functionality in torch/csrc/utils/variadic.h into ATen/core. There's some refactoring due on that file too (it has copies of some C++ helper pieces which already live in c10--you can't actually move the whole thing because it is literally incompatible with other code in the codebase). So instead of calling `type_set()` to get the type set of the dispatch argument, now we just call `at::detail::multi_dispatch_tensor_type_set` on all of the tensor/tensor list arguments. * The code generator is adjusted to codegen collection of arguments as needed. There is a little bit of a hack in the code generator to turn 'self' arguments into 'this'. I think this may be duplicated with some logic somewhere else but I have to double check. The new generated code looks like this: ``` inline Tensor & Tensor::copy_(const Tensor & src, bool non_blocking) const { static auto table = globalATenDispatch().getOpTable("aten::copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> Tensor(a!)"); return table->getOp<Tensor & (Tensor &, const Tensor &, bool)>(at::detail::multi_dispatch_tensor_type_set(this, src))(const_cast<Tensor&>(this), src, non_blocking); } ``` The key difference is that previously we wrote `type_set()` as argument to getOp; now it is a call to `multi_dispatch_tensor_type_set` which collects the type ids together. After turning on multi-dispatch, I had to refactor existing code which previously dispatched one place, but now dispatches somewhere else. The primary component affected by this is sparse. Binary operations (add/sub/mul/div/addmm) now dispatch to sparse kernels even if you did add(dense, sparse). So I delete all the sparse handling code from dense kernels, and bulk up the sparse error handling to handle when the first argument is dense. In the case of addmm, I can eliminate the bridge code entirely (well, not quite: more on this below). I also updated the dispatch on sparse to actually point at sparse kernels. Pay special attention to the handling of `div_` by scalar: previously this logic lived in the "dense" `div_` implementation, but there is actually not any sparse kernel we dispatch to. I solved this particular problem by making a redispatch, but another valid approach would have been to add specific dispatches for sparse div on scalar. This codepath is poorly tested because it is only exercised from C++. * One minor annoyance is that because I now want separate dispatch for dense and sparse, I also need to replicate the `add`, `add_`, `add_out` trifecta on the sparse side. I opted for a compromise here: I wrote new a new `add_sparse` trifecta, but reused the implementation between CPU and CUDA. This means that I hav to do another dispatch once I get to `add_out`. The alternative would have been to do twice as many copies for CPU and CUDA (thereby eliminating the extra dispatch) but that seemed distinctly not worth it. * A lot of kernels in sparse assumed that the dispatch argument must be sparse. This is no longer true with dispatch, so I converted the asserts into plain error checking. This also means that we've perturbed the error message in the case of TestSparseOneOff.test_cuda_sparse_cpu_dense_add (I just updated the saved error message) * `addmm` is a little bit even more special: the bridge code also handled broadcasting. I replicated the broadcasting logic between CPU and CUDA implementations to avoid an extra dispatch. * `_sparse_addmm` gave me a bit of trouble, because I had forgotten why we had `torch.sparse.addmm` in the first place. But in the end, its changes followed along with the structural changes I made in addmm. I opted for an extra dispatch here for simplicity. * c10d has some Variable-Tensor confusion in its sparse code. I've worked around it by judiciously inserting "no variable type" guards, but a more correct fix would be to just solve the confusion entirely. Benchmark: Apply the following patch to the base commit and this commit: ``` diff --git a/aten/src/ATen/native/Const.cpp b/aten/src/ATen/native/Const.cpp new file mode 100644 index 0000000000..b66f4d3ece --- /dev/null +++ b/aten/src/ATen/native/Const.cpp @@ -0,0 +1,10 @@ +#include <ATen/ATen.h> + +namespace at { +namespace native { + +Tensor _const5(const Tensor& self, const Tensor& second, const Tensor& third, const Tensor& fourth, const Tensor& fifth) { + return self; +} + +}} // namespace at::native diff --git a/aten/src/ATen/native/native_functions.yaml b/aten/src/ATen/native/native_functions.yaml index b494ed7950..fddae638bb 100644 --- a/aten/src/ATen/native/native_functions.yaml +++ b/aten/src/ATen/native/native_functions.yaml @@ -5878,3 +5878,9 @@ dispatch: CPU: im2col_backward_cpu CUDA: im2col_backward_cuda + +# For benchmarking +- func: _const5(Tensor self, Tensor second, Tensor third, Tensor fourth, Tensor fifth) -> Tensor + variants: function + dispatch: + CPU: _const5 ``` Comparisons with timeit: One-argument, representative case: Before: ``` In [6]: %timeit x.reshape(1, 1) 1.46 µs ± 1.38 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [7]: %timeit x.reshape(1, 1) 1.48 µs ± 29.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [8]: %timeit x.reshape(1, 1) 1.52 µs ± 61.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` After: ``` In [3]: %timeit x.reshape(1, 1) 1.42 µs ± 1.34 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [4]: %timeit x.reshape(1, 1) 1.43 µs ± 1.01 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [5]: %timeit x.reshape(1, 1) 1.42 µs ± 0.982 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` Five-argument, synthetic case (we expect, with enough Tensor arguments, for there to be a slowdown, as we scale `O(n)` with number of arguments, compared to old dispatcher which is `O(1)` with number of arguments): Before: ``` In [1]: import torch In [2]: x = torch.zeros(1) In [3]: %timeit torch._const5(x, x, x, x, x) 949 ns ± 1.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [4]: %timeit torch._const5(x, x, x, x, x) 954 ns ± 1.96 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [5]: %timeit torch._const5(x, x, x, x, x) 947 ns ± 0.601 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` After: ``` In [3]: %timeit torch._const5(x, x, x, x, x) 985 ns ± 9.11 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [4]: %timeit torch._const5(x, x, x, x, x) 984 ns ± 1.17 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [5]: %timeit torch._const5(x, x, x, x, x) 988 ns ± 0.555 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bddppq Differential Revision: D17481256 Pulled By: ezyang fbshipit-source-id: b3206936b4ca8938d45ea90fd71422e0d80b5f96	2019-09-19 14:29:38 -07:00
Dmytro Dzhulgakov	af64789cfa	Fold activation permutation inside quantized conv operator (#26242 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26242 According to https://github.com/pytorch/pytorch/issues/19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for activations of the qconv by using MemoryLayout mechanism - activations stay logically as NCHW but strided as NHWC. Note, that this version is more aggressive than eventual MemoryLayout mechanism - the QConv's output is always NHWC regardless of the input striding. I think it's ok as we don't have NCHW quantized kernels anyway - so the very first conv would magically switch the order, but I'm open to suggestions. Btw, it doesn't change behavior - same happens today in master because of the explicit permute() call. Test Plan: Imported from OSS Differential Revision: D17443218 Pulled By: dzhulgakov fbshipit-source-id: cfd136ae0465acd8d8c26ffad87385dac9c88726	2019-09-19 13:39:26 -07:00
Dmytro Dzhulgakov	d5daac7223	Fold weight permutation inside quantized conv operator (#26241 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26241 According to https://github.com/pytorch/pytorch/issues/19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for weights of the qconv by using MemoryLayout mechanism. Test Plan: Imported from OSS Differential Revision: D17443219 Pulled By: dzhulgakov fbshipit-source-id: ce0eb92034a9977b3303dafab8b0414575171062	2019-09-19 13:39:22 -07:00
Dmytro Dzhulgakov	8c1354c31b	Implement more support for per-channel quantization (#26240 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26240 In particular adds support for empty/empty_like which is needed for memory layouts to work. Test Plan: Imported from OSS Differential Revision: D17443220 Pulled By: dzhulgakov fbshipit-source-id: 9c9e25981999c0edaf40be104a5741e9c62a1333	2019-09-19 13:39:17 -07:00
Ivan Kobzarev	8317f75b79	Use gradle 4.10.3 for build and publish Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26473 Differential Revision: D17484698 Pulled By: IvanKobzarev fbshipit-source-id: 8bd888f51054a5f02291938f1469ef0d0fa02cb2	2019-09-19 12:47:15 -07:00
iurii zdebskyi	f673def92d	Enabled where for bool tensor on CUDA (#26430 ) Summary: Enabled "where_cuda" for bool tensors on CUDA Fixing https://github.com/pytorch/pytorch/issues/26247 Tested via unit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/26430 Differential Revision: D17464181 Pulled By: izdeby fbshipit-source-id: cbb09925753b2e6f35e7400da3243d4d3fc86b69	2019-09-19 12:29:31 -07:00
Jerry Zhang	aad8738681	Remove quantization for bias in pattern (#26415 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26415 We do dynamic quantization for bias right now, remove this in pattern Test Plan: python test/test_jit.py 'TestJit.test_quant_fusion' Imported from OSS Differential Revision: D17465555 fbshipit-source-id: 5e229cbc6ae85ea4ce727b3479993d79747d7792	2019-09-19 11:57:11 -07:00
Jiakai Liu	d799726474	ensure c10/macros included before using (#26439 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26439 C10_MOBILE / FEATURE_TORCH_MOBILE are checked in EnableNamedTensor.h but NamedTensor.h includes it at very beginning - for internal build it's fine as C10_MOBILE / FEATURE_TORCH_MOBILE are set as compiler flags, but for cmake build it relies on c10/macros/Macros.h header to derive these macros from other macros like __ANDROID__, so it won't work as expected. Test Plan: - build locally; - will check CI; Differential Revision: D17466581 Pulled By: ljk53 fbshipit-source-id: 317510bcc077782ec2d22e23b1aaa0cb77cb73a9	2019-09-19 11:53:33 -07:00
Alexander Melnikov	68895eb9f4	fix flaky test (#26395 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26395 This diff makes each SummaryWriter write into its own unique directory. Reviewed By: orionr Differential Revision: D17441500 fbshipit-source-id: d284fcf0e7e7a7214e644349e345f1de0e1a1aba	2019-09-19 11:13:31 -07:00
Edward Yang	fe9dbbdba3	Emergency Docker upgrade to version 347. (#26466 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26466 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17480533 Pulled By: ezyang fbshipit-source-id: 5532bd50aaea284ebb208feb949b5a6aca6be458	2019-09-19 10:11:25 -07:00
Elias Ellison	4c1a2c2033	add setitem to class types (#25750 ) Summary: Follow up to https://github.com/pytorch/pytorch/pull/25664, add `class_type[ind] = val`. Like `__getitem__`, `__setitem__` has a custom compilation path so it wasn't added with the rest of the magic methods. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25750 Differential Revision: D17428725 Pulled By: eellison fbshipit-source-id: ff3767ef41515baf04b0c0f5c896dbd3f1d20cd3	2019-09-19 10:01:39 -07:00
Junjie Bai	07bd76988e	Revert D17265918: Implement multiple dispatch Test Plan: revert-hammer Differential Revision: D17265918 Original commit changeset: 221efe4e86a4 fbshipit-source-id: f0ab90fa1201080e0d62fd140faf0fcdfd56601b	2019-09-19 09:50:17 -07:00
Edward Yang	ece14ff473	Implement multiple dispatch (#25653 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25653 Instead of considering only the TensorTypeSet of the first argument, we collect all Tensor and TensorList arguments and union them together before computing the dispatch type id. Billing of changes: * ATenDispatch fallback code (i.e., what gets run if there is no entry for a function in the table) now lives out-of-line in a function `getFallbackOp`. This gave me an opportunity to write a more detailed error message, providing information about what registrations were available. There is a TODO in the fallback code, suggesting that we could automatically redispatch in the event that there is no handler for the key. But this is a bit of a design question, because it's not clear if automatic redispatch would cover up errors in the dispatch table (i.e., there should have been something registered at some key, but there wasn't.) * Collection of Tensor/TensorList arguments is done using the trusty old IterArgs helper class. A minor bit of refactoring I had to do to get here was move the IterArgs functionality in torch/csrc/utils/variadic.h into ATen/core. There's some refactoring due on that file too (it has copies of some C++ helper pieces which already live in c10--you can't actually move the whole thing because it is literally incompatible with other code in the codebase). So instead of calling `type_set()` to get the type set of the dispatch argument, now we just call `at::detail::multi_dispatch_tensor_type_set` on all of the tensor/tensor list arguments. * The code generator is adjusted to codegen collection of arguments as needed. There is a little bit of a hack in the code generator to turn 'self' arguments into 'this'. I think this may be duplicated with some logic somewhere else but I have to double check. After turning on multi-dispatch, I had to refactor existing code which previously dispatched one place, but now dispatches somewhere else. The primary component affected by this is sparse. Binary operations (add/sub/mul/div/addmm) now dispatch to sparse kernels even if you did add(dense, sparse). So I delete all the sparse handling code from dense kernels, and bulk up the sparse error handling to handle when the first argument is dense. In the case of addmm, I can eliminate the bridge code entirely (well, not quite: more on this below). I also updated the dispatch on sparse to actually point at sparse kernels. Pay special attention to the handling of `div_` by scalar: previously this logic lived in the "dense" `div_` implementation, but there is actually not any sparse kernel we dispatch to. I solved this particular problem by making a redispatch, but another valid approach would have been to add specific dispatches for sparse div on scalar. This codepath is poorly tested because it is only exercised from C++. * One minor annoyance is that because I now want separate dispatch for dense and sparse, I also need to replicate the `add`, `add_`, `add_out` trifecta on the sparse side. I opted for a compromise here: I wrote new a new `add_sparse` trifecta, but reused the implementation between CPU and CUDA. This means that I hav to do another dispatch once I get to `add_out`. The alternative would have been to do twice as many copies for CPU and CUDA (thereby eliminating the extra dispatch) but that seemed distinctly not worth it. * A lot of kernels in sparse assumed that the dispatch argument must be sparse. This is no longer true with dispatch, so I converted the asserts into plain error checking. This also means that we've perturbed the error message in the case of TestSparseOneOff.test_cuda_sparse_cpu_dense_add (I just updated the saved error message) * `addmm` is a little bit even more special: the bridge code also handled broadcasting. I replicated the broadcasting logic between CPU and CUDA implementations to avoid an extra dispatch. * `_sparse_addmm` gave me a bit of trouble, because I had forgotten why we had `torch.sparse.addmm` in the first place. But in the end, its changes followed along with the structural changes I made in addmm. I opted for an extra dispatch here for simplicity. * c10d has some Variable-Tensor confusion in its sparse code. I've worked around it by judiciously inserting "no variable type" guards, but a more correct fix would be to just solve the confusion entirely. Benchmark: Apply the following patch to the base commit and this commit: ``` diff --git a/aten/src/ATen/native/Const.cpp b/aten/src/ATen/native/Const.cpp new file mode 100644 index 0000000000..b66f4d3ece --- /dev/null +++ b/aten/src/ATen/native/Const.cpp @@ -0,0 +1,10 @@ +#include <ATen/ATen.h> + +namespace at { +namespace native { + +Tensor _const5(const Tensor& self, const Tensor& second, const Tensor& third, const Tensor& fourth, const Tensor& fifth) { + return self; +} + +}} // namespace at::native diff --git a/aten/src/ATen/native/native_functions.yaml b/aten/src/ATen/native/native_functions.yaml index b494ed7950..fddae638bb 100644 --- a/aten/src/ATen/native/native_functions.yaml +++ b/aten/src/ATen/native/native_functions.yaml @@ -5878,3 +5878,9 @@ dispatch: CPU: im2col_backward_cpu CUDA: im2col_backward_cuda + +# For benchmarking +- func: _const5(Tensor self, Tensor second, Tensor third, Tensor fourth, Tensor fifth) -> Tensor + variants: function + dispatch: + CPU: _const5 ``` Comparisons with timeit: One-argument, representative case: Before: ``` In [6]: %timeit x.reshape(1, 1) 1.46 µs ± 1.38 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [7]: %timeit x.reshape(1, 1) 1.48 µs ± 29.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [8]: %timeit x.reshape(1, 1) 1.52 µs ± 61.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` After: ``` In [3]: %timeit x.reshape(1, 1) 1.42 µs ± 1.34 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [4]: %timeit x.reshape(1, 1) 1.43 µs ± 1.01 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [5]: %timeit x.reshape(1, 1) 1.42 µs ± 0.982 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` Five-argument, synthetic case (we expect, with enough Tensor arguments, for there to be a slowdown, as we scale `O(n)` with number of arguments, compared to old dispatcher which is `O(1)` with number of arguments): Before: ``` In [1]: import torch In [2]: x = torch.zeros(1) In [3]: %timeit torch._const5(x, x, x, x, x) 949 ns ± 1.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [4]: %timeit torch._const5(x, x, x, x, x) 954 ns ± 1.96 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [5]: %timeit torch._const5(x, x, x, x, x) 947 ns ± 0.601 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` After: ``` In [3]: %timeit torch._const5(x, x, x, x, x) 985 ns ± 9.11 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [4]: %timeit torch._const5(x, x, x, x, x) 984 ns ± 1.17 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [5]: %timeit torch._const5(x, x, x, x, x) 988 ns ± 0.555 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17265918 Pulled By: ezyang fbshipit-source-id: 221efe4e86a40f36abc81e2ebceaa7e251c90b3d	2019-09-19 09:30:40 -07:00
Pavel Belevich	fc3e1a22da	C++ API parity: at::Tensor::output_nr Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26216 Test Plan: Imported from OSS Differential Revision: D17427576 Pulled By: pbelevich fbshipit-source-id: 351c834c6c44a2a2f915e48a1e8aa8ad7f4274b3	2019-09-19 09:11:40 -07:00
Hong Xu	97c8c18a21	tag files should not be deleted by "python setup.py clean". Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26416 Differential Revision: D17477606 Pulled By: ezyang fbshipit-source-id: de36e8556981af0b2a71f17ee8e61b9deb5da024	2019-09-19 07:23:15 -07:00
Mike Ruberry	d9ab78b3f0	Moves more tests to TestTorchDeviceType (#26435 ) Summary: - Moves all ROCm-requiring test_torch tests to TestTorchDeviceType - Moves test_stft and test_lu from test_cuda - Moves many CUDA-only test_torch tests to TestTorchDeviceType - Combines several test_torch CPU tests with their CUDA variants Pull Request resolved: https://github.com/pytorch/pytorch/pull/26435 Differential Revision: D17470469 Pulled By: mruberry fbshipit-source-id: 90bb7fc09465c53eb2ab8da52eb2c2509775c16f	2019-09-19 01:49:34 -07:00
Jiakai Liu	6b4bbdda37	fix JNI wrapper for IValue interface change (#26448 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26448 Seems CI was broken by PR #25439 - fix based on interface change. Test Plan: - build locally Differential Revision: D17468987 Pulled By: ljk53 fbshipit-source-id: 3c1cb582c8d05357a94295b670b2ce61a7a5a4cd	2019-09-18 23:54:03 -07:00
Zachary DeVito	8d9364ef32	Refactor emitIsInstance (#26061 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26061 This is in preparation for actually emitting a dynamic isinstance check instruction. It re-arranges the logic so that all the types and properties to check against are in a flat list. In the future this flat list will be encoded into an actual instruction if we determine that we cannot perform the check statically. Test Plan: Imported from OSS Differential Revision: D17332062 Pulled By: zdevito fbshipit-source-id: 4c0b65436f8e030170d469fe747e79de24bb24eb	2019-09-18 23:27:13 -07:00
Supriya Rao	d46b982db3	Add support to call unpack for pytorch mobile quantized FC and Conv (#26211 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26211 Currently QNNPACK does not have an unpack function like FBGEMM does. In order to be able to script quantized models for mobile, we need to save unpacked weights. This change stores the original weights and bias in the opaque struct and simply returns it when unpack is called Test Plan: python test/test_quantized.py TestQNNPackOps.test_qconv_unpack python test/test_quantized.py TestQNNPackOps.test_qlinear_unpack Imported from OSS Differential Revision: D17464430 fbshipit-source-id: 83ad5a2556dcf13245a1047feef6cfb489c9ef69	2019-09-18 23:05:18 -07:00
Elias Ellison	921079c5c2	flat hash map that preserves insertion and deletion order (#25675 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25675 This will be used to support OrderedDict in python. Modifies the existing `flat_hash_map` to preserve insertion and deletion order. Test Plan: Imported from OSS Differential Revision: D17440131 Pulled By: eellison fbshipit-source-id: c7a6a290c8471627f5a061c0cca8e98ff131c9b4	2019-09-18 22:36:31 -07:00
Elias Ellison	43b30cd5d9	make copy (#26371 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26371 This is just so the PR making the flat hash map preserve order is easier to review Replaces https://github.com/pytorch/pytorch/pull/25674 bc ghstack was poisoned and i had to resubmit Test Plan: Imported from OSS Differential Revision: D17440132 Pulled By: eellison fbshipit-source-id: 8a4f640d070d85795261cb3a129518c72096e9ef	2019-09-18 22:36:27 -07:00
Haixin Liu	dcbfc3bdbf	Add per channel observer (#25887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25887 ghstack-source-id: 90383258 Add per channel observer to compute the qparams for each channel. Test Plan: buck test mode/dev caffe2/test:quantization -- 'test_per_channel_minmax_observer' buck test mode/dev caffe2/test:quantization -- 'test_per_channel_minmax_observer_scriptable' Differential Revision: D17137226 fbshipit-source-id: 0b1c93e3cbcda86f5c4e30f7cd94c670f2665063	2019-09-18 22:16:45 -07:00
Lu Fang	7042bfea1d	Revert D17374409: [pytorch][PR] Implement more size-oriented opcodes in the depickler. Test Plan: revert-hammer Differential Revision: D17374409 Original commit changeset: 17971b26e484 fbshipit-source-id: 527d220cd814d2228bd9439d60bf19a9ec42ed40	2019-09-18 21:58:09 -07:00
Negin Raoof	293d73fc92	Export gelu (#24475 ) Summary: Added support for gelu in symbolic opset9 + op and ORT tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/24475 Reviewed By: hl475 Differential Revision: D17088708 Pulled By: houseroad fbshipit-source-id: 9d2f9d7d91481c57829708793d88f786d6c3956f	2019-09-18 21:18:07 -07:00
Owen Anderson	5127599152	Implement more size-oriented opcodes in the depickler. (#25786 ) Summary: These are intentionally not yet used by the encoder to avoid backcompat issues. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25786 Differential Revision: D17374409 fbshipit-source-id: 17971b26e48429c68b7fa8126d7ed56ff80b5d68	2019-09-18 21:05:25 -07:00
BowenBao	595c1dfa74	Export clamp for opset 11 (#25797 ) Summary: - Export clamp for opset 11, which enables dynamic min/max inputs. - Bump ONNX Runtime version in CI to enable opset 11 onnx::clip tests. ~~- Re-enable some disabled tests, now that backend impl & fixes are in.~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/25797 Reviewed By: hl475 Differential Revision: D17399112 Pulled By: houseroad fbshipit-source-id: 9b8bfa86b2bddfb5e15d6812f04b31db6e701d26	2019-09-18 20:40:23 -07:00
Ailing Zhang	b1ecf4bc82	Revert D17464904: Add NoQEngine to QEngine and refactor the name of set/get qengine Test Plan: revert-hammer Differential Revision: D17464904 Original commit changeset: d8f2cebb978f fbshipit-source-id: 8feb86f7347f455eb51538ce7893d4a096ba0ba4	2019-09-18 20:04:58 -07:00
Jerry Zhang	cbc7172a02	Fix quantized::linear QuantFusion patterns (#26414 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26414 Fix the patterns after changes to prepack functions(https://github.com/pytorch/pytorch/pull/25626) Test Plan: pytho test/test_jit.py 'TestJit.test_quant_fusion' Imported from OSS Differential Revision: D17465553 fbshipit-source-id: 7df6a6aa8389bb4a7a370c65ade4c2585b45b882	2019-09-18 19:59:07 -07:00
Jerry Zhang	4f7292f7ee	Add NoQEngine to QEngine and refactor the name of set/get qengine (#26330 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26330 att Test Plan: . Imported from OSS Differential Revision: D17464904 fbshipit-source-id: d8f2cebb978fcbc478bc7e111ba24bc71a6f8915	2019-09-18 19:38:59 -07:00
Jiakai Liu	c3f881cdbc	add script to build mobile library with host toolchain (#26440 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26440 As we are optimizing build size for Android/iOS, it starts diverging from default build on several build options, e.g.: - USE_STATIC_DISPATCH=ON; - disable autograd; - disable protobuf; - no caffe2 ops; - no torch/csrc/api; ... Create this build_mobile.sh script to 'simulate' mobile build mode with host toolchain so that people who don't work on mobile regularly can debug Android/iOS CI error more easily. It might also be used to build libtorch on devices like raspberry pi natively. Test Plan: - run scripts/build_mobile.sh -DBUILD_BINARY=ON - run build_mobile/bin/speed_benchmark_torch on host machine Differential Revision: D17466580 Pulled By: ljk53 fbshipit-source-id: 7abb6b50335af5b71e58fb6d6f9c38eb74bd5781	2019-09-18 19:34:09 -07:00
Mike Ruberry	495dbacfd1	Back out "[pytorch][PR] Fix many type mismatches in the CUDA version of calc_digamma and calc_trigamma" (#26444 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26444 Original commit changeset: 6276a011a373 Test Plan: Revert of recent change that was breaking a test. Test plan is that build no longer breaks verified manually. Differential Revision: D17467067 fbshipit-source-id: bf866f4dc0f08af249d92cebc9846623d44224f6	2019-09-18 19:12:29 -07:00
Jerry Zhang	fb28014af0	Remove quantizeBias (#26388 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26388 We don't need to quantize bias after https://github.com/pytorch/pytorch/pull/26057 Test Plan: . Imported from OSS Differential Revision: D17465554 fbshipit-source-id: 6b3992aa3ff1c17ccef11850c2b0a008b225bf30	2019-09-18 19:07:27 -07:00
Ivan Kobzarev	6387ffab65	Exclude libfbjni.so from pytorch_android not to have its duplicating (#26382 ) Summary: fbjni is used during linking `libpytorch.so` and is specified in `pytorch_android/CMakeLists.txt` and as a result its included as separate `libfbjni.so` and is included to `pytorch_android.aar` We also have java part of fbjni and its connected to pytorch_android as gradle dependency which contains `libfbjni.so` As a result when we specify gradle dep `'org.pytorch:pytorch_android'` (it has libjni.so) and it has transitive dep `'org.pytorch:pytorch_android_fbjni'` that has `libfbjni.so` and we will have gradle ambiguity error about this Fix - excluding libfbjni.so from pytorch_android.aar packaging, using `libfbjni.so` from gradle dep `'org.pytorch:pytorch_android_fbjni'` Pull Request resolved: https://github.com/pytorch/pytorch/pull/26382 Differential Revision: D17468723 Pulled By: IvanKobzarev fbshipit-source-id: fcad648cce283b0ee7e8b2bab0041a2e079002c6	2019-09-18 18:40:48 -07:00
Ansha Yu	e44ea6cd5e	tvm operator dynolog (#26295 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26295 Log the following in scuba caffe2_tvm_operator_stats: 1. everything in caffe2_operator_stats 2. fallback netdef 3. tvm module graph_json 4. whether compilation triggered this round 5. number of compilations stored in tvm_runtime_map 6. (not yet logged) last compilation time if any 7. (not yet logged) total bytes occupied by compilation 8. whether this compilation is fallback 9. batch size as observed by tvm op Test Plan: ``` buck run mode/dbg //tvm/sparse:tvm_bbpredictor_benchmark -- --init_net ~/tmp/ads/84480054_204/init_net.pb --input_init_net ~/tmp/ads/84480054_204/input_init_net.pb --pred_net ~/tmp/ads/84480054_204/pred_net.pb --warmup 1000 --iter 1000 --num_cycles 5 --caffe2_logging_operator_dyno_sampling_rate=1 --vmodule=Logger= 2 ``` Logs show up in the scuba: https://our.intern.facebook.com/intern/scuba/query/?dataset=caffe2_tvm_operator_stats https://fburl.com/scuba/lq2h22e4 Auto submitted adindexer canary: https://our.intern.facebook.com/intern/ads/canary/421064436039494716 Additional adindexer canary: https://our.intern.facebook.com/intern/ads/canary/421082681202831286/ Additional adfinder canary: https://our.intern.facebook.com/intern/ads/canary/421082685084831037/ Reviewed By: yinghai Differential Revision: D17358412 fbshipit-source-id: d2119c12ddeaa86217c163e32fb1e211952139f5	2019-09-18 18:37:17 -07:00
Vitaly Fedyunin	36ade9aa23	Move the CUDA implementation of rsqrt to ATen. (#25285 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25285 Fix #24620 Test Plan: Imported from OSS Differential Revision: D17397459 fbshipit-source-id: 024dc0da8085df85513fde5f1d1e0141f734b284	2019-09-18 18:17:52 -07:00
Pavel Belevich	44ffbc43de	C++ API parity: at::Tensor::is_leaf Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26186 Test Plan: Imported from OSS Differential Revision: D17427580 Pulled By: pbelevich fbshipit-source-id: c01362a3b1fdb0bd1dfc158dbf6fe1cf1d928761	2019-09-18 17:56:13 -07:00
Huan Gui	a8386d2a7d	fix composite learning rate (#26227 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26227 In the previous implementation of composite lr, the lr_scale for each sub policy will be rewritten by the last lr_scale. Due to another bug in unittest (where policy_lr_scale being the same for all sub policies), this bug was not detected by unittest... Fix: add an additional field in CompositeLearningRateItem so that we store lr_scale values for all sub policies If fix unittest, the error in previous implementation: https://fburl.com/testinfra/ikdbnmey With the fix, https://fburl.com/testinfra/m694ehl1 Test Plan: unittest buck test caffe2/caffe2/python/operator_test:learning_rate_op_test -- test_composite_learning_rate_op Reviewed By: chocjy, alex1o1o7cloud Differential Revision: D17380363 fbshipit-source-id: 161e9cb71bb2ea7f0734a3361e270616057a08e4	2019-09-18 17:34:17 -07:00
Jerry Zhang	f75c1e4939	Add extra filtering for scale/zero_point/dtype in FoldQuantizeCallIntoBuffer (#26224 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26224 We need to make sure they are Constant before we can do folding Test Plan: python test/test_jit.py 'TestJit.test_fold_quantize' Imported from OSS Differential Revision: D17462530 fbshipit-source-id: 2e02f980e0e7f28014d2f813035975dfc69cacd9	2019-09-18 17:03:56 -07:00
Supriya Rao	b23be95558	Adding quantized::conv2d function for pytorch mobile in c10 (#26152 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26152 This change adds the support to call QNNPACK using the refactored API for Conv2d operators Test Plan: python test/test_quantized.py TestQNNPackOps.test_qconv_qnnpack Imported from OSS Differential Revision: D17459892 fbshipit-source-id: d20b3e8b81dd403541cb2b9164731448ca229695	2019-09-18 16:48:42 -07:00
Rohan Varma	1f51051287	remove extra get_worker_id call in distributed rpc init (#26381 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26381 Was looking through this definition and saw that it has 2 identical definitions of get_worker_id. Tested by ensuring that all tests in `test/test_rpc.py` still pass. ghstack-source-id: 90347452 Test Plan: See above Differential Revision: D17439495 fbshipit-source-id: 9a78340f7aefa5797e0ae837fbcfe24ebe3a775d	2019-09-18 16:34:54 -07:00
Jerry Zhang	f29e0d70cb	Add filter function to subgraph rewriter runGraph (#26223 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26223 add filter function to runGraph, if the function returns false for given `Match`, the we'll skip the rewrite. Test Plan: will test in later PR that adds extra filtering on Constant nodes Imported from OSS Differential Revision: D17462529 fbshipit-source-id: 52abe52cb3e729a3871f7a60eddd5275060af36a	2019-09-18 16:34:50 -07:00
Zachary DeVito	12762cd586	Use static type information to restore type tags (#25447 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25447 When we unpickle IValues, we lose type information for List[T] and Dict[K, V]. We can restore this information using the static type information contained in the top-level Module/Class type. This ensures that even after serialization we can always get the dynamic type of an ivalue using its type() method. Test Plan: Imported from OSS Differential Revision: D17127872 Pulled By: zdevito fbshipit-source-id: 1ffb5e37a7c35c71ac9d3fb7b2edbc7ce3fbec72	2019-09-18 16:07:01 -07:00
Zachary DeVito	ad0af1127b	Add ivalue::type(), part 1 (#25439 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25439 This introduces a type() method on IValue that returns the tagged type of the IValue. The intention is that this value is always present/accurate, making it possible for clients to recover the Type from an IValue. Currently our APIs here are incomplete: they can sometimes recover a type but not always. This PR adds the function, and cleans up remaining cases where Lists/Dicts are not tagged. However, this information does not survive serialization unchanged. A second PR will use the type information in the ClassType being serialized to fixup the serialized ivalues to have the correct types again. After this patch it will be save to remove our incomplete APIs for recovering types. Test Plan: Imported from OSS Differential Revision: D17125595 Pulled By: zdevito fbshipit-source-id: 71c8c1a0e44762647e8f15f45d8ed73af8e6cb92	2019-09-18 16:06:58 -07:00
BowenBao	d02369dac2	add pass for onnx scalar type conversion (#24378 ) Summary: This pass tries to resolve scalar type mismatch issues between input tensors introduced by the implicit type conversions on scalars. e.g. https://github.com/pytorch/pytorch/issues/23724 Pull Request resolved: https://github.com/pytorch/pytorch/pull/24378 Reviewed By: hl475 Differential Revision: D17088682 Pulled By: houseroad fbshipit-source-id: 3de710f70c3b70b9f76fd36a7c4c76e168dbc756	2019-09-18 15:55:54 -07:00
Mike Ruberry	248d5857ae	Adds dtypes decorators to and allows helper methods in device generic test classes (#26375 ) Summary: - Adds dtypes, dtypesIfCPU, and dtypesIfCUDA decorators. - Eliminates the need for nontest members to be defined in an inherited base. - Updates one test to use the decorators and updates TestTorchDeviceType with helpers. This PR appears to be hanging the ROCm build, which is not entirely surprising. See https://github.com/pytorch/pytorch/issues/26394, which demonstrates that the ROCm build can be hung by commenting out a Python test that was never run on ROCm. gchanan - what type list, if any, do you want to expose? I imagine most test suites will define their own lists like today. SCALAR_TYPES, QUANTIZED_TYPES, and ALL_TYPES seem reasonable to me. DOCUMENTED_TENSOR_TYPES will be removed, of course. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26375 Test Plan: Edit is to tests themselves. Differential Revision: D17462294 Pulled By: mruberry fbshipit-source-id: f8259ec66709749b1bf8077efc737676af901436	2019-09-18 15:35:52 -07:00
Supriya Rao	52d999e173	Disable QNNPACK tests if pytorch is not built with it. (#26427 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26427 Use the new macro USE_PYTORCH_QNNPACK to enable testing with qnnpack Test Plan: test caffe2/test:quantized -- TestQNNPackOps Summary (total time 4.96s): PASS: 0 FAIL: 0 SKIP: 4 caffe2/test:quantized - test_qlinear_qnnpack (test_quantized.TestQNNPackOps) caffe2/test:quantized - test_qnnpack_add (test_quantized.TestQNNPackOps) caffe2/test:quantized - test_qnnpack_relu (test_quantized.TestQNNPackOps) caffe2/test:quantized - test_qnnpack_maxpool2d (test_quantized.TestQNNPackOps) FATAL: 0 TIMEOUT: 0 OMIT: 0 Reviewed By: ljk53 Differential Revision: D17459791 fbshipit-source-id: 3798fc270d22123b8807c9c63f12b9940981b115	2019-09-18 14:51:29 -07:00
Mike Ruberry	a561660241	Puts ROCm tests on default stream (#26394 ) Summary: This PR has been updated. Since ORIGINAL PR comment below. ROCm CI builds have been hanging as we've been refactoring tests, even when these refactors seem entirely innocuous. This PR started by commenting out test_stft, for example, a Python test never run on ROCm, and that was sufficient to reliably hang the ROCm build in CI. Putting ROCm tests back on the default stream appears to remove this hang. So this PR now does that. This is likely to unblock development. ORIGINAL: Some test changes appear to be causing ROCm builds to hang in CI. This PR is an attempt to diagnose the source of the hang. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26394 Test Plan: Change is to test themselves. Differential Revision: D17456678 Pulled By: mruberry fbshipit-source-id: 38d00d01c64b5055c1dfed01687ce3e1c9372887	2019-09-18 14:07:33 -07:00
Hong Xu	13b544602e	Fix many type mismatches in the CUDA version of calc_digamma and calc_trigamma (#25791 ) Summary: - There are some missing casts. - Functions like ::log, ::sin will potentially always invoke the double version on host. For example, compiling the following code: ```c++ #include <cmath> float log_float(float f) { return ::logf(f); } double log_double(double f) { return ::log(f); } float log_float2(float f) { return ::log(f); } float log_float3(float f) { return std::log(f); } ``` using `g++ -c -O3` leads to: log_float(float): jmp logf log_double(double): jmp log log_float2(float): subq $8, %rsp cvtss2sd %xmm0, %xmm0 call log addq $8, %rsp cvtsd2ss %xmm0, %xmm0 ret log_float3(float): jmp logf Note that log_float2 delegates the call to the double version of log (surrounded by cast), while log_float3 delegates the call correctly to logf. See https://godbolt.org/z/KsRWwW Pull Request resolved: https://github.com/pytorch/pytorch/pull/25791 Differential Revision: D17452312 Pulled By: izdeby fbshipit-source-id: 6276a011a373cd7cb144f9ecd84116aa206e7d1b	2019-09-18 13:41:34 -07:00
Nikolay Korovaiko	18eb92e2af	Add support for lists for prim::min and prim::max Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26155 Differential Revision: D17455540 Pulled By: Krovatkin fbshipit-source-id: e3aee465d108b59691d6c68f85fbf212a5d6a125	2019-09-18 13:39:08 -07:00
Richard Zou	76fb909beb	Change "named_guard" in native_functions to "supports_named_tensor" (#26352 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26352 "named_guard: P" is the same as "supports_named_tensor: !P". Also changed the error message to be more understandable to users. Test Plan: - `TEST_NAMEDTENSOR=1 pytest test/test_namedtensor.py -v` - [namedtensor ci] Differential Revision: D17426234 Pulled By: zou3519 fbshipit-source-id: 4cab780e6e29e184e79cdd3690f41df9ebb2ecb5	2019-09-18 12:28:16 -07:00
Tao Xu	ecb82ed5a2	clean up the PR job script for iOS build (#26353 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26353 ### Summary As the new iOS building script has been landed, this PR will clean up some redundant code for the PR jobs. ### Test Plan - Don't break any existing CI jobs - Don't break the old iOS CI jobs Test Plan: Imported from OSS Differential Revision: D17457253 Pulled By: xta0 fbshipit-source-id: 0d85117533a62d0b9b7b859b0044fd4388c3c9d4	2019-09-18 12:21:19 -07:00
Hong Xu	2801df5ba1	Add a float version of calc_erfinv (by templating) on CPU (#26070 ) Summary: Currently calc_erfinv's float version on CPU is missing. This commit adds the float version (by templating). I also used this opportunity to clean up calc_erfinv a bit. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26070 Reviewed By: ezyang Differential Revision: D17368024 Pulled By: VitalyFedyunin fbshipit-source-id: 00cc3097f340022b3788143e6c12b01c35d72f13	2019-09-18 11:40:40 -07:00
Shihao Xu	b0b0f2c65f	Make ProcessGroupAgent take num_send_recv_threads as constructor argument (#26313 ) Summary: # Problem If there is not enough number of thread in the RPC Agent thread pool. Some circular dependent works could cause deadlock. The current to way to get around this deadlock is to provide abundant number of threads. # Solution as titled Pull Request resolved: https://github.com/pytorch/pytorch/pull/26313 Differential Revision: D17405491 Pulled By: xush6528 fbshipit-source-id: a1d9b6a84db0371cd4b63328fa00f651c0808485	2019-09-18 10:36:29 -07:00
Mike Ruberry	388cfdf2ac	Removes torchtest, expands generic device testing (#26374 ) Summary: - Removes torchtest - <s>Moves test_torch tests skipped on ROCm to generic device test class</s> - Creates test_nn generic device test class Next: adding dtypes to generic device testing framework. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26374 Test Plan: Change is to tests themselves. Differential Revision: D17442218 Pulled By: mruberry fbshipit-source-id: d7e4451d09fc9049478b35a7efb8bb580071e8c8	2019-09-18 10:24:50 -07:00
Yanli Zhao	ed09704899	use allgatherv for sparse all reduce (#23917 ) Summary: per https://github.com/pytorch/pytorch/issues/22226, The current sparse allreduce in ProcessGroupGloo pads the indices and values tensors to the maximum length across all processes and then performs a regular allgather (because they'll have equal size across processes). Instead, we can use allgatherv. This is mostly a win for memory usage if there is severe size imbalance between processes. close https://github.com/pytorch/pytorch/issues/22226 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23917 Test Plan: buck run mode/dev-nosan caffe2/test:c10d -- test_c10d.ProcessGroupGlooTest.test_sparse_allreduce_basics buck run mode/dev-nosan caffe2/test:c10d -- test_c10d.ProcessGroupGlooTest.test_sparse_allreduce_basics_cuda buck run mode/dev-nosan caffe2/test:c10d -- test_c10d.ProcessGroupGlooTest.test_sparse_allreduce_checks Differential Revision: D16664985 Pulled By: zhaojuanmao fbshipit-source-id: e7d3c0770cbc09f9175b3027b527e95053724843	2019-09-18 09:57:45 -07:00
Pavel Belevich	98ccae09af	C++ API parity: at::Tensor::grad Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26150 Test Plan: Imported from OSS Differential Revision: D17427579 Pulled By: pbelevich fbshipit-source-id: 68d012076aa86dee9f23fad71a2d265d75f56d22	2019-09-18 09:20:38 -07:00
Lu Fang	72aeafd3d0	Fix no tab check (#26399 ) Summary: ignore the folder and its children ios/TestApp Pull Request resolved: https://github.com/pytorch/pytorch/pull/26399 Differential Revision: D17451239 Pulled By: houseroad fbshipit-source-id: d6ba666bf955454eca4a10c00784ee5947a70f59	2019-09-18 09:11:32 -07:00
Zecong Hu	b8ae4d0f1c	Resolve #25605 cyclic reference in _LRScheduler (#25776 ) Summary: Cyclic reference was introduced in a previous version due to runtime overwriting of the bound method `optimizer.step`. This is now avoided by keeping a weak reference to the optimizer instance. Credit: https://stackoverflow.com/questions/26157952/why-set-a-bound-method-to-python-object-create-a-circular-reference Pull Request resolved: https://github.com/pytorch/pytorch/pull/25776 Differential Revision: D17420770 Pulled By: ezyang fbshipit-source-id: 546ec94cf725ebfddb310b24e6a2e146ddecd1f6	2019-09-18 06:08:35 -07:00
Richard Zou	bae7528479	Change '' to '...' and `...` for named tensor API functions. (#26350 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26350 Python 3 lets us use `...` to perform indexing. Semantically, `...` means "the rest of the unspecified dimensions". For example, while indexing, one can do (for 5D `tensor`) `tensor[0, 0, ..., 0]` and the `...` is expanded into `tensor[0, 0, :, :, 0]`. Previously, we were using '' to represent a similar behavior in names. For example, `tensor.refine_names` supports things like the following: ``` x = torch.randn(2, 3, 4, 5, 6) x_out = x.refine_names('', 'H', 'W') # refine only the last two dimensions ``` This PR changes it so that named tensor API functions recognize `'...'` (in Python 2 and Python 3) and `...` (in Python 3 exclusively) instead of `''`. Test Plan: - [namedtensor ci] Differential Revision: D17424666 Pulled By: zou3519 fbshipit-source-id: 003182879fd38ced3fea051217572a457cdaf7cf	2019-09-18 05:47:13 -07:00
Richard Zou	277d442d18	Rename torch.namedtensor -> torch._namedtensor_internals (#26349 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26349 The directory holds a lot of private helper functions that help implement named tensor functionality. Instead of naming each helper function with a leading underscore, I change the name of the import to `_namedtensor_internals` to signal it should not be used directly. Test Plan: - [namedtensor ci] Differential Revision: D17424178 Pulled By: zou3519 fbshipit-source-id: 8f7b74346765759303480e581038a661021acf53	2019-09-18 05:47:09 -07:00
Xiaodong Wang	f341291bfb	Support unpickle py2 NetDef object in py3 (#26147 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26147 We may try to unpickle a byte string in py3 that was pickled from py2. Therefore we need to add encoding latin1. Reviewed By: kennyhorror Differential Revision: D17305677 fbshipit-source-id: c0c8a51909629a65eb72bb81cccfbabaee9f8d01	2019-09-18 02:02:34 -07:00
Haixin Liu	f2e9622ed8	Add l2 norm minimization (#24022 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24022 In histogram observer add an approximation for L2 error minimization for selecting min/max. By selecting new min/max, we filter out outliers in input distribution. This follows the implementation of NormMinimization::NonlinearQuantizationParamsSearch in caffe2/quantization/server/norm_minimization.cc ghstack-source-id: 90298789 Test Plan: buck test mode/dev caffe2/test:quantization -- 'test_histogram_observer' Differential Revision: D16713239 fbshipit-source-id: 82631ba47974e25689c9c66bc3088117090e26d4	2019-09-18 00:07:10 -07:00
Richard Zou	0038111019	Implement named tensor `unflatten(dim, namedshape)`. (#25658 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25658 This unflattens `dim` according to the shape specified in `namedshape`. `namedshape` may be either an OrderedDict or an iterable of (name, size) tuples. Future: - It is possible to make it take a dict in Python >= 3.6 because those are ordered by default, but I'll leave that task for the future. Test Plan: - new tests [namedtensor ci] Differential Revision: D17192655 Pulled By: zou3519 fbshipit-source-id: fd9bd2f462c23a4df1c23d66f2aa95076ff1b160	2019-09-17 21:24:25 -07:00
Ailing	f6203a88a3	enable xla cpp tests in CI Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26347 Differential Revision: D17430034 Pulled By: ailzhang fbshipit-source-id: 4d3a07617a37aa2d1ddf4fd874c0a678c716bf3e	2019-09-17 21:14:29 -07:00
davidriazati	61197e94b3	Remove `torch.save`-related logic from pickler (#25502 ) Summary: The Pickler previously had a distinction between tensors that would be inlined in 1 pickle binary (matching the format of `torch.save()`) and tensors that are saved elsewhere with only a reference stored in the binary. This PR moves that distinction out to `torch::pickle_save` to match the eager Python interface. The change can be seen in `register_prim_ops.cpp` where the call to `jit::pickle` is now `torch::pickle_save` ](https://our.intern.facebook.com/intern/diff/17175215/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/25502 Pulled By: driazati Differential Revision: D17175215 fbshipit-source-id: 8c9a21327cc79eaf6a0e488ea99e305be52f82b1	2019-09-17 20:38:13 -07:00
Junjie Bai	acb300fd6b	Split PyTorch ROCm tests as 2 CI jobs to run in parallel (#26380 ) Summary: ROCm CI jobs are running on Jenkins. They have the "-test{1,2}" parts in "JOB_BASE_NAME", not "BUILD_ENVIRONMENT". Pull Request resolved: https://github.com/pytorch/pytorch/pull/26380 Differential Revision: D17439523 Pulled By: bddppq fbshipit-source-id: 31e2a986d1b7ea40c90ab399a3c1e0a328ae3a92	2019-09-17 20:31:29 -07:00
Michael Suo	193a6a6f98	Revert D17431514: [pytorch][PR] fix schema matching of tuples to vartype lists Test Plan: revert-hammer Differential Revision: D17431514 Original commit changeset: 2ad98bab15ea fbshipit-source-id: 5cf445fd1e37629c700b9b3740fe13ca941e4db9	2019-09-17 17:23:12 -07:00
Supriya Rao	bb1efb3bee	Adding quantized::linear function for pytorch mobile in c10 (#26135 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26135 This change adds the support to call QNNPACK using the refactored API for Linear operators (Fully Connected) It also has certain cmake changes to enable builing and using pytorch_qnnpack inside aten I have disabled USE_QNNPACK in CMakeLists.txt. Enabling it results in picking kernels from third_party/QNNPACK during runtime since the function names are the same. Test Plan: python test/test_quantized.py TestQNNPackOps.test_qlinear_qnnpack Imported from OSS Differential Revision: D17434885 fbshipit-source-id: 084698026938f4529f61d12e86dfe82534ec73dd	2019-09-17 16:16:39 -07:00
Gregory Chanan	59002bb095	Kill if_true / if_false in Declarations.cwrap. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26346 Test Plan: Imported from OSS Differential Revision: D17421345 Pulled By: gchanan fbshipit-source-id: 03b3c61edc13994d96b1d60648da7335fb090531	2019-09-17 15:36:02 -07:00
Elias Ellison	a06e1c3af7	min(li) max(li) (#26351 ) Summary: Add min and max of a list to JIT. Fixes https://github.com/pytorch/pytorch/issues/26036 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26351 Differential Revision: D17427547 Pulled By: eellison fbshipit-source-id: 45796b4076eef0b496b01c2cc710ec4dc15a1ee6	2019-09-17 14:50:33 -07:00
vishwakftw	be976413f7	Skip testing triangular_solve_batched on non-default CUDA stream (#26115 ) Summary: This is for testing purposes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26115 Differential Revision: D17433122 Pulled By: zou3519 fbshipit-source-id: bf41327e6141e9ae589fcf18254c2a8cdd868dd7	2019-09-17 14:45:53 -07:00
Johannes M Dieterich	71d3457a1f	Fix compiler unwrapping step in jenkins build scripts for Caffe2/PyTorch on ROCm (#25409 ) Summary: Fix the regex (requires enabling extglob) for two digit clang releases. While there, also fix it for three digit releases with the hope that I do not need to touch it for some time. Unfortunately, this regex requires extglob to be enabled in the shell. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25409 Differential Revision: D17431786 Pulled By: bddppq fbshipit-source-id: a50b2ff525d9b6046deae9c8725c92d67119599a	2019-09-17 13:50:42 -07:00
Elias Ellison	a8073f34af	fix schema matching of tuples to vartype lists (#25944 ) Summary: In schema matching we allow a homogenous tuple to be matched to list arguments. This logic wasn't yet extended for vartype lists, causing stuff like `len((1, 2, 3))` to fail. Fix for https://github.com/pytorch/pytorch/issues/20500 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25944 Differential Revision: D17431514 Pulled By: eellison fbshipit-source-id: 2ad98bab15eaa496471df651572735eb35183323	2019-09-17 13:47:46 -07:00
Johannes M Dieterich	9181b9c73e	Enable basic GPU profiling capability on ROCm. (#26300 ) Summary: Inserting markers using the nvtx-equivalent API is not supported yet. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26300 Differential Revision: D17425573 Pulled By: bddppq fbshipit-source-id: 4df6c695ba07ab68e7f4dc2f77edde06f78fdac7	2019-09-17 12:11:27 -07:00
Lu Fang	b63f8ef2c9	Rebase CircleCI to master if it is gcc5_4 (#26321 ) Summary: This is the first step of adding CI for bc breaking changes detection of function shcemas. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26321 Reviewed By: hl475 Differential Revision: D17425468 Pulled By: houseroad fbshipit-source-id: b4bb36e5597043407c943b5b8dfe2b1ac3248cb2	2019-09-17 12:04:15 -07:00
Tao Xu	cc61af3c3d	Add iOS test app skeleton (#26261 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26261 ### Summary Previously we have enabled the CI jobs for Pull Requests and nightly build - #25840 [iOS][Circle CI] Add PR jobs for iOS builds - #26074 [IOS][CIRCLE CI] Nightly jobs for iOS builds The testing phase is missing in the nightly build process. Although we are able to generate the build and upload it to the AWS, there is no way to know whether the binary is valid or not (there could be a linking error). To add the test phase to the process, we need 1. Put a dummy test App in the repo. 2. After the build jobs finishes, manually link the static libs to the dummy app to produce an executable using the xcode tool chain. 3. If there is no linking error, then upload the binaris to AWS. If there is an error, then stops the following process and reports an error in CI. The second and third steps depends on the first step which needs to be landed first. ### Test Plan - Don't break any existing CI jobs Test Plan: Imported from OSS Differential Revision: D17408929 Pulled By: xta0 fbshipit-source-id: e391da242639943005453d1318795f981034cc72	2019-09-17 11:06:57 -07:00
Daya Khudia	0ad8c679ae	Enable support for dilated convolutions (#26205 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26205 Enabling quantized dilated convolutions. test:quantized ``` Summary (total time 14.01s): PASS: 43 FAIL: 0 SKIP: 5 caffe2/test:quantized - test_qnnpack_add (test_quantized.TestQNNPackOps) caffe2/test:quantized - test_qnnpack_maxpool2d (test_quantized.TestQNNPackOps) caffe2/test:quantized - test_qnnpack_linear (test_quantized.TestQNNPackOps) caffe2/test:quantized - test_qnnpack_relu (test_quantized.TestQNNPackOps) caffe2/test:quantized - test_compare_tensor_scalar (test_quantized.TestComparatorOps) FATAL: 0 TIMEOUT: 0 OMIT: 0 ``` ghstack-source-id: 90244587 Test Plan: buck test mode/dev caffe2/test:quantized Differential Revision: D17375370 fbshipit-source-id: cff0ba9a77cabac3ad164b2e133bfa466865afd4	2019-09-17 10:55:23 -07:00
Deyu Fu	3ce2ceca05	fix ctc_loss argument check error message (#26325 ) Summary: Was confused by the wrong message while debugging. Turns out cpu version is wrong on comparison direction and gpu version is printing wrong number in addition to that. This fix should make the error message correct. jjsjann123 for tracking Pull Request resolved: https://github.com/pytorch/pytorch/pull/26325 Differential Revision: D17408969 Pulled By: soumith fbshipit-source-id: 0d9330e00aaabcb3e8e893b37a6a53fb378171c5	2019-09-17 10:48:37 -07:00
Michael Suo	a76403f609	Revert D17367016: [pytorch][PR] Enabled bfloat16 dtype on CUDA Test Plan: revert-hammer Differential Revision: D17367016 Original commit changeset: 7e6ae7c6aa4e fbshipit-source-id: 6ca4e1dec5357232e224bf6d6f957ac80005c77c	2019-09-17 10:39:59 -07:00
Sebastian Messmer	958d627288	Remove dead function (#26259 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26259 This wasn't called from anywhere (confirmed by grep) ghstack-source-id: 90222268 Test Plan: waitforsandcastle Differential Revision: D17391417 fbshipit-source-id: 77c395f2f7104995f6af6e3e20d3f615223085b3	2019-09-17 10:31:38 -07:00
Sebastian Messmer	2470031f33	Fixed size arrays (#23695 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23695 Fix JIT schema inference for fixed sized arrays like `int[3]` and move corresponding ops to the c10 dispatcher. ghstack-source-id: 90222271 Test Plan: waitforsandcastle Differential Revision: D16611697 fbshipit-source-id: b20a479ffcd8fe8421b11bb259802745923e3b0d	2019-09-17 10:31:34 -07:00
Sebastian Messmer	2b20ba7bb4	Move more ops to c10 (#26255 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26255 Add some more ops that work fine without needing fixes [namedtensor ci] ghstack-source-id: 90222272 Test Plan: unit tests Differential Revision: D17390980 fbshipit-source-id: 0eeae69a409a8cfd9195b71053c1f6202ddd3509	2019-09-17 10:31:30 -07:00
Will Feng	57a4b7c55d	Re-organize C++ API `torch::nn` folder structure (#26262 ) Summary: This PR aims to re-organize C++ API `torch::nn` folder structure in the following way: - Every module in `torch/csrc/api/include/torch/nn/modules/` (except `any.h`, `named_any.h`, `modulelist.h`, `sequential.h`, `embedding.h`) has a strictly equivalent Python file in `torch/nn/modules/`. For example: `torch/csrc/api/include/torch/nn/modules/pooling.h` -> `torch/nn/modules/pooling.py` `torch/csrc/api/include/torch/nn/modules/conv.h` -> `torch/nn/modules/conv.py` `torch/csrc/api/include/torch/nn/modules/batchnorm.h` -> `torch/nn/modules/batchnorm.py` `torch/csrc/api/include/torch/nn/modules/sparse.h` -> `torch/nn/modules/sparse.py` - Containers such as `any.h`, `named_any.h`, `modulelist.h`, `sequential.h` are moved into `torch/csrc/api/include/torch/nn/modules/container/`, because their implementations are too long to be combined into one file (like `torch/nn/modules/container.py` in Python API) - `embedding.h` is not renamed to `sparse.h` yet, because we have another work stream that works on API parity for Embedding and EmbeddingBag, and renaming the file would cause conflict. After the embedding API parity work is done, we will rename `embedding.h` to `sparse.h` to match the Python file name, and move the embedding options out to options/ folder. - `torch/csrc/api/include/torch/nn/functional/` is added, and the folder structure mirrors that of `torch/csrc/api/include/torch/nn/modules/`. For example, `torch/csrc/api/include/torch/nn/functional/pooling.h` contains the functions for pooling, which are then used by the pooling modules in `torch/csrc/api/include/torch/nn/modules/pooling.h`. - `torch/csrc/api/include/torch/nn/options/` is added, and the folder structure mirrors that of `torch/csrc/api/include/torch/nn/modules/`. For example, `torch/csrc/api/include/torch/nn/options/pooling.h` contains MaxPoolOptions, which is used by both MaxPool modules in `torch/csrc/api/include/torch/nn/modules/pooling.h`, and max_pool functions in `torch/csrc/api/include/torch/nn/functional/pooling.h`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26262 Differential Revision: D17422426 Pulled By: yf225 fbshipit-source-id: c413d2a374ba716dac81db31516619bbd879db7f	2019-09-17 10:07:29 -07:00
Richard Zou	caed485873	Turn on BUILD_NAMEDTENSOR permanently (#26060 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26060 This PR enables BUILD_NAMEDTENSOR by default. This is done via including a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`. In the future, the plan is to get rid of the flag entirely: we can incrementally delete usages after this PR goes in. This PR also maintains the namedtensor ci vs regular ci distinction. `test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified. TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this distinction later and send out an announcement about it; devs will be responsible for named tensor failures after that. The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we could quickly prototype named tensor features without worrying about adding overhead to the framework. The overheads can be categorized as memory overhead and performance overhead. Memory overhead: named tensors adds 1 additional word per Tensor. This is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>` field. This is not a lot of overhead. Performance overhead: At all entry points to name inference, we check if inputs to an op are named. If inputs are not named, we short-circuit and don't do name inference. These calls should therefore be as efficient as error-checking code and not take up a lot of time. My plan is to benchmark a few functions and then post the results in a comment to this PR. Test Plan: - [namedtensor ci] Differential Revision: D17331635 Pulled By: zou3519 fbshipit-source-id: deed901347448ae2c26066c1fa432e3dc0cadb92	2019-09-17 08:25:00 -07:00
Iurii Zdebskyi	1accc38b75	Enabled bfloat16 dtype on CUDA (#26148 ) Summary: Enabled basic functionality for bfloat16 dtype on CUDA. Tested via unit tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26148 Differential Revision: D17367016 Pulled By: izdeby fbshipit-source-id: 7e6ae7c6aa4e21f076d8b70b91e26b50063c6875	2019-09-17 08:17:36 -07:00
Edward Yang	19b4314f30	Fix typo (#26298 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26298 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17420724 Pulled By: ezyang fbshipit-source-id: b8e651d0dfe7abec5615e849bdd5d1a19feb7b40	2019-09-17 08:02:11 -07:00
Gregory Chanan	a3915bdb9d	Replace simple if_true / if_false cases in Declarations.cwrap. (#26285 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26285 I renamed: THTensor_(std / var) -> THTensor(std_single / var_single) THTensor(stdall / varall) -> THTensor(std_all, var_all) because I reversed the meaning of the bias/unbiased parameters (to match ATen) and type checking wouldn't catch failures. Test Plan: Imported from OSS Differential Revision: D17397227 Pulled By: gchanan fbshipit-source-id: 244fe878d4e1045620137c00fbaea6e6f919fc8d	2019-09-17 07:49:30 -07:00
Michael Kuchnik	e5d9a5e5be	Fix typo in docs. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26263 Differential Revision: D17397190 Pulled By: ezyang fbshipit-source-id: 62e3c4c3021c728a3314262528579676d605a81e	2019-09-17 07:46:49 -07:00
Ralf Gommers	1b4951d3a5	Fix remaining invalid function cast warnings that show up with GCC 8/9 (#26104 ) Summary: Follow-up to gh-25483, more of the same fixes for warnings like: ``` ../torch/csrc/autograd/python_variable.cpp:503:31: warning: cast between incompatible function types from ‘PyObject* ()(THPVariable)’ {aka ‘_object* ()(THPVariable)’} to ‘getter’ {aka ‘_object* ()(_object, void*)’} [-Wcast-function-type] 503 \| {"_backward_hooks", (getter)THPVariable_get_backwards_hooks, (setter)THPVariable_set_backwards_hooks, nullptr, nullptr}, \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` This takes the build log output for a full rebuild with GCC 9.1 from ~10,000 to ~7,000 lines. `clang-tidy` is going to complain, no way around that - see discussion at the end of gh-25483. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26104 Differential Revision: D17396831 Pulled By: ezyang fbshipit-source-id: d71696bfe4dbe25519e4bcb7753151c118bd39f7	2019-09-17 07:43:37 -07:00
Gregory Chanan	30f31c66ba	Kill declared_type and ignore_check from THFormal. (#26284 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26284 They aren't used anymore. Test Plan: Imported from OSS Differential Revision: D17397182 Pulled By: gchanan fbshipit-source-id: 3f1cc0fd12aa8f8589548640421b206fa7c571e1	2019-09-17 07:40:33 -07:00
Edward Yang	925131a85e	Fix race in CUDA initialization (#25788 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25788 Previously, I thought that _lazy_init held the GIL throughout initialization, so I could write the code in a single-threaded manner. This is not true; it releases the GIL at various points, which make it possible for another thread to race with initialization. The correct fix is to add locking for the initialization section, so other threads wait until the first thread finishes initializing before being let in. There is some subtlety with how to handle lazy calls, which will call _lazy_init reentrantly; this is handled using TLS that lets you know if you are the initializing thread (and therefore reentrant calls are OK.) Fixes #16559 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17366348 Pulled By: ezyang fbshipit-source-id: 99b982709323e2370d03c127c46d87be97495916	2019-09-17 07:40:29 -07:00
peter	2ce8c83f67	Enable CPU fused kernel on Windows Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25578 Differential Revision: D17397156 Pulled By: ezyang fbshipit-source-id: b243528c2bfd5a0d401897833048429e67fe40ef	2019-09-17 07:29:40 -07:00
Lu Fang	bebc3d6aad	Automatic update of fbcode/onnx to 1316afc9f972f81340faa05763e2898f38bcc3b0 (#26309 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26309 Previous import was 95252c2adec185e305e34486c6756ece9aa8f57f Included changes: - [1316afc9](https://github.com/onnx/onnx/commit/1316afc9): Update IR doc to clarify initializers are permitted as node inputs (#2320) <G. Ramalingam> - [5e920d0c](https://github.com/onnx/onnx/commit/5e920d0c): Avoid uses of special chars (#2315) <Wei-Sheng Chin> - [2fa08b0f](https://github.com/onnx/onnx/commit/2fa08b0f): Regenerate ONNX proto and add release date to ver 6 IR (#2316) <Wei-Sheng Chin> - [adf9c7a3](https://github.com/onnx/onnx/commit/adf9c7a3): Add description of default type about y_zero_point (#2110) <Takeshi Watanabe> - [ee7072c7](https://github.com/onnx/onnx/commit/ee7072c7): Support make_attribute empty string (#2129) <shjwudp> - [f913b6e7](https://github.com/onnx/onnx/commit/f913b6e7): More unsqueeze tests (#2200) <James Allingham> - [57b51937](https://github.com/onnx/onnx/commit/57b51937): Fix resize shape inference issue in opset10 (#2294) <Bowen Bao> - [d7595f34](https://github.com/onnx/onnx/commit/d7595f34): Sequence related ops (#2249) <Bowen Bao> - [599f3da9](https://github.com/onnx/onnx/commit/599f3da9): Add helper function update_inputs_outputs_dims to tools (#2148) <Bowen Bao> - [3e6382bc](https://github.com/onnx/onnx/commit/3e6382bc): Update documentation about required input output types (#2310) <G. Ramalingam> - [0c765d9b](https://github.com/onnx/onnx/commit/0c765d9b): Shape inference for NMS (#2269) <Hariharan Seshadri> - [89266710](https://github.com/onnx/onnx/commit/89266710): Fix extra collect_snippets warning (#2277) (#2307) <Lutz Roeder> Test Plan: ci Reviewed By: hl475 Differential Revision: D17403954 fbshipit-source-id: 78a9c3ecf5aa7f7a0ba8ea30286eab61ee903772	2019-09-17 06:46:59 -07:00
Andrey Malevich	28d3eb8156	Back out "Back out "[Caffe2] Fix device_option propagation"" (#25908 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25908 Original commit changeset: f6e961e88c01 device_option propagation is completely broken in Caffe2 for cases when pass through operators are used. As an example Gather operator don't have gradient and passes through it's inputs, which results in incorrect detection of the components for sparse parameter aggregation (component will be empty instead of the real device). This diff is trying to fix this issue. Original diff had a problem, that Caffe2 is not handling cases when device option is present, but contains only metadata (for example one for auto-generated reduction ops in backward pass). This diff is addressing this issue by merging device options during the backward pass Test Plan: 1. net_transform is finally working with Gather + FloatToHalf transformed model instead of failing because of incorrect number of components. 2. New unit-test. 3. Verify that previously broken benchmark is now passing ezyang do you have suggestions what else I should test? Reviewed By: ezyang Differential Revision: D17281528 fbshipit-source-id: 4a1bc386f29f6a34fbf8008effde9d4890abebfa	2019-09-17 04:01:36 -07:00
Pieter Noordhuis	9ef86b04e5	Make TORCH_WARN_ONCE capture variables by reference (#26289 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26289 It's not possible to refer to values of local variables otherwise. ghstack-source-id: 90160797 Test Plan: The code compiles. Differential Revision: D17397702 fbshipit-source-id: 49c74c44c88f197264603e4978e3d60bf199f6ac	2019-09-17 03:49:17 -07:00
Jianyu Huang	8b7a12dd39	Average Pooling 3D AVX2 Implementation (#26111 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26111 3D AveragePool AVX2 implementation. ghstack-source-id: 89997917 Test Plan: buck test mode/dev //caffe2/caffe2/quantization/server:pool_dnnlowp_op_test ``` [jianyuhuang@devvm794.ftw3.facebook.com: ~/fbsource/fbcode/caffe2/aten/src/ATen/native] $ buck test mode/dev //caffe2/caffe2/quantization/server:pool_dnnlowp_op_test Building: finished in 7.3 sec (100%) 9885/9885 jobs, 5 updated Total time: 7.8 sec Trace available for this run at /tmp/testpilot.20190912-113555.187864.log TestPilot test runner for Facebook. See https://fburl.com/testpilot for details. Testpilot build revision 56b41ca91d4d1fbda32ec1f4d992fa85f9215fd1 fbpkg 8ef46ee301e64eb1aab58fe98a6a0777 at Wed Sep 11 23:27:02 2019 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/698/t.par Discovering tests Running 2 tests Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/3659174704896894 ✓ caffe2/caffe2/quantization/server:pool_dnnlowp_op_test - test_dnnlowp_max_pool (caffe2.caffe2.quantization.server.pool_dnnlowp_op_test.DNNLowPOpPoolTest) 0.358 1/2 (passed) ✓ caffe2/caffe2/quantization/server:pool_dnnlowp_op_test - test_dnnlowp_average_pool (caffe2.caffe2.quantization.server.pool_dnnlowp_op_test.DNNLowPOpPoolTest) 0.331 2/2 (passed) ✓ caffe2/caffe2/quantization/server:pool_dnnlowp_op_test - main 0.000 (passed) Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/3659174704896894 Summary (total time 7.91s): PASS: 3 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 0 ``` Reviewed By: dskhudia Differential Revision: D17346452 fbshipit-source-id: a3342b2fa22c8eaed8426a110ad7e2cc056ed373	2019-09-17 03:41:34 -07:00
vishwakftw	2dac673861	Enable batching for pinverse (#26095 ) Summary: Changelog: - Modify existing implementation of pinverse to support batching on inputs Pull Request resolved: https://github.com/pytorch/pytorch/pull/26095 Test Plan: - Added tests in test_pinverse to test batched implementation Differential Revision: D17408092 Pulled By: soumith fbshipit-source-id: bba95eb193ce33a94ecfaf74da270d34b435e4af	2019-09-16 23:19:16 -07:00
Hong Xu	81d7675301	Ensure that n is non-negative in polygamma. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26294 Differential Revision: D17416847 Pulled By: soumith fbshipit-source-id: 17d5576e019e31e85c0308fb956524484e526cf6	2019-09-16 23:16:11 -07:00
BowenBao	13a07f163e	fix test_arange and bump ort ci version (#26320 ) Summary: It appears to be a bug with test_arange, which wasn't revealed with older version of onnxruntime. TLDR. The test tries to update exported onnx model to accept dynamic sized input, however it is written incorrectly such that the exported model input is still fixed sized. Meanwhile, the version of ort in CI doesn't validate if model input size matches with input data, so this error was not found. Affecting ci in https://github.com/pytorch/pytorch/pull/25797 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26320 Reviewed By: hl475 Differential Revision: D17406442 Pulled By: houseroad fbshipit-source-id: a09ad4b925ccbed0b71342f5aaa7878e1c4a5a2d	2019-09-16 22:25:00 -07:00
Ashkan Aliabadi	dc851ab5d4	Integrate forked QNNPACK into mobile PyTorch builds. (#25844 ) Summary: Enable forked QNNPACK builds in PyTorch mobile. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25844 Differential Revision: D17336458 Pulled By: AshkanAliabadi fbshipit-source-id: 6ea09dd6c114b64313e9159bf7f17253bc87bfdb	2019-09-16 20:50:43 -07:00
Mike Ruberry	226ee7a889	Adds generic device tests to test_autograd.py (#26248 ) Summary: - Adds new decorators for skipping on ROCm, skipping on MKL, running only on the CPU and running only on CUDA - Makes decorator skip semantics consistent - Adds CUDA default stream requirement to MAGMA decorator - Creates TestAutogradDeviceType Note this PR originally moved test_cdist, but moving it caused failures in CI. There may be an undiagnosed issue with cdist or the test. The issue does not reproduce locally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26248 Test Plan: Change is to tests themselves. Differential Revision: D17410386 Pulled By: mruberry fbshipit-source-id: 8459df44f2a00f0e71680fbe713587a01d4b0300	2019-09-16 20:25:25 -07:00
Ivan Kobzarev	b07991f7f5	Fix error messages; tensor creation method names with type (#26219 ) Summary: After offline discussion with dzhulgakov : - In future we will introduce creation of byte signed and byte unsigned dtype tensors, but java has only signed byte - we will have to add some separation for it in method names ( java types and tensor types can not be clearly mapped) => Returning type in method names - fixes in error messages - non-static method Tensor.numel() - Change Tensor toString() to be more consistent with python Update on Sep 16: Type renaming on java side to uint8, int8, int32, float32, int64, float64 ``` public abstract class Tensor { public static final int DTYPE_UINT8 = 1; public static final int DTYPE_INT8 = 2; public static final int DTYPE_INT32 = 3; public static final int DTYPE_FLOAT32 = 4; public static final int DTYPE_INT64 = 5; public static final int DTYPE_FLOAT64 = 6; ``` ``` public static Tensor newUInt8Tensor(long[] shape, byte[] data) public static Tensor newInt8Tensor(long[] shape, byte[] data) public static Tensor newInt32Tensor(long[] shape, int[] data) public static Tensor newFloat32Tensor(long[] shape, float[] data) public static Tensor newInt64Tensor(long[] shape, long[] data) public static Tensor newFloat64Tensor(long[] shape, double[] data) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/26219 Differential Revision: D17406467 Pulled By: IvanKobzarev fbshipit-source-id: a0d7d44dc8ce8a562da1a18bd873db762975b184	2019-09-16 18:27:16 -07:00
Ivan Kobzarev	448c53747a	CircleCI android nightly (snapshot) build publishing (#26069 ) Summary: To publish android snapshots to sonatype repository: 1. set gradle properties SONATYPE_NEXUS_USERNAME, SONATYPE_NEXUS_PASSWORD, ANDROID_SIGN_KEY, ANDROID_SIGN_PASS these variables are stored as context environment variables in 'org-member' circleCI context 2. gradle -p ~/workspace/android/ uploadArchives Due to gradle bugs in version 5 uploadArchives task works correctly with gradle version 4.10.3 That is also the reason of changes `archiveClassifier.set('sources')` -> `classifier = 'sources'` as archiveClassifier was introduced in version 5 Registering nightly build job that publishes *-SNAPSHOT version of android api Testing: CircleCI successful snapshot publishing run https://circleci.com/gh/pytorch/pytorch/2786503?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link Corresponding published artifacts can be seen: https://oss.sonatype.org/#nexus-search;quick~pytorch_android <img width="1316" alt="Screenshot 2019-09-16 09 36 14" src="https://user-images.githubusercontent.com/6638825/64976167-7f447480-d865-11e9-95c5-874c5cd62b6d.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/26069 Differential Revision: D17406399 Pulled By: IvanKobzarev fbshipit-source-id: c3dc1e68f02aacbb60d21f8355f676e6e5fc2897	2019-09-16 18:07:53 -07:00
Halil Akin	31960e8872	Add missing argument for failing function call (#26311 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26311 We are currently unable to deploy models due to D16955662 changing function signature of ```quantized_lstm(``` but the function call here (https://fburl.com/diffusion/e4wrmx83) not passing the newly added ```use_dynamic``` param. Here is the details of the error: P111215482 ``` E0916 12:36:16.423516 1149877 ExceptionTracer.cpp:214] exception stack complete terminate called after throwing an instance of 'torch::jit::script::ErrorReport' what(): Arguments for call are not valid. The following operator variants are available: aten::quantized_lstm(Tensor input, Tensor[] hx, Tensor[] params, bool has_biases, int num_layers, float dropout, bool train, bool bidirectional, bool batch_first, *, int? dtype=None) -> (Tensor, Tensor, Tensor): Keyword argument use_dynamic unknown. ``` This diff fixes that. Test Plan: Running quantization tests after. ```buck test mode/dev caffe2/test:jit -- 'test_quantization_modules $test_jit\.TestScript$'``` https://our.intern.facebook.com/intern/testinfra/testrun/5910974518872494 Also, currently building a package (language_technology.translation.jedi.scripts:35c3643) and testing this (f138747078). f138771702 Reviewed By: jhcross Differential Revision: D17404451 fbshipit-source-id: 390d2ce1ecbdd63a07a8f16c80e4c3ac25ab0a99	2019-09-16 17:04:14 -07:00
neginraoof	fcb100a3e0	Export round (#26126 ) Summary: Added round export in opset 11 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26126 Reviewed By: hl475 Differential Revision: D17403589 Pulled By: houseroad fbshipit-source-id: f9ac3f7602c50019b9feadda8d5d944a058c5455	2019-09-16 16:40:10 -07:00
Gregory Chanan	5aff3dbaf6	Kill 'default_init', which isn't needed anymore. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26281 Test Plan: Imported from OSS Differential Revision: D17397097 Pulled By: gchanan fbshipit-source-id: fb53e90637a3dfb2300fca78f414abe2d82832f3	2019-09-16 16:20:49 -07:00
vishwakftw	03e3f130c6	Add derivative of cholesky_solve (#26185 ) Summary: Changelog: - Add derivative of cholesky_solve. The equations are derived akin to the derivative of solve methods using the technique detailed [here](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=2ahUKEwiXrOjIyM7kAhWstlkKHRxqCDgQFjAAegQIAhAC&url=https%3A%2F%2Fpeople.maths.ox.ac.uk%2Fgilesm%2Ffiles%2FNA-08-01.pdf&usg=AOvVaw0BNISOvM_I9KjPrl0xv1R_) Pull Request resolved: https://github.com/pytorch/pytorch/pull/26185 Test Plan: - Added tests for cholesky_solve in test_autograd.py Closes half of https://github.com/pytorch/pytorch/issues/4669. Differential Revision: D17408123 Pulled By: soumith fbshipit-source-id: f9668c8d4d758c0dc658941a8b730a17683091aa	2019-09-16 16:18:26 -07:00
Hong Xu	a96e41b7c0	Use expected_wrapper only if CMAKE_{C,CXX}_COMPILER and/or is not set by user (#26306 ) Summary: This will honor user's preference. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26306 Differential Revision: D17408030 Pulled By: soumith fbshipit-source-id: 6841b805603d40cd7caf78dbb42405a0c931f052	2019-09-16 16:12:29 -07:00
Daya Khudia	2b52c1d982	Dynamic quantization for bias. (#26057 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26057 bias is now unquantized (i.e. floating type) for qconv and qlinear. It is dynamically quantized by fbgemm. TODO: Add some performance numbers. Tests: test:quantization ``` Summary (total time 8.41s): PASS: 24 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0More details at https://our.intern.facebook.com/intern/buck/build/74d5f6f7-55c9-4350-a618-2013042fffd8 OMIT: 0 ``` test:quantized ``` Summary (total time 13.21s): PASS: 43 FAIL: 0 SKIP: 5 caffe2/test:quantized - test_qnnpack_maxpool2d (test_quantized.TestQNNPackOps) caffe2/test:quantized - test_compare_tensor_scalar (test_quantized.TestComparatorOps) caffe2/test:quantized - test_qnnpack_linear (test_quantized.TestQNNPackOps) caffe2/test:quantized - test_qnnpack_relu (test_quantized.TestQNNPackOps) caffe2/test:quantized - test_qnnpack_add (test_quantized.TestQNNPackOps) FATAL: 0 TIMEOUT: 0 OMIT: 0 ``` ghstack-source-id: 90166254 Test Plan: buck test mode/dev caffe2/test:quantization buck test mode/dev caffe2/test:quantized Differential Revision: D17328028 fbshipit-source-id: d4a163d730d0f4a03e8e0faf7420710cf36eec09	2019-09-16 14:43:06 -07:00
Kyle Engel	4a947b607c	Clarified ambiguous docstring in NegativeBinomial Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25923 Differential Revision: D17392848 Pulled By: soumith fbshipit-source-id: 2833e72fe449c74dfd8273a7b1eb46c05c63d999	2019-09-16 14:38:32 -07:00
Bulent Abali	327e94f51b	Add __s390x__ compiler define for s390 builds. (#26233 ) Summary: pytorch builds fail on 390 architecture because in simd.h the ifdef macros default to an x86 asm instruction. This patchs adds an ifdef __s390x__ to be able to build on s390. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26233 Differential Revision: D17392714 Pulled By: soumith fbshipit-source-id: 037672bfea64fc5e52da2390d93b973534137c12	2019-09-16 14:31:51 -07:00
Jerry Zhang	06c69ad8ed	Whiltelist and fusion support for quantized::linear - matmul(with bias) (#26204 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26204 Support quant fusion for `matmul` with bias to `quantized::linear`. Test Plan: python test/test_jit.py 'TestJit.test_quant_fusion' Imported from OSS Differential Revision: D17380073 fbshipit-source-id: 00014469a852cc5d5b66469fc4b8d05eafba1e3e	2019-09-16 14:05:50 -07:00
Junjie Bai	6f87a1891e	Upgrade Caffe2 docker images to 306 to include roctracer and rocprofiler Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26260 Differential Revision: D17391902 Pulled By: bddppq fbshipit-source-id: 89ab3dedf05ba398acb7300fac95f03cfb31f0ba	2019-09-16 13:24:31 -07:00
Gregory Chanan	ffbffb69c6	Kill defaults in nn.yaml. (#26282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26282 Since this isn't the end-user API anymore, we shouldn't have defaults. Test Plan: Imported from OSS Differential Revision: D17397153 Pulled By: gchanan fbshipit-source-id: d44040bec0ee9c70734a53ebcc10a96f12226a29	2019-09-16 12:22:55 -07:00
Sebastian Messmer	6df70db807	Disable broken unit tests (#26301 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26301 - ghstack-source-id: 90176419 Test Plan: waitforsandcastle Differential Revision: D17400971 fbshipit-source-id: b6f9cb27fe955b0200d62591300c70ba79a90e5f	2019-09-16 12:12:39 -07:00
Pieter Noordhuis	f43a2c9c2f	Add ProcessGroupGloo::createDefaultDevice (#26166 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26166 There were 2 variants to create a new device. One to do so based the name of a network interface, and one to do so based on a hostname or address. In the latter, if the address was not specified, it would lookup the local hostname and try to resolve that. If that failed, the process would crash. In this default path, we now try to lookup and use the local hostname, and if that fails we fallback to using the loopback address. If the local hostname doesn't resolve to an address that we can bind to, it is very likely that this process won't join other processes over the network, and that the user is trying to run a local test. If this assumption is wrong, the user can override the default interface selection by setting the environment variable `GLOO_SOCKET_IFNAME` to the name of the external network interface. I tested this by changing the local hostname to a bogus name and confirmed that default initialization works as expected. Closes #26049. Test Plan: Imported from OSS Differential Revision: D17397898 Pulled By: pietern fbshipit-source-id: 95a2467761d89df87b520d6e5837b92184b0dc12	2019-09-16 12:00:43 -07:00
svcscm	7a7425cc48	Updating submodules Summary: GitHub commits: `653434b898` `b74fbefc1a` `9bd5fce6e8` `6efcef720f` `cb7830b6b3` `53f0c0d175` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 78d0e24f5601aa990391a2404ae9d23b325de93f	2019-09-16 11:44:28 -07:00
Jerry Zhang	fd3cc36fab	Whiltelist and fusion support for quantized::linear - matmul(without bias) (#26209 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26209 Support quant fusion for `matmul`(without bias) -> `quantized::linear` Test Plan: python test/test_jit.py 'TestJit.test_quant_fusion' Imported from OSS Differential Revision: D17380075 fbshipit-source-id: 290caee7f7bcf94d2731c0ee9bd40054f0fb9b07	2019-09-16 11:33:48 -07:00
Jerry Zhang	f95d2b61d1	Whiltelist and fusion support for quantized::linear - addmm (#26208 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26208 Supporing `addmm` -> `quantized::linear` quant fusion Test Plan: python test/test_jit.py 'TestJit.test_quant_fusion' Imported from OSS Differential Revision: D17380074 fbshipit-source-id: fae88f118f85663d777648695768b0504ed7ccf9	2019-09-16 10:48:20 -07:00
Hong Xu	c92ed8dd44	Move the CUDA implementation of round to ATen. (#25041 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25041 Fix #24617 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25041 Test Plan: Imported from OSS Differential Revision: D17114368 Pulled By: VitalyFedyunin fbshipit-source-id: 6ec6ef99b4451acd7e93491fd4b44fca9ce1809d	2019-09-16 09:54:30 -07:00
Iurii Zdebskyi	b6d1105eb6	Enabled conv methods for the bfloat16 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26167 Differential Revision: D17367728 Pulled By: izdeby fbshipit-source-id: 0a7bd9a6dbc15815af195d644c9372af2135e93a	2019-09-16 09:47:42 -07:00
Rohan Varma	4e538ebcf3	Migrate away from using Variable( in test_nn.py (#26077 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26077 As per #26071, we would like to get rid of the calls to Variable( where possible. This diff removes the calls in the test file test_nn.py. The unit tests should all still pass as expected. ghstack-source-id: 90086624 Test Plan: tests in `test_nn.py` should all pass. Differential Revision: D17336484 fbshipit-source-id: 43fc7bd0b0be835ae89d06162ce1cbe4e0056d91	2019-09-16 09:37:54 -07:00
Sebastian Messmer	c006356034	fix hypothesis timeout (#26280 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26280 ghstack-source-id: 90160270 Test Plan: testinprod Differential Revision: D17396861 fbshipit-source-id: ee2348ffa7f6092e2c5647a42d0e17879dcfacd0	2019-09-16 09:09:44 -07:00
Gu, Jinghui	38b2bc1451	Upgrade MKLDNN to v0.20.5 (#25757 ) Summary: 1. Fix issues exposed by below posts. https://github.com/pytorch/pytorch/issues/25242 https://github.com/pytorch/pytorch/issues/25101 https://github.com/pytorch/pytorch/issues/23825 2. Fix RNN support issue in mkldnn-bridge Pull Request resolved: https://github.com/pytorch/pytorch/pull/25757 Differential Revision: D17367948 Pulled By: VitalyFedyunin fbshipit-source-id: d8430d3909ecbf853afa0ce3d968735f86f1da31	2019-09-16 09:01:56 -07:00
SsnL	df9d8f9032	Fix no auto batching bugs: cannot bulk load; not work with namedtuple (#26065 ) Summary: see title Pull Request resolved: https://github.com/pytorch/pytorch/pull/26065 Differential Revision: D17392851 Pulled By: soumith fbshipit-source-id: 468cd41c8e03d689ff2e0261d948e28daad6bfaf	2019-09-16 07:22:31 -07:00
Sebastian Messmer	24ae9b5040	Fix binary size of OpsAlreadyMovedToC10.cpp (#26237 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26237 Calling a lot of `std::string` constructors is horrible for binary size, see t53997334. Using `const char*` instead should make the binary size much smaller. ghstack-source-id: 90145501 Test Plan: size checks on the diff Differential Revision: D17386002 fbshipit-source-id: c5420adf225e535396e806a0df92419a7e2ad3e8	2019-09-16 00:28:23 -07:00
Johannes M Dieterich	976cefdb41	Switch to the new profiler infrastructure (#26174 ) Summary: The ones supported going forward are rocprofiler and roctracer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26174 Differential Revision: D17387538 Pulled By: bddppq fbshipit-source-id: 19d9828d9d07b5073ab5fa288e24fd65a8b18b52	2019-09-15 17:52:18 -07:00
Richard Zou	91fc6f3b94	Fix namedtensor ci (#26257 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26257 In native_functions.yaml, all overloads must have unique overload names. This PR fixes `flatten` to have unique names for the overloads. Test Plan: - tested locally, but also [namedtensor ci] Differential Revision: D17391243 Pulled By: zou3519 fbshipit-source-id: aaef654953b4275c43b9d7bd949c46bd011f6c73	2019-09-15 17:41:30 -07:00
Mike Ruberry	31139b5f9a	Back out "[pytorch][PR] Refines test_torch.py generic device testing" (#26252 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26252 Original commit changeset: 1375774f24c2 Testing to see if this is somehow the source of hangs on ROCm builds. Test Plan: Change is to tests themselves. This diff is for testing the ROCm hang, however. Differential Revision: D17390575 fbshipit-source-id: a6ffd5eb1df3971b99b6d42271a8d3d501ac79c6	2019-09-15 13:42:25 -07:00
Sebastian Messmer	21ba320cd5	Fix CI (#26250 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26250 Exclude some ops from the c10 dispatcher that don't work with it yet. ghstack-source-id: 90138046 Test Plan: waitforsandcastle Reviewed By: zou3519 Differential Revision: D17390117 fbshipit-source-id: a87fb3048aeba2c3293b95d610ddb8e94369f8fe	2019-09-15 12:15:40 -07:00
Sebastian Messmer	a2e5445fcf	Fix Windows build (#26246 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26246 Broken due to https://github.com/pytorch/pytorch/issues/12117. Try fixing it. ghstack-source-id: 90137033 Test Plan: waitforsandcastle Reviewed By: zou3519 Differential Revision: D17387317 fbshipit-source-id: 705998c0b1608668d510b47f4fe20cecf5057c5f	2019-09-15 11:24:21 -07:00
Mike Ruberry	b6b2b4c18f	Refines test_torch.py generic device testing (#26244 ) Summary: - Adds SkipCUDAIfRocm and skipCPUIfNoMkl decorators, ports corresponding tests - Changes "SkipIf" input semantics for consistency - Removes torchtest, which has been replaced with this new generic framework - Refactors some common parts out of CUDA tests to TestTorchDeviceType - Ensures all MAGMA tests run on default stream by putting the skipCUDANonDefaultStreamIf in the skipCUDAIfNoMagma decorator. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26244 Differential Revision: D17389060 Pulled By: mruberry fbshipit-source-id: 1375774f24c2266049e6d4b899e7300ddf32eac8	2019-09-15 03:35:23 -07:00
Sebastian Messmer	26d537d744	Remove unboxedAutogradKernel from c10 (#26130 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26130 Since we now just use TensorTypeId::VariableTensorId, there's no need to treat autograd kernels any differently. ghstack-source-id: 90130457 Test Plan: unit tests Differential Revision: D17353873 fbshipit-source-id: d4468506a5366bc5e7429144b090b3e78af9de62	2019-09-15 01:18:11 -07:00
Sebastian Messmer	0e30e6570d	Call aten ops through c10 dispatcher (#23668 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23668 - The eager mode frontend now calls operators who are defined in native_functions.yaml with `use_c10_dispatcher: True` through the c10 dispatcher and not anymore through globalATenDispatch(). - These operators aren't registered with globalAtenDispatch anymore, only on c10 now. - Backend extensions calling globalATenDispatch().registerOp() to add their own kernels still work, this function will forward the registration to the c10 dispatcher for them. ghstack-source-id: 90130455 Test Plan: benchmarks at https://docs.google.com/document/d/1gpzKZcFf1JJameY1vKxF7Cloul9s6D8HKIK2_Pp1hFo/edit# Differential Revision: D16603133 fbshipit-source-id: 991f17b355e9c78c5e86fee4fa381df7ab98ac82	2019-09-15 01:18:07 -07:00
Chaitanya Sri Krishna Lolla	e86d99ae88	Use MIOpen for transpose convolutions (#26172 ) Summary: Provides significant performance uplift where used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26172 Differential Revision: D17374862 Pulled By: bddppq fbshipit-source-id: 85d2df3c67b8935bc54f3a81a912a25c0102743a	2019-09-14 23:23:53 -07:00
Dmytro Dzhulgakov	df338f80a6	Add a wrapper for inspect in JIT to produce better error message (#25415 ) Summary: If source code is not available due to packaging (e.g. sources are compiled to .pyc), TorchScript produces very obscure error message. This tries to make it nicer and allow to customize message by overriding _utils_internal. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25415 Test Plan: Really hard to unittest properly. Did one off testing by compiling to .pyc and checking the message. Differential Revision: D17118238 Pulled By: dzhulgakov fbshipit-source-id: 3cbfee0abddc8613000680548bfe0b8ed52a36b0	2019-09-14 21:27:51 -07:00
Yaroslav Bulatov	7f3c423541	Add type hint for cuda.set_rng_state (#26200 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/26199 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26200 Differential Revision: D17386885 Pulled By: soumith fbshipit-source-id: 9da03aae29281b2ed691cbfdd7b85fde55e5b7ef	2019-09-14 19:29:42 -07:00
Mike Ruberry	b4b8f53a5d	Ports most of test_torch.py to generic device type framework (#26232 ) Summary: This PR moves many tests in test_torch.py to the generic device type framework. This means that many CUDA tests now run in test_torch.py and there is greater consistency in how tests for many device types are written. One change is that all MAGMA tests are run on the default stream due to intermittent instability running MAGMA on the non-default stream. This is a known issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26232 Test Plan: While this PR edits the tests itself, it was validated using two independent methods: (1) The code was reviewed and it was verified that all deleted functions were actually moved. (2) The output of the TestTorch CI was reviewed and test outputs were matched before and after this PR. Differential Revision: D17386370 Pulled By: mruberry fbshipit-source-id: 843d14911bbd52e8aac6861c0d9bc3d0d9418219	2019-09-14 17:10:47 -07:00
Sebastian Messmer	9f6b6b8101	Back out "[quant][observer] Add histogram observer" (#26236 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26236 Original diff broke oss CI. Reverting. Original commit changeset: 0f047d3349cb ghstack-source-id: 90125990 Test Plan: testinprod Reviewed By: hx89 Differential Revision: D17385490 fbshipit-source-id: 4258502bbc0e3a6dd6852c8ce01ed05eee618b1a	2019-09-14 12:48:46 -07:00
Tao Xu	3051e36e05	Remove armv7s build from iOS (#26222 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26222 ### Summary The last generation of armv7s device is Phone 5C. As discussed with David offline, we decided not to support iOS armv7s devices. ### Test plan - CI finishes successfully - Builds can be run only on X86_64 and arm64 devices Test Plan: Imported from OSS Differential Revision: D17385308 Pulled By: xta0 fbshipit-source-id: f883999aed18224ea3386b1f016964a33270fa34	2019-09-14 11:07:37 -07:00
Pavel Belevich	5f9cbfa1d6	Added possible out of shared memory error message (#25730 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/5040 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25730 Differential Revision: D17226214 Pulled By: pbelevich fbshipit-source-id: 92278272aab74e6690f14fc9597acfd1a98854b7	2019-09-14 05:27:48 -07:00
Mike Ruberry	4160b8cd77	adds sync to flaky test_events_multi_gpu_query (#26231 ) Summary: This test can sometimes fail in CI. I suspect this flakiness is because the test asks a CUDA stream to record an event, fails to synchronize the CPU with that stream, then checks if the event is recorded on the CPU. There is no guarantee this will have happened. This one-line change preserves the intent of the test while ensuring the GPU has recorded the event before the CPU queries it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26231 Differential Revision: D17382110 Pulled By: mruberry fbshipit-source-id: 35b701f87f41c24b208aafde48bf10e1a54de059	2019-09-14 00:34:44 -07:00
Mike Ruberry	fbf991d062	Creates generic device type testing framework (#25967 ) Summary: This PR addresses https://github.com/pytorch/pytorch/issues/24851 by... 1. lets device types easily register themselves for testing 2. lets tests be written to run on multiple devices and with multiple dtypes 3. provides a mechanism to instantiate those tests so they are discoverable and filterable by unittest and pytest It refactors three tests from test_torch.py to demonstrate how to use it. `test_diagonal` is the simplest example. Most tests just need to be modified to accept 'device' as an argument. The framework will then instantiate `test_diagonal_cpu` and `test_diagonal_cuda` (when CUDA is available) which call `test_diagonal` with the appropriate 'device' argument. `test_neg` also has dtype variants. It accepts both 'device' and 'dtype' as arguments, and the dtypes it runs with are specified with the 'dtypes' decorator. Dtypes can be specified for all device types and particular device types. The framework instantiates tests like `test_neg_cpu_torch.float`. `test_inverse` has device-specific dependencies. These dependencies are expressed with the sugary 'skipCUDAIfNoMagma' and 'skipCPUIfNoLapack' decorators. These decorators are device-specific so CPU testing is not skipped if Magma is not installed, and there conditions may be checked after or before the test case has been initialized. This means that skipCUDAIfNoMagma does not initialize CUDA. In fact, CUDA is only initialized if a CUDA test is run. These instantiated tests may be run as usual and with pytest filtering it's easy to run one test on all device types, run all the tests for a particular device type, or run a device type and dtype combination. See the note "Generic Device-Type Testing" for more detail. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25967 Differential Revision: D17381987 Pulled By: mruberry fbshipit-source-id: 4a639641130f0a59d22da0efe0951b24b5bc4bfb	2019-09-13 23:34:28 -07:00
Lu Fang	dc6939ebff	Add isBackwardCompatibleWith for Argument and FunctionSchema (#23409 ) Summary: we intend to be conservative, and will relax the checks in future if necessary. So far, we consider the following three conditions as backward compatible: 1) two schemas are equal 2) two schemas have same number of arguments, and this schema's arguments are backward compatible with the corresponding ones in argument list of old_schema. 3) this schema has m argument, old_argument has n argument, m > n. the first n arguments of this schema are backward compatible with the corresponding arguments of old_schema. the remaning arguments must be either OptionalType or provide default values. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23409 ghstack-source-id: 90111021 Test Plan: buck test //caffe2/test:function_schema Reviewed By: hl475 Differential Revision: D16505203 fbshipit-source-id: e4099537776a60e8945e5c3cd57fa861f3598a9b	2019-09-13 20:37:14 -07:00
Haixin Liu	1563fdb591	Add histogram observer (#23959 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23959 Add histogram observer that records the running histogram of tensor values along with min/max values. ghstack-source-id: 90076996 Test Plan: Added a test test_histogram_observer buck test mode/dev caffe2/test:quantization -- 'test_histogram_observer' buck test mode/dev caffe2/test:quantization -- 'test_observer_scriptable' Differential Revision: D16692835 fbshipit-source-id: 0f047d3349cb9770fad4a2b6cb346c51d9e99cd4	2019-09-13 19:24:04 -07:00
Tao Xu	c6b75cea6e	fix circle CI Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26225 Test Plan: Imported from OSS Differential Revision: D17379899 Pulled By: xta0 fbshipit-source-id: 4077aa0149b23560f3a9e29531ca9bc612a2c09c	2019-09-13 18:19:41 -07:00
Jerry Zhang	6d3ac7f85c	use whitelist for selecting observed values (#25974 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25974 Previously we observe all the Tensor values, but what we want is actually observing only the ones that can be quantized. Test Plan: python test/test_jit.py python test/test_quantizer.py Imported from OSS Differential Revision: D17348986 fbshipit-source-id: 55be0d73862a0e7eb1e7fd882d16e0d830618b63	2019-09-13 15:38:31 -07:00
Ivan Kobzarev	d250f01060	Tensor renaming to dtype, shape; support long, double (#26183 ) Summary: Applying dzhulgakov review comments org.pytorch.Tensor: - dims renamed to shape - typeCode to dtype - numElements to numel newFloatTensor, newIntTensor... to newTensor(...) Add support of dtype=long, double Resorted in code byte,int,float,long,double For if conditions order float,int,byte,long,double as I expect that float and int branches will be used more often Tensor.toString() does not have data, only numel (data buffer capacity) Pull Request resolved: https://github.com/pytorch/pytorch/pull/26183 Differential Revision: D17374332 Pulled By: IvanKobzarev fbshipit-source-id: ee93977d9c43c400b6c054b6286080321ccb81bc	2019-09-13 15:18:41 -07:00
svcscm	1114b05122	Updating submodules Summary: GitHub commits: `97631357aa` `2f1477dfee` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 33029d2e8c6a3664a35823829670f6ed9dfc3b44	2019-09-13 15:09:51 -07:00
Tao Xu	b5a3a8b427	Change the source link in podspec (#26089 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26089 ### Summary A couple of changes 1. Replace the source link with the newly nightly build address 2. Remove module support for Swift and Objective-C 3. Expose all static libraries instead of archiving them into one single library. This is because those static libraries might contain object files that have the same name, e.g. `init.c.o` in both `libcupinfo.a` and `libqnnpack.a`. If we archive them into one using this `libtool -static` command, by default, it only picks one object file and discards the others, which could result in undefined symbols when linking the executable. The change here is to expose all the static libraries and let the linker decide which one to use. ### Test Plan - pod spec lint succeed - `pod spec lint --verbose --allow-warnings --no-clean --use-libraries --skip-import-validation` Test Plan: Imported from OSS Differential Revision: D17363037 Pulled By: xta0 fbshipit-source-id: ba77b0001b58e6e2353d8379d932db598166d37d	2019-09-13 15:00:31 -07:00
Tao Xu	16605ef2eb	Nightly build for for iOS (#26074 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26074 ### Summary This PR creates a nightly job for iOS builds. The job will generate a couple of static libraries that contains three architectures(x86, arm64, armv7s) and upload them to AWS s3. ### Note The test phase in this job is missing right now, meaning if there is a linking error, we won't be able to know it. To add the test jobs, we have to put a dummy test App in the repo and manually link the libraries to the app after the build finishes. This will be done in the next following PRs Test Plan: Imported from OSS Differential Revision: D17363066 Pulled By: xta0 fbshipit-source-id: 5beeb4263af5722f0a852297023f37aaea9ba4b1	2019-09-13 14:24:52 -07:00
svcscm	8c46061e2c	Updating submodules Summary: GitHub commits: `83a6a614e9` `c8cac64995` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 1f5bc1e065fe13d89eeb42539f21a8ab0ab8b8a1	2019-09-13 14:21:17 -07:00
Sebastian Messmer	8321f2592e	Register ATen ops with c10 (#26131 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26131 Changes in this PR: - For each operator with use_c10_dispatcher: True, additionally generate a c10 registration line in TypeDefault.cpp, CPUType.cpp, and other backend files. - This doesn't change globalATenDispatch yet, the c10 registration is purely additional and the operator calling path doesn't change. A diff further up the stack will change these things. - Enable the use_c10_dispatcher: True flag for about ~70% of operators - This also changes the c10->jit operator export because ATen ops are already exported to JIT directly and we don't want to export the registered c10 ops because they would clash - For this, we need a way to recognize if a certain operator is already moved from ATen to c10, this is done by generating a OpsAlreadyMovedToC10.cpp file with the list. A diff further up in the stack will also need this file to make sure we don't break the backend extension API for these ops. Reasons for some ops to be excluded (i.e. not have the `use_c10_dispatcher` flag set to true): - `Tensor?(a!)` (i.e. optional tensor with annotations) not supported in c++ function schema parser yet - `-> void` in native_functions.yaml vs `-> ()` expected by function schema parser - out functions have different argument order in C++ as in the jit schema - `Tensor?` (i.e. optional tensor) doesn't work nicely with undefined tensor sometimes being undefined tensor and sometimes being None. - fixed-size arrays like `int[3]` not supported in c10 yet These will be fixed in separate diffs and then the exclusion tag will be removed. ghstack-source-id: 90060748 Test Plan: a diff stacked on top uses these registrations to call these ops from ATen Differential Revision: D16603131 fbshipit-source-id: 315eb83d0b567eb0cd49973060b44ee1d6d64bfb	2019-09-13 13:52:40 -07:00
Sebastian Messmer	cadf836cbc	Allow overwriting catch-all kernels (#25947 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25947 Previously, the c10 dispatcher didn't allow having a catch-all kernel and backend specific kernels at the same time. This is also the long term goal. But to make the current XLA implementation work, we need to allow them to overwrite these ops with XLA variants. This diff changes that so that ops can have both, catchall and backend specific kernels, and will call into the catchall kernel if there is no more specific kernel registered. This is also the current behavior of globalATenDispatch. ghstack-source-id: 90049398 Test Plan: unit tests Differential Revision: D17293036 fbshipit-source-id: f2d5928e904c1dc9b6b89e9bb468debe48a4056c	2019-09-13 13:52:36 -07:00
Sebastian Messmer	b01520ac9c	Make schema part of RegisterOperators::Options (#26114 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26114 With this diff, the operator schema or name can be specified as part of the options objects: ``` static auto registry = torch::RegisterOperators() .op(torch::RegisterOperators::options().schema("my_op").kernel(&kernel)) .op(...); ``` This does not break backwards compatibility, all old APIs are kept as shorthands. This (a) makes the API more consistent, accumulating all options into the options objects and not treating schema special anymore, and (b) this is required for allowing the c10 dispatcher to forward registration calls to ATenDispatch for ops that are still on that dispatcher, see plan in https://github.com/pytorch/pytorch/issues/24132 ghstack-source-id: 90049402 Test Plan: unit tests Differential Revision: D17350383 fbshipit-source-id: cbb8f33a52dccb2a4522753e7b5ac8ba35b908fd	2019-09-13 13:52:32 -07:00
Ivan Kobzarev	0ea59786e8	Use torch::from_blob instead of shareExternalPointer, nits (#25973 ) Summary: The main part is to switch at::Tensor creation from usage of `torch::empty(torch::IntArrayRef(...))->ShareExternalPointer(...) to torch::from_blob(...)` Removed explicit set of `device CPU` as `at::TensorOptions` by default `device CPU` And renaming of local variables removing `input` prefix to make them shorter Pull Request resolved: https://github.com/pytorch/pytorch/pull/25973 Differential Revision: D17356837 Pulled By: IvanKobzarev fbshipit-source-id: 679e099b8aebd787dbf8ed422dae07a81243e18f	2019-09-13 13:40:11 -07:00
Vincent Quenneville-Belair	a3f0d988d9	Revert D17349760: Change schedulers to chainable form Test Plan: revert-hammer Differential Revision: D17349760 Original commit changeset: 0a6ac01e2a6b fbshipit-source-id: 41c2c136215dabc26cad5098a08eff2a2a29b715	2019-09-13 12:54:59 -07:00
Jerry Zhang	43335cddb7	Fold quantize op into module (#25625 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25625 We want to fold the quantize op for weights/bias into module to avoid quantizing weights on the fly. Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D17208889 fbshipit-source-id: 1854b8953b065855d210bc1166533c08ca264354	2019-09-13 12:27:16 -07:00
Nick Korovaiko	27b5a6c577	Add documentation to logging Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26175 Differential Revision: D17371085 Pulled By: Krovatkin fbshipit-source-id: ea06f4e16fc320940a299e8e1d4f4d7c76f5950a	2019-09-13 12:13:16 -07:00
Aapo Kyrola	20124c4814	guard dyndep with a lock (#26153 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26153 I am suspecting that our multithreaded test-system causes issue with dyndep, if two places try to concurrently InitOpsLibrary. So perhaps we just guard this by a lock. This is just a guess-fix, as it is impossible to repro. Test Plan: sandcastle Reviewed By: bddppq Differential Revision: D17361310 fbshipit-source-id: 596634a2098b18881abbd26a5a727a5ba0d03b6e	2019-09-13 11:38:14 -07:00
Geovanni Zhang	e293c4ea73	Fix 'in' return true incorrectly (#24156 ) Summary: Because of 'return NotImplemented', __contains__ return True when the element is not a number. bool(NotImplemented) == True Pull Request resolved: https://github.com/pytorch/pytorch/pull/24156 Differential Revision: D16829895 Pulled By: zou3519 fbshipit-source-id: 9d3d58025b2b78b33a26fdfcfa6029d0d049f11f	2019-09-13 09:27:58 -07:00
Ailing Zhang	079cd4e1fc	Remove requests as dependency (#26083 ) Summary: local build is slow... test in CI... Pull Request resolved: https://github.com/pytorch/pytorch/pull/26083 Differential Revision: D17346949 Pulled By: ailzhang fbshipit-source-id: f552d1a4be55ad4e2bd915af7c5a2c1b6667c446	2019-09-13 08:39:53 -07:00
Gregory Chanan	07e7c7eb9f	Kill remaining defaults in Declarations.cwrap. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25612 Test Plan: Imported from OSS Differential Revision: D17172499 Pulled By: gchanan fbshipit-source-id: f99e813a4a90e8576541da317027e6f8ae76079b	2019-09-13 08:06:55 -07:00
Gregory Chanan	10f1d3e37b	Get rid of more defaults in Declarations.cwrap. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25611 Test Plan: Imported from OSS Differential Revision: D17172493 Pulled By: gchanan fbshipit-source-id: 0f4319f8024ac4eca62576231214227b341f56c4	2019-09-13 08:06:51 -07:00
Gregory Chanan	fef2d2e3c4	Kill most defaults in Declarations.cwrap. (#25610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25610 They don't do anything anymore, since this isn't the end-user interface. Test Plan: Imported from OSS Differential Revision: D17172495 Pulled By: gchanan fbshipit-source-id: a380d970f0836ed85eb9ac2aa42eb73655d775aa	2019-09-13 08:06:48 -07:00
Pieter Noordhuis	6276958de1	Turn setup_ci_environment into command Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26163 Test Plan: Imported from OSS Differential Revision: D17366536 Pulled By: pietern fbshipit-source-id: 07181a77aaeba5457aa716ceac9cc404aacefe5f	2019-09-13 07:59:22 -07:00
Pieter Noordhuis	12086a6593	Turn setup_linux_system_environment into command Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26162 Test Plan: Imported from OSS Differential Revision: D17366537 Pulled By: pietern fbshipit-source-id: 98413daa344812f06578c3373d8516292d2f21f5	2019-09-13 07:59:18 -07:00
Pieter Noordhuis	0303ecf070	Turn should_run_job into command Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26160 Test Plan: Imported from OSS Differential Revision: D17366539 Pulled By: pietern fbshipit-source-id: a870d6da21925764986c6c748ad291440b78e6fd	2019-09-13 07:59:14 -07:00
Pieter Noordhuis	219a04ee82	Use CircleCI commands for brew update/install (#26159 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26159 The snippets for working with Homebrew were duplicated across binary builds, macOS builds, and iOS builds. In #25336, the CircleCI configuration version was updated to version 2.1, which supports parameterized commands. This means we no longer have to use YAML tricks to duplicate stanzas and instead can natively define a series of reusable steps. Motivation for doing this is that the macOS binary builds were still using the slow `brew update` instead of `git fetch` (see #25988). [test macos] [test wheel] Test Plan: Imported from OSS Differential Revision: D17366538 Pulled By: pietern fbshipit-source-id: 194c0f37c1dc999705f3ba97fdabf4ff18728d93	2019-09-13 07:59:10 -07:00
Pieter Noordhuis	0963e1705b	Run PyTorch macOS CPU-only build/test on all PRs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26096 Test Plan: Imported from OSS Differential Revision: D17366419 Pulled By: pietern fbshipit-source-id: 138659dae346aad3cde52d488cd1780614e7692f	2019-09-13 07:45:57 -07:00
Vincent Quenneville-Belair	939ae80de1	Change schedulers to chainable form (#24352 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24352 Enable chainable schedulers as requested in #13022 by implementing the changes mentioned below from [comment](https://github.com/pytorch/pytorch/pull/21800#issuecomment-513370208). * Changing the behavior of schedulers to the chainable formula when available * Using the closed form whenever epoch is different from None until the next release with a deprecation warning * Making `get_computed_values` the supported way of obtaining the last computed learning rate by the scheduler (see [comment](https://github.com/pytorch/pytorch/pull/21800#issuecomment-513940729) for new syntax) * Returning a deprecation warning when invoking the undocumented get_lr function (see [comment](https://github.com/pytorch/pytorch/pull/21800#discussion_r294305485)) referring to `get_computed_values`, and deprecating it in the next release. * `CosineAnnealingWarmRestart` still takes an epoch parameter as it is the only one with a mechanic relying on fractional epoch * `MultiplicativeLR` is consumes a function providing the multiplicative factor at each epoch. It mimics `LambdaLR` in its syntax. # #20527 ### Before The user calls scheduler with a constant epoch either across loops or in the same loop. ``` import torch.optim as optim from torch import nn conv = nn.Conv2d(3,3,3) optimizer = optim.Adam(conv.parameters()) lr_scheduler = optim.lr_scheduler.StepLR(optimizer, 2) # Scheduler with sometimes-constant epoch number for epoch in [0, 0, 1, 1, 2, 2, 3, 3]: lr_scheduler.step(epoch) print(optimizer.param_groups[0]['lr']) ``` ### After If the user wants to step ``` import torch.optim as optim from torch import nn conv = nn.Conv2d(3,3,3) optimizer = optim.Adam(conv.parameters()) lr_scheduler = optim.lr_scheduler.StepLR(optimizer, 2) last_epoch = -1 for epoch in [0, 0, 1, 1, 2, 2, 3, 3]: # Check if epoch number has changed manually if epoch-last_epoch > 0: lr_scheduler.step() last_epoch = epoch print(epoch, scheduler.get_computed_values()) ``` # #22107 ### Before ``` import torch from torchvision.models import resnet18 net = resnet18() optimizer = torch.optim.SGD(net.parameters(), 0.1) scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[3, 6, 9], gamma=0.1) scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 3, gamma=0.1) for i in range(10): # Scheduler computes and returns new learning rate, leading to unexpected behavior print(i, scheduler.get_lr()) scheduler.step() ``` ### After ``` import torch from torchvision.models import resnet18 net = resnet18() optimizer = torch.optim.SGD(net.parameters(), 0.1) lr_scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[3, 6, 9], gamma=0.1) lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 3, gamma=0.1) for i in range(10): # Returns last computed learning rate by scheduler print(i, lr_scheduler.get_computed_values()) lr_scheduler.step() ``` Test Plan: Imported from OSS Differential Revision: D17349760 Pulled By: vincentqb fbshipit-source-id: 0a6ac01e2a6b45000bc6f9df732033dd81f0d89f	2019-09-13 07:36:05 -07:00
Edward Yang	2503fdc116	Add data field to Tensor pyi. (#26093 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26093 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: vsiles Differential Revision: D17366320 Pulled By: ezyang fbshipit-source-id: 025f1c3d75d294fc1b51ddc540e542a05dc72b6a	2019-09-13 07:32:03 -07:00
Richard Zou	babaac3e08	Fix bug with named tensors and (no) tracer support (#26106 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26106 Previously, in the named tensors build, an operator is marked as non-traceable if ANY of its overloads are named tensor overloads. This breaks the tracer for things like torch.full (has a names= overload for named tensor) and tensor.sum (has a Dimname overload for named tensor). This PR fixes the problem by putting the "no tracer support" logic into the location where the tracer attempts to construct a graph by adding a Dimname/DimnameList argument to a node. Test Plan: - new test in test_jit.py to check if torch.full is traceable - new test in test_namedtensor.py to check what happens when someone tries to trace a function that uses named tensor APIs. - [namedtensor ci] Differential Revision: D17353452 Pulled By: zou3519 fbshipit-source-id: b0b843c8357ffe54baee6e8df86db914f0b1ece4	2019-09-13 06:45:00 -07:00
Pavel Belevich	33221b19ac	C++ API parity: at::Tensor::data Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26008 Test Plan: Imported from OSS Differential Revision: D17343488 Pulled By: pbelevich fbshipit-source-id: b9ba5e26cad621a428a14292446d7fb5a6e5535d	2019-09-12 23:33:34 -07:00
Richard Zou	5e2d25af34	Implement tensor.align_as(other), change tensor.align_to(names) (#25843 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25843 `tensor.align_to(names)` permutes the dimensions of `tensor` and adds additional 1-sized dimensions such that the output tensor has dimensions in the same order as `names`. All dimensions of `tensor` must be present in `names`, in addition, this function requires that all dims of `tensor` be named. `tensor.align_as(other)` is equivalent to `tensor.align_to(other.names)`. I'm planning on changing `torch.align_tensors(*tensors)` to align closer to these semantics because there didn't seem to be a clear use case for the old semantics that preserve unnamed dimensions. That will come in a future change. Test Plan: - new tests [namedtensor ci] Differential Revision: D17255549 Pulled By: zou3519 fbshipit-source-id: 1e437ad81e9359b4d5bd0e7e64c3a1be441fc3e3	2019-09-12 22:53:44 -07:00
Richard Zou	e544f88590	Implement tensor.refine_names (#25842 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25842 `tensor.refine_names(names)` takes `tensor` and attempts to name its dimensions `names` out-of-place. If a dimension `i` already had a name, then it cannot be changed (so tensor.names[i] must equal names[i]); if the original dimension did not have a name, then the new name (names[i]) can be anything. `tensor.refine_names(names)` also accepts a glob '' that greedily selects names from `tensor`. Here are some examples: - `Tensor[None].refine_names('N') -> Tensor[N]` - `Tensor[N].refine_names('N') -> Tensor[N]` - `Tensor[N].refine_names('D') -> Error!` - `Tensor[N].refine_names(None) -> Error!` - `Tensor[None, None].refine_names('', D) -> Tensor[None, D]` Test Plan: - new tests [namedtensor ci] Differential Revision: D17255548 Pulled By: zou3519 fbshipit-source-id: fdbdb3a12f24fbe37ce1e53ed09dc8a42589d928	2019-09-12 22:53:40 -07:00
Jerry Zhang	94964a9ba2	Add fusion for quantized linear (#25624 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25624 First fuse the splitted op into aten::linear and then fuse `dequant - aten::linear - quant` into quantized linear op Test Plan: python test/test_jit.py 'TestJit.quant_fusion' Imported from OSS Differential Revision: D17208891 fbshipit-source-id: 864b19fabab2e8e6f8f8ad35eb3dbbf2d5fdb8c4	2019-09-12 20:52:37 -07:00
Lu Fang	e9e7e9d466	Automatic update of fbcode/onnx to 95252c2adec185e305e34486c6756ece9aa8f57f (#26137 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26137 Previous import was 7988d8360b11e6003560076e9b1d4aa426db3244 Included changes: - [95252c2a](https://github.com/onnx/onnx/commit/95252c2a): Fix shapeinference function (#2296) <jignparm> - [414285bb](https://github.com/onnx/onnx/commit/414285bb): fix the buffer overflow problem in shape inference logic of Squeeze op <Lu Fang> - [797cdd0f](https://github.com/onnx/onnx/commit/797cdd0f): Support for negative indices in 'Gather', 'GatherElements', 'ScatterElements', 'OneHot' (#2260) <Negin Raoof> - [7636978d](https://github.com/onnx/onnx/commit/7636978d): Fix collect_snippets warnings (#2277) <Lutz Roeder> - [fa70c33b](https://github.com/onnx/onnx/commit/fa70c33b): Update printable_graph in helper.py to output details of initializers that do not have matching graph inputs. (#2135) <Scott McKay> - [428d09b0](https://github.com/onnx/onnx/commit/428d09b0): test int64 input type for 'where' op (#2253) <Negin Raoof> Test Plan: ci Reviewed By: bddppq Differential Revision: D17353795 fbshipit-source-id: 6d4f39754863a30f427f4512c7b228e45d3ce84f	2019-09-12 20:49:08 -07:00
Orion Reblitz-Richardson	ff7921e85b	Create TensorBoard test classes in all cases (#26005 ) Summary: To give better signal to the user, we will now always create the TensorBoard tests classes and just disable tests if TensorBoard is not installed. cc lanpa sanekmelnikov natalialunova pietern [test macos] Pull Request resolved: https://github.com/pytorch/pytorch/pull/26005 Reviewed By: sanekmelnikov Differential Revision: D17352430 Pulled By: orionr fbshipit-source-id: 87a592064f4768ffded76a3d666a8e508a1ef164	2019-09-12 19:40:35 -07:00
Ailing Zhang	3acab233b5	Add device check before accessing data_ptr in PackLayer (#26056 ) Summary: fixes https://github.com/pytorch/xla/issues/927 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26056 Differential Revision: D17331859 Pulled By: ailzhang fbshipit-source-id: bdc334f03c8dcbb4ef4f5e059a63ef188a0b8b61	2019-09-12 19:25:42 -07:00
Jerry Zhang	be82239c86	Port fuse_linear from pytorch/tvm (#25623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25623 Port over fuse_linear pass from pytorch/tvm project, we'll need this in backend specific quantization pass to match aten::linear and swap it with quantized linear Test Plan: python test/test_jit.py 'TestJit.test_fuse_linear' Imported from OSS Differential Revision: D17208890 fbshipit-source-id: f4ff3889ae4525797d3b986f46ae37e50ea49116	2019-09-12 18:51:13 -07:00
Shahriar	18a0040fec	C++ unregister_module function for Module (#26088 ) Summary: This PR adds ```unregister_module``` to ```nn::Module``` and ```erase``` function to ```OrderedDict```. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26088 Differential Revision: D17360058 Pulled By: yf225 fbshipit-source-id: f1f375b4751317da85b8da1458e092fe2405ceec	2019-09-12 18:38:57 -07:00
Jerry Zhang	1d87090051	Support quantizing any methods called (#25505 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25505 Support for quantizing all the methods called by forward method, including child module methods and other methods in the current module It relies on module level constant prop, we need to figure out a way to do constant prop for these methods as well. We can either do constant prop in the module level or do constant prop in the quantization function, but this will need some discussion. Test Plan: python test/test_jit.py 'TestJit.insert_quant_dequant' python test/test_quantizer.py Imported from OSS Differential Revision: D17208887 fbshipit-source-id: 21749457b21b00a6edada290c26324e2fb210b10	2019-09-12 18:09:44 -07:00
Gregory Chanan	5fce76961c	Kill kwarg_only declarations in Declarations.cwrap. (#25609 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25609 They don't do anything anymore. Test Plan: Imported from OSS Differential Revision: D17172497 Pulled By: gchanan fbshipit-source-id: 5cf7fdcf7d2da0054ac1bd7d8d2b70a2264b8c93	2019-09-12 17:38:48 -07:00
James Reed	e2e1f5effd	Fix build warning in vec256_qint.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26121 Test Plan: Imported from OSS Differential Revision: D17351960 Pulled By: jamesr66a fbshipit-source-id: 12389729fe5fb8d863cf47288920ea375a3e74ab	2019-09-12 17:38:44 -07:00
Taiqing Wang	784c4a91ea	Implementation of ConstantThenLinearWarmupLRPolicy and CompositeCyclicalLRPolicy (#25970 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25970 ConstantThenLinearWarmupLRPolicy: * first use a constant warm up * then ramp up to the fixed learning rate linearly CompositeCyclicalLRPolicy: * first use a constant warm up * then ramp up to the fixed learning rate linearly * then use cyclical learning rates for the rest of time Pull Request resolved: https://our.intern.facebook.com/intern/opensource/shipit/preview/D17302632/ Test Plan: * buck test * https://our.intern.facebook.com/intern/testinfra/testconsole/testrun/5910974518377039/ * https://our.intern.facebook.com/intern/testinfra/testrun/1407375027118303 * checked the consistency of learning rates w.r.t. iterations with offline simulations n143987 Reviewed By: swatirallapalli Differential Revision: D17302632 fbshipit-source-id: 1098d4dd9109a48932b76e36d78239e49f8077a1	2019-09-12 17:38:40 -07:00
Jerry Zhang	f559c1d85d	Skip inserting duplicate observers (#25504 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25504 Skip inserting duplicate observers for values observed in forward method of a child module or other methods in the current module. Test Plan: python test/test_jit.py -- 'TestJit.insert_observers' python test/test_jit.py -- 'TestJit.insert_observers_child_qconfig' python test/test_jit.py -- 'TestJit.insert_observers_skip_values' Imported from OSS Differential Revision: D17208888 fbshipit-source-id: e04f1c22ab1c4f410933a17a3ef31acf5f217323	2019-09-12 16:22:51 -07:00
Vincent Quenneville-Belair	135bbc261d	fix base_lr overridden in cyclic lr (#26105 ) Summary: base_lr parameter was being overridden by super `__init__`, see https://github.com/pytorch/pytorch/issues/21965. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26105 Reviewed By: yf225 Differential Revision: D17346724 Pulled By: vincentqb fbshipit-source-id: 4b146bd64f4f385c0a9c4f4df8eb8991312fb15c	2019-09-12 15:53:03 -07:00
Gregory Chanan	f9a8b8ada3	Stop reordering TH random function arguments. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25608 Test Plan: Imported from OSS Differential Revision: D17172494 Pulled By: gchanan fbshipit-source-id: 5a46889cc040297231e2473ae5b2879b39f8d60a	2019-09-12 15:43:08 -07:00
Rohan Varma	369064fa0d	remove "build_deps" arg from setup.py command in (#26113 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26113 After https://github.com/pytorch/pytorch/pull/16914, passing in an argument such as "build_deps" (i.e. python setup.py build_deps develop) is invalid since it gets picked up as an invalid argument. ghstack-source-id: 90003508 Test Plan: Before, this script would execute "python setup.py build_deps develop", which errored. Now it executes "python setup.py develop" without an error. Verified by successfully running the script on devgpu. In setup.py, there is already a `RUN_BUILD_DEPS = True` flag. Differential Revision: D17350359 fbshipit-source-id: 91278c3e9d9f7c7ed8dea62380f18ba5887ab081	2019-09-12 15:34:21 -07:00
Jiakai Liu	ffee507d36	change gradle build to use static libtorch + gc-sections (#25984 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25984 Link static libtorch libraries into pytorch.so (API library for android) with "-Wl,--gc-sections" flag to remove unused symbols in libtorch. Test Plan: - full gradle CI with stacked PR; - will check final artifacts.tgz size change; Differential Revision: D17312859 Pulled By: ljk53 fbshipit-source-id: 99584d15922867a7b3c3d661ba238a6f99f43db5	2019-09-12 15:12:45 -07:00
Jiakai Liu	fbc038ab35	simplify build_android_gradle.sh (#25897 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25897 It doesn't hurt to set all variables unconditionally. And we can create link to lib directory instead of specific files - this way it's easier to switch between dynamic/static library names. Test Plan: - check android gradle CI; - use stack diff to check all 4 architectures on PR; Pull Request resolved: https://github.com/pytorch/pytorch/pull/25897 Differential Revision: D17307240 Pulled By: ljk53 fbshipit-source-id: c975085ddda852ef7da1c29935c2f6a28d797e5a	2019-09-12 15:12:41 -07:00
Gregory Chanan	771cb628eb	Kill TH(C)Blas kwarg_only declarations. (#25607 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25607 Since we don't generate these as end-user bindings, and we no longer reorder based on this property, we can just get rid of the property. Test Plan: Imported from OSS Differential Revision: D17172500 Pulled By: gchanan fbshipit-source-id: f84fd8bb2b13598501897f56871b21339585d844	2019-09-12 15:01:38 -07:00
Gregory Chanan	ad91d0285b	Stop re-ordering TH(C)Blas arguments. (#25606 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25606 This just complicates the codegen for no benefit. Test Plan: Imported from OSS Differential Revision: D17172498 Pulled By: gchanan fbshipit-source-id: d2f50e45400ac0336792422518e03dbae3a1bedc	2019-09-12 15:01:34 -07:00
Nikolay Korovaiko	1eae6355d8	tracing with an opt-in by file name (#25895 ) Summary: This basically works a simple filter as you suggested ZolotukhinM `export PYTORCH_JIT_LOG_LEVEL=guard_elimination` will print all `GRAPH_DUMP` and `GRAPH_UPDATE` statements. `export PYTORCH_JIT_LOG_LEVEL=>guard_elimination:>alias_analysis` will print all `GRAPH_DUMP`, `GRAPH_UPDATE` and `GRAPH_DEBUG` statements in `guard_elimination.cpp` and in `alias_analysis.cpp` Pull Request resolved: https://github.com/pytorch/pytorch/pull/25895 Differential Revision: D17309090 Pulled By: Krovatkin fbshipit-source-id: 8fa9e67cc9af566b084d66cc15223633fda08444	2019-09-12 14:16:53 -07:00
Nikolay Korovaiko	f928994968	make sure all out stringstreams start out empty in jit_log.hpp Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25863 Differential Revision: D17347386 Pulled By: Krovatkin fbshipit-source-id: a42cf56680a27bc3e50fd945ab372a409225b875	2019-09-12 12:39:10 -07:00
Qi Zhou	076eaf4ccf	Exposing Fused8BitRowwiseQuantizedToFloat in PyTorch (#26080 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26080 Will be used in c2 ctr_mbl_feed model to PyTorch conversion Test Plan: Unit test Reviewed By: yinghai Differential Revision: D17337604 fbshipit-source-id: a90d9f5dc38301608d1562c6f2418e7f4616e753	2019-09-12 12:36:33 -07:00
vishwakftw	f91fbf90c7	Skip test_triangular_solve_batched (#26108 ) Summary: cc: gchanan zou3519 I will look into why this is failing spuriously. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26108 Differential Revision: D17348399 Pulled By: zou3519 fbshipit-source-id: aed4ccfc3f106692d4e32acc029740309570b0c3	2019-09-12 12:36:29 -07:00
Lu Fang	7e4ac8b851	Automatic update of fbcode/onnx to 7988d8360b11e6003560076e9b1d4aa426db3244 (#25959 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25959 Previous import was 28ca699b69b5a31892619defca2391044a9a6052 Included changes: - [7988d836](https://github.com/onnx/onnx/commit/7988d836): Supporting negative axes for all existing onnx ops (#2281) <Negin Raoof> - [5ca0a09e](https://github.com/onnx/onnx/commit/5ca0a09e): Update managingexperimentalops.md (#1981) <Joseph Spisak> - [bc0495c1](https://github.com/onnx/onnx/commit/bc0495c1): Fix link to community docs in readme (#2261) <Prasanth Pulavarthi> - [2fdb3ef6](https://github.com/onnx/onnx/commit/2fdb3ef6): move map and sequence types to onnx domain, (#2244) <Ke Zhang> - [568b65aa](https://github.com/onnx/onnx/commit/568b65aa): Improve compatiblity with proto3 and enable reading attributes (#2288) <Dmitri Smirnov> - [1f350f2c](https://github.com/onnx/onnx/commit/1f350f2c): Remove type info for loop variadic input in Loop op used to compose the Range op (#2287) <Hariharan Seshadri> - [eb139446](https://github.com/onnx/onnx/commit/eb139446): Add Foundation WG to working-groups.md (#2276) <Ryan Loney> - [4eabc4b3](https://github.com/onnx/onnx/commit/4eabc4b3): Fix testdata model for CumSum. Add exclusive attribute. (#2271) <jignparm> - [1a62afdb](https://github.com/onnx/onnx/commit/1a62afdb): Support GatherND operator in ONNX (#2106) <Hariharan Seshadri> - [0e330e9d](https://github.com/onnx/onnx/commit/0e330e9d): Support ScatterND operator in ONNX (#2220) <Bowen Bao> - [733f7a6a](https://github.com/onnx/onnx/commit/733f7a6a): Add Det to ONNX (#2233) <Bowen Bao> - [52187738](https://github.com/onnx/onnx/commit/52187738): Update the description of nearest_mode of resize op (#2257) <daquexian> - [64b4b686](https://github.com/onnx/onnx/commit/64b4b686): Adding sparse tensor to ONNX (#2019) <G. Ramalingam> - [c8a8b7cc](https://github.com/onnx/onnx/commit/c8a8b7cc): Support Range operator in ONNX (#2242) <Hariharan Seshadri> - [44b0d6d5](https://github.com/onnx/onnx/commit/44b0d6d5): Update resize op (#2057) <daquexian> - [7d907964](https://github.com/onnx/onnx/commit/7d907964): Add function to fuse dynamic quantization graph into 1 node (#2187) <Ashwini Khade> - [36f8e6d9](https://github.com/onnx/onnx/commit/36f8e6d9): Update logo_request.md (#2231) <Prasanth Pulavarthi> - [4eb737c8](https://github.com/onnx/onnx/commit/4eb737c8): Update Clip in opset 11 to support min/max as inputs instead of attributes (#2096) <Bowen Bao> - [a25e1388](https://github.com/onnx/onnx/commit/a25e1388): Fix segfault in tile shape inference (#2221) <daquexian> - [2dc273c7](https://github.com/onnx/onnx/commit/2dc273c7): update onehot shape inference to reflect the spec for depth input (#2224) <Ashwini Khade> - [665211c1](https://github.com/onnx/onnx/commit/665211c1): Add GatherElements Op and Rename ScatterElements (#2143) <Lara Haidar> - [3ba2e31a](https://github.com/onnx/onnx/commit/3ba2e31a): Unique (#2141) <liqunfu> - [5a5588ad](https://github.com/onnx/onnx/commit/5a5588ad): Clarify dimension variable scoping (#2211) <G. Ramalingam> - [fabe39d5](https://github.com/onnx/onnx/commit/fabe39d5): Liqun/topk sort (#2126) <liqunfu> - [453aa644](https://github.com/onnx/onnx/commit/453aa644): Update document for NMS (#2193) <Hector Li> - [34e28ec2](https://github.com/onnx/onnx/commit/34e28ec2): Handle negative 'axis' value in Split type and shape inferencing (#2177) <Scott McKay> - [28ec4583](https://github.com/onnx/onnx/commit/28ec4583): depth to space shuffle order (#2163) <Negin Raoof> - [98f72629](https://github.com/onnx/onnx/commit/98f72629): minor updates to fix links in readme (#2189) <Prasanth Pulavarthi> - [321d1467](https://github.com/onnx/onnx/commit/321d1467): Add check to disallow squeezing input axes which are not 1 (#2204) <Ashwini Khade> - [573f0dc9](https://github.com/onnx/onnx/commit/573f0dc9): fix a bug in fun shape inference (#2188) <Tang, Cheng> - [36dc7110](https://github.com/onnx/onnx/commit/36dc7110): Clarify ambiguity in gather spec regarding indices expectation (#2202) <Ashwini Khade> - [a2449673](https://github.com/onnx/onnx/commit/a2449673): Fix some minor issues in IR.md and Versioning.md (#2108) <edgchen1> - [349aff69](https://github.com/onnx/onnx/commit/349aff69): Skip install typing package for python >=3.5 (#2199) <bddppq> Test Plan: ci Reviewed By: bddppq, benoitsteiner Differential Revision: D17296390 fbshipit-source-id: 9f9f5ce85d9694128008d756c2ea393bd4e0cb71	2019-09-12 12:15:03 -07:00
James Reed	bdc656da70	TorchScript Serialization for dynamic LSTM Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26084 Test Plan: Imported from OSS Differential Revision: D17339315 Pulled By: jamesr66a fbshipit-source-id: 03a2674edcf779becfe3b8ec96f1bae23c74b11c	2019-09-12 11:04:47 -07:00
Junjie Bai	827d71d769	Disable test_cuda.test_stream_event_nogil on ROCm (#26087 ) Summary: Was recently enabled in https://github.com/pytorch/pytorch/pull/26055, it's flaky on master: https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/37575 https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/37577 ``` 05:39:35 test_stream_event_nogil (__main__.TestCuda) ... Exception in thread Thread-3: 05:39:40 Traceback (most recent call last): 05:39:40 File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner 05:39:40 self.run() 05:39:40 File "/usr/lib/python2.7/threading.py", line 754, in run 05:39:40 self.__target(self.__args, *self.__kwargs) 05:39:40 File "test_cuda.py", line 1894, in _test_stream_event_nogil 05:39:40 c2p.put(sync_func(self, TestCuda.FIFTY_MIL_CYCLES)) 05:39:40 File "test_cuda.py", line 1882, in _event_wait 05:39:40 self.assertTrue(s1.query()) 05:39:40 File "/usr/lib/python2.7/unittest/case.py", line 422, in assertTrue 05:39:40 raise self.failureException(msg) 05:39:40 AssertionError: False is not true ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/26087 Differential Revision: D17340891 Pulled By: bddppq fbshipit-source-id: b2b70beb1b068db53197a5f9f6a80cb046e66ebd	2019-09-12 10:06:26 -07:00
Nikolay Korovaiko	f3fdbba666	print source code when a function is executed (#25868 ) Summary: While this isn't ideal as it might print out the same source every time a function is run; it's still easier to go and tweak python code to reduce loop counts, than to insert `std::cout` and recompile cpp code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25868 Differential Revision: D17318386 Pulled By: Krovatkin fbshipit-source-id: 928ba6543204042924ab41a724635594709630de	2019-09-12 10:03:59 -07:00
Richard Zou	4fb5a7c5b8	Experimental warning for named tensors (#26050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26050 Throws a warning once when someone attempts to attach names to a tensor. This is guaranteed to happen at the callsite `set_named_tensor_meta`. Test Plan: - run tests [namedtensor ci] Differential Revision: D17331634 Pulled By: zou3519 fbshipit-source-id: 44f5e5c95acd9c7ba543c1210a3b1314aab348f0	2019-09-12 06:34:12 -07:00
Richard Zou	03bb7969be	Move NamedTensorMetaInterface definitions to TensorImpl.h (#26030 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26030 Test Plan: - [namedtensor ci] Pull Request resolved: https://github.com/pytorch/pytorch/pull/26030 Differential Revision: D17322383 Pulled By: zou3519 fbshipit-source-id: d5b914d646b48a6f4e0104aceb435e694b72bd96	2019-09-12 06:34:08 -07:00
J M Dieterich	a996b1d653	Make regular softmax warp size aware (#25956 ) Summary: Enable one unit test that passes now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25956 Differential Revision: D17298150 Pulled By: bddppq fbshipit-source-id: 8763e71ad7ef80be915fe93a3471b29f27f3f0a4	2019-09-11 23:16:16 -07:00
Satendra Gera	e09c5e69f4	Dynamic registration of RPC backends (#25734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25734 [pytorch] Dynamic registration of RPC backends Allow non-pg rpc backends to be plugged in as a backend. ghstack-source-id: 89938296 Differential Revision: D17183789 fbshipit-source-id: 885fed12d80b82b60f9a125f78302a161e708089	2019-09-11 21:48:44 -07:00
Supriya Rao	24d5b5f5f9	Add Runtime flag for quantized backend. (#25680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25680 Add a runtime flag to choose between FBGEMM and QNNPACK when compiled with both. The flag can be set by using torch.backends.quantized.engine = torch.fbgemm/torch.qnnpack or ctx::setPreferredQuantizedEngine(at::QEngine) ghstack-source-id: 89935643 Test Plan: Verified torch.backends.quantized.engine works Differential Revision: D17198233 fbshipit-source-id: e5449d06f4136385e0e6d18bd4237f8654a61672	2019-09-11 21:37:36 -07:00
James Reed	83ecdf76da	Revert "TorchScript Serialization for dynamic LSTM module" (#26079 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26079 This reverts commit e3039612d851d0fbd337546c8debc27ec7cfc4e4. Test Plan: Imported from OSS Differential Revision: D17337585 Pulled By: jamesr66a fbshipit-source-id: 4b93a4c5ca2fe491d609da889a42d22be8e52889	2019-09-11 21:23:19 -07:00
Jianyu Huang	ead14a6bd4	Use BytesIO instead of tempfile (#25976 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25976 As recommended in https://github.com/pytorch/pytorch/pull/25877/files#r322956051: > We should move more of these toward using BytesIO. Using files in tests is generally considered bad practice because it introduces syscalls and dependencies on the execution environment, and thus can cause test flakiness/instability. ghstack-source-id: 89929947 Test Plan: CI Differential Revision: D17310441 fbshipit-source-id: ba97cce4224225df45ff44062f1bc8ebefb25922	2019-09-11 19:35:49 -07:00
Jianyu Huang	abb7e1365c	Upgrade the naming for fbgemm quantized op (#26064 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26064 Just changing the names after https://github.com/pytorch/pytorch/pull/25678. ghstack-source-id: 89944542 Test Plan: CI Differential Revision: D17332068 fbshipit-source-id: 5e9febed7a2fcd10d44273e55643b277d33a3ad7	2019-09-11 19:33:18 -07:00
James Reed	e3039612d8	TorchScript Serialization for dynamic LSTM module Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25877 Test Plan: Imported from OSS Reviewed By: jianyuh Differential Revision: D17275746 Pulled By: jamesr66a fbshipit-source-id: db2f38ddd99f02ccb4fb754fa1c1e6cad4425fa8	2019-09-11 19:17:25 -07:00
Jerry Zhang	d4757afbe5	remove verbose in pytorch_ci hypothesis profile (#26075 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26075 att, remove verbose argument to reduce noice in the logs Test Plan: ci Imported from OSS Differential Revision: D17335935 fbshipit-source-id: 2e4289e838bf4489dcad8d5533353eebcff0d481	2019-09-11 18:16:30 -07:00
J M Dieterich	5376ee51fd	Enable more mGPU tests (#26055 ) Summary: Enable mGPU tests that pass on ROCm as of 2.7. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26055 Differential Revision: D17331484 Pulled By: bddppq fbshipit-source-id: 51f956a84a6c14a1a41473d322950994fa29c25c	2019-09-11 17:54:35 -07:00
Supriya Rao	6b7ea23d5b	Add new API for Fully Connected and Convolution Operators in QNNPACK (#25862 ) Summary: This change adds a new prepack and run function for FC and Convolution operators in QNNPACK. The new functions added are `PackBMatrix`, `qnnpackLinear`, `PrePackConvWeights` and `qnnpackConv` Pull Request resolved: https://github.com/pytorch/pytorch/pull/25862 Test Plan: QNNPACK unit tests fully-connected-test convolution-test Differential Revision: D17299260 Pulled By: supriyar fbshipit-source-id: fdc4e2d5f1232675acd153f3efb9d17ed8628a54	2019-09-11 17:51:48 -07:00
Dmytro Dzhulgakov	a6a7f35481	Better error messages in C2 ONNX backend (#25809 ) Summary: Just a tiny fix to make debugging easier (output errors to stderr and include in the exception message) Pull Request resolved: https://github.com/pytorch/pytorch/pull/25809 Reviewed By: zrphercule Differential Revision: D17329957 Pulled By: houseroad fbshipit-source-id: 0d73dd9f62c735fbc5096e6a7c0e5f58e4cd90ae	2019-09-11 17:00:42 -07:00
Shahriar	28a2dafc15	C++ Average Pool Module (#25800 ) Summary: This PR adds Average Pool module to C++ front-end. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25800 Differential Revision: D17318094 Pulled By: yf225 fbshipit-source-id: c914c0e802bbe5f1d1f0a21a669c28bc956899db	2019-09-11 16:39:56 -07:00
Junjie Bai	a7eb18e243	Enable Unique operator tests on ROCm Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26046 Differential Revision: D17331522 Pulled By: bddppq fbshipit-source-id: 729624d1df15a1c0c7ba2b7e7e3c3a903fb13abf	2019-09-11 16:36:14 -07:00
Mike Ruberry	276bde302e	Enables _do_cuda_non_default_stream (#25989 ) Summary: Now that backward reuses forward streams calls to backward no longer need to be explicitly synced (in the great majority of cases). This is an opportunity to enable the _do_cuda_non_default_stream flag, which this PR does for test_cuda.py and test_distributions.py, where the flag was previously defined but set to false. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25989 Test Plan: Test changes the entire test suite, so the test suite is the test plan. Differential Revision: D17329233 Pulled By: mruberry fbshipit-source-id: 52f65b5ed53de26e35e6d022658d7fac22609f6a	2019-09-11 16:00:50 -07:00
Richard Zou	ad2ec71695	Add TEST_NAMEDTENSOR flag to namedtensor ci (#25948 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25948 Previously, test/test_namedtensor.py is skipped if pytorch was not compiled with BUILD_NAMEDTENSOR. Now, we skip test/test_namedtensor.py if pytorch was not compiled with BUILD_NAMEDTENSOR or if TEST_NAMEDTENSOR is not set. This is done in preparation for turning on BUILD_NAMEDTENSOR=1 permanently; at that point we will use TEST_NAMEDTENSOR to differentiate between the named tensor ci and the regular ci. Test Plan: - [namedtensor ci] (and check that the named tensor tests are actually running). Differential Revision: D17300132 Pulled By: zou3519 fbshipit-source-id: 928f71f4d50445680b6ae1aa54b8857bc92e4d08	2019-09-11 14:53:20 -07:00
vishwakftw	eee58f8284	Refactor torch.*solve tests (#25733 ) Summary: Changelog: - De-duplicate the code in tests for torch.solve, torch.cholesky_solve, torch.triangular_solve - Skip tests explicitly if requirements aren't met for e.g., if NumPy / SciPy aren't available in the environment - Add generic helpers for these tests in test/common_utils.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/25733 Test Plan: - All tests should pass to confirm that the change is not erroneous Clears one point specified in the discussion in https://github.com/pytorch/pytorch/issues/24333. Differential Revision: D17315330 Pulled By: zou3519 fbshipit-source-id: c72a793e89af7e2cdb163521816d56747fd70a0e	2019-09-11 14:30:00 -07:00
Richard Zou	100ad48ced	Remove unnecessary BUILD_NAMEDTENSOR from interned_strings.h (#25938 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25938 It doesn't matter whether or not we expose these for namedtensor / non-namedtensor builds. Test Plan: - [namedtensor ci] Differential Revision: D17291249 Pulled By: zou3519 fbshipit-source-id: a5aac77469e28198f63967396e2bdb1ec15bad97	2019-09-11 14:18:46 -07:00
davidriazati	68f40fb2c8	Add `in` membership checks for lists (#25796 ) Summary: Since it requires an equality operator, it's only implemented for lists of `int`, `float`, and `str`. Fixes some of #25758 ](https://our.intern.facebook.com/intern/diff/17296216/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/25796 Pulled By: driazati Differential Revision: D17296216 fbshipit-source-id: 561245bfa75b65cee4e3395e242b2439b3c87b2e	2019-09-11 14:10:38 -07:00
davidriazati	d546c069a4	Preserve module names in recursive script (#24505 ) Summary: Turns ``` ScriptModule( (conv): ScriptModule() (lin): ScriptModule() (sub): ScriptModule() ) ``` into ``` ScriptModule( original=MyModule (conv): ScriptModule(original=Conv2d) (lin): ScriptModule(original=Linear) (sub): ScriptModule(original=Submodule) ) ``` ](https://our.intern.facebook.com/intern/diff/16862032/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/24505 Pulled By: driazati Differential Revision: D16862032 fbshipit-source-id: 76dc4e5252bbf746f5cc26450b577dab10477732	2019-09-11 14:07:04 -07:00
Pieter Noordhuis	d1d336168d	Skip TestHub on macOS (#26033 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26033 [test macos] Test Plan: Imported from OSS Differential Revision: D17323698 Pulled By: pietern fbshipit-source-id: 1b5805d2b0f693d05a807299df4941a6bb528801	2019-09-11 13:56:03 -07:00
Pieter Noordhuis	32b7b8994f	Delay external imports until we're ready to test tensorboard (#25993 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25993 These imports fail the test suite if they're not installed, even if we don't end up testing tensorboard. [test macos] Test Plan: Imported from OSS Differential Revision: D17318588 Pulled By: pietern fbshipit-source-id: febad497ecb3fd292317f68fc2439acd893ccd67	2019-09-11 13:55:58 -07:00
Pieter Noordhuis	6e3a8483a2	Skip TestAutograd.test_deep_reentrant on macOS (#25942 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25942 See #25941 for tracking issue. [test macos] Test Plan: Imported from OSS Differential Revision: D17318586 Pulled By: pietern fbshipit-source-id: 43f61b8487210b032960b1a12516ab2f428c5e03	2019-09-11 13:55:54 -07:00
Pieter Noordhuis	8ec80531b8	Refactor macOS build and test (#25930 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25930 This commit temporarily enables the following builds for this PR: * pytorch_macos_10_13_py3_build * pytorch_macos_10_13_py3_test [test macos] Pull Request resolved: https://github.com/pytorch/pytorch/pull/25930 Test Plan: Imported from OSS Differential Revision: D17318583 Pulled By: pietern fbshipit-source-id: d12f04b148318711e8a15def7dca12b8d7ef65d3	2019-09-11 13:55:49 -07:00
Pieter Noordhuis	66e3f080ad	Change brew update logic to run much faster (#25988 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25988 Running `brew update` used to take over 6 minutes. Now it completes in about 30 seconds. Test Plan: Imported from OSS Differential Revision: D17318585 Pulled By: pietern fbshipit-source-id: 75956aebc887cb29dbc2bc7efbf823243f18ab01	2019-09-11 13:55:45 -07:00
Pieter Noordhuis	5b0c2fe127	Remove trailing whitespace in CircleCI configuration files Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25987 Test Plan: Imported from OSS Differential Revision: D17318584 Pulled By: pietern fbshipit-source-id: 64ed5fa5a45d16b7920556e16ff60c31176420f9	2019-09-11 13:55:40 -07:00
Lara Haidar	8ca93ec351	Fix torch.arange traced as constant (#25363 ) Summary: torch.arange is always traced as a constant which makes it impossible to trace correctly TestModel() from the example below. class TestModel(torch.nn.Module): def forward(self, input): return torch.arange(input.shape[0]) input = torch.randn(5,3,2) print(torch.jit.trace(TestModel(), input).graph) Currently the trace of TestModel() looks like: graph(%self : ClassType<TestModel>, %input : Float(5, 3, 2)): %11 : int = prim::Constant[value=5]() %12 : int = prim::Constant[value=4]() %13 : int = prim::Constant[value=0]() %14 : Device = prim::Constant[value="cpu"]() %15 : bool = prim::Constant[value=0]() %16 : Long(5) = aten::arange(%11, %12, %13, %14, %15) return (%16) This PR will allow the trace to have a variable value for %11. The trace of TestModel() with this PR's modifs looks like: graph(%self : ClassType<TestModel>, %input : Float(5, 3, 2)): %2 : int = prim::Constant[value=0]() %3 : int = aten::size(%input, %2) %4 : Long() = prim::NumToTensor(%3) %11 : Scalar = prim::ImplicitTensorToNum(%4) %12 : int = prim::Constant[value=4]() %13 : int = prim::Constant[value=0]() %14 : Device = prim::Constant[value="cpu"]() %15 : bool = prim::Constant[value=0]() %16 : Long(5) = aten::arange(%11, %12, %13, %14, %15) return (%16) More info : https://github.com/pytorch/pytorch/issues/20075 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25363 Reviewed By: zrphercule Differential Revision: D17301934 Pulled By: houseroad fbshipit-source-id: d9907763742cb51d8c761bf63fc2e4918f7b9941	2019-09-11 13:39:54 -07:00
Lingyi Liu	62767077c3	add the tensor_observer to record the runtime tensor for quantization … (#25830 ) Summary: …accuracy analsyis Pull Request resolved: https://github.com/pytorch/pytorch/pull/25830 Differential Revision: D17327147 Pulled By: llyfacebook fbshipit-source-id: 095d5537a31b8d7541081000eaeb8b8474dfb8d0	2019-09-11 13:36:28 -07:00
Daya Khudia	ec48280afa	Improve error message when input is not in the right format (#25928 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25928 Improved error message ghstack-source-id: 89854172 Test Plan: if given the input of wrong dimension, the message earlier ``` [QConv2D] each dimension of output tensor should be greater than 0 ``` message now ``` Given groups=1, weight of size 20, 5, 5, 1, expected input (NHWC) 10, 1, 32, 32 to have 1 channels, but got 32 channels instead ``` Reviewed By: jianyuh Differential Revision: D17287290 fbshipit-source-id: d91573d6d69f2a5e0e615ffbd47a0bd233636a0b	2019-09-11 13:33:40 -07:00
J M Dieterich	00d967c39d	enable unit tests (#25963 ) Summary: These unit tests pass after landing all the warp size awareness patches. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25963 Differential Revision: D17319124 Pulled By: bddppq fbshipit-source-id: 22f5d5f1ca9c67e66a7ccf983b2d2f889a74e729	2019-09-11 12:31:43 -07:00
Jiakai Liu	075adb4d2d	remove pthreadpool.a from install directory (#25977 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25977 Call add_subdirectory() explicitly before NNPACK/QNNPACK with EXCLUDE_FROM_ALL property so that pthreadpool target won't be installed by default for libtorch mobile build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25977 Test Plan: Imported from OSS Differential Revision: D17312083 Pulled By: ljk53 fbshipit-source-id: 79851d0aa9402c5b9287ef4bbd8d7fd3a341497d	2019-09-11 12:27:56 -07:00
svcscm	54f3cb8f79	Updating submodules Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 0643fdce9eee21b5caea3af56f9638cf10cd1756	2019-09-11 12:24:11 -07:00
jiayisun	b9bf91feb8	Add torch.backends.mkldnn.enabled flag (#25459 ) Summary: This PR is about add torch.backends.mkldnn.enabled flag said in https://github.com/pytorch/pytorch/issues/25186 which can be used disable mkldnn at runtime step as torch.backends.cudnn.enabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25459 Differential Revision: D17258926 Pulled By: ezyang fbshipit-source-id: e179ad364cc608fdaa7d0f37e2e762ceb5eda598	2019-09-11 12:09:40 -07:00
Karl Ostmo	c79a13b7b6	Simply code generation - phase 1 (#25961 ) Summary: remove special casing in YAML renderer Pull Request resolved: https://github.com/pytorch/pytorch/pull/25961 Differential Revision: D17322658 Pulled By: kostmo fbshipit-source-id: 2e44e075d97262790c7a5773abf0afa70e0b24cb	2019-09-11 12:03:29 -07:00
Jerry Zhang	76487e16a8	indentation for hypothesis profile and proper inheritance for QuantizationTestCase (#25934 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25934 att Test Plan: python test test/test_quantized_nn_mods.py Imported from OSS Differential Revision: D17318270 fbshipit-source-id: afb39f79e01e4d36a55dd17648c25e0743de1d42	2019-09-11 10:00:25 -07:00
Jerry Zhang	c475ef72f9	Change order of activation and weight in QConfig (#25950 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25950 I feel that is a more natural order Test Plan: python test/test_quantizer.py Imported from OSS Differential Revision: D17294963 fbshipit-source-id: ed8ffdfe788a5e81966bda856e8d046ab68ee229	2019-09-11 09:51:01 -07:00
albanD	63df9ffd0b	Fix typo in OpenBLAS cmake detection Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25966 Differential Revision: D17315925 Pulled By: albanD fbshipit-source-id: 55c6b4a1ddeaf95714034ec66a4d59b0f00ba634	2019-09-11 09:10:42 -07:00
Edward Yang	2080a15860	Add VariableTensorId, store it in TensorTypeSet (#25597 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25597 We now take advantage of the new bitset representation TensorTypeSet to store "Variable-ness" of a tensor directly in the dispatch key. We introduce a new thread local TensorTypeSet "excluded" and replace the previous thread local boolean with it; we no longer have to query `is_variable()` to do dispatch (I didn't delete `is_variable`, because there are still a lot of uses of it). The key change is in `dispatchTypeId`. Knock-on effects: * Because Variable is now a TensorTypeId, I can eliminate the out-of-line registration `registerVariableOp` for variables; instead, make the registrar take a TensorTypeId (instead of a Backend) and you just register under the Variable key. * Tensors aren't really ever created with Variable information initialized correctly at the start; instead, a tensor "becomes" a Variable because we set its `autograd_meta_`. These setters now correctly setup invariants on the dispatch type set. The new invariant is that if `autograd_meta_ != nullptr`, then `type_set().has(TensorTypeId::VariableTensorId)`. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17265919 Pulled By: ezyang fbshipit-source-id: a90a7ed14f5cb1086137483ae3d0646fcd4c42d0	2019-09-11 08:59:48 -07:00
Shahriar	ba9fda14a7	C++ MaxPool Module Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24860 Differential Revision: D17260361 Pulled By: yf225 fbshipit-source-id: 4b8c894d3bdf675cfeb9fc84934fe0339a048c1e	2019-09-11 08:56:57 -07:00
Shahriar	e04836004d	L1Loss module (#25902 ) Summary: yf225 This is L1Loss module. I don't think that ```_Loss``` and ```_WeightedLoss``` as base Python classes do anything. First one sets reduction type and also takes in ```reduce``` parameter which is deprecated. The second one only registers ```weight``` parameter. I don't think that we should keep this structure. What do you think? Pull Request resolved: https://github.com/pytorch/pytorch/pull/25902 Differential Revision: D17307045 Pulled By: yf225 fbshipit-source-id: ad3eda2ee8dcf4465054b376c1be89b39d11532f	2019-09-11 07:18:17 -07:00
Hong Xu	1a58a9e441	The float version of calc_digamma should return float type. (#25488 ) Summary: Besides common understanding, the only occurrence of calc_digamma is in UnaryOpsKernel.cpp, which clearly sees the float version of calc_digamma as returning float type (and the double version of calc_digamma as returning double type). Pull Request resolved: https://github.com/pytorch/pytorch/pull/25488 Reviewed By: ezyang Differential Revision: D17172379 Pulled By: VitalyFedyunin fbshipit-source-id: 56facd45564cff019d572138c0d541a0bdded5c8	2019-09-11 07:10:41 -07:00
Pieter Noordhuis	ebdb32c749	Remove global group name tracking for ProcessGroupNCCL (#25905 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25905 Now that we can detect and recover from failures in NCCL we should allow processes that are started at different times (and perhaps have had previous NCCL process group instances), to eventually be part of the same process group. Keeping track of group names in global variables prevents that, because the processes will be out of sync. This commit removes the global group name maps and defers responsibility of isolating access to the same store from multiple process groups to the store itself. Users can use `c10d::PrefixStore` to derive new store instances whose keyspace is scoped to some prefix. Functionally, this is identical to keeping a global map and using a group name, but also gives more flexibility to the front-end API to reset state and have processes that have started at different times to join the same process group. ghstack-source-id: 89804865 Test Plan: Tests pass. Differential Revision: D17281416 fbshipit-source-id: eab3b48463a9b0ef24aedeca76e2bb970b9f33ef	2019-09-11 06:56:33 -07:00
Pieter Noordhuis	929764ac2a	Remove superfluous check for POLLIN in TCPStore (#25911 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25911 The check is practically equivalent to checking for equivalence with POLLIN (because the constant is a single bit and poll(2) is asked to check for POLLIN). On macOS, if a client disconnects, POLLHUP will be set as well, and the check fails. Instead of performing the check and letting it fail, we can simply run the `query` function and catch exceptions, in case we see EOF. Test Plan: Imported from OSS Differential Revision: D17313301 Pulled By: pietern fbshipit-source-id: 00c5a69043f70848ef632d53f8e046dc69e15650	2019-09-11 02:23:34 -07:00
Pieter Noordhuis	e4cd807cdb	Make running Gloo tests conditional on availability Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25913 Test Plan: Imported from OSS Differential Revision: D17313283 Pulled By: pietern fbshipit-source-id: f07cb456e79a0067eac0f7abbc378fbd05c5565f	2019-09-11 02:20:32 -07:00
Pieter Noordhuis	ebeb2a35ce	Increase failure threshold for timing based assert (#25867 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25867 The test can fail if this is run as a stress test. Increase the threshold to significantly decrease the probability of failure. ghstack-source-id: 89743661 Test Plan: Tests pass. Differential Revision: D17266101 fbshipit-source-id: af514eff305783e4a970ac30c3ebdb02fbdcf4c5	2019-09-11 02:17:37 -07:00
Mike Ruberry	87a2c92615	Updates autograd engine to respect streams set in forward (#8354 ) Summary: This PR addresses issue https://github.com/pytorch/pytorch/issues/7601. Currently models that use streams explicitly in forward have to do a lot of extra work to make backwards respect those streams. This PR extends the (recently added) input tracing (see TypeAndShape) to record the devices and streams of inputs. The autograd engine then uses this metadata to enact the expected stream parallelism without extra work from the user. For example, a model with forward declared like (original example courtesy of ngimel): ``` def forward(self,x): x0 = x.clone() torch._C._cuda_setStream(self.stream1._cdata) y0 = self.fc1(x0) self.event1.record(stream = torch.cuda.current_stream()) torch._C._cuda_setStream(self.stream2._cdata) y1 = self.fc2(x) self.event2.record(stream = torch.cuda.current_stream()) self.stream2.wait_event(self.event1) return y0 + y1 ``` currently will backward on a single stream. With this change the kernels will go on the streams they are assigned in forward and both forward and backward will (for appropriate sizes) run the fc1 and fc2 kernels simultaneously. The crux of this change is, as mentioned, an expansion of the TypeAndShape tracing and a relatively simple change to the autograd engine to use cuda events for stream synchronization. To make this efficient I also added a new AutoGPUAndStream class, exposed getting and setting streams on devices, and removed InputBuffer's AutoGPU (it's now redundant). While making these modifications I also fixed AutoGPU to check before setting the GPU when it's destroyed and to use THCudaCheck instead of its custom error handler. These changes mean that an often excessive cudaSetDevice() is not being called when inputs are added to a buffer. In addition to allowing users to easily set and use streams that are respected in both forward and backward, this change may encourage modules to do the same and the expanded tracing might allow further optimizations in the autograd engine. (apaszke, for example, now after initial enumeration we know the number of devices that will be used by a graph task, which might help provide a sense of the "level of parallelism" we should expect.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/8354 Test Plan: Two tests were added specifically for this behavior. Differential Revision: D17275980 Pulled By: mruberry fbshipit-source-id: 92bd50ac782ffa973b159fcbbadb7a083802e45d	2019-09-10 23:46:51 -07:00
svcscm	6b3f968957	Updating submodules Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: b644007606b1c53d030e596f84ad5c09eedf2a9e	2019-09-10 22:19:47 -07:00
vishwakftw	9815739d83	Fix LBFGS on GPU (#25909 ) Summary: Changelog: - Fixes mismatch of device in LBFGS, and possibly that of data type as well. Fixes https://github.com/pytorch/pytorch/issues/25854 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25909 Differential Revision: D17285583 Pulled By: soumith fbshipit-source-id: 68df9326c1c40803494ee0693a0eddcd98c30ce7	2019-09-10 21:41:53 -07:00
Jianyu Huang	3185b455c6	Add assert to ensure the divisor is not 0 (#25960 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25960 Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/124 Test Plan: CI Reviewed By: dskhudia Differential Revision: D17292372 fbshipit-source-id: 71a72f87b99c65b3b956bd8361694b1de05fc333	2019-09-10 21:03:15 -07:00
Jianyu Huang	9b4f3fd7d3	Add torch.nn.LSTM into the default dynamic quantize mappings (#25954 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25954 Add torch.nn.LSTM into the default dynamic quantize mappings. We will by default dynamic quantize LSTM when we apply the quantize_dynamic API. ghstack-source-id: 89839673 Test Plan: CI Differential Revision: D17294958 fbshipit-source-id: 824aceef821276b3e28c52ce3bebafaf9b0a0833	2019-09-10 21:03:12 -07:00
Zachary DeVito	21e9d1144e	fix use-after-free bug Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25965 Test Plan: Imported from OSS Differential Revision: D17300835 Pulled By: zdevito fbshipit-source-id: dd22d71687f03a5900aec4e36b795e1b13904eee	2019-09-10 20:18:14 -07:00
Richard Zou	dc015a1afb	Delete tools/autograd/env.py (#25920 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25920 It's not necessary anymore Test Plan - [namedtensor ci] Test Plan: Imported from OSS Differential Revision: D17285053 Pulled By: zou3519 fbshipit-source-id: 56dc18d1d49a0df7dacda880189e6c2fb09bc5f6	2019-09-10 20:08:24 -07:00
Igor Fedan	e69a6bab8c	compute common dtype based on inputs only (#25593 ) Summary: Currently we compute common dtype for TensorIterator based on all inputs and outputs. It can be a problem when dtype of the outputs should be different from dtype of inputs. (Example torch.eq) We also have `dont_compute_common_dtype` method that allows us to avoid a computation of a common dtype for all inputs and outputs. This PR will give the ability to compute common dtype based only on inputs using `compute_common_dtype_only_for_inputs`. Also it will provide a simple method `input_dtype(int arg=0) that will give the ability to dispatch based on input's dtype. ``` AT_DISPATCH_ALL_TYPES(iter.input_dtype(), ... ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/25593 Differential Revision: D17286352 Pulled By: ifedan fbshipit-source-id: a94fb608acd2763120992fe85b8dfd02ff21f9ba	2019-09-10 19:30:08 -07:00
Elias Ellison	8f7020bbdb	add support for ModuleDict (#25715 ) Summary: Add support for nn.ModuleDict in script. This is needed to support torchvision. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25715 Differential Revision: D17301826 Pulled By: eellison fbshipit-source-id: 541b5477e980f519a8c3bbb1be91dac227f6d00f	2019-09-10 18:43:49 -07:00
Will Feng	a88f310151	Simplify header inclusion in test/cpp/api/modules.cpp (#25921 ) Summary: This PR simplifies header inclusion in `test/cpp/api/modules.cpp`, so that when we add a new `torch::nn` module and add the test in `modules.cpp`, we can check that the new module's header is included in `torch/torch.h`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25921 Differential Revision: D17303220 Pulled By: yf225 fbshipit-source-id: 327db0ff2f075d52e7b594b3dffc5a59441e0931	2019-09-10 18:37:39 -07:00
Jiakai Liu	74b48f21c1	remove protobuf from Dependencies.cmake for libtorch mobile build (#25958 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25958 Should have cleaned up the remaining protobuf dependencies before landing PR #25896. Test Plan: - CI build; Reviewed By: dreiss Differential Revision: D17296949 Pulled By: ljk53 fbshipit-source-id: 20c444e63900c7fa054db3cc757d3f18614af630	2019-09-10 18:23:20 -07:00
Spandan Tiwari	fc93d1ae6b	Add ONNX export support for torch.log1p. (#25808 ) Summary: `torch.log1p` operator is not supported in ONNX exporter. This PR adds the support. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25808 Reviewed By: zrphercule Differential Revision: D17298092 Pulled By: houseroad fbshipit-source-id: 65a919a07797722d7d4df8caf284bd89acd0bb02	2019-09-10 18:17:08 -07:00
Elias Ellison	1897440e02	add torch.jit.is_scripting api (#25955 ) Summary: The PR https://github.com/pytorch/pytorch/pull/25263 was based on got reverted and ghimport got confused. Relanding here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25955 Differential Revision: D17296727 Pulled By: eellison fbshipit-source-id: 96200d3ef4c86f0d9907dc41b05619cb33bf2bab	2019-09-10 17:28:59 -07:00
Richard Zou	5d3267cd30	Remove some more BUILD_NAMEDTENSOR flags Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25919 Test Plan: - [namedtensor ci] Differential Revision: D17285052 Pulled By: zou3519 fbshipit-source-id: 52dce616104248bd36a1ddbefe51ce83163eae51	2019-09-10 17:22:28 -07:00
Shen Li	2655b2710c	Disable flaky test_invalid_names in test_rpc.py (#25916 ) Summary: pietern discovered that `test_invalid_names` is flaky on master. https://github.com/pytorch/pytorch/issues/25656 is potentially the fix. Disable this test for now and will try to add it again when https://github.com/pytorch/pytorch/issues/25656 is in. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25916 Differential Revision: D17287496 Pulled By: mrshenli fbshipit-source-id: 9313958d3480c2bab20cd2341837c7821e3bb1b5	2019-09-10 17:00:22 -07:00
Richard Zou	4231287504	Add names= argument to torch.tensor ctor (#25424 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25424 Test Plan - new tests [namedtensor ci] Test Plan: Imported from OSS Differential Revision: D17120399 Pulled By: zou3519 fbshipit-source-id: 93d7944f2ec4c5a7256f505323b879af706131df	2019-09-10 16:58:01 -07:00
Yanli Zhao	2856fd6c22	make python rpc handler to be singleton class (#25742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25742 python rpc handler right now is namespace + global variable, changed it to be singleton class as it can gurantee deterministic order of variable destruction. for namespace + global variable, we hit a process exit crash issue because global variables have dependencies and they are not destructed as expected ghstack-source-id: 89809889 Test Plan: unit test passed Differential Revision: D17097999 fbshipit-source-id: 5a5d003925dba1a7ea1caf3b7c28ff9e24c94a21	2019-09-10 15:30:20 -07:00
Jiakai Liu	16c1907830	update build_android.sh to not build host protoc for libtorch (#25896 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25896 Similar change as PR #25822. Test Plan: - Updated CI to use the new script. - Will check pytorch android CI output to make sure it builds libtorch instead of libcaffe2. Reviewed By: dreiss Differential Revision: D17279722 Pulled By: ljk53 fbshipit-source-id: 93abcef0dfb93df197fabff29e53d71db5674255	2019-09-10 15:19:43 -07:00
Jiakai Liu	4bd9ddb0b7	remove pthreadpool dependency in aten/CMake (#25894 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25894 NNPack/QNNPack both depend on a third-party library "pthreadpool". There are two versions of "pthreadpool" implementation, one is the default implementation under third-party/pthreadpool, the other is caffe2 custom implementation under caffe2/utils/threadpool. Both implementations share the same interface (as defined by pthreadpool headers). Usually only one version of pthreadpool should be linked into libtorch. If QNNPACK_CUSTOM_THREADPOOL/NNPACK_CUSTOM_THREADPOOL are set to true, then QNNPack/NNPack will not link third-party/pthreadpool - they will expect the caller (libtorch) to link correct version of pthreadpool; otherwise they will bring in the default pthreadpool implementation. Looks like libtorch cmake already sets both macros to true in Dependencies.cmake and External/nnpack.cmake. And currently libtorch mobile build includes the caffe2/utils/threadpool pthreadpool implementation. So it shouldn't try to explicitly link default pthreadpool target in aten/CMake in this AT_NNPACK_ENABLED section. Test Plan: - Before this diff, libtorch.so links libpthreadpool.a: ``` LINK_LIBRARIES = lib/libc10.so lib/libqnnpack.a lib/libnnpack.a lib/libcpuinfo.a -llog -ldl -lm lib/libnnpack.a lib/libpthreadpool.a lib/libcpuinfo.a lib/libclog.a -llog -latomic -lm ``` - After this diff, libtorch.so no longer links libpthreadpool.a: ``` LINK_LIBRARIES = lib/libc10.so lib/libqnnpack.a lib/libnnpack.a lib/libcpuinfo.a -llog -ldl -lm lib/libnnpack.a lib/libcpuinfo.a lib/libclog.a -llog -latomic -lm ``` - Tried the following combinations to make sure things work as expected: * remove caffe2/utils/threadpool, remove libpthreadpool: link error; * keep caffe2/utils/threadpool, remove libpthreadpool: no link error; * remove caffe2/utils/threadpool, add back libpthreadpool: no link error; Pull Request resolved: https://github.com/pytorch/pytorch/pull/25894 Reviewed By: dreiss Differential Revision: D17279723 Pulled By: ljk53 fbshipit-source-id: ae5aa7ca7283a276ecf1e2140bad0a6af3efdb3a	2019-09-10 15:03:13 -07:00
Sebastian Kaczor	ec8e75ea92	Fix int32 overflow in SummaryOps.cu getBin #25747 (#25748 ) Summary: Fixes issue https://github.com/pytorch/pytorch/issues/25747 by upcasting to int64 before multiplication. Should be good enough for all reasonable nbins Pull Request resolved: https://github.com/pytorch/pytorch/pull/25748 Differential Revision: D17269111 Pulled By: ezyang fbshipit-source-id: 484be39080571203264a1bb9898ecf23d1aeafab	2019-09-10 15:00:45 -07:00
Wanchao Liang	a7eaec6cf2	add set_grad_enabled to TorchScript and fix data attribute Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25350 Test Plan: Imported from OSS Differential Revision: D17100829 fbshipit-source-id: d85d6f3b03218b9c77e144365940eeaa5b4cce9a	2019-09-10 14:36:26 -07:00
Lara Haidar	387d5a4459	Add ONNX Export Support to rsqrt Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24153 Reviewed By: zrphercule Differential Revision: D17231150 Pulled By: houseroad fbshipit-source-id: 621fa9069238a74101bb2a7f4792a6feb1f89606	2019-09-10 14:33:54 -07:00
Johannes M Dieterich	d377556f08	Make persistent softmax WARP_SIZE aware. (#25937 ) Summary: Also change documentation to reflect both the CUDA and ROCm facts. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25937 Differential Revision: D17291453 Pulled By: bddppq fbshipit-source-id: ee1d7a34f3ad6c05a8f1564d4f9e516e497f2199	2019-09-10 14:12:40 -07:00
Pavel Belevich	a14e884546	Migrate pow from TH to Aten (CUDA) (#25517 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24613 ``` DEBUG = 0 OMP_NUM_THREADS = 1 Tesla M40 import torch base = torch.randn(1000000, device='cuda:1') exp = torch.randn(1000000, device='cuda:1') out = torch.empty_like(base) timeit base.pow(0) old 53.1 µs ± 22.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 18.7 µs ± 15 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) timeit base.pow(1/3) old 53.3 µs ± 20.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 51.1 µs ± 101 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(-1/3) old 53.3 µs ± 55.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 51.1 µs ± 29.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(1/2) old 53.2 µs ± 38.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 34.8 µs ± 40.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(-1/2) old 53.3 µs ± 54.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 42 µs ± 32.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(1) old 38.3 µs ± 53.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 40.1 µs ± 41.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(-1) old 38.4 µs ± 29 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 35 µs ± 143 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(2) old 38.1 µs ± 20.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 34.8 µs ± 90.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(-2) old 38.3 µs ± 11.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 35.2 µs ± 54.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(3) old 38.3 µs ± 164 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 34.9 µs ± 46.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(-3) old 53.3 µs ± 89.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 51.4 µs ± 31.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(123456.789) old 53.3 µs ± 12.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 51.2 µs ± 24.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(-123456.789) old 53.5 µs ± 152 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 51.3 µs ± 66.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(exp) old 58.2 µs ± 25.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 54.5 µs ± 25.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(0, exp) old 49.1 µs ± 89.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 58.7 µs ± 125 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(1, exp) old 48.7 µs ± 26.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 18.7 µs ± 88.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) timeit torch.pow(-1, exp) old 50.7 µs ± 104 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 59.8 µs ± 100 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(42, exp) old 49.4 µs ± 98 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 58.6 µs ± 26.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(-42, exp) old 50.4 µs ± 131 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 59.8 µs ± 48.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(0, exp, out=out) old 49 µs ± 13 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 59.2 µs ± 169 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(1, exp, out=out) old 49.3 µs ± 122 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 18.8 µs ± 45.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) timeit torch.pow(-1, exp, out=out) old 50.4 µs ± 167 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 60.2 µs ± 71.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(42, exp, out=out) old 49.2 µs ± 293 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 58.9 µs ± 193 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(-42, exp, out=out) old 50.5 µs ± 150 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 60.1 µs ± 89.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) base = (torch.rand(1000000, device='cuda:1') * 10).to(int) exp = (torch.rand(1000000, device='cuda:1') * 10).to(int) out = torch.empty_like(base) timeit base.pow(0) old 75.5 µs ± 10.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 33.8 µs ± 84.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(1/3) old 75.5 µs ± 78.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 842 µs ± 449 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(-1/3) old 75.5 µs ± 24.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 843 µs ± 231 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(1/2) old 75.7 µs ± 141 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 123 µs ± 71.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(-1/2) old 76 µs ± 162 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 180 µs ± 55.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(1) old 74.1 µs ± 25.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 72.3 µs ± 32.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(-1.0) old Integers to negative integer powers are not allowed. new 86.9 µs ± 84.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(2) old 74.2 µs ± 15.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 66.5 µs ± 28.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(-2.0) old Integers to negative integer powers are not allowed. new 87.3 µs ± 25.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(3) old 74.3 µs ± 23.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 66.5 µs ± 43.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(-3.0) old Integers to negative integer powers are not allowed. new 861 µs ± 372 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(123456.789) old 256 µs ± 115 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 863 µs ± 64.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(-123456.789) old Integers to negative integer powers are not allowed. new 863 µs ± 57.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(exp) old 111 µs ± 14.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 98.8 µs ± 16 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(0, exp) old 81.9 µs ± 23.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 92.9 µs ± 14.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(1, exp) old 81.9 µs ± 25.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 33.6 µs ± 56.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(-1, exp) old 82.2 µs ± 15.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 93.6 µs ± 161 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(42, exp) old 82.1 µs ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 93.8 µs ± 75.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(-42, exp) old 82.3 µs ± 18.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 94 µs ± 68.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(0, exp, out=out) old 81.6 µs ± 115 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 93.8 µs ± 83.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(1, exp, out=out) old 81.6 µs ± 26.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 33.7 µs ± 36.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(-1, exp, out=out) old 82.7 µs ± 119 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 93.9 µs ± 116 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(42, exp, out=out) old 82.6 µs ± 216 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 93.7 µs ± 144 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(-42, exp, out=out) old 82.5 µs ± 214 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 94 µs ± 55.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/25517 Differential Revision: D17251364 Pulled By: pbelevich fbshipit-source-id: 20904c073c311e76285eaa1b68e67e67ea3c62d8	2019-09-10 13:46:22 -07:00
Mikhail Zolotukhin	55219d55a6	Only create a new clone of observer when we actually insert it. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25931 Test Plan: Imported from OSS Differential Revision: D17288546 Pulled By: ZolotukhinM fbshipit-source-id: 01584d27c0ebd127b845e560aedd1f1d9d298c5e	2019-09-10 13:46:18 -07:00
Johannes M Dieterich	618804f237	Make lookup table warp size aware Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25926 Differential Revision: D17286446 Pulled By: bddppq fbshipit-source-id: d25515f25f9df309a08ae7f948bb6a087e45134e	2019-09-10 13:22:17 -07:00
Shahriar	3680cef44e	C++ Fold nn module Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24160 Differential Revision: D17260740 Pulled By: yf225 fbshipit-source-id: f0c7769316bed330289ca3d948f2e39c72ec928b	2019-09-10 13:19:37 -07:00
Johannes M Dieterich	2ab0f221ba	Make spatial depthwise convolution warp size aware (#25922 ) Summary: Use new macro and remove hard-coded path. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25922 Differential Revision: D17286444 Pulled By: bddppq fbshipit-source-id: 21bfb6053258af3ccfe1f2a6e5c17faa31602e28	2019-09-10 13:08:46 -07:00
Mikhail Zolotukhin	c749be9e9f	Make arguments of Module::dump easier to remember. (#25740 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25740 Previously we had `omit_method_bodies`, `omit_attr_values` and `omit_param_values`. They were called the same in the python bindings and it was hard to remember their proper spelling. This PR changes them to `code`, `attrs`, and `params` which are might easier to remember. It also flips their meaning - now they enable printing instead of disabling it. I also changed the default values to 'print all' from 'print nothing', as that's the most usual way of using it. Test Plan: Imported from OSS Differential Revision: D17217517 Pulled By: ZolotukhinM fbshipit-source-id: fa56e478a732ffd685d885f11c9da0457cd03d16	2019-09-10 11:42:26 -07:00
Ailing Zhang	26f67e7aa7	fix scatter CPU kernel when (input size, src size) > index size (#25839 ) Summary: fixes https://github.com/pytorch/pytorch/issues/25836 According to doc, https://pytorch.org/docs/stable/tensors.html#torch.Tensor.scatter_ `index` must have the smallest size and we should iterate over `index` instead of `tensor`. cc: dlibenzi Pull Request resolved: https://github.com/pytorch/pytorch/pull/25839 Differential Revision: D17269116 Pulled By: ailzhang fbshipit-source-id: 0e8569fed6c0d2dd70e4e3ec5d29d8730cd2ae8f	2019-09-10 11:41:41 -07:00
Johannes M Dieterich	5dfef472fb	make sparse coalesce warp size aware (#25918 ) Summary: Use the new C10_WARP_SIZE macro to make the sparse coalesce kernel warp size aware. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25918 Differential Revision: D17286442 Pulled By: bddppq fbshipit-source-id: a079f012c32e5786b49b2a6973019d847ee11897	2019-09-10 11:10:07 -07:00
Haixin Liu	9c10f729de	Add Dropout to blacklist (#25881 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25881 Add Dropout to blacklist to avoid the error in eager mode quantization. ghstack-source-id: 89759536 Test Plan: Test locally in python notebook. Reviewed By: jianyuh Differential Revision: D17270826 fbshipit-source-id: bcf43483976740564d7f407838f25c2dbb67b016	2019-09-10 10:57:38 -07:00
Johannes M Dieterich	26675b507f	Enable libflame as a LAPACK choice (#25795 ) Summary: libflame is BLIS's companion LAPACK from the FLAME project Mimicks my ancient `f5bc78263e` in cmake upstream BLIS WWW: https://github.com/flame/libflame Pull Request resolved: https://github.com/pytorch/pytorch/pull/25795 Differential Revision: D17286461 Pulled By: bddppq fbshipit-source-id: 7cd0d27127c78563574791415e4a34f045df30df	2019-09-10 10:34:55 -07:00
Edward Yang	aa49aa856c	Tensor type set (#25308 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25308 Instead of storing a single TensorTypeId in a Tensor, we store a bitset of tensor type IDs in a Tensor, TensorTypeSet. This class comes with some unit tests. This is in preparation for making Variable a TensorTypeId. In order to help flush out places where this makes a semantic difference, we rename `Tensor::type_id()` to `Tensor::type_set()` and smoke out all of the locations where this was semantically meaningful. Because the new tensor type set is 64-bits, this increases the size of Tensor by a word. Listing of semantic changes: * Many TensorImpl related constructors just propagate TensorTypeId to a parent constructor. These are pretty simple to adjust. * Backend extensions are now in the business of explicitly constructing a TensorTypeSet and then passing it in. This is probably OK for now but when Variable drops, these dispatch IDs may get immediately overwritten to have Variable set. * `sparseTensorSetToDeviceType` and similar functions previously did an equality test with TensorTypeId, to determine what an appropriate device type is. This equality is now replaced with a set inclusion test. This is valid, under the assumption that we don't ever have weird sets like "this tensor is simultaneously a sparse CPU tensor and a sparse CUDA tensor", which will be true in the short term plan of adding Variable to the dispatch ID. * `impl::dispatchTypeId` was generally introduced for cases where we legitimately need to convert from `TensorTypeSet -> TensorTypeId` in a dispatch related manner. At the moment, the implementation is trivial, but they will soon be adjusted to handle TLS. I've tried to make these call sites as forwards compatible as possible: * `checked_tensor_unwrap` and co now use `dispatchTypeId`. When Variable is added to the type set, these will always be called in a context where the Variable type ID is disabled, so we will get the correct underlying tensor type ID. * Uses of `Backend` in dispatch are now replaced with `TensorTypeSet`. The general heuristic here for whether or not to accept a `TensorTypeId` or `TensorTypeSet` is that we want to make the generated code as simple as possible. It is easier to retrieve a `TensorTypeSet`, so that's a more appropriate API in these cases. * In some cases, I could not conveniently switch an implementation to the new semantics, because it was blocked on some other refactor. In this case, I introduced `legacyExtractTypeId`, which gives what would be a BC-compatible `TensorTypeSet` to `TensorTypeId` implementation that will continue to report the same values it would have prior to this change. This is different from `dispatchTypeId`, because this function does NOT respect TLS; it always ignores Variable type IDs. * c10 dispatcher tests, which are oblivious to Variable dispatch, use this BC function (actually, they use `extractTypeId`, an overload for Tensor. * The implementation of `new_` methods heavily relies on tensor type ID, I chose not to unwind this. PR to refactor this at https://github.com/pytorch/pytorch/pull/25475 Slicing also relies on tensor type ID, see `torch/csrc/autograd/python_variable_indexing.cpp` (though in some cases in this file, I was able to replace use of tensor type ID with TensorOptions) * In some cases, there is an equality test on tensor type ID which would be better done by testing "tensor axes". In those cases, I replaced those equality tests with more equality tests. * Example: `torch/csrc/nn/type_checks.h` * There is a total punt in `torch/csrc/tensor/python_tensor.cpp` where "instance of" checking is done via dispatch ids. In general, the Variable-ness of a tensor doesn't participate in instanceof testing. It's not entirely clear what to do here. * Instead of storing `Backend` in `VariableInfo`, we now just store Layout. c10 dispatcher test updates were done with: ``` :%s/$[^ ]\+$\.type_id()/extractTypeId(\1)/g :%s/$[^( ]\+$->type_id()/extractTypeId(*\1)/g ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/25308 Differential Revision: D17092791 Test Plan: sandcastle and ossci Reviewed By: bwasti Pulled By: ezyang fbshipit-source-id: 22207d14fe62dd31ee19cc5011af22e3d9aabb5b	2019-09-10 10:30:54 -07:00
Jiakai Liu	6630c3f379	add NO_EXPORT macro to unset __visibility__ attribute (#25816 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25816 On Android we will release a small set of native APIs designed for mobile use cases. All of needed libtorch c++ APIs are called from inside this JNI bridge: android/pytorch_android/src/main/cpp/pytorch_jni.cpp With NO_EXPORT set for android static library build, it will hide all original TORCH, CAFFE2, TH/ATen APIs, which will allow linker to strip out unused ones from mobile library when producing DSO. If people choose to directly build libtorch DSO then it will still keep all c++ APIs as the mobile API layer is not part of libtorch build (yet). Test Plan: - build libtorch statically and link into demo app; - confirm that linker can strip out unused APIs; Differential Revision: D17247237 Pulled By: ljk53 fbshipit-source-id: de668216b5f2130da0d6988937f98770de571c7a	2019-09-10 10:20:21 -07:00
Jiakai Liu	8485710143	introduce INTERN_DISABLE_AUTOGRAD flag to create inference only library for mobile Summary: This is the first of a series of changes to reduce build size by cutting autograd functions from mobile build. When INTERN_DISABLE_AUTOGRAD is set: * On CMake side we exclude Functions.h/cpp, VariableType.h/cpp, VariableTypeManual.cpp from the build process. Still keep variable_factories.h as we rely on it to create variables instead of tensors. In source code we gate a couple autograd references (in autograd/variable.cpp) with C10_MOBILE (technically we should use a dedicated c macro but its maintenance cost is higher than cmake macro as we have several build systems to change). * Pass --disable-autograd flag to codegen script, which will stop generating Functions/VariableType code. And for variable_factories.h it will stop generating tracing code. Edit: in this diff we will keep Functions.h/cpp to avoid changing source code. Why we need this change if it's already not calling VariableType and autograd stuff with USE_STATIC_DISPATCH=ON for mobile? It's trying to reduce static library size for iOS build, for which it's relatively harder to strip size with linker approach. Why we need make involved change into codegen script? There isn't a global config system in codegen - autograd/env.py provides similar functionality but it says not adding anything there. Test Plan: - will check CI; - test mobile build in sample app; Differential Revision: D17202733 Pulled By: ljk53 fbshipit-source-id: 5701c6639b39ce58aba9bf5489a08d30d1dcd299	2019-09-10 10:20:17 -07:00
Jiakai Liu	41cf5564fe	gate static aten registerer with USE_STATIC_DISPATCH (#25815 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25815 Don't need call these global registerers when USE_STATIC_DISPATCH is set as they will keep all aten functions at link time. Should solely rely on jit/generated/register_aten_ops* to keep "interface" aten functions (which are directly called from JIT), and rely on STATIC_DISPATCH + linker to keep all other aten functions that are transitively needed by the "interface" functions. Test Plan: - build and run in the demo app; - with stacked diff to shrink registered "interface" functions, linker can strip out unused aten implementations; Differential Revision: D17247236 Pulled By: ljk53 fbshipit-source-id: 1feb5fbb8b9cfa057b9ba8bf3f2967f40980c917	2019-09-10 10:20:13 -07:00
Peter Bell	76ee02f10d	Rename packed tensor accessor (#25654 ) Summary: Closes https://github.com/pytorch/pytorch/issues/19268 This does the renaming suggested by ezyang in https://github.com/pytorch/pytorch/issues/19268#issuecomment-490478887 except that the templated version of `packed_accessor` is also renamed to `generic_packed_accessor`. Additionally, all of the users I could find in `ATen/native/cuda` are updated without changing their index types. The corresponding tutorial update is in pytorch/tutorials#644 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25654 Differential Revision: D17259208 Pulled By: ezyang fbshipit-source-id: 172a46f623d544ca16f7ed5077b6e4f57a3d1f21	2019-09-10 09:18:54 -07:00
Ralf Gommers	e8cc1fddb7	Fix cpp_extensions test failures with GCC 9.1 from ArrayRef(initializer_list) (#25384 ) Summary: These are test failures due to `-Werror` in `test/cpp_extensions/setup.py` that look like: ``` $ python test/run_test.py -i cpp_extensions Test executor: ['/home/rgommers/anaconda3/envs/pytorch-gcc91/bin/python'] Running test_cpp_extensions ... [2019-08-29 02:19:03.421117] running install running build running build_py creating build creating build/lib.linux-x86_64-3.6 creating build/lib.linux-x86_64-3.6/torch_test_cpp_extension copying torch_test_cpp_extension/__init__.py -> build/lib.linux-x86_64-3.6/torch_test_cpp_extension running build_ext building 'torch_test_cpp_extension.cpp' extension creating build/temp.linux-x86_64-3.6 gcc -pthread -B /home/rgommers/anaconda3/envs/pytorch-gcc91/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/rgommers/code/pytorch/torch/include -I/home/rgommers/code/pytorch/torch/include/torch/csrc/api/include -I/home/rgommers/code/pytorch/torch/include/TH -I/home/rgommers/code/pytorch/torch/include/THC -I/home/rgommers/anaconda3/envs/pytorch-gcc91/include/python3.6m -c extension.cpp -o build/temp.linux-x86_64-3.6/extension.o -g -Werror -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=cpp -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++11 cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ In file included from /home/rgommers/code/pytorch/torch/include/c10/core/MemoryFormat.h:5, from /home/rgommers/code/pytorch/torch/include/ATen/core/Tensor.h:5, from /home/rgommers/code/pytorch/torch/include/ATen/Tensor.h:2, from /home/rgommers/code/pytorch/torch/include/ATen/Context.h:4, from /home/rgommers/code/pytorch/torch/include/ATen/ATen.h:5, from /home/rgommers/code/pytorch/torch/include/torch/csrc/api/include/torch/types.h:3, from /home/rgommers/code/pytorch/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4, from /home/rgommers/code/pytorch/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3, from /home/rgommers/code/pytorch/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3, from /home/rgommers/code/pytorch/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3, from /home/rgommers/code/pytorch/torch/include/torch/csrc/api/include/torch/data.h:3, from /home/rgommers/code/pytorch/torch/include/torch/csrc/api/include/torch/all.h:4, from /home/rgommers/code/pytorch/torch/include/torch/extension.h:4, from extension.cpp:1: /home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h: In instantiation of ‘constexpr c10::ArrayRef<T>::ArrayRef(const std::initializer_list<_Tp>&) [with T = long int]’: /home/rgommers/code/pytorch/torch/include/c10/core/TensorImpl.h:1464:34: required from here /home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h:103:39: error: initializing ‘c10::ArrayRef<long int>::Data’ from ‘std::initializer_list<long int>::begin’ does not extend the lifetime of the underlying array [-Werror=init-list-lifetime] 103 \| : Data(Vec.begin() == Vec.end() ? static_cast<T*>(nullptr) : Vec.begin()), /home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h: In instantiation of ‘constexpr c10::ArrayRef<T>::ArrayRef(const std::initializer_list<_Tp>&) [with T = unsigned char]’: /home/rgommers/code/pytorch/torch/include/ATen/NativeFunctions.h:47:1: required from here /home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h:103:39: error: initializing ‘c10::ArrayRef<unsigned char>::Data’ from ‘std::initializer_list<unsigned char>::begin’ does not extend the lifetime of the underlying array [-Werror=init-list-lifetime] /home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h: In instantiation of ‘constexpr c10::ArrayRef<T>::ArrayRef(const std::initializer_list<_Tp>&) [with T = signed char]’: /home/rgommers/code/pytorch/torch/include/ATen/NativeFunctions.h:47:1: required from here /home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h:103:39: error: initializing ‘c10::ArrayRef<signed char>::Data’ from ‘std::initializer_list<signed char>::begin’ does not extend the lifetime of the underlying array [-Werror=init-list-lifetime] /home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h: In instantiation of ‘constexpr c10::ArrayRef<T>::ArrayRef(const std::initializer_list<_Tp>&) [with T = short int]’: /home/rgommers/code/pytorch/torch/include/ATen/NativeFunctions.h:47:1: required from here /home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h:103:39: error: initializing ‘c10::ArrayRef<short int>::Data’ from ‘std::initializer_list<short int>::begin’ does not extend the lifetime of the underlying array [-Werror=init-list-lifetime] /home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h: In instantiation of ‘constexpr c10::ArrayRef<T>::ArrayRef(const std::initializer_list<_Tp>&) [with T = int]’: /home/rgommers/code/pytorch/torch/include/ATen/NativeFunctions.h:47:1: required from here /home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h:103:39: error: initializing ‘c10::ArrayRef<int>::Data’ from ‘std::initializer_list<int>::begin’ does not extend the lifetime of the underlying array [-Werror=init-list-lifetime] /home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h: In instantiation of ‘constexpr c10::ArrayRef<T>::ArrayRef(const std::initializer_list<_Tp>&) [with T = float]’: /home/rgommers/code/pytorch/torch/include/ATen/NativeFunctions.h:47:1: required from here /home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h:103:39: error: initializing ‘c10::ArrayRef<float>::Data’ from ‘std::initializer_list<float>::begin’ does not extend the lifetime of the underlying array [-Werror=init-list-lifetime] /home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h: In instantiation of ‘constexpr c10::ArrayRef<T>::ArrayRef(const std::initializer_list<_Tp>&) [with T = double]’: /home/rgommers/code/pytorch/torch/include/ATen/NativeFunctions.h:47:1: required from here /home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h:103:39: error: initializing ‘c10::ArrayRef<double>::Data’ from ‘std::initializer_list<double>::begin’ does not extend the lifetime of the underlying array [-Werror=init-list-lifetime] /home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h: In instantiation of ‘constexpr c10::ArrayRef<T>::ArrayRef(const std::initializer_list<_Tp>&) [with T = bool]’: /home/rgommers/code/pytorch/torch/include/ATen/NativeFunctions.h:47:1: required from here /home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h:103:39: error: initializing ‘c10::ArrayRef<bool>::Data’ from ‘std::initializer_list<bool>::begin’ does not extend the lifetime of the underlying array [-Werror=init-list-lifetime] /home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h: In instantiation of ‘constexpr c10::ArrayRef<T>::ArrayRef(const std::initializer_list<_Tp>&) [with T = c10::Half]’: /home/rgommers/code/pytorch/torch/include/ATen/NativeFunctions.h:47:1: required from here /home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h:103:39: error: initializing ‘c10::ArrayRef<c10::Half>::Data’ from ‘std::initializer_list<c10::Half>::begin’ does not extend the lifetime of the underlying array [-Werror=init-list-lifetime] /home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h: In instantiation of ‘constexpr c10::ArrayRef<T>::ArrayRef(const std::initializer_list<_Tp>&) [with T = c10::BFloat16]’: /home/rgommers/code/pytorch/torch/include/ATen/NativeFunctions.h:47:1: required from here /home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h:103:39: error: initializing ‘c10::ArrayRef<c10::BFloat16>::Data’ from ‘std::initializer_list<c10::BFloat16>::begin’ does not extend the lifetime of the underlying array [-Werror=init-list-lifetime] cc1plus: all warnings being treated as errors error: command 'gcc' failed with exit status 1 Traceback (most recent call last): File "test/run_test.py", line 438, in <module> main() File "test/run_test.py", line 430, in main raise RuntimeError(message) RuntimeError: test_cpp_extensions failed! ``` The warnings look valid, the code isn't guaranteed to work (although in practice it does seem to). Using `std::begin` keeps the underlying array for the `initializer_list` going out of scope. Note that the same warning is reported in https://github.com/pytorch/vision/issues/1173#issuecomment-517308733 (Cc ShahriarSS) Pull Request resolved: https://github.com/pytorch/pytorch/pull/25384 Differential Revision: D17113146 Pulled By: ezyang fbshipit-source-id: 477c414481fb3664a8cb92728f4111e6317b309e	2019-09-10 09:09:52 -07:00
Supriya Rao	c60dddbb9f	Store bias in PackedConvWeight in fbgemm (#25626 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25626 Add bias as an optional parameter in the packed conv weight struct. ghstack-source-id: 89780639 Test Plan: python test/run_test.py --exclude nn --verbose --bring-to-front quantization quantized quantized_tensor quantized_nn_mods quantizer Reviewed By: raghuramank100 Differential Revision: D17177723 fbshipit-source-id: e502f2196cb1c002db8b691124db740368944c92	2019-09-10 08:43:55 -07:00
Hong Xu	57b23c61c5	In the CUDA implementation of erfinv, erfinv() should be used for double (#25337 ) Summary: This best preserves accuracy, while erfinvf() should be used for half and float. This is also consistent with the implementation before the migration: https://github.com/pytorch/pytorch/issues/24943 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25337 Differential Revision: D17102333 Pulled By: zou3519 fbshipit-source-id: 5178cff534cf5f10d86ab04d4b6c1779ffedf49e	2019-09-10 06:30:33 -07:00
Igor Fedan	bf04c2ca2f	Make torch checks same for both CPU and CUDA multinomial (#25595 ) Summary: Currently we have different checks for multinomial method on CPU and CUDA. This PR will make them consistent. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25595 Differential Revision: D17236163 Pulled By: ifedan fbshipit-source-id: 7718173bdaf216e8eb636c2a5b9c5939b975325b	2019-09-10 05:29:58 -07:00
Hong Xu	8a026d4f74	Remove tools/setup_helpers/dist_check.py (#25879 ) Summary: What dist_check.py does is largely merely determining whether we should use set "USE_IBVERBS" to ON or OFF when the user sets "USE_GLOO_IBVERBS" to ON. But this is unnecessary, because this complicated determination will always be overrided by gloo: `2101e02cea/cmake/Dependencies.cmake (L19-L28)` Since dist_check.py becomes irrelevant, this commit also simplifies the setting of `USE_DISTRIBUTED` (by removing its explicit setting in Python scripts), and deprecate `USE_GLOO_IBVERBS` in favor of `USE_IBVERBS`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25879 Differential Revision: D17282395 Pulled By: pietern fbshipit-source-id: a10735f50728d89c3d81fd57bcd26764e7f84dd1	2019-09-10 04:33:28 -07:00
Johannes M Dieterich	a8d4bb34ea	Unify treatment of warp size / wave size (#25884 ) Summary: Introduce a C10_WARP_SIZE define in Macros.h For kernels that had ifdef-ing of WARP_SIZE for ROCm vs CUDA, use said macro. This is no functional change - we merely refactor to unify on one WARP_SIZE definition. I hope we can encourage use of this macro over more WARP_SIZE definitions being sprinkled across the code base (or numerically hard-coded). Some kernels remain that have their own WARP_SIZE definitions but did not satisfy above condition. They will be fixed in follow-up PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25884 Differential Revision: D17276662 Pulled By: bddppq fbshipit-source-id: cef8e77a74ae2e5de10df816ea80b25cb2bab713	2019-09-10 00:11:09 -07:00
Swati Rallapalli	c47ccfd01d	Enable variable size embedding (#25782 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25782 Enable variable size embedding for dot processor. We split the embedding matrix into multiple towers, based on the embedding size and perform dot product in a loop over each of the towers and finally concatenate all the dot product outputs. Test Plan: buck test //caffe2/caffe2/fb/dper/layer_models/tests/split_1: https://our.intern.facebook.com/intern/testinfra/testrun/3659174703037560 Specific unit tests -- buck test //caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test -- test_per_feature_emb_dim https://our.intern.facebook.com/intern/testinfra/testrun/3377699726358808 Reviewed By: chenshouyuan Differential Revision: D16690811 fbshipit-source-id: 8f5bce5aa5b272f5f795d4ac32bba814cc55210b	2019-09-09 22:08:32 -07:00
Huamin Li	2a917616a8	remove cosh_ op test (#25893 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25893 as title Test Plan: waitforsandcastle Reviewed By: mingzhe09088 Differential Revision: D17278340 fbshipit-source-id: 81b7e8658d5919e865754ae4d834dc44494cb2e3	2019-09-09 20:34:35 -07:00
Elias Ellison	7ab4ad7b6d	add torch.jit.is_scripting() api (#25263 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25263 This adds an api to return true in script and false in eager, which together with ignore allows guarding of not yet supported JIT features. Bikeshedding requested please. cc zou3519 ``` def foo(): if not torch.jit.is_scripting(): return torch.linear(...) else: return addmm(...) ``` Test Plan: Imported from OSS Differential Revision: D17272443 Pulled By: eellison fbshipit-source-id: de0f769c7eaae91de0007b98969183df93a91f42	2019-09-09 20:24:36 -07:00
vishwakftw	36bdde255e	Fix test_det_logdet_slogdet_batched on PowerPC (#25773 ) Summary: Changelog: - Simplify generation of singular matrices to just constructing a constant matrix instead of a random singular matrix using random_square_matrix_of_rank, which is susceptible to numerical issues Pull Request resolved: https://github.com/pytorch/pytorch/pull/25773 Test Plan: - test_det_logdet_slogdet_batched should pass Fixes https://github.com/pytorch/pytorch/issues/25172 cc: branfosj hartb Apologies for the delay. Differential Revision: D17261059 Pulled By: soumith fbshipit-source-id: 8f991e2cb8c0e9dccad363d4785075213088e58a	2019-09-09 19:23:42 -07:00
M. Doosti Lakhani	b27bcda851	argument 't', mis referenced to 'torch.t()' (#25885 ) Summary: Instead, it referenced to `:func:t`. This is caused by a misuse of `:attr:`. Fix https://github.com/pytorch/pytorch/issues/25834 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25885 Differential Revision: D17276501 Pulled By: soumith fbshipit-source-id: 6485a628b0e169a8b4b8bd956d9de3686017f02e	2019-09-09 19:21:00 -07:00
James Reed	79bcf6e5ba	Test scripting and tracing for dynamic linear modules Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25870 Test Plan: Imported from OSS Differential Revision: D17275747 Pulled By: jamesr66a fbshipit-source-id: ed8eaf7e9af3127c987e56d17d60b52d039d5ae8	2019-09-09 19:00:35 -07:00
James Reed	20204d1fe7	Fix c10 tracing (#25869 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25869 The c10 code for tracing was not disabling tracing when calling the op like it should have. This caused really weird errors where we were recording tensors for ops called within a given c10 op implementation, and making tracing fail Test Plan: Imported from OSS Differential Revision: D17275748 Pulled By: jamesr66a fbshipit-source-id: b4e89ae5a954a1f476c9e5b8bf405bdc621f0323	2019-09-09 19:00:32 -07:00
mshivama	67281deec0	Fix missing str to int conversion in the commit f71ddd42 (#25861 ) Summary: Came up in internal testing w/ python 2.7 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25861 Differential Revision: D17261070 Pulled By: soumith fbshipit-source-id: 412fe5e53ef4d8f2564d77dd17b480bb58cc391e	2019-09-09 18:30:41 -07:00
M. Doosti Lakhani	1777eb2ed9	fix typo: toDense --> to_dense #25706 (#25832 ) Summary: Only fixes a minor typo in [torch.sparse.FloatTensor docs](https://pytorch.org/docs/stable/sparse.html#torch.sparse.FloatTensor.toDense). Pull Request resolved: https://github.com/pytorch/pytorch/pull/25832 Differential Revision: D17276700 Pulled By: soumith fbshipit-source-id: cf3d550d5756b000a4e864170ecd4b31826b40f8	2019-09-09 18:27:03 -07:00
Mikhail Zolotukhin	e8f316c024	SubgraphMatcher: add logging to a check missed previously. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25735 Test Plan: Imported from OSS Differential Revision: D17216869 Pulled By: ZolotukhinM fbshipit-source-id: 64431134dff63cb5e22fa70110ceecc56e9031e7	2019-09-09 18:21:47 -07:00
Will Feng	d7d3aedd2c	Make various improvements to C++ API parity test harness (#25828 ) Summary: This PR makes the following improvements to C++ API parity test harness: 1. Remove `options_args` since we can get the list of options from the Python module constructor args. 2. Add test for mapping `int` or `tuple` in Python module constructor args to `ExpandingArray` in C++ module options. 3. Use regex to split up e.g. `(1, {2, 3}, 4)` into `['1', '{2, 3}', '4']` for `cpp_default_constructor_args`. 4. Add options arg accessor tests in `_test_torch_nn_module_ctor_args`. We will be able to merge https://github.com/pytorch/pytorch/pull/24160 and https://github.com/pytorch/pytorch/pull/24860 after these improvements. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25828 Differential Revision: D17266197 Pulled By: yf225 fbshipit-source-id: 96d0d4a2fcc4b47cd1782d4df2c9bac107dec3f9	2019-09-09 15:43:55 -07:00
Tao Xu	115494b00b	Cocoapods for iOS OSS release (#25847 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25847 ### Summary The Podspec file for iOS OSS release. This podspec contains the C++ header files and a static library that supports three architectures. Please ignore the link for `s.source` for now, as I'm still working on the CI nightly build. This is a temporary link for testing purpose. ### Note Previously I have a cocoapods release proposal - https://github.com/pytorch/pytorch/pull/25543 which contains two podspec files. However, for the time being, we haven't decided whether we want to release the Objective-C API wrapper or not. Please review and refer to this one if you have questions. Test Plan: Imported from OSS Differential Revision: D17262459 Pulled By: xta0 fbshipit-source-id: 4cc60787a41beab14cf9b1c0e9ab62b8b14603c5	2019-09-09 14:50:03 -07:00
Johannes M Dieterich	773b949a97	Remove NULL arguments that have been marked deprecated by rocBLAS (#25866 ) Summary: Silences compile-time warnings Pull Request resolved: https://github.com/pytorch/pytorch/pull/25866 Differential Revision: D17265421 Pulled By: bddppq fbshipit-source-id: 72c70a1aad655ff782f6e1dbb1002bc59b1eb9f3	2019-09-09 13:29:07 -07:00
Tao Xu	001ba1c504	Clean up the iOS build script (#25822 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25822 ### Summary Since protobuf has been removed from mobile, the `build_host_protoc.sh` can be removed from `build_ios.sh` as well. However, the old caffe2 mobile build still depend on it, therefore, I introduced this `BUILD_PYTORCH_MOBILE` flag to gate the build. - iOS device build ``` BUILD_PYTORCH_MOBILE=1 IOS_ARCH=arm64 ./scripts/build_ios.sh BUILD_PYTORCH_MOBILE=1 IOS_ARCH=armv7s ./scripts/build_ios.sh ``` - iOS simulator build ``` BUILD_PYTORCH_MOBILE=1 IOS_PLATFORM=SIMULATOR ./scripts/build_ios.sh ``` ### Test Plan All device and simulator builds run successfully Test Plan: Imported from OSS Differential Revision: D17264469 Pulled By: xta0 fbshipit-source-id: f8994bbefec31b74044eaf01214ae6df797816c3	2019-09-09 11:59:50 -07:00
Tao Xu	13292ec3c7	Add PR jobs for iOS builds (#25840 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25840 ### Summary: The CI jobs for iOS builds are missing, this PR creates a workflow which will two PR jobs: - pytorch_ios_10_2_1_x86_64_build - pytorch_ios_10_2_1_arm64_build ### Note: Those two jobs will not store any artifacts nor upload any binary files, which will be done in the next PR. Test Plan: - The jobs can be triggered by any PR. - The jobs can be run successfully. Differential Revision: D17255504 Pulled By: xta0 fbshipit-source-id: 5c56e85c7ccf6339a3e0ffd11eedd925f137adc8	2019-09-09 10:48:15 -07:00
hongzhen	378881e903	Enable log_softmax and CrossEntropyLoss for bfloat16 (#24457 ) Summary: Enabled torch.nn.functional.log_softmax and torch.nn.CrossEntropyLoss for bfloat16 data type. In order to do that, following dependency have to be enabled. - RNE (round to nearest even) - AccumulateType - bfloat16 arithmetic operator overload Also, we implement std::numeric_limits fully support for bfloat16 data type background for dependency: - RNE vs truncate From torch.nn.CrossEntropyLoss test. input_size=(128, 1000) RNE result: float output: tensor(7.3981, dtype=torch.float32, grad_fn=<NllLossBackward>) bfloat16 output: tensor(7.3125, dtype=torch.bfloat16, grad_fn=<NllLossBackward>) truncate result: float output: tensor(7.3981, dtype=torch.float32, grad_fn=<NllLossBackward>) bfloat16 output: tensor(5.8750, dtype=torch.bfloat16, grad_fn=<NllLossBackward>) - scalar_t vs AccumulateType (AccumulateType of bfloat16 is float) AccumulateType is essential to keep accuracy, especially for reduction related operation. we have verified it with both local case and real topology. It turns out that bfloat16 type accumulator would cause huge relative error when elements number is large, even more than 50%. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24457 Differential Revision: D17113018 Pulled By: ezyang fbshipit-source-id: 8d61297ca118f9b5c6730a01efcf3a3704d2f206	2019-09-09 09:19:47 -07:00
Edward Yang	c5accd1486	More accurately describe field invariants in OperatorEntry (#25793 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25793 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17259049 Pulled By: ezyang fbshipit-source-id: 03bf2f28bfd584250ae8feddf4933522ea331b0b	2019-09-09 09:02:57 -07:00
Edward Yang	0eacd3cc5c	Upgrade NVIDIA driver on CI to 430.40 (#24242 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24242 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17162837 Pulled By: ezyang fbshipit-source-id: 7bfa92eb151d13fd60cb525475056b363d1254f9	2019-09-09 08:59:45 -07:00
XiagenFeng	d1496183f5	Fix cuDnn build error with CC3.0 platform(#25820 ) (#25825 ) Summary: __ldg is only available for CC3.5 and above, add default implementation for CC3.0 platform. This PR along with jcooky's PR of `ecdf4564d4`. make the pytorch master HEAD build and runs properly for CC3.0 platform(such as Retina MacBook Pro of Late 2013). I test the mnist example from pytorch/examples with the wheel built, the test accuracy ends with 99% after 10 Epochs with GT 750M CC3.0 platform. CC3.0 platform decrease training time into about 1/5 of its cpu counterpart. ``` (pytorch) SamuelFdeMBP:mnist sfeng$ pip list \| grep torch pytorch-sphinx-theme 0.0.24 /Users/sfeng/GH/pytorch_110/docs/src/pytorch-sphinx-theme torch 1.3.0a0+a332583 torchvision 0.5.0a0+0bd7080 (pytorch) SamuelFdeMBP:mnist sfeng$ date && time python main.py && date 日 9 8 07:17:38 CST 2019 /Users/sfeng/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/cuda/__init__.py:132: UserWarning: Found GPU0 GeForce GT 750M which is of cuda capability 3.0. PyTorch no longer supports this GPU because it is too old. The minimum cuda capability that we support is 3.5. warnings.warn(old_gpu_warn % (d, name, major, capability[1])) Train Epoch: 1 [0/60000 (0%)] Loss: 2.300039 ...... Train Epoch: 10 [59520/60000 (99%)] Loss: 0.007440 Test set: Average loss: 0.0322, Accuracy: 9895/10000 (99%) real 2m39.962s user 4m13.625s sys 0m9.672s 日 9 8 07:20:18 CST 2019 (pytorch) SamuelFdeMBP:mnist sfeng$ date && time python main.py --no-cuda && date 日 9 8 07:20:40 CST 2019 Train Epoch: 1 [0/60000 (0%)] Loss: 2.300039 Train Epoch: 1 [640/60000 (1%)] Loss: 2.213470 Train Epoch: 1 [1280/60000 (2%)] Loss: 2.170460 ...... Train Epoch: 10 [58880/60000 (98%)] Loss: 0.005681 Train Epoch: 10 [59520/60000 (99%)] Loss: 0.007686 Test set: Average loss: 0.0319, Accuracy: 9894/10000 (99%) real 12m6.604s user 75m53.129s sys 3m41.744s 日 9 8 07:32:47 CST 2019 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/25825 Differential Revision: D17252176 Pulled By: soumith fbshipit-source-id: 70bf84ae6380be86b56344b161a52fb06a53a1b2	2019-09-09 08:23:28 -07:00
Theo	4fac61a886	Fix typing on nn.Parameter (#25586 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/25399 As per https://github.com/pytorch/pytorch/issues/25580 I'm pushing this to test my changes on the CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/25586 Differential Revision: D17259178 Pulled By: ezyang fbshipit-source-id: d48cdd602bfda60c213f79a4f124df54a68ca698	2019-09-09 07:54:27 -07:00
Edward Yang	f70ef229ce	Back out "[Caffe2] Fix device_option propagation" Summary: Original commit changeset: 916551b93346 Test Plan: none Reviewed By: nairbv Differential Revision: D17259017 fbshipit-source-id: f6e961e88c01126393ed2b6be0adeb6fcc68cb3c	2019-09-09 07:22:42 -07:00
Edward Yang	97b432bdf0	Back out "[pytorch][PR] remove tools/setup_helpers/cudnn.py" Summary: Original commit changeset: abd9cd0244ca (Note: this ignores all push blocking failures!) Test Plan: none Reviewed By: nairbv Differential Revision: D17259003 fbshipit-source-id: d7e067eeb36192766c639bfcbc66f540ce8eb77e	2019-09-09 06:47:45 -07:00
Ralf Gommers	4299faa10b	Fix invalid function cast warnings that show up with GCC 8/9 (#25483 ) Summary: Fixes ~5000 lines of warnings like: ``` In file included from ../aten/src/TH/TH.h:4, from ../torch/csrc/Storage.cpp:11: ../torch/csrc/Storage.h:6:39: warning: cast between incompatible function types from ‘PyObject* ()(THPStorage)’ {aka ‘_object* ()(THPStorage)’} to ‘getter’ {aka ‘_object* ()(_object, void)’} [-Wcast-function-type] 6 \| #define THPStorage_(NAME) TH_CONCAT_4(THP,Real,Storage_,NAME) \| ^~~ caffe2/aten/src/TH/THGeneral.h:154:37: note: in definition of macro ‘TH_CONCAT_4_EXPAND’ 154 \| #define TH_CONCAT_4_EXPAND(x,y,z,w) x ## y ## z ## w \| ^ ../torch/csrc/Storage.h:6:27: note: in expansion of macro ‘TH_CONCAT_4’ 6 \| #define THPStorage_(NAME) TH_CONCAT_4(THP,Real,Storage_,NAME) \| ^~~~~~~~~~~ ../torch/csrc/generic/Storage.cpp:299:22: note: in expansion of macro ‘THPStorage_’ 299 \| {"device", (getter)THPStorage_(device), nullptr, nullptr, nullptr}, \| ^~~~~~~~~~~ ../torch/csrc/Storage.h:6:39: warning: cast between incompatible function types from ‘PyObject ()(THPStorage)’ {aka ‘_object* ()(THPStorage)’} to ‘getter’ {aka ‘_object* ()(_object, void*)’} [-Wcast-function-type] 6 \| #define THPStorage_(NAME) TH_CONCAT_4(THP,Real,Storage_,NAME) \| ^~~ caffe2/aten/src/TH/THGeneral.h:154:37: note: in definition of macro ‘TH_CONCAT_4_EXPAND’ 154 \| #define TH_CONCAT_4_EXPAND(x,y,z,w) x ## y ## z ## w \| ^ ../torch/csrc/Storage.h:6:27: note: in expansion of macro ‘TH_CONCAT_4’ 6 \| #define THPStorage_(NAME) TH_CONCAT_4(THP,Real,Storage_,NAME) \| ^~~~~~~~~~~ ``` This issue and the fix is very similar to how CPython fixed it, see https://bugs.python.org/issue33012. There's still more of these warnings left, but this fixes the majority of them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25483 Differential Revision: D17149824 Pulled By: ezyang fbshipit-source-id: 353560a4f76070fa7482608e9532b60205d16798	2019-09-09 06:35:11 -07:00
Pieter Noordhuis	bf4a28175d	Retry connecting to TCP store on ECONNRESET (#25707 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25707 The retry logic dealt with ECONNREFUSED to deal with the client being started before the server. It didn't yet deal with the server being started but having its listen backlog exhausted. This may happen when starting many processes that all try to connect at the same time. The server implementation uses blocking I/O to read and write entire messages, so it may take a bit longer to call `accept(2)` on new connections compared to a fully event driven approach. This commit both increases the default listen backlog on the server side and implements retries on ECONNRESET after `connect(2)`. Test Plan: Imported from OSS Differential Revision: D17226958 Pulled By: pietern fbshipit-source-id: 877a7758b29286e06039f31b5c900de094aa3100	2019-09-09 02:54:20 -07:00
Soumith Chintala	73855ecd43	fix cudnn static linkage (#25848 ) Summary: Fix regression caused by https://github.com/pytorch/pytorch/pull/24938 This fixes CUDA nightly breakages Pull Request resolved: https://github.com/pytorch/pytorch/pull/25848 Differential Revision: D17256348 Pulled By: soumith fbshipit-source-id: dded577717947d0f092e9d76b423b2bc7c56070a	2019-09-08 21:41:57 -07:00
Richard Zou	74fa53995d	Fix assertion if NamedTensorMeta's num_names != tensor.dim (#25778 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25778 I don't know how this ever compiled, it was caught by an internal test. Do we not set DEBUG when compiling in debug mode in OSS? Test Plan - [namedtensor ci] Test Plan: Imported from OSS Differential Revision: D17228393 Pulled By: zou3519 fbshipit-source-id: 441ad716a369ee99be4723318cf78e394f98becf	2019-09-08 13:49:20 -07:00
Richard Zou	294cf096bf	Name inference for unbind (#25585 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25585 Test Plan: - new tests [namedtensor ci] Pull Request resolved: https://github.com/pytorch/pytorch/pull/25585 Differential Revision: D17185070 Pulled By: zou3519 fbshipit-source-id: 85512b194f5b7c62a00aa81d048b5351e098bdb0	2019-09-08 11:35:58 -07:00
Zhang Zhi	d7a1152ee9	Fix error message stack overflow (#25146 ) Summary: When the given input size is larger than expected, `weight_sizes` is `k`-length but only has `weight_dim` numbers. And it causes the confusing error message: ``` RuntimeError: Expected 4-dimensional input for 4-dimensional weight 256 5 3 3 3987964488216321853 94670871813000, but got 6-dimensional input of size [1, 61, 1, 5, 64, 64] instead ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/25146 Differential Revision: D17233651 Pulled By: soumith fbshipit-source-id: c6ddfa45e854f9b95ca253052f8bc358e35fd9d4	2019-09-07 22:47:55 -07:00
Ashkan Aliabadi	825f4714f9	Fork QNNPACK into aten/src/ATen/native/quantized/cpu/qnnpack (#25500 ) Summary: The motivation for this move, and our long-term commitment to maintaining and integrating this code into ATen is described in the issue below: https://github.com/pytorch/pytorch/issues/25621 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25500 Test Plan: QNNPack unit tests, as follows: OSS: x86: mkdir build; cd build; cmake ..; make all -j16 && make test All 26 unit tests pass, both when built with ADD_DEFINITIONS(-DPYTORCH_QNNPACK_RUNTIME_QUANTIZATION=0) and ADD_DEFINITIONS(-DPYTORCH_QNNPACK_RUNTIME_QUANTIZATION=1) ARM: Make sure you have an android device available to adb either through one world or directly connected. To compile and push do $> adb shell mkdir /data/qnnpack && ./scripts/build-android-arm64.sh && adb push ./build/android/arm64-v8a/*-test /data/qnnpack To execute tests, first $> adb shell to login into the device, then run all the tests by $> for t in $(ls /data/qnnpack); do /data/qnnpack/$t; done Repeat the exact same process with ADD_DEFINITIONS(-DPYTORCH_QNNPACK_RUNTIME_QUANTIZATION=0), and ADD_DEFINITIONS(-DPYTORCH_QNNPACK_RUNTIME_QUANTIZATION=1) Repeat the exact same process with ./scripts/build-android-armv7.sh for AARCH32. Reviewed By: ljk53 Differential Revision: D17194732 Pulled By: AshkanAliabadi fbshipit-source-id: 9e627338ebd63aa917a36b717618c0643ccf40c8	2019-09-07 16:45:30 -07:00
James Reed	45bfa6a5c6	Fix missing newline in compiled from source range highlihgt (#25802 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25802 Test script ``` import torch def foo(x, y): x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y return x scripted = torch.jit.script(foo) scripted.save('foo.zip') loaded = torch.jit.load('foo.zip') loaded(torch.rand(3, 4), torch.rand(4, 5)) ``` Before this change ``` RuntimeError: The size of tensor a (4) must match the size of tensor b (5) at non-singleton dimension 1 The above operation failed in interpreter, with the following stack trace: at code/__torch__.py:7:9 op_version_set = 1 class PlaceholderModule(Module): __parameters__ = [] def forward(self: __torch__.PlaceholderModule, x: Tensor, y: Tensor) -> Tensor: x0 = torch.add(x, y, alpha=1) ~~~~~~~~~ <--- HERE x1 = torch.add(x0, y, alpha=1) x2 = torch.add(x1, y, alpha=1) x3 = torch.add(x2, y, alpha=1) x4 = torch.add(x3, y, alpha=1) x5 = torch.add(x4, y, alpha=1) x6 = torch.add(x5, y, alpha=1) x7 = torch.add(x6, y, alpha=1) x8 = torch.add(x7, y, alpha=1) x9 = torch.add(x8, y, alpha=1)Compiled from code at /home/jamesreed/print_test.py:5:8 def foo(x, y): x = x + y ~~~~~ <--- HERE x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y ``` After this change ``` RuntimeError: The size of tensor a (4) must match the size of tensor b (5) at non-singleton dimension 1 The above operation failed in interpreter, with the following stack trace: at code/__torch__.py:7:9 op_version_set = 1 class PlaceholderModule(Module): __parameters__ = [] def forward(self: __torch__.PlaceholderModule, x: Tensor, y: Tensor) -> Tensor: x0 = torch.add(x, y, alpha=1) ~~~~~~~~~ <--- HERE x1 = torch.add(x0, y, alpha=1) x2 = torch.add(x1, y, alpha=1) x3 = torch.add(x2, y, alpha=1) x4 = torch.add(x3, y, alpha=1) x5 = torch.add(x4, y, alpha=1) x6 = torch.add(x5, y, alpha=1) x7 = torch.add(x6, y, alpha=1) x8 = torch.add(x7, y, alpha=1) x9 = torch.add(x8, y, alpha=1) Compiled from code at /home/jamesreed/print_test.py:5:8 def foo(x, y): x = x + y ~~~~~ <--- HERE x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y x = x + y ``` Test Plan: Imported from OSS Differential Revision: D17250599 Pulled By: jamesr66a fbshipit-source-id: 56266dcbf2c2287dc8ced7b9463ed42ef5f1167c	2019-09-07 14:38:53 -07:00
Huamin Li	1c81d9006a	increase input shape to reduce variance (#25812 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25812 as title Test Plan: ``` [huaminli@devvm2388.ftw3 ~/fbsource/fbcode] buck run mode/dev-nosan caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operators None --iterations 3 ``` last few lines of the output P109238440 Reviewed By: mingzhe09088 Differential Revision: D17246792 fbshipit-source-id: d93ee5f404164d32210968997c6ea63b82058d2a	2019-09-07 06:25:26 -07:00
Richard Zou	a332583c59	Quick fixes for named tensor for windows (#25728 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25728 Two quick fixes: 1) windows doesn't seem to like std::locale, so that got removed. 2) at::empty should call the non-named-tensor overload if the tensor doesn't have names to avoid re-dispatching. In the long term we'll merge the at::empty names and no-names overloads. Test Plan - [namedtensor ci], but the windows thing isn't easy to test without running BUILD_NAMEDTENSOR=1 on windows. Test Plan: Imported from OSS Differential Revision: D17212059 Pulled By: zou3519 fbshipit-source-id: 58da5ab96d53c4844237ca10fa1b2de4b1052a0c	2019-09-06 21:59:20 -07:00
Richard Zou	6257c8d634	Add flatten for named tensors. (#25672 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25672 There are three overloads: 1) flatten(tensor, int start_dim, int end_dim, Dimname out_dim) 2) flatten(tensor, Dimname start_dim, Dimname end_dim, Dimname out_dim) 3) flatten(tensor, DimnameList dims, Dimname out_dim) `flatten` joins all the dimensions between start_dim and end_dim into one dimension. The name of the output dimension is specified by `out_dim`. In the case where flatten takes a list of `dims` to flatten, all the dimensions in `dims` must be in consecutive order. Test Plan: - new tests [namedtensor ci] Differential Revision: D17192656 Pulled By: zou3519 fbshipit-source-id: 55d2b23358bd77cbef299f66701a8da8cd194f4f	2019-09-06 21:16:44 -07:00
James Reed	bc6eec1db8	Factor unnecesary work out of add inner loop (#25751 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25751 This PR does several things: 1) Factor unnecessary scale inversion out of quantize function in the inner loop. This saves cycles in the inner kernel (unfortunately the compiler couldn't hoist it out automatically for some reason) 2) Use FMA in the dequantize routine when possible. This also necessitates having the user pass in a pre-multiplied (scale * -zero_point) vector. Benchmark Script ``` import torch import time x = torch.rand(1, 256, 56, 56) y = torch.rand(1, 256, 56, 56) print('dtype', 'ms/iter (float)', 'ms/iter (quant)', 'quant / float', sep='\t') for dtype in [torch.quint8, torch.qint8, torch.qint32]: qX = torch.quantize_linear(x, 0.1, 5, dtype).permute([0, 3, 1, 2]) qY = torch.quantize_linear(y, 0.1, 5, dtype).permute([0, 3, 1, 2]) _x = x.permute([0, 3, 1, 2]) _y = y.permute([0, 3, 1, 2]) NITER = 10000 # Test float s = time.time() for i in range(NITER): _x + _y elapsed_float = time.time() - s ms_per_iter_float = elapsed_float / NITER * 1000 # Test quantized s = time.time() for i in range(NITER): torch.ops.quantized.add(qX, qY, 0.1, 5) elapsed = time.time() - s ms_per_iter = elapsed / NITER * 1000 print(str(dtype), ms_per_iter_float, ms_per_iter, ms_per_iter / ms_per_iter_float, sep='\t') print('float gbps', 'quant gbps', sep='\t') print((x.numel() + 2 * y.numel()) * x.element_size() / ms_per_iter_float / 1e6, (qX.numel() + 2 * qX.numel()) * qX.element_size() / ms_per_iter / 1e6, sep = '\t') ``` Before this change ``` dtype ms/iter (float) ms/iter (quant) quant / float torch.quint8 0.47297704219818115 0.1909616231918335 0.403743958278252 float gbps quant gbps 20.368413560257675 12.612209509659206 torch.qint8 0.4638909578323364 0.18829500675201416 0.40590359344764254 float gbps quant gbps 20.767363185988053 12.79082245219568 torch.qint32 0.4605833768844605 4.219791603088379 9.161840862847583 float gbps quant gbps 20.916499560114787 2.2830018413585225 ``` After this change ``` dtype ms/iter (float) ms/iter (quant) quant / float torch.quint8 0.465389084815979 0.1516613483428955 0.3258807593282176 float gbps quant gbps 20.70051128038237 15.880433784319726 torch.qint8 0.4630591154098511 0.15664465427398683 0.3382821956443757 float gbps quant gbps 20.804669812996085 15.375232631861083 torch.qint32 0.4726278781890869 4.103795266151429 8.682931023610927 float gbps quant gbps 20.38346116380751 2.347532314650444 ``` Test Plan: Imported from OSS Differential Revision: D17222302 Pulled By: jamesr66a fbshipit-source-id: fffc819f565dfd3b85fb6496c7c6635ec2c237a4	2019-09-06 19:27:43 -07:00
James Reed	03d4198a67	Use more efficient specialized Quantize routine (#25731 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25731 I didn't notice this before, but the QuantizeAvx2 routine was requantizing only a single vector of 8 floats into 1/4 of a 256-bit int8 register. This switches it to use a specialization that goes from 4 float vectors into a whole int8 vector, borrowed from C2 Test Plan: Imported from OSS Differential Revision: D17214413 Pulled By: jamesr66a fbshipit-source-id: 1d6fc556e43739e9a4b0dba5df2332beb1b3795b	2019-09-06 19:27:39 -07:00
Andrey Malevich	bd0e564d40	Fix device_option propagation (#25203 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25203 device_option propagation is completely broken in Caffe2 for cases when pass through operators are used. As an example Gather operator don't have gradient and passes through it's inputs, which results in incorrect detection of the components for sparse parameter aggregation (component will be empty instead of the real device). This diff is trying to fix this issue. Test Plan: net_transform is finally working with Gather + FloatToHalf transformed model instead of failing because of incorrect number of components. Reviewed By: dzhulgakov Differential Revision: D16936041 fbshipit-source-id: 916551b933469f04e32ddf86ec4b2c07f76c9176	2019-09-06 19:05:04 -07:00
Shihao Xu	a9e56c2e68	Make Python RPC handler does not hold module in global variable (#25458 ) Summary: # Problem ProcessGroupAgent used in test_rpc has SIGSEGV on exiting. # Solution It was because the python module was unpexceted destructed twice. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25458 Test Plan: Run prototype tests on top of this diff. Differential Revision: D17127093 Pulled By: xush6528 fbshipit-source-id: 4b86cd8465e8cca6fce1c163e78160a2386fa9c3	2019-09-06 17:35:21 -07:00
Raghuraman Krishnamoorthi	17c1b2c715	Relax scale to prevent saturation in conv/linear. Add test to verify precision of numerics of quantized model with updated observer. This test catches errors in (#25667 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25667 Relax scale and zero-point for activations to ensure that fbgemm implementations of conv and linear do not saturate due to 16 bit intermediate accumulation. Add test to verify precision of numerics of quantized model with updated observer. This test catches errors in handling layouts for quantized ops in addition to saturation/quantization errors. ghstack-source-id: 89587942 Test Plan: buck test caffe2/test:quantized -- 'test_float_quant_compare $test_quantized_models\.ModelNumerics$' --print-passing-details Passes when SQNR > 35 dB buck test caffe2/test:quantization -- 'test_minmax_observer $test_quantization\.ObserverTest$' --print-passing-details Passes with additional coverage for observer changes Differential Revision: D17140498 fbshipit-source-id: 42c58e726bb0b0f51890590ee2525428f9a8d24e	2019-09-06 17:18:01 -07:00
neginraoof	5d7fff5d03	Fixed nondeterministic RG for ORT RNN tests (#25205 ) Summary: Relaxing tolerance for ORT RNN tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/25205 Reviewed By: BIT-silence Differential Revision: D17238862 Pulled By: houseroad fbshipit-source-id: 8d55b23a6a5c7edfe5998592ddc51e0ae2c5bbf7	2019-09-06 16:35:43 -07:00
Lu Fang	75cac0fe69	expose parse_schema and __eq__ function to python and add round trip tests (#23208 ) Summary: expose necessary functions to python, and add round-way tests for function schema str() and parsing functions. We iterate over all the registered function schemas and get the string, then parse the string. We compare the schema generated from parsing with the original one, and make sure they are equal. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23208 ghstack-source-id: 89638026 Test Plan: buck test //caffe2/test:function_schema Reviewed By: zrphercule Differential Revision: D16435471 fbshipit-source-id: 6961ab096335eb88a96b132575996c24090fd4c0	2019-09-06 15:50:56 -07:00
Richard Zou	f2f804dccc	Move BUILD_NAMEDTENSOR in NamedTensorUtils.h (#25781 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25781 To prepare for removing the BUILD_NAMEDTENSOR flag, I am attempting to remove BUILD_NAMEDTENSOR out of header areas. Test Plan: - [namedtensor ci] - Tested building locally with USE_STATIC_DISPATCH=1. Previously, in https://github.com/pytorch/pytorch/pull/25721, this change had caused a dependency cycle while building with that on. Differential Revision: D17229490 Pulled By: zou3519 fbshipit-source-id: 22fbd5e2770374ab321c13542fa321a2bf7d3101	2019-09-06 15:33:20 -07:00
Will Feng	2fe8341aac	Map module options between Python and C++ in API parity test (#25784 ) Summary: `torch.nn` modules in Python save their kwarg options directly as module object attributes, while `torch::nn` modules in C++ save their options inside the `options` field of the module object. This PR tries to map between these two (by using the newly added `options_args` list to discover options arguments in Python module), to make sure options equivalence is properly checked. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25784 Differential Revision: D17238609 Pulled By: yf225 fbshipit-source-id: 2febd277ddcbe3ab458ac3feaaf93e4c94bb5b98	2019-09-06 15:30:36 -07:00
Tzu-Wei Huang	c5a0de23e2	Fix empty graph problem (#25599 ) Summary: This fixes the empty graph problem since pytorch 1.2 To prevent such things happen, we have to make the test harder. There 3 levels of verification. lv 1. make sure that the graph is saved to some event file. <--currently here lv 2. make sure the file can be read by tensorboard. lv 3. make sure the graph in tensorboard is human-friendly. I think (3) must be involved by a human. (2) is possible, but it will be useless if we want to use lv 3 directly. cc orionr Pull Request resolved: https://github.com/pytorch/pytorch/pull/25599 Reviewed By: sanekmelnikov Differential Revision: D17229276 Pulled By: orionr fbshipit-source-id: b39f2f1805ee0b3a456b2c69d97e6e3622f5220e	2019-09-06 14:24:28 -07:00
Shihao Xu	c9e8dcb706	Change worker name constrant (#25780 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25780 support "trainer:0", "server:1" format Test Plan: # Unit tests ``` buck test mode/dev-nosan caffe2/test:rpc ``` Differential Revision: D17228907 fbshipit-source-id: a6e759f4364548454ab0f2907707e738997bbf38	2019-09-06 13:53:50 -07:00
Edward Yang	2bb166edb4	Revert D17228224: [pytorch][PR] add torch.nn.Identity to __init__.pyi.in Test Plan: revert-hammer Differential Revision: D17228224 Original commit changeset: a8d36240892b fbshipit-source-id: a21002b55305f2a03f1f4ba44a7cff6cb9f66c51	2019-09-06 13:15:05 -07:00
François Darmon	ec3793362f	Documentation change of torch.where (#25554 ) Summary: Change the doc of torch.where. The parameters are x and y instead of input and other Pull Request resolved: https://github.com/pytorch/pytorch/pull/25554 Differential Revision: D17227193 Pulled By: soumith fbshipit-source-id: 96d8a6f60ae8e788648247320ae715d0058de2b4	2019-09-06 12:55:16 -07:00
J M Dieterich	748436a514	Enable BLIS from the FLAME project as a BLAS choice. (#23819 ) Summary: BLIS is AMD's official recommendation for BLAS. Mimicks my ancient `f5bc78263e` in cmake upstream BLIS WWW: https://github.com/flame/blis Pull Request resolved: https://github.com/pytorch/pytorch/pull/23819 Differential Revision: D17231360 Pulled By: bddppq fbshipit-source-id: 68db70d63e410438f99b2bf57986b81ff6b6c5b3	2019-09-06 12:00:25 -07:00
Richard Zou	7970e5720b	Rename tensor.view_names -> tensor.renamed (#25711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25711 This function renames the dimensions of a tensor out-of-place. Because of that, I think `tensor.renamed(...)` is a clearer name: `view_names` has the connotation that we can use names to `view` our tensors with a "different shape", but what this function really does is let us rename a tensor no matter the previous names. `tensor.names_`, the in-place version of this, is unchanged for now. However, we might delete this or not advertise it if it has no use case and also because its naming is a little inconsistent with `tensor.renamed`. Test Plan: - [namedtensor ci] Differential Revision: D17206515 Pulled By: zou3519 fbshipit-source-id: 67053951fcc8130c84566b5ebbdce35ef619c90d	2019-09-06 11:28:04 -07:00
Jerry Zhang	3c6009e6f1	derandomize hypothesis tests (#25513 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25513 Randomized tests are flaky, this PR derandomized some of them Test Plan: python test/test_fake_quant.py python test/test_quantized_nn_mods.py Imported from OSS Differential Revision: D17221273 fbshipit-source-id: f6978704ba0139071c26f443e923955a2f849832	2019-09-06 10:53:05 -07:00
Andrew Krieger	a41ff31702	Correctly gate __CUDA_ARCH__ with defined() (#25729 ) Summary: Undefined preprocessor macros being evaluated cause errors on some compilers/configs. There is an ungated define in caffe2 which is inconsistent with the rest of the file and should be fixed anyway because it's causing issues in ovrsource. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25729 Test Plan: contbuilds Differential Revision: D17211552 Pulled By: akrieger fbshipit-source-id: 499b123894b255f37ff68079c4ba3650b1599a5c	2019-09-06 09:42:15 -07:00
Yuma Hiramatsu	511d1875c5	add torch.nn.Identity to __init__.pyi.in (#25777 ) Summary: I fixed https://github.com/pytorch/pytorch/issues/25694. Check it, please. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25777 Differential Revision: D17228224 Pulled By: ezyang fbshipit-source-id: a8d36240892bcb7e669b8dce38419ff3fc9e9afd	2019-09-06 09:27:49 -07:00
Will Feng	5e372862dc	Use `constructor` in test_params for C++ API parity test (#25749 ) Summary: This PR changes the C++ API parity test script so that `test_params` such as the following is understood: `88e4cee3e7/test/common_nn.py (L2194-L2200)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/25749 Differential Revision: D17227867 Pulled By: yf225 fbshipit-source-id: 03a8e17d233931ba0b38f75e9b75b0c09b98ed08	2019-09-06 08:57:40 -07:00
Jiakai Liu	67c530851c	get rid of protobuf dependencies (#25650 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25650 This PR removes protobuf dependencies from mobile build altogether: - caffe2/proto: protobuf files, including caffe2.proto and torch.proto; - caffe2 components that depend on caffe2.proto, including most part of caffe2/core, caffe2/utils; - libprotobuf / libprotobuf-lite dependencies; - protobuf compiler; - some utils class, e.g.: netdef_converter.cpp; - introduce a macro to disable third_party/onnx which depends on protobuf; Test Plan: - builds; - link with demo app to make sure it can load and run a model in pickle format; Differential Revision: D17183548 Pulled By: ljk53 fbshipit-source-id: fe60b48674f29c4a9b58fd1cf8ece44191491531	2019-09-06 08:48:20 -07:00
Supriya Rao	9d2d31e626	Store bias in PackedLinearWeight struct in fbgemm (#25428 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25428 Added bias as an optional param to the quantized_linear_prepack function. Bias is quantized during runtime using input scale and weight scale. ghstack-source-id: 89601399 Test Plan: python test/run_test.py --exclude nn --verbose --bring-to-front quantization quantized quantized_tensor quantized_nn_mods quantizer Differential Revision: D17121304 fbshipit-source-id: 8adb0e55e4aed0a5430aaa2c8639c8ad1639c85a	2019-09-06 08:37:34 -07:00
Jiakai Liu	4c7189d0f4	fix OSS mobile CI (#25755 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25755 PR #25721 breaks mobile CI (with USE_STATIC_DISPATCH=1) due to circular header dependency. Move 'ATen/core/Tensor.h' back into '#ifdef BUILD_NAMEDTENSOR' to work around the CI issue. Test Plan: - build android library locally Differential Revision: D17223997 Pulled By: ljk53 fbshipit-source-id: d8b5fd26e332953f1b592758fc76947ea2af94dc	2019-09-06 08:12:28 -07:00
Pieter Noordhuis	3e843115c0	Use whitelist instead of blacklist for USE_DISTRIBUTED (#25759 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25759 In #25260, USE_DISTRIBUTED was defaulted to OFF for Windows and macOS only. The Android builds didn't run for the PR and started to fail when it was merged to master. It turns out the mobile builds explicitly disable USE_DISTRIBUTED but only after the USE_DISTRIBUTED option, and derivative dependent options were defined. The result being that USE_GLOO was enabled while USE_DISTRIBUTED was disabled. This commit ensures that USE_DISTRIBUTED defaults to OFF unless the build is for a supported platform. ghstack-source-id: 89613698 Test Plan: N/A Differential Revision: D17224842 fbshipit-source-id: 459039b79ad5240e81dfa3caf486858d6e77ba4b	2019-09-06 07:53:44 -07:00
Hong Xu	66ac6698f6	remove tools/setup_helpers/cudnn.py (#25482 ) Summary: FindCUDNN.cmake and cuda.cmake have done the detection. This commit deletes `tools/setup_helpers/cudnn.py` as it is no longer needed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25482 Differential Revision: D17226408 Pulled By: ezyang fbshipit-source-id: abd9cd0244cabea1f5d9f93f828d632d77c8dd5e	2019-09-06 06:54:35 -07:00
Summer Deng	d95763b4dc	Enable loading int8 prepacked models in PredictorContainer Summary: To test the int8 ads models on CPU and accelerators with the ads replayer, we need to load the PREPACKING_INIT_NET_TYPE in the int8 model to initialize the int8 w_packed blobs. Test Plan: Ads replayer test. P74811059 Reviewed By: zrphercule Differential Revision: D16518888 fbshipit-source-id: cee212710ad37d9e491c970b25b2fe484373e5e4	2019-09-06 02:53:52 -07:00
Hong Xu	cc4211069e	Do not pass down USE_GLOO_IBVERBS to CMake (#25720 ) Summary: It doesn't seem to be used anywhere once down to CMake in this repo or any submodules Pull Request resolved: https://github.com/pytorch/pytorch/pull/25720 Differential Revision: D17225088 Pulled By: pietern fbshipit-source-id: a24b080e6346a203b345e2b834fe095e3b9aece0	2019-09-06 02:40:42 -07:00
Cameron Long	d47ced49ad	Adds a -m flag to pytorch.distributed.launch (#24910 ) Summary: Adds a '-m' flag to torch.distributed.launch that allows users to launch python modules using launch instead of specifying the full file path. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24910 Differential Revision: D17221653 Pulled By: pietern fbshipit-source-id: 5c6453ed266fd121103b11caab303e3f9404227d	2019-09-06 01:13:44 -07:00
Jiakai Liu	2bed201190	remove caffe2.pb.h dependency for embedding_lookup_idx.cc (#25670 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25670 This is part of the effort to get rid of protobuf dependency for libtorch mobile build. embedding_lookup_idx.cc is used by ATen/EmbeddingBag.cpp. It indirectly includes caffe2.pb.h but doesn't really need it. Clean up the headers to unblock no-protobuf mobile build. The broader problem is that many common headers in pytorch/caffe2 directly or indirectly include caffe2.pb.h. After landing the stack of changes to remove protobuf from OSS libtorch mobile build, it's going to constraint how ATen and other parts of pytorch use caffe2 components: it will break OSS mobile CI if a PR introduces a dependency to a caffe2 file that indirectly includes caffe2.pb.h. We will need to tease out caffe2.pb.h dependencies like in this diff, or do a refactor to replace protobuf generated types. Chatted with gchanan and ezyang to confirm that there is no plan to add more dependencies to caffe2 components from ATen in near future, so this should be fine. Test Plan: - build locally with stacked diffs Differential Revision: D17191913 Pulled By: ljk53 fbshipit-source-id: 1248fe6424060a8bedcf20e73942b7500ae5e815	2019-09-06 00:54:36 -07:00
Lu Fang	a6fb6e1fb3	Expose an API to iterate all the registered operators (#23207 ) Summary: So we can iterate over the operator registry, and check the backward compatiblity. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23207 ghstack-source-id: 89570438 Test Plan: ci and the round trip tests added in the last diff Reviewed By: zrphercule Differential Revision: D16434335 fbshipit-source-id: 86a66d746a1f122a8aafe39e936606d6ba7ef362	2019-09-05 21:47:44 -07:00
James Reed	21ba9b3c6d	Copy quantize routine to vec256 (#25685 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25685 This saves a bunch of dynamic linking/function call overhead Benchmark script ``` import torch import time x = torch.rand(1, 256, 56, 56) y = torch.rand(1, 256, 56, 56) print('dtype', 'ms/iter (float)', 'ms/iter (quant)', 'quant / float', sep='\t') for dtype in [torch.quint8, torch.qint8, torch.qint32]: qX = torch.quantize_linear(x, 0.1, 5, dtype).permute([0, 3, 1, 2]) qY = torch.quantize_linear(y, 0.1, 5, dtype).permute([0, 3, 1, 2]) _x = x.permute([0, 3, 1, 2]) _y = y.permute([0, 3, 1, 2]) NITER = 1000 # Test float s = time.time() for i in range(NITER): _x + _y elapsed_float = time.time() - s ms_per_iter_float = elapsed_float / NITER * 1000 # Test quantized s = time.time() for i in range(NITER): torch.ops.quantized.add(qX, qY, 0.1, 5) elapsed = time.time() - s ms_per_iter = elapsed / NITER * 1000 print(str(dtype), ms_per_iter_float, ms_per_iter, ms_per_iter / ms_per_iter_float, sep='\t') ``` Before this change (DynDisp to AVX2) ``` dtype ms/iter (float) ms/iter (quant) quant / float torch.quint8 0.47539472579956055 0.5174136161804199 1.0883873717996941 torch.qint8 0.46573758125305176 0.5322310924530029 1.1427703365080666 torch.qint32 0.47144651412963867 4.043398380279541 8.576579228174513 ``` After this change (DynDisp to AVX2) ``` dtype ms/iter (float) ms/iter (quant) quant / float torch.quint8 0.48140883445739746 0.3396260738372803 0.705483675263412 torch.qint8 0.4651052951812744 0.3467671871185303 0.7455670591395397 torch.qint32 0.4986207485198975 4.015796899795532 8.053810259031533 ``` Test Plan: Imported from OSS Differential Revision: D17199438 Pulled By: jamesr66a fbshipit-source-id: d518500c2b5f4e3a202d9ebc2a5862b4062ef118	2019-09-05 21:43:32 -07:00
James Reed	f7bcba33a6	Vectorized specialization of max_pool2d for channels-last layout (#25676 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25676 This PR achieves two things: 1) Ensures the channels-last layout is propagated through the operator if we receive an input in that layout. This helps to alleviate unnecessary data movement in, e.g. ResNet inference 2) Applies interleaved vectorization along the channel dimension in the kernel. This allows us to use the functional units on the CPU much more effectively. Benchmark script ``` import torch, time for dtype in [torch.qint8, torch.quint8, torch.qint32]: print('**', str(dtype), '**') x = torch.rand(1, 56, 56, 256) q_x = torch.quantize_linear(x, 0.5, 1, dtype) q_x = q_x.permute([0, 3, 1, 2]) x = x.permute([0, 3, 1, 2]) NITER = 100 s = time.time() for i in range(NITER): float_out = torch.max_pool2d(x, kernel_size=3, stride=None, padding=0, dilation=1) time_per_iter_float = (time.time() - s) / NITER s = time.time() for i in range(NITER): quant_out = torch.max_pool2d(q_x, kernel_size=3, stride=None, padding=0, dilation=1) time_per_iter_quant = (time.time() - s) / NITER ref_quantized = torch.quantize_linear(float_out, 0.5, 1, dtype) torch.testing.assert_allclose(ref_quantized.dequantize(), quant_out.dequantize()) print('time/iter ms (float)', 'time/iter ms (quant)', 'quant/float', sep='\t') print(time_per_iter_float 1000, time_per_iter_quant * 1000, time_per_iter_quant / time_per_iter_float, sep='\t') bytes_float = (x.numel() + float_out.numel()) * x.element_size() bytes_quant = (q_x.numel() + quant_out.numel()) * q_x.element_size() float_bw_gbps = bytes_float / time_per_iter_float / 1e9 quant_bw_gbps = bytes_quant / time_per_iter_quant / 1e9 print('GB/s float', 'GB/s quant', sep='\t') print(float_bw_gbps, quant_bw_gbps, sep='\t') ``` Before this change (DynDisp to AVX2) ``` ** torch.qint8 * time/iter ms (float) time/iter ms (quant) quant/float 5.197856426239014 1.2381434440612793 0.23820270175433766 GB/s float GB/s quant 0.6816348335661166 0.7153936841878243 torch.quint8 * time/iter ms (float) time/iter ms (quant) quant/float 5.14232873916626 1.1790156364440918 0.2292765974808621 GB/s float GB/s quant 0.6889952353715999 0.7512707826941549 torch.qint32 * time/iter ms (float) time/iter ms (quant) quant/float 4.918942451477051 3.401169776916504 0.6914432950715265 GB/s float GB/s quant 0.7202849057394649 1.041712185038912 ``` After this change (DynDisp to AVX2) ``` torch.qint8 * time/iter ms (float) time/iter ms (quant) quant/float 5.0574493408203125 0.018107891082763672 0.0035804394394243393 GB/s float GB/s quant 0.700558673203699 48.915690731270566 torch.quint8 * time/iter ms (float) time/iter ms (quant) quant/float 4.984829425811768 0.016908645629882812 0.0033920209069399163 GB/s float GB/s quant 0.7107645412406512 52.38503540665539 torch.qint32 *** time/iter ms (float) time/iter ms (quant) quant/float 4.973354339599609 0.13938188552856445 0.028025729922108406 GB/s float GB/s quant 0.7124044976624851 25.419658993448625 ``` Test Plan: Imported from OSS Differential Revision: D17196457 Pulled By: jamesr66a fbshipit-source-id: 614be60ed74bed5d0369c58cc450b430cfabe5fb	2019-09-05 21:43:28 -07:00
Wanchao Liang	ed64338297	Make tensor key in Dict works in serialization (#25442 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25442 This make the tensor key in dict works in serialzation by comparing the tensor keys TensorImpl address directly. Given that we just want to ensure the ordering be stable when iterating, it should be good enough, we will need careful consideration if we want to stick with python 3.7 insertion order Test Plan: Imported from OSS Differential Revision: D17216377 fbshipit-source-id: 80df17dc2fa9eddd73a66e3979d7f8d7934660c0	2019-09-05 20:20:17 -07:00
Igor Fedan	d939ee2d85	Migrate digamma and polygamma from the TH to Aten (CUDA) (#25662 ) Summary: Close https://github.com/pytorch/pytorch/issues/24550 Close https://github.com/pytorch/pytorch/issues/24612 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25662 Differential Revision: D17205206 Pulled By: ifedan fbshipit-source-id: 625602ff88940e11e4f7d63bb8950754427b4242	2019-09-05 19:49:33 -07:00
TortillasAlfred	38e4766349	Add CosineAnnealingWarmRestarts to optim documentation (#25421 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/20028. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25421 Differential Revision: D17221542 Pulled By: soumith fbshipit-source-id: 9c83c9ad6bf34ba59713c61485e4ef4b782a2792	2019-09-05 19:06:18 -07:00
Brian Vaughan	88e4cee3e7	Improve handling of mixed-type tensor operations (#22273 ) Summary: Improve handling of mixed-type tensor operations. This PR affects the arithmetic (add, sub, mul, and div) operators implemented via TensorIterator (so dense but not sparse tensor ops). For these operators, we will now promote to reasonable types where possible, following the rules defined in https://github.com/pytorch/pytorch/issues/9515, and error in cases where the cast would require floating point -> integral or non-boolean to boolean downcasts. The details of the promotion rules are described here: https://github.com/nairbv/pytorch/blob/promote_types_strict/docs/source/tensor_attributes.rst Some specific backwards incompatible examples: * now `int_tensor * float` will result in a float tensor, whereas previously the floating point operand was first cast to an int. Previously `torch.tensor(10) * 1.9` => `tensor(10)` because the 1.9 was downcast to `1`. Now the result will be the more intuitive `tensor(19)` * Now `int_tensor *= float` will error, since the floating point result of this operation can't be cast into the in-place integral type result. See more examples/detail in the original issue (https://github.com/pytorch/pytorch/issues/9515), in the above linked tensor_attributes.rst doc, or in the test_type_promotion.py tests added in this PR: https://github.com/nairbv/pytorch/blob/promote_types_strict/test/test_type_promotion.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/22273 Reviewed By: gchanan Differential Revision: D16582230 Pulled By: nairbv fbshipit-source-id: 4029cca891908cdbf4253e4513c617bba7306cb3	2019-09-05 18:26:09 -07:00
Johannes M Dieterich	9c5a899773	Enable jit fusion on ROCm (#22872 ) Summary: As of ROCm 2.6, we support hiprtc - the HIP runtime compilation API. Enable the jit fusion feature depending on the existence of such an API. This entails * new hipification rules for API_RTC * add hiprtc APIs to the shim loader * update cmake infrastructure to find the hiprtc library (it is part of the HIP package) * enabling of unit tests in the jit_fuser test set * special casing in resource strings for HIP - the typedefs CUDA requires would be redundant * for now disable the occupancy calculation we do not support yet and hard-code Thanks to t-vi for working with me on getting this integration done! Pull Request resolved: https://github.com/pytorch/pytorch/pull/22872 Differential Revision: D17207425 Pulled By: bddppq fbshipit-source-id: 93409f3051ad0ea06afacc2239fd6c402152debe	2019-09-05 18:22:08 -07:00
Elias Ellison	82c8949a9d	add __getitem__ to class types (#25664 ) Summary: Add magic method for `class_type[index]`. Since the compiler has custom logic for indexing this was not included with the other magic methods. Fix for https://github.com/pytorch/pytorch/issues/25637 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25664 Differential Revision: D17214996 Pulled By: eellison fbshipit-source-id: bf77f70851f6c3487147da710cc996624492a0c8	2019-09-05 17:19:15 -07:00
Richard Zou	76bc44fb30	Move most BUILD_NAMEDTENSOR macros out of header areas (#25721 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25721 Context: I am starting to work on removing the BUILD_NAMEDTENSOR flag. Here is the approach: - Move the macro out of header areas - Include a new `enable_namedtensor.h` header that does a `#ifndef BUILD_NAMEDTENSOR #define BUILD_NAMEDTENSOR`. - Include `enable_namedtensor.h` where necessary. This only really needs to happen in two files (c10/TensorImpl.h, ATen/Dimname.h). - Incrementally delete usages of the BUILD_NAMEDTENSOR macro later. The alternative is to straight up delete all instances of BUILD_NAMEDTENSOR. This alternative could be disruptive, lead to merge conflicts, and isn't incremental. Along with the above, some work needs to be done on feature flagging named tensors, and merging the namedtensor CI with the regular CI, and communicating with devs. This work will too be done incrementally. Test Plan - [namedtensor ci] Test Plan: Imported from OSS Differential Revision: D17210913 Pulled By: zou3519 fbshipit-source-id: c73f128b976bb90212639e8f2a3ad2a6a52b8e0c	2019-09-05 17:15:44 -07:00
davidriazati	0be29ee2ba	Finish testing code examples in the docs (#25668 ) Summary: All of the code examples should now run as unit tests, save for those that require interaction (i.e. show `pdb` usage) and those that use CUDA. `save` had to be moved before `load` in `jit/__init__.py` so `load` could use the file generated by `save` ](https://our.intern.facebook.com/intern/diff/17192417/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/25668 Pulled By: driazati Differential Revision: D17192417 fbshipit-source-id: 931b310ae0c3d2cc6affeabccae5296f53fe42bc	2019-09-05 16:13:37 -07:00
Johannes M Dieterich	c6dd4036f5	Enable two tests that were skipped b/c of rocThrust bugs fixed in ROCm 2.7 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25724 Differential Revision: D17212373 Pulled By: bddppq fbshipit-source-id: 2978bc13cdcd0e96a82c0019a08b589f67c0fe1d	2019-09-05 16:10:56 -07:00
Swati Rallapalli	1559c64417	Cyclical learning rate multiplier: use fabs(base_lr) (#25628 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25628 We figured that base_lr is negative in learning_rate_functors.h. So using fabs(base_lr) for cyclical learning rate multiplier computation. Test Plan: Canary: f135306794 Reviewed By: chenshouyuan Differential Revision: D17167635 fbshipit-source-id: e7fb55835f9fc07712edd63e81f1cf355e05b9f4	2019-09-05 15:53:54 -07:00
Michael Suo	11eb8ac2a9	Revert D17199043: [JIT] preserve ignored function return value type Test Plan: revert-hammer Differential Revision: D17199043 Original commit changeset: 1196fd94c207 fbshipit-source-id: 49789ae1f128262bc40a9d5b0d2b7bfbbf0b7e1e	2019-09-05 15:51:06 -07:00
Lu Fang	a294e157cb	Align AliasInfo's operator<< with FunctionSchema (#23206 ) Summary: old (a) new (a! -> b) Pull Request resolved: https://github.com/pytorch/pytorch/pull/23206 ghstack-source-id: 89570435 Test Plan: cont build and the round trip tests in the last diff Reviewed By: zrphercule Differential Revision: D16433909 fbshipit-source-id: b5b018e839935cccbb1fb446070afd1cb9379bb1	2019-09-05 15:47:44 -07:00
Pieter Noordhuis	ce3b81fdf3	Only default USE_DISTRIBUTED=True on Linux (#25725 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25725 After landing #25260 the macOS wheel builds started to fail. It turns out that if not specified, the setup helpers default USE_DISTRIBUTED to true on all platforms except Windows. This commit updates that such that USE_DISTRIBUTED only defaults to true on Linux. More work is needed to enable it by default on macOS. [test wheel] ghstack-source-id: 89571701 Test Plan: N/A Differential Revision: D17211695 fbshipit-source-id: 185db2e3425e45e6b76bd09d70a84e57327ca20f	2019-09-05 15:26:35 -07:00
Johannes M Dieterich	30aef56e63	rocBLAS deprecated the last two parameters. (#25726 ) Summary: Fixes warnings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25726 Differential Revision: D17212371 Pulled By: bddppq fbshipit-source-id: ac07c437c71b70340d345894ccab069c817fdb61	2019-09-05 15:26:31 -07:00
Ailing Zhang	bc2a37b2a2	bring back skipped bitwise dispatch (#25689 ) Summary: Before https://github.com/pytorch/pytorch/issues/24879, `bitwise_not` calls into `at::bitwise_not_out` which goes through a device dispatch. But after the PR it's dispatched directly to `at::native::bitwise_not_out` which only has cpu and cuda impls. Skipping `at::` dispatch indeed broke XLA but XLA didn't have unary tests. We didn't notice it until a test has been added in https://github.com/pytorch/xla/pull/986. :P Trying to fix the breakage in this PR to save a revert. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25689 Differential Revision: D17201071 Pulled By: ailzhang fbshipit-source-id: 0ca560a14a2ec6141f3795479c6dcb460e3805b5	2019-09-05 15:24:06 -07:00
Frank Jiang	3be1745b3c	Make SparseNormalize backwards compatible (#25660 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25660 As title Test Plan: buck test caffe2/caffe2/python/operator_test:sparse_normalize_test https://our.intern.facebook.com/intern/testinfra/testrun/5910974517813190 Reviewed By: boryiingsu Differential Revision: D17187839 fbshipit-source-id: 1e5a6eaac0e825db4ae969540a1f689444070579	2019-09-05 15:14:21 -07:00
Shen Li	197fd4f707	Adding RRef as return value for builtin operators (#25169 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25169 See #23110 for RRef design details. This commit only implements RRef as return value for builtin operators, and RRef will communicate between a user and the owner. More specifically, a RRef is first created on the `dist.remote` caller, which is a user of the RRef. Then the RRef user sends and notification to the owner to report the fork to the owner, and the owner uses a shared_ptr to keep the RRef alive. When the user RRef is destructed on the caller, another notification will be sent to the owner, and the owner can then drop it's RRef as well. Test Plan: Imported from OSS Differential Revision: D17048343 Pulled By: mrshenli fbshipit-source-id: 9dd3b3d0e4fd214c76fecdbed746a6d3029b3efd	2019-09-05 15:14:17 -07:00
Jiakai Liu	99b6472d6b	move USE_STATIC_DISPATCH from CI script to master cmake (#25696 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25696 Move the flag from CI to CMake so it's less magic and can be reused by iOS build as well. Test Plan: - will check CI Differential Revision: D17202734 Pulled By: ljk53 fbshipit-source-id: da4f150cbcf2bb5624def386ce3699eff2a7446f	2019-09-05 15:14:13 -07:00
Jiakai Liu	17e7079aa2	rename 'mobile' to 'static_dispatch' (#25695 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25695 Rename codegen variables to better reflect its semantics. As we are going to change other parts of codegen for mobile build, e.g. autograd, it would be more clear to use more specific names instead of calling everything 'mobile'. Test Plan: - will check CI Differential Revision: D17202732 Pulled By: ljk53 fbshipit-source-id: b2953c0914f25f9a1de00be89a09a6372cc5b614	2019-09-05 15:14:09 -07:00
Jerry Zhang	99cd83ea22	Inserting observers for all methods called in forward (#25503 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25503 Previously we only insert observers for forward methods, this PR extends the support to all observers. It will insert duplicated observers right now, we'll remove them in next PR. Test Plan: python test/test_jit.py -- 'TestJit.insert_observers' Imported from OSS Differential Revision: D17208886 fbshipit-source-id: 04084c8f42c56cb66a11968987a15752f532ac04	2019-09-05 15:11:22 -07:00
svcscm	7333a8c679	Updating submodules Test Plan: n/a Reviewed By: cdelahousse fbshipit-source-id: 3acbf05f8b585739c26f865d57cb7be587542073	2019-09-05 12:50:19 -07:00
Elias Ellison	df043cd49d	preserve ignored function return value type (#25262 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25262 Preserve the type of ignore'd functions on serialization. Currently we first compile an ignore'd function with it's annotated type when first compiling, but do not preserve it. This is important for being able to compile models with not-yet-supported features in JIT. ``` torch.jit.ignore def unsupported(x): return x def foo(): if not torch.jit._is_scripting(): return torch.linear(...) else: return unsupported(...) ``` Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D17199043 Pulled By: eellison fbshipit-source-id: 1196fd94c207b9fbee1087e4b2ef7d4656a6647f	2019-09-05 11:21:55 -07:00
Supriya Rao	61819260f7	Rename FBGEMM quantized operators to generic quantized ops (#25678 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25678 As an effort to unify fbgemm and qnnpack at the dispatcher level, we need to have a generic name for the quantized backed ops. Currently FBGEMM is guarded by the USE_FBGEMM macro and QNNPACK uses USE_QNNPACK. ghstack-source-id: 89518961 Test Plan: buck test caffe2/test:quantized Differential Revision: D17194364 fbshipit-source-id: 5960aedff6b8cb89eb3872c39b74caf54c0fbf20	2019-09-05 10:13:08 -07:00
Richard Zou	50cb48643d	Fix named tensor build (#25673 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25673 We recently moved new_empty into ATen. new_empty doesn't support named tensors (in fact, it was hackily supporting named tensors before). This fixes the named tensor test by changing all uses of `new_empty` to `empty`. Named tensor support for `new_empty` will come eventually, but it might be a little tricky. Test Plan: - [namedtensor ci] Differential Revision: D17206043 Pulled By: zou3519 fbshipit-source-id: 1697bd1d63e7cb344f3d459a29af0fcb9696ea49	2019-09-05 09:18:24 -07:00
Pieter Noordhuis	3556bea5aa	Build torch.distributed with Gloo backend on macOS (#25260 ) Summary: In facebookincubator/gloo#212, a libuv based Gloo transport was introduced, which allows us to use Gloo on macOS (and later perhaps also Windows). This commit updates CMake code to enable building with USE_DISTRIBUTED=1 on macOS. A few notes: * The Caffe2 ops are not compiled, for they depend on `gloo::transport::tcp`. * The process group implementation uses `gloo::transport::tcp` on Linux (because of `epoll(2)` on Linux and `gloo::transport::uv` on macOS). * The TCP store works but sometimes crashes on process termination. * The distributed tests are not yet run. * The nightly builds don't use `USE_DISTRIBUTED=1`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25260 Reviewed By: mrshenli Differential Revision: D17202381 Pulled By: pietern fbshipit-source-id: ca80a82e78a05b4154271d2fb0ed31c8d9f26a7c	2019-09-05 07:09:50 -07:00
Jiakai Liu	a3d0abf729	move GetDimFromOrderString to caffe2/core/types.h (#25671 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25671 To decouple string_utils.h from types.h and protobuf headers. Logically GetDimFromOrderString seems to be more similiar to StringToStorageOrder comparing to other string_utils functions. Test Plan: - Will check all internal/external CI jobs. Reviewed By: yinghai Differential Revision: D17191912 Pulled By: ljk53 fbshipit-source-id: fe555feef27bfd74c92b6297c12fb668252ca9ff	2019-09-05 04:32:04 -07:00
Jiakai Liu	a35a63b8bd	move legacy deserialization code into jit/import_legacy.cpp (#25649 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25649 Continue the work of PR #25493 to remove dependencies of generated protobuf headers from jit/import.cpp. Instead of adding intrusive #if/#else to gate the legacy functions, moving them into a separate file. Keep the ScriptModuleDeserializer structure as otherwise it will require a lot of interface changes. There is not much state to copy from ScriptModuleDeserializer as it only extracts extra_files before calling into LEGACY_deserialize. There is no state to copy back into ScriptModuleDeserializer either as it directly returns script::Module. Test Plan: - builds; - with stacked PR to remove protobuf from cmake; - load and run ResNet-18 in model.json format with non-mobile build; - load and run ResNet-18 in pickle format with mobile build; Differential Revision: D17183549 Pulled By: ljk53 fbshipit-source-id: 2947b95659cd16046d9595fb118d22acc179b3ad	2019-09-05 03:16:10 -07:00
Jiakai Liu	3363ec9283	clean up binaries/cmake for mobile (#25651 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25651 Most of the bianries are not useful/compilable for mobile. Consolidate the gating logic and move to the beginning of the file. Test Plan: - make sure BUILD_BINARY=ON works for both mobile and non-mobile builds; Differential Revision: D17183550 Pulled By: ljk53 fbshipit-source-id: a8179f4e80999271bf43b5d97798abc713c59843	2019-09-04 22:32:45 -07:00
Huamin Li	d4226392bd	change shape for some ops to reduce variance (#25686 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25686 From the new runs, we found some ops that we can increase the shape size to reduce the variance Test Plan: ``` [huaminli@devvm2388.ftw3 ~/fbsource/fbcode] buck run mode/dev-nosan caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operators None --iterations 3 ``` last few lines of the output P108624830 Reviewed By: mingzhe09088 Differential Revision: D17199623 fbshipit-source-id: a9277509f6d3e6503d3086b3b02f87eebd953239	2019-09-04 21:17:43 -07:00
Will Feng	ef6ea545e8	Add Python/C++ API parity tracker for torch.nn (#25289 ) Summary: This PR adds Python/C++ API parity tracker at `test/cpp_api_parity/parity-tracker.md`, which currently shows parity status for `torch.nn` modules. A good amount of line changes here is moving `new_criterion_tests` from `test_nn.py` to `common_nn.py`, so that it can be used in `test_cpp_api_parity.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25289 Differential Revision: D17188085 Pulled By: yf225 fbshipit-source-id: 33d12fb1a4de2d9147ed09380973f361a3981fdf	2019-09-04 19:46:33 -07:00
Zachary DeVito	0806203d54	Remove accidentally re-added file (#25677 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25677 (from a merge conflict resolution gone bad) Test Plan: Imported from OSS Differential Revision: D17195369 Pulled By: zdevito fbshipit-source-id: 9e40a2fbf2f58c952642147086e537bbbb049d97	2019-09-04 19:43:44 -07:00
James Reed	4d415bff2b	Add requests as a legit dependency (#25596 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25596 Giving up on trying to limit this to just a py2 dependency Test Plan: Imported from OSS Differential Revision: D17171063 Pulled By: jamesr66a fbshipit-source-id: 5df35fd128f3051dd9c6709f7d45323fedc12e65	2019-09-04 17:43:37 -07:00
Jerry Zhang	76b6b1b1a6	move no_deadline to hypothesis_utils.py (#25598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25598 att Test Plan: CI Imported from OSS Differential Revision: D17192467 fbshipit-source-id: 9ee93b02cc293bb71ed114534d92eedda3ddee88	2019-09-04 17:06:33 -07:00
svcscm	80820b2610	Updating submodules Test Plan: n/a Reviewed By: cdelahousse fbshipit-source-id: 67a4aef6ea96db636e1779b51a776c2d238a81d6	2019-09-04 15:38:34 -07:00
Edward Yang	55da02a86d	Revert D17097735: [quantization] Rename fbgemm quantized operators to generic `quantized` ops Test Plan: revert-hammer Differential Revision: D17097735 Original commit changeset: 447112a7a421 fbshipit-source-id: 78368b6f84d96cea70692fb000cebe99602a08c1	2019-09-04 15:02:32 -07:00
Edward Yang	2e1a5cb80e	Port new_full to ATen. (#25583 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25583 Following the game plan from https://github.com/pytorch/pytorch/pull/25475 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17183438 Pulled By: ezyang fbshipit-source-id: 67bd98206f349ddf5ffdd7be0c16e45418c1b1cd	2019-09-04 14:34:43 -07:00
Edward Yang	3d9c419648	Port new_empty to ATen. (#25475 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25475 I got sucked into this rabbit hole when I was trying to understand what I should do with TensorTypeId occurrences in torch/csrc/utils/tensor_new.cpp. I eventually concluded that all of my problems were because Tensor.new_empty was hand implemented and not actually a native function. So I made it a native function. There are a bunch of other new_* functions which should get this treatment, but I'm sending out this PR just to show how it can be done. The general recipe: 1. Implement a concept of TensorOptions merging (TensorOptions::merge_in). This represents the notion of taking a tensor, but "overriding" some of its values with specific overrides. One subtlety here is how devices get merged; see the comments for what our existing behavior is, and how I preserve it. 2. Implement new_empty as a native function, using options merging. 3. Add another special case to Python binding generation to treat new_* similar to *_like (i.e., handle TensorOptions correctly). The logic here is probably wrong, actually; we should codegen TensorOptions correctly no matter what happens, but new_empty follows the same pattern as empty_like so I opted not to touch this code too much. 4. Delete the now defunct manual binding code. 5. Delete manual type annotations that are no longer necessary since we're going through native. I didn't handle memory format correctly here. I don't know if this function should accept memory format; prior memory format patches didn't add support for memory format to new_like. If we had put memory format in TensorOptions this wouldn't have been a question. ghstack-source-id: 89294185 Test Plan: sandcastle & ossci Differential Revision: D17133000 fbshipit-source-id: 00f4e98bd5174f6fd54e8aba2910ea91824771d9	2019-09-04 14:34:39 -07:00
Jessica Lin	0cc8ac75c9	Alphabetize Package Reference section in Docs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25666 Differential Revision: D17190766 Pulled By: soumith fbshipit-source-id: 836305062b0195b2f11be069447e05008c128d21	2019-09-04 14:31:16 -07:00
Supriya Rao	c9ba5186d3	Rename fbgemm quantized operators to generic `quantized` ops (#25338 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25338 As an effort to unify fbgemm and qnnpack at the dispatcher level, we need to have a generic name for the quantized backed ops. Currently FBGEMM is guarded by the USE_FBGEMM macro and QNNPACK uses USE_QNNPACK. TBD: Use compile time macro or run_time to switch between fbgemm and qnnpack. ghstack-source-id: 89454244 Test Plan: buck test caffe2/test:quantized Differential Revision: D17097735 fbshipit-source-id: 447112a7a421387724d3e29b8fd8412dfb1c373a	2019-09-04 14:27:27 -07:00
Zachary DeVito	efc5306ad2	Make NoneType <: Optional[T] (#25361 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25361 Previously we had a different None object for each type T so that unwrap optional could still recover the type T from it. After a few months of having this conversion behavior, it has become clear that only the unwrap optional operators cause this problem. Furthermore, it is beneficial to have NoneType <: Optional[T] because this is how IValues work (in particular the None IValue is not tagged). This patch makes the necessary changes to do this. In particular it special cases unwrap optional in export so that it annotates the None to make sure we can recover the type. This also changes how matching and evaluating type values work so that we can consider None matchable to type Optional[T], eventhough we cannot derive T from that match. Test Plan: Imported from OSS Differential Revision: D17103072 Pulled By: zdevito fbshipit-source-id: 37678ed3e5ce54f2eb3ee4dff2734a39f0bee028	2019-09-04 13:52:40 -07:00
Pavel Belevich	738303ba43	Add set(CMAKE_SHARED_LINKER_FLAGS_RELEASE "-Wl,--no-as-needed") to CMakeLists.txt (#25445 ) Summary: This is a fix for a rare build issue on Ubuntu: `symbol lookup error: miniconda3/envs/pytorch-py3.7/lib/libmkl_intel_lp64.so: undefined symbol: mkl_blas_dsyrk` https://software.intel.com/en-us/articles/symbol-lookup-error-when-linking-intel-mkl-with-gcc-on-ubuntu Pull Request resolved: https://github.com/pytorch/pytorch/pull/25445 Differential Revision: D17151458 Pulled By: pbelevich fbshipit-source-id: a0f3e86a05ac408b95446560f42fc16fbff2d7af	2019-09-04 13:40:10 -07:00
James Reed	817f4502fb	Dynamic dispatch for optimized quantized op kernels (#25545 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25545 This re-uses the infrastructure from ATen/native/cpu, which compiles kernels multiple times for different instruction sets and dispatches dynamically based on the CPU's capability flags at runtime. This ensures we use the most optimal quantized kernel for the given machine Test Plan: Imported from OSS Differential Revision: D17166369 Pulled By: jamesr66a fbshipit-source-id: 8c8393f99365e1408819bbaf254c1b5734a34b70	2019-09-04 13:26:40 -07:00
Dylan Bespalko	849c32f8e9	Cpu-strided-complex support for binary-ops (#25534 ) Summary: In-tree changes to pytorch to support complex numbers are being submitted here. Out-of-tree support for complex numbers is here: [pytorch-cpu-strided-complex extension](https://gitlab.com/pytorch-complex/pytorch-cpu-strided-complex) Note: These changes do not support AVX/SSE operations on complex tensors. Changes so far: - [x] Added complex support of torch.empty. - [x] Added complex support of CopyKernels - [x] Added complex support of BinaryOp kernels Once these changes are applied the rest of the kernels are pretty easy. ezyang I have fixed the issues in the original [PR: 25373](https://github.com/pytorch/pytorch/pull/25373). Pull Request resolved: https://github.com/pytorch/pytorch/pull/25534 Differential Revision: D17188390 Pulled By: ezyang fbshipit-source-id: ade9fb00b2caa89b0f66a4de70a662b62db13a8c	2019-09-04 13:20:52 -07:00
Teng Yang	e3afe6a4e1	Update Transformer.py comments to include a full example (#25411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25411 We provide a full example in Transformer.py in comments section. Test Plan: N/A Reviewed By: zhangguanheng66 Differential Revision: D17116514 fbshipit-source-id: b8fd331bef7a626e52f3347c88adba21b1f43ec5	2019-09-04 12:53:35 -07:00
svcscm	5330b7392d	Updating submodules Test Plan: n/a Reviewed By: cdelahousse fbshipit-source-id: 23397af84c6b6654d2fa5af9175035b4a1e60b17	2019-09-04 12:29:28 -07:00
Michael Suo	0c6ee947b6	Remove forward compat code for serialization format (#25440 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25440 See the comments deleted for what this PR is all about Test Plan: Imported from OSS Differential Revision: D17125690 Pulled By: suo fbshipit-source-id: a4a2f541a3e161f9c15b51df475130e7bf683cf8	2019-09-04 12:22:31 -07:00
Michael Suo	bb969d5ac8	Remove friend dependency on ClassType in InterfaceType (#25617 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25617 This was causing some build issues if you included c10 but not torch Test Plan: Imported from OSS Differential Revision: D17173352 Pulled By: suo fbshipit-source-id: 8b6f65b6cdefea716598dec2909bbeb511f881b5	2019-09-04 12:07:40 -07:00
svcscm	2eebf08427	Updating submodules Test Plan: n/a Reviewed By: cdelahousse fbshipit-source-id: 6dceee13c8dbd2f20d464a432c1100456e4f2892	2019-09-04 12:04:08 -07:00
Junjie Bai	b266a079f0	Enable PiecewiseLinearTransform test on ROCm (#25632 ) Summary: thrust segfaults should have been fixed in ROCm rocThrust Pull Request resolved: https://github.com/pytorch/pytorch/pull/25632 Differential Revision: D17179503 Pulled By: bddppq fbshipit-source-id: 4d3854eacb30c945119d58250bccf399ccbc6105	2019-09-04 10:51:28 -07:00
Gregory Chanan	478440a061	Kill discover_sparse_tensor_operations. (#25589 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25589 It's not used anymore. Test Plan: Imported from OSS Differential Revision: D17172501 Pulled By: gchanan fbshipit-source-id: 4fff9e48358015bcf886294b8db359c3cc7acafa	2019-09-04 10:47:57 -07:00
Gregory Chanan	9f1a817742	Kill unused enumerate_options_due_to_default. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25588 Test Plan: Imported from OSS Differential Revision: D17172502 Pulled By: gchanan fbshipit-source-id: b3156a52ed5b4b108a1668714fe5cb26a3d3f575	2019-09-04 10:47:53 -07:00
Pieter Noordhuis	5407241b4f	Run clang-format on torch/csrc/distributed (#25647 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25647 TSIA Test Plan: N/A Differential Revision: D17182909 fbshipit-source-id: 22a6554693def0032a051cef5fe788f49de1d740	2019-09-04 10:08:09 -07:00
Pieter Noordhuis	ee087a6a47	Fix clang-tidy script (#25652 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25652 The clang-tidy driver script generates a chunk whitelist per file so that it only shows errors for lines that were actually changed. If a change removes the chunk the count is equal to 0. If the chunk happens to be at the start of the file, and the start position is equal 0, clang-tidy fails to run. This change filters out those chunks. Test Plan: Imported from OSS Differential Revision: D17184188 Pulled By: pietern fbshipit-source-id: b6c2d9dca4d52cd6bf4b186603545312726fb00b	2019-09-04 09:46:26 -07:00
Tao Xu	14c2492fb5	Fix iOS simulator build (#25633 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25633 The iOS simulator build (x86_64) is broken right now. To fix it: 1. Fix the bug in iOS.cmake 2. Disable avx2 for mobile x86_64 build Test Plan: 1. The `build_ios.sh` can be run successfully for iOS x86 build. The build script I'm using: ```shell ./scripts/build_ios.sh \ -DBUILD_CAFFE2_MOBILE=OFF \ -DIOS_PLATFORM=SIMULATOR \ -DUSE_NNPACK=OFF \ -DCMAKE_PREFIX_PATH=$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())') \ -DPYTHON_EXECUTABLE=$(python -c 'import sys; print(sys.executable)') ``` 2. All generated static libs are x86 libs as shown below ``` > lipo -i *.a Non-fat file: libasmjit.a is architecture: x86_64 Non-fat file: libc10.a is architecture: x86_64 Non-fat file: libcaffe2_protos.a is architecture: x86_64 Non-fat file: libclog.a is architecture: x86_64 Non-fat file: libcpuinfo.a is architecture: x86_64 Non-fat file: libfbgemm.a is architecture: x86_64 Non-fat file: libtorch.a is architecture: x86_64 Differential Revision: D17183803 Pulled By: xta0 fbshipit-source-id: 870d5433a3616b8e7ed9fb7dfab6aebbda26f723	2019-09-04 08:58:25 -07:00
Richard Zou	47cee2dd22	Implement initial version of autograd with named tensors (#25604 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25604 In this initial version: - autograd ignores all names. - tensor.grad is unnamed, unless the user manually assigns to it. - if a grad tensor has any names, perhaps the user was hoping for some alignment-checking behavior that named tensor offers for other ops. We raise a warning in this case. Future: do some more extensive checking to see if this actually works in all cases. Test Plan: - [namedtensor ci] - Check a warning is raised if a grad tensor has names. - Check tensor.grad field is unnamed. - Check that we can perform backward on an op that doesn't explictly support names in backward. `sigmoid` is one such op. Differential Revision: D17171788 Pulled By: zou3519 fbshipit-source-id: 64837fde94d8269610b6d3539ac025516dbe1df4	2019-09-04 06:36:54 -07:00
Sebastian Messmer	791347642b	Allow TensorMethods.h to include Dispatcher.h (alternative) (#23888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23888 This is an alternative to https://github.com/pytorch/pytorch/pull/23684. Instead of splitting a bunch of headers into declaration and definition, we change tensor includes to only include the tensor declaration when the tensor definition isn't needed. ghstack-source-id: 89357687 Test Plan: waitforsandcastle Differential Revision: D16673569 fbshipit-source-id: fa1d92809b05de7910a8c2dc2f55abe071ca63bf	2019-09-04 01:35:19 -07:00
Linbin Yu	885da48d22	remove protobuf usage from mobile build (#25493 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25493 remove protobuf usage from mobile build Test Plan: buck build //caffe2:torch buck build -c 'protobuf.use_v3=true' -c 'project.ignore=true' fbsource//fbandroid/mode/dev_clang_asan //xplat/experimental/pytorch/predictor:predictor Reviewed By: ljk53 Differential Revision: D17116846 fbshipit-source-id: d75e5f48e7eae960c0b5c7b8ef7f3359eb6ca4ec	2019-09-03 22:55:34 -07:00
iotamudelta	4fe857187c	switch to rocThrust for thrust/cub APIs (#25620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25620 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25602 Enable rocThrust with hipCUB and rocPRIM for ROCm. They are the ROCm implementations of the thrust and cub APIs and replace the older hip-thrust and cub-hip packages going forward. ROCm 2.5 is the first release to contain the new packages as an option, as of 2.6 they will be the only available option. Add hipification rules to correctly hipify thrust::cuda to thrust::hip and cub:: to hipcub:: going forward. Add hipification rules to hipify specific cub headers to the general hipcub header. Infrastructure work to correctly find, include and link against the new packages. Add the macro definition to choose the HIP backend to Thrust. Since include chains are now a little different from CUDA's Thrust, add includes for functionality used where applicable. Skip four tests that fail with the new rocThrust for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21864 Reviewed By: xw285cornell Differential Revision: D16940768 Pulled By: bddppq fbshipit-source-id: 3dba8a8f1763dd23d89eb0dd26d1db109973dbe5	2019-09-03 22:16:30 -07:00
svcscm	68b9920c7c	Updating submodules Test Plan: n/a Reviewed By: cdelahousse fbshipit-source-id: 7d6c34ed015a5016e59413e2e02224a6a46e2b03	2019-09-03 21:17:17 -07:00
Richard Zou	0ebbcd9541	Name inference rules for relu/relu_/threshold/threshold_ (#25569 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25569 Test Plan - new tests [namedtensor ci] Test Plan: Imported from OSS Differential Revision: D17159121 Pulled By: zou3519 fbshipit-source-id: c68bdb543155488aa3634f908bd576e5c30c8d77	2019-09-03 20:10:24 -07:00
Richard Zou	9ea6238b07	Fix named tensor printing (#25564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25564 There are a number of ops that get called while printing tensors depending on how large the tensors are. This PR makes it so that before we attempt to format tensor data for printing, we drop the names of the tensor (if there are any). This is easier than supporting named tensors for all of those ops (which should happen eventually). Test Plan: - new test [namedtensor ci] Differential Revision: D17158872 Pulled By: zou3519 fbshipit-source-id: 282023837645b8cb16a4d93896a843dd598fc738	2019-09-03 19:58:00 -07:00
Jianyu Huang	0483d537ab	Add the dynamic quantized LSTM module (#25157 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25157 Add the dynamic quantized LSTM module. TODO (separate PRs): - Serialization. - Bias can be Null. ghstack-source-id: 89443731 Test Plan: buck test mode/dev caffe2/test:quantization -- 'test_quantized_rnn $test_quantization\.PostTrainingDynamicQuantTest$' --print-passing-details ``` [jianyuhuang@devvm2816.prn3.facebook.com: ~/fbsource/fbcode/caffe2/test] $ buck test mode/dev caffe2/test:quantization -- 'test_quantized_rnn $test_q uantization\.PostTrainingDynamicQuantTest$' --print-passing-details Action graph will be rebuilt because files have been added or removed. Parsing buck files: finished in 1.4 sec Building: finished in 4.0 sec (100%) 8122/8122 jobs, 2 updated Total time: 5.5 sec Trace available for this run at /tmp/testpilot.20190902-164918.1275502.log TestPilot test runner for Facebook. See https://fburl.com/testpilot for details. Testpilot build revision b61bc0e3b71033578eddfe0a28b0739bc685663f fbpkg 3b1c1aed1c534c0cb161a981eca6e2f0 at Sun Sep 1 20:58:52 2019 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/690/t.par Discovering tests Running 1 tests Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/2251799823877227 ✓ caffe2/test:quantization - test_quantized_rnn (test_quantization.PostTrainingDynamicQuantTest) 1.048 1/1 (passed) Test output: > test_quantized_rnn (test_quantization.PostTrainingDynamicQuantTest) ... ok > > ---------------------------------------------------------------------- > Ran 1 test in 1.049s > > OK Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/2251799823877227 Summary (total time 5.53s): PASS: 1 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 0 ``` Differential Revision: D16955662 fbshipit-source-id: 61cf1a74913105fa02e44b3941813eabac0006b5	2019-09-03 19:18:28 -07:00
Yinghai Lu	4edf77b6c0	Fuse to individual operators to GatherFuse8BitRowwiseQuantFloatMulLengthElim (#25519 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25519 Fuse Gather-Fused8BitRowwiseQuantizedToFloat-Mul-LengthsSum opportunistically. Test Plan: ``` buck test caffe2/caffe2/opt/custom:concat_elim_test ``` Reviewed By: dreamingleo Differential Revision: D17125045 fbshipit-source-id: 8ee50410eb13a82e1e5c8180f392fce2fe9cd728	2019-09-03 19:08:49 -07:00
Huamin Li	cd4a7cdaa6	change shape for some ops to reduce variance Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25619 Test Plan: ``` [huaminli@devvm2388.ftw3 ~/fbsource/fbcode] buck run mode/dev-nosan caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operators None --iterations 3 ``` last few lines of output P108286305 Reviewed By: mingzhe09088 Differential Revision: D17175802 fbshipit-source-id: 46b69fc1895444b15b6dfcec0625b6b9b006712a	2019-09-03 18:52:25 -07:00
Richard Zou	67d64ea910	Fix binary op name inference to happen before shape checks (#25563 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25563 Before, for binary ops, name inference occurred after shape checks. This defeats the purposes for names because the names are supposed to tell the user that i.e. their tensors are misaligned or that they are adding incompatible tensors. This PR changes TensorIterator so that names are computed before shape checks and propagated after the binary ops are finished. In order to support this, this PR makes the following changes: - adds a `names_` field to TensorIterator, similar to `shape_`. This is necessary to hold the output names, that are computed in `compute_names`, until they are used in `propagate_names_to_outputs()`. Test Plan: Imported from OSS Differential Revision: D17158869 Pulled By: zou3519 fbshipit-source-id: 0caa90f7a93e4d9bdb2549cd330cc3abd2258868	2019-09-03 18:49:09 -07:00
Richard Zou	9922e09436	Name inference rule for torch.cat (#25568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25568 Test Plan - new test [namedtensor ci] Test Plan: Imported from OSS Differential Revision: D17159069 Pulled By: zou3519 fbshipit-source-id: fbc185ea5865b128508451096b742ac18e467670	2019-09-03 18:43:10 -07:00
vishwakftw	49baeb9d4c	Eliminate magic numbers in BatchLinearAlgebra.cu (#25524 ) Summary: Changelog: - We had 65535 as a common magic number for several linalg routines as a batch size limit. This PR explicitly assigns them to a variable to minimize possible errors Pull Request resolved: https://github.com/pytorch/pytorch/pull/25524 Test Plan: - All existing tests should pass to confirm that the modification is correct This is a follow-up of the suggestion in https://github.com/pytorch/pytorch/issues/24438. Differential Revision: D17171842 Pulled By: zou3519 fbshipit-source-id: a9ed5000f47614b8aa792c577f30b30475e0ac4b	2019-09-03 17:53:27 -07:00
zou3519	8edf149f7f	Don't save `self` in `index` backward (#25594 ) Summary: `self` isn't necessary for `index` backward, we only need the shape of `self`. Changing derivatives.yaml to use `zeros_like(self)` triggers a codepath in the codegen to only save the shape. Fixes https://github.com/pytorch/pytorch/issues/24853. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25594 Test Plan: - I added a new test that is adapted from the code in https://github.com/pytorch/pytorch/issues/24853. I'm not sure what a more minimal example would look like because the bug is hard to trigger because of how autograd handles differential views. Differential Revision: D17168645 Pulled By: zou3519 fbshipit-source-id: 11f270fed7370730984a93e4316dd937baa351a7	2019-09-03 17:47:40 -07:00
Richard Zou	a6ba4f64ac	Name inference for masked_fill_ / masked_fill Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25567 Test Plan: - new tests [namedtensor ci] Differential Revision: D17159070 Pulled By: zou3519 fbshipit-source-id: d177a0847fc592b6b15e3ae59fcea847d4975e12	2019-09-03 17:45:14 -07:00
Richard Zou	2aef60660f	Name inference rule for masked select (#25566 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25566 masked_select returns a tensor with None names. However, it broadcasts its inputs so we need to perform a check that they are broadcastable. Test Plan: - new tests [namedtensor ci] Differential Revision: D17159071 Pulled By: zou3519 fbshipit-source-id: ad201f3f73bc54163ede1ba3d906d2409ebef475	2019-09-03 17:45:09 -07:00
vishwakftw	d1e079e2e0	Enable torch.cholesky for batches > 262140 (#24438 ) Summary: Changelog: - Iterate over mini batches of 262140 matrices (maximum) Pull Request resolved: https://github.com/pytorch/pytorch/pull/24438 Test Plan: - Added slow tests to test the behavior in test_torch and test_cuda Fixes https://github.com/pytorch/pytorch/issues/24403 Differential Revision: D17175603 Pulled By: soumith fbshipit-source-id: 1abb0a1e92494cf43ef4ba9efb54a919cd18bfef	2019-09-03 17:35:37 -07:00
Gregory Chanan	0621e2ce94	Get rid of _th_reciprocal_. (#25507 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25507 It doesn't seem to be used. Test Plan: Imported from OSS Differential Revision: D17163584 Pulled By: gchanan fbshipit-source-id: 7409cc06bf84863bd14aea060c755d0f162d2aec	2019-09-03 15:45:36 -07:00
vishwakftw	1e4832ffad	Enable broadcasting of batch dimensions RHS and LHS tensors for lu_solve (#24333 ) Summary: Changelog: - Enable broadcasting of RHS and LHS tensors for lu_solve. This means that you can now have RHS with size `3 x 2` and LHS with size `4 x 3 x 3` for instance - Remove deprecated behavior of having 2D tensors for RHS. Now all tensors have to have a last dimension which equals the number of right hand sides - Modified docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/24333 Test Plan: - Add tests for new behavior in test_torch.py with a port to test_cuda.py Differential Revision: D17165463 Pulled By: zou3519 fbshipit-source-id: cda5d5496ddb29ed0182bab250b5d90f8f454aa6	2019-09-03 15:14:48 -07:00
svcscm	914a6051f9	Updating submodules Test Plan: n/a Reviewed By: cdelahousse fbshipit-source-id: 2a2db3cc9bde896c49121450a7853047a56c3154	2019-09-03 15:11:51 -07:00
Igor Fedan	896cd1c510	Documentation for cdist (#25221 ) Summary: https://github.com/pytorch/pytorch/issues/21730 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25221 Differential Revision: D17073908 Pulled By: ifedan fbshipit-source-id: 19e2534183d6a2a7e9cdfcee4734cff1b124e05a	2019-09-03 14:16:07 -07:00
James Reed	9cb9f15989	Remove index calculation in quantized max_pool2d (#25526 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25526 This is not used, adds unnecessary operations in the tight inner loop, and makes vectorization extremely difficult Benchmark script ``` import torch, time for dtype in [torch.qint8, torch.quint8, torch.qint32]: print('**', str(dtype), '**') x = torch.rand(1, 56, 56, 256) q_x = torch.quantize_linear(x, 0.5, 1, dtype) q_x = q_x.permute([0, 3, 1, 2]) x = x.permute([0, 3, 1, 2]) NITER = 100 s = time.time() for i in range(NITER): float_out = torch.max_pool2d(x, kernel_size=3, stride=None, padding=0, dilation=1) time_per_iter_float = (time.time() - s) / NITER s = time.time() for i in range(NITER): quant_out = torch.max_pool2d(q_x, kernel_size=3, stride=None, padding=0, dilation=1) time_per_iter_quant = (time.time() - s) / NITER print('time/iter ms (float)', 'time/iter ms (quant)', 'quant/float', sep='\t') print(time_per_iter_float 1000, time_per_iter_quant * 1000, time_per_iter_quant / time_per_iter_float, sep='\t') numel = x.numel() + float_out.numel() float_bw_gbps = (numel * 4) / time_per_iter_float / 1e9 quant_bw_gbps = numel / time_per_iter_quant / 1e9 print('GB/s float', 'GB/s quant', sep='\t') print(float_bw_gbps, quant_bw_gbps, sep='\t') ``` Before this change (AVX2) ``` $ OMP_NUM_THREADS=1 python pool_bench.py ** torch.qint8 * time/iter ms (float) time/iter ms (quant) quant/float 3.6582303047180176 2.891871929168701 0.7905111729677203 GB/s float GB/s quant 0.9685120139731342 0.30629295546107427 torch.quint8 * time/iter ms (float) time/iter ms (quant) quant/float 3.6472487449645996 2.889857292175293 0.7923389640383144 GB/s float GB/s quant 0.9714281223323551 0.3065064847313822 torch.qint32 * time/iter ms (float) time/iter ms (quant) quant/float 3.7154507637023926 3.0337929725646973 0.8165342957045585 GB/s float GB/s quant 0.9535962727896339 0.291964549990766 ``` After this change (AVX2) ``` $ OMP_NUM_THREADS=1 python pool_bench.py torch.qint8 * time/iter ms (float) time/iter ms (quant) quant/float 3.869810104370117 1.928541660308838 0.4983556320065849 GB/s float GB/s quant 0.9155591371263668 0.45929005228653125 torch.quint8 * time/iter ms (float) time/iter ms (quant) quant/float 4.014170169830322 1.846764087677002 0.460061235459548 GB/s float GB/s quant 0.8826332342930452 0.47962812679240213 torch.qint32 *** time/iter ms (float) time/iter ms (quant) quant/float 3.983309268951416 1.848154067993164 0.4639745355448337 GB/s float GB/s quant 0.8894714823217043 0.4792674027235246 ``` Test Plan: Imported from OSS Differential Revision: D17166342 Pulled By: jamesr66a fbshipit-source-id: ce6b29349ceb4912a0dba4d085ef9a3cc1a2e965	2019-09-03 13:08:58 -07:00
Pieter Noordhuis	500e72aaa5	Make scatter/gather arguments optional (#25575 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25575 For both scatter and gather, only the source and destination rank, respectively, need to supply a list of tensors. The `scatter_list` and `gather_list` arguments were mandatory, however, and this has resulted in some confusion. This commit makes both the `scatter_list` and `gather_list`, and the `src` and `dst` arguments optional. Closes #25463. Test Plan: Imported from OSS Differential Revision: D17164253 fbshipit-source-id: a16bc208c87a1c96163c1a86d4a7ca8634a26f95	2019-09-03 12:27:05 -07:00
Pieter Noordhuis	493f7bd817	Error phrasing in torch.distributed helper functions Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25574 Test Plan: Imported from OSS Differential Revision: D17164254 fbshipit-source-id: 13dbcffd67c2b5425c722b2b21765345a85a3872	2019-09-03 12:27:01 -07:00
Richard Zou	938e740241	Name inference rule for mean, std, var, std_mean, var_mean (#25431 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25431 I put the name propagation logic in a central place, `make_reduction`, that creates a TensorIterator for the reduction. This lets us implement name inference rules for mean, std, var, std_mean, and var_mean. Test Plan - new tests [namedtensor ci] Test Plan: Imported from OSS Differential Revision: D17123577 Pulled By: zou3519 fbshipit-source-id: 2d47080a40da0c4bcabbb3df71ffa8fbeb7a14c6	2019-09-03 11:54:13 -07:00
Horace He	f3f83ccb23	Added invert bitwise operation to JIT (#22324 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/25360 Fixes https://github.com/pytorch/pytorch/issues/22124 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22324 Differential Revision: D17140477 Pulled By: yf225 fbshipit-source-id: f42aec5e688fe079d9e79726b7a6c345da94ae2e	2019-09-03 11:16:30 -07:00
Richard Zou	5c4cc1e8f3	Prepare to add some Dimname/DimnameList overloads (#25405 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25405 This PR adds schemas to native_functions.yaml, core/Tensor.h, and core/TensorMethods.h for Dimname/DimnameList overloads for the following functions: - min, max, max_values, min_values - mean, meadian - logsumexp, std, var, norm The actual implementations will come in a later PR. I am accumulating all the addtional schemas and changes to core/{Tensor\|TensorMethods}.h in this PR so that there is only one point of failure for potential merge conflicts. Test Plan: - Check that all pytorch builds still build. [namedtensor ci] Differential Revision: D17116333 Pulled By: zou3519 fbshipit-source-id: fd666d60109a311767169261afbec0fd85cc00c8	2019-09-03 10:55:47 -07:00
Igor Fedan	c89301a625	Migrate multinomial from the TH to ATen (CPU) (#25274 ) Summary: https://github.com/pytorch/pytorch/issues/24738 I updated the way to define n_categories and n_dist to fix https://github.com/pytorch/pytorch/issues/12309 Previously: n_dist = prob_dist.size(0) n_categories = prob_dist.size(1) Changed to: n_dist = prob_dist.size(-2) n_categories = prob_dist.size(-1) Pull Request resolved: https://github.com/pytorch/pytorch/pull/25274 Differential Revision: D17137157 Pulled By: ifedan fbshipit-source-id: 0320eafdaa7c272e169101b436b6c2ea4ba4736b	2019-09-03 09:38:27 -07:00
Pearu Peterson	f793a7c57e	Implement indexing methods for sparse tensors (#24937 ) Summary: Resolves https://github.com/pytorch/pytorch/issues/7416 . This PR implements the following indexing methods for sparse tensors: - [x] `select` - [x] `index_select` Note that this PR also modifies [gen.py](https://github.com/pytorch/pytorch/pull/24937/files#diff-76aa8cb3d0fad99c5f761d08cbcb4d19) that is not directly required to resolve the original issue but to work around a CI build issue reported in issue https://github.com/pytorch/pytorch/issues/24931 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/24937 Differential Revision: D17163796 Pulled By: ezyang fbshipit-source-id: 06613301ec456d9ed3491b9ce48e804048600f09	2019-09-03 09:31:03 -07:00
Brian Johnson	832c72a2d6	Update index.rst (#24245 ) Summary: Adds links to torchaudio and torchtext to docs index. We should eventually evolve this to bring the audio and text docs builds in like torchvision. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24245 Differential Revision: D17163539 Pulled By: soumith fbshipit-source-id: 5754bdf7579208e291e53970b40f73ef119b758f	2019-09-03 09:28:19 -07:00
Hong Xu	b46bc79f2f	Create helpers for implementing unary ops whose CUDA implementation is ATen. (#24879 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24879 Pull Request resolved: https://github.com/pytorch/pytorch/pull/24879 Test Plan: Imported from OSS Differential Revision: D17073557 Pulled By: VitalyFedyunin fbshipit-source-id: 0d876627d500601ecd2a6aa6501e880842f2e98b	2019-09-03 09:02:13 -07:00
Gregory Chanan	4864403fb4	Delete torch/csrc/nn/type_checks, which aren't used anymore. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25506 Test Plan: Imported from OSS Differential Revision: D17141960 Pulled By: gchanan fbshipit-source-id: 460d6a83c796f0d1ca576a709c298e90204f6b06	2019-09-03 08:32:30 -07:00
peter	09ef107e59	Add copy logic for LibTorch to avoid issues on Windows (#25556 ) Summary: This should work both on VS and Ninja. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25556 Differential Revision: D17162045 Pulled By: ezyang fbshipit-source-id: 18c3d62e9ba93bf603f3a5310087fac77be4a974	2019-09-03 06:33:38 -07:00
svcscm	ba9f13448b	Updating submodules Test Plan: n/a Reviewed By: cdelahousse fbshipit-source-id: f4673e8e37d73ece6d4e1f2a03460293d8484715	2019-09-03 05:56:57 -07:00
Jongsoo Park	8199bb3dd3	add options to flush cache in SLS benchmarks (#25530 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25530 Add an option to flush cache for more consistent benchmarking. Test Plan: buck run mode/opt caffe2/caffe2/fb/python/benchmarks:sparse_lengths_sum_4bit_benchmark -- --flush-cache buck run mode/opt caffe2/caffe2/python/operator_test:sparse_lengths_sum_benchmark -- --flush-cache Reviewed By: hyuen Differential Revision: D17148087 fbshipit-source-id: 7eb782986676620254c1619a9a48c656cb1a6856	2019-09-03 05:09:03 -07:00
Jongsoo Park	f1059d4e6a	format sparse_lengths_sum_benchmark (#25529 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25529 To prepare D17148087 Test Plan: Just formatting Reviewed By: hyuen Differential Revision: D17148085 fbshipit-source-id: faff90ee7dfec543d47037d20ce00f251144bc06	2019-09-03 05:08:59 -07:00
Justin Chiu	53cacb6a59	test_allreduce_coalesced_stress message passed in as kwarg (#25557 ) Summary: addresses https://github.com/pytorch/pytorch/issues/25427, see issue discussion for more context. message conversion to unicode is a potential source of flakiness, passing in as kwarg instead of to `prec` is both more clear and resilient to being broken in the future. cc mrshenli Pull Request resolved: https://github.com/pytorch/pytorch/pull/25557 Differential Revision: D17160343 Pulled By: pietern fbshipit-source-id: af071fecc04c7e0a6658694dc0d76472193f8e78	2019-09-03 00:54:11 -07:00
Justin Chiu	631c34d876	checks requiring GPU moved to their own test (#25555 ) Summary: `test_allreduce_coalesced_checks` is skipped if no GPU/not compiled with `CUDA` support. This PR moves the checks involving `.cuda()` to their own tests, since the others are still valid with or without CUDA. cc pietern mrshenli Pull Request resolved: https://github.com/pytorch/pytorch/pull/25555 Differential Revision: D17160337 Pulled By: pietern fbshipit-source-id: 4c5e6db44d2728ca43784b85131e890d3d003bcd	2019-09-03 00:50:32 -07:00
Horace He	71c97d3747	Fixed flatten docs (I think) (#25544 ) Summary: I think... I'm having issues building the site, but it appears to get rid of the error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25544 Differential Revision: D17157327 Pulled By: ezyang fbshipit-source-id: 170235c52008ca78ff0d8740b2d7f5b67397b614	2019-09-02 11:34:56 -07:00
Michael Suo	7d3564fc2c	remove MULTI_GPU (#25509 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25509 Trying to reduce the number of build parameters to simplify the config. This one is purely derived from the build environment, so we can have the CI scripts just compute it. Test Plan: Imported from OSS Differential Revision: D17143343 Pulled By: suo fbshipit-source-id: 7837607b7b18a9233fd8657dc9c63539c0194110	2019-09-02 10:22:30 -07:00
vishwakftw	9d37179061	Fix CUDA distributions test on Windows (#25539 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/25304. The possible cause for the failure could have been the fact that `at::empty` was creating a tensor with very small values or 0, which led to `cumdist` not summing to a positive number. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25539 Differential Revision: D17156212 Pulled By: ezyang fbshipit-source-id: ee8039e576bf76a2266aeb7e9537337d635e0f8f	2019-09-02 08:19:47 -07:00
Pieter Noordhuis	c36b77fcda	Run clang-format on torch/lib/c10d (#25382 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25382 The formatted code swapped the inclusion order around in ProcessGroupNCCLTest.cpp, causing a compilation failure in `ATen/cuda/CUDAMultiStreamGuard.h`. To fix this, this commit also includes a fix to the include list in `ATen/cuda/CUDAMultiStreamGuard.h`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25382 Test Plan: Imported from OSS Differential Revision: D17152634 Pulled By: pietern fbshipit-source-id: c7b74d65a10dce5d602a98dc23fe2810235f932d	2019-09-02 02:59:47 -07:00
Pritam Damania	40cb5182e9	Attach 'send' autograd function to the autograd graph as part of RPC. (#24876 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24876 This contains very basic functionality of adding 'send' autograd function to our autograd graph. The purpose of this change is to validate the basic structure proposed here makes sense. Once this makes sense, we can build upon this to address more complicated scenarios. At a high level we've added the following functionality: 1) Define a very simple 'SendRpcBackwards' autograd function. 2) Attach this function to appropriate tensors when we call an RPC. 3) Store the send function in our distributed autograd context. ghstack-source-id: 89359708 Test Plan: unit tests. Differential Revision: D16903255 fbshipit-source-id: 6c04794a8e58b199795404225fd9da0c1440460e	2019-09-01 23:54:01 -07:00
Mike Ruberry	a024e1e091	Creates Torch-friendly Event class and adds Stream tracking to autograd (#25130 ) Summary: Resubmission of https://github.com/pytorch/pytorch/issues/23424 because previous PR was borked. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25130 Test Plan: Two tests were added to cuda_stream_test for this functionality. Differential Revision: D17145538 Pulled By: mruberry fbshipit-source-id: 2546c5907c038412e03aa0d3328a972b0164c455	2019-09-01 12:37:52 -07:00
Pavel Belevich	6a458512c2	Fix pow precision (#25476 ) Summary: Found in gpytorch: ``` test_computes_cubic_kernel (test.kernels.test_polynomial_kernel.TestPolynomialKernel) ... FAIL ====================================================================== FAIL: test_computes_cubic_kernel (test.kernels.test_polynomial_kernel.TestPolynomialKernel) ---------------------------------------------------------------------- Traceback (most recent call last): File "/mnt/xarfuse/uid-30041/0efc4638-seed-sandcastle-2ddc31a66f82cDbd-ns-4026533029/test/kernels/test_polynomial_kernel.py", line 70, in test_computes_cubic_kernel self.assertLess(torch.norm(res - actual), 1e-5) AssertionError: tensor(1.0790e-05, grad_fn=<NormBackward0>) not less than 1e-05 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/25476 Differential Revision: D17147518 Pulled By: pbelevich fbshipit-source-id: 60b619f5166d2bfaed7aa4803672e6be17d32b76	2019-09-01 06:38:16 -07:00
Shen Li	c881136215	Move worker name collection code from Python to C++ (#24260 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24260 This also simplifies ProcessGroupAgent constructor signature. Test Plan: Imported from OSS Differential Revision: D16789219 Pulled By: mrshenli fbshipit-source-id: bbb69022435467fbb1c28da21dd03d3ab52fc521	2019-08-31 19:02:45 -07:00
Mike Ruberry	ac7996ccd3	Removes SymbolicVariable (#25077 ) Summary: This PR excises the last of SymbolicVariable. There should be no change in functionality. One new test for addmm fusion was added. A case where the peephole optimizer might convert a scalar argument remains untested. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25077 Test Plan: Refactors existing code so mostly covered by current tests. One test for addmm fusion was added. Differential Revision: D17145334 Pulled By: mruberry fbshipit-source-id: 6b68faf764f9ee8398b55c43110228ed9faf81eb	2019-08-31 11:19:50 -07:00
Igor Fedan	60c4e74e49	Migrate CPU_tensor_apply to TensorIterator in aten/src/ATen/native/TensorCompare.cpp (#25402 ) Summary: https://github.com/pytorch/pytorch/issues/24497 https://github.com/pytorch/pytorch/issues/24498 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25402 Differential Revision: D17135825 Pulled By: ifedan fbshipit-source-id: fba07e4b59453db1a98bfdebfe21f0827cc952e5	2019-08-30 20:59:57 -07:00
Pritam Damania	e316b7d548	Multiple fixes to test_c10d.py. (#25441 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25441 1) There was a bug in https://github.com/pytorch/pytorch/pull/25012, where the tests which needed to be skipped for return code checking was incorrect. 2) Added proper setup and teardown for the nccl_error tests. 3) Ensure AssertionError is not ignored for tests that skip return code checking. ghstack-source-id: 89317660 Test Plan: unit tests Differential Revision: D17125824 fbshipit-source-id: 317ec39942b93e40ab847246b3a5129919ba2ac4	2019-08-30 18:22:58 -07:00
Ivan Kobzarev	6e4eeb1d17	Gradle tasks for publishing to bintray, jcenter, mavencentral etc. (#25351 ) Summary: Gradle tasks for publishing to bintray and jcenter, mavencentral; snapshot buidls go to oss.sonatype.org Those gradle changes adds tasks: bintrayUpload - publishing on bintray, in 'facebook' org uploadArchives - uploading to maven repos Gradle tasks are copied from facebook open sourced libraries like https://github.com/facebook/litho, https://github.com/facebookincubator/spectrum To do the publishing we need to provide somehow (e.g. in ~/.gradle/gradle.properties) ``` signing.keyId= signing.password= signing.secretKeyRingFile= bintrayUsername= bintrayApiKey= bintrayGpgPassword= SONATYPE_NEXUS_USERNAME= SONATYPE_NEXUS_PASSWORD= ``` android/libs/fbjni is submodule, to be able to add publishing tasks to it (it needs to be published as separate maven dependency) - I created `android/libs/fbjni_local` that has only `build.gradle` with release tasks. pytorch_android dependency for ':fbjni' changed from implementation -> api as implementation treated as 'private' dependency which is translated to scope=runtime in maven pom file, api works as 'compile' Testing: it's already published on bintray with version 0.0.4 and can be used in gradle files as ``` repositories { maven { url "https://dl.bintray.com/facebook/maven" } } dependencies { implementation 'com.facebook:pytorch_android:0.0.4' implementation 'com.facebook:pytorch_android_torchvision:0.0.4' } ``` It was published in com.facebook group I requested sync to jcenter from bintray, that usually takes 2-3 days Versioning added version suffixes to aar output files and circleCI jobs for android start failing as they expected just pytorch_android.aar pytorch_android_torchvision.aar, without any version To avoid it - I changed circleCI android jobs to zip *.aar files and publish as single artifact with name artifacts.zip, I will add kostmo to check this part, if circleCI jobs finish ok - everything works :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/25351 Reviewed By: kostmo Differential Revision: D17135886 Pulled By: IvanKobzarev fbshipit-source-id: 64eebac670bbccaaafa1b04eeab15760dd5ecdf9	2019-08-30 17:52:34 -07:00
James Reed	a27fdfd38c	Vectorized quantized relu/relu6 (#25496 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25496 Benchmark Script ``` import torch, time sizes = [ (1, 56, 56, 256), (1, 28, 28, 512), (1, 14, 14, 1024), (1, 7, 7, 2048), ] NITER = 1000 for dtype in [torch.qint8, torch.quint8, torch.qint32]: print('***', str(dtype), '*') print('\t*relu**') print('\tsize', 'time (float ms)', 'time (quant ms)', 'quant / float', sep='\t') for size in sizes: # NHWC x = torch.rand(size) # NCHW x = x.permute([0, 2, 3, 1]) # Test float s = time.time() for i in range(NITER): torch.relu(x) time_per_iter_float = (time.time() - s) / NITER # Test quantized q_x = torch.quantize_linear(x, 0.5, 1, dtype) s = time.time() for i in range(NITER): torch.relu(q_x) time_per_iter_quant = (time.time() - s) / NITER print('\t', size, time_per_iter_float * 1000, time_per_iter_quant * 1000, time_per_iter_quant / time_per_iter_float, sep='\t') print('\t***relu6**') print('\tsize', 'time (float ms)', 'time (quant ms)', 'quant / float', sep='\t') for size in sizes: # NHWC x = torch.rand(size) # NCHW x = x.permute([0, 2, 3, 1]) # Test float relu6 s = time.time() for i in range(NITER): torch._C._nn.hardtanh(x, 0., 6.) time_per_iter_float_6 = (time.time() - s) / NITER # Test quantized relu6 q_x = torch.quantize_linear(x, 0.5, 1, dtype) s = time.time() for i in range(NITER): torch.ops.quantized.relu6(q_x) time_per_iter_quant_6 = (time.time() - s) / NITER print('\t', size, time_per_iter_float_6 * 1000, time_per_iter_quant_6 * 1000, time_per_iter_quant_6 / time_per_iter_float_6, sep='\t') ``` Before this change (AVX2) ``` $ OMP_NUM_THREADS=1 python relu_bench.py *** torch.qint8 * *relu* size time (float ms) time (quant ms) quant / float (1, 56, 56, 256) 0.28845906257629395 0.32473158836364746 1.1257458353479874 (1, 28, 28, 512) 0.12658190727233887 0.1621997356414795 1.2813816692816096 (1, 14, 14, 1024) 0.060466766357421875 0.08151435852050781 1.3480852943031985 (1, 7, 7, 2048) 0.021933555603027344 0.04172706604003906 1.9024305404582809 *relu6* size time (float ms) time (quant ms) quant / float (1, 56, 56, 256) 1.0264298915863037 0.4686436653137207 0.45657640054641424 (1, 28, 28, 512) 0.4577608108520508 0.23253798484802246 0.5079901541051298 (1, 14, 14, 1024) 0.22967290878295898 0.11695981025695801 0.509245129853278 (1, 7, 7, 2048) 0.12731575965881348 0.060141801834106445 0.4723830105187069 * torch.quint8 * *relu* size time (float ms) time (quant ms) quant / float (1, 56, 56, 256) 0.28515172004699707 0.32268643379211426 1.1316306762551913 (1, 28, 28, 512) 0.1268613338470459 0.1618938446044922 1.2761480562681475 (1, 14, 14, 1024) 0.06022787094116211 0.08164644241333008 1.355625578946535 (1, 7, 7, 2048) 0.018331527709960938 0.04460000991821289 2.432967433149516 *relu6* size time (float ms) time (quant ms) quant / float (1, 56, 56, 256) 1.027123212814331 0.5206699371337891 0.50692062124382 (1, 28, 28, 512) 0.4589383602142334 0.25958728790283203 0.565625605542444 (1, 14, 14, 1024) 0.23261427879333496 0.13058066368103027 0.561361341867771 (1, 7, 7, 2048) 0.13072657585144043 0.06684517860412598 0.5113358027528374 * torch.qint32 * *relu* size time (float ms) time (quant ms) quant / float (1, 56, 56, 256) 0.285900354385376 0.44794583320617676 1.5667900593168678 (1, 28, 28, 512) 0.12691712379455566 0.21081137657165527 1.6610160258035915 (1, 14, 14, 1024) 0.05957603454589844 0.10731720924377441 1.8013486473507283 (1, 7, 7, 2048) 0.01675701141357422 0.05678510665893555 3.388737123669683 *relu6* size time (float ms) time (quant ms) quant / float (1, 56, 56, 256) 1.0314903259277344 0.6447939872741699 0.6251090980366052 (1, 28, 28, 512) 0.4572310447692871 0.3106963634490967 0.6795172090859886 (1, 14, 14, 1024) 0.2294166088104248 0.1586904525756836 0.6917130080447454 (1, 7, 7, 2048) 0.12760710716247559 0.07992196083068848 0.6263127705647926 ``` After this change (AVX2) ``` $ OMP_NUM_THREADS=1 python relu_bench.py * torch.qint8 * *relu* size time (float ms) time (quant ms) quant / float (1, 56, 56, 256) 0.2889232635498047 0.06460881233215332 0.22361928056034167 (1, 28, 28, 512) 0.13853216171264648 0.013955354690551758 0.10073729102343015 (1, 14, 14, 1024) 0.0721442699432373 0.007253408432006836 0.10054032617855548 (1, 7, 7, 2048) 0.015225648880004883 0.004289150238037109 0.28170557930505313 *relu6* size time (float ms) time (quant ms) quant / float (1, 56, 56, 256) 1.042311191558838 0.06422209739685059 0.061615089540392104 (1, 28, 28, 512) 0.46384429931640625 0.01335287094116211 0.028787399049295198 (1, 14, 14, 1024) 0.2301616668701172 0.007760286331176758 0.033716675920477994 (1, 7, 7, 2048) 0.12573981285095215 0.004631757736206055 0.03683604763827976 * torch.quint8 * *relu* size time (float ms) time (quant ms) quant / float (1, 56, 56, 256) 0.2877991199493408 0.0571134090423584 0.1984488661828141 (1, 28, 28, 512) 0.12664175033569336 0.013076543807983398 0.10325618347283565 (1, 14, 14, 1024) 0.06389951705932617 0.005294084548950195 0.08285014961904974 (1, 7, 7, 2048) 0.016280174255371094 0.003660917282104492 0.22486966199988284 *relu6* size time (float ms) time (quant ms) quant / float (1, 56, 56, 256) 1.0244698524475098 0.05978655815124512 0.05835853344870231 (1, 28, 28, 512) 0.454937219619751 0.013289213180541992 0.02921109244842504 (1, 14, 14, 1024) 0.22972846031188965 0.0077877044677734375 0.03389960676705229 (1, 7, 7, 2048) 0.125657320022583 0.0045795440673828125 0.03644470586003093 * torch.qint32 * *relu* size time (float ms) time (quant ms) quant / float (1, 56, 56, 256) 0.28399205207824707 0.2665698528289795 0.9386525111468004 (1, 28, 28, 512) 0.12665152549743652 0.12166023254394531 0.9605903447756557 (1, 14, 14, 1024) 0.0598299503326416 0.059305429458618164 0.9912331387355795 (1, 7, 7, 2048) 0.014290809631347656 0.012906551361083984 0.9031364698031366 *relu6*** size time (float ms) time (quant ms) quant / float (1, 56, 56, 256) 1.020923376083374 0.27229976654052734 0.2667191024513184 (1, 28, 28, 512) 0.4564201831817627 0.12390279769897462 0.2714665176181136 (1, 14, 14, 1024) 0.23244047164916992 0.05935955047607422 0.25537527976482316 (1, 7, 7, 2048) 0.1271505355834961 0.014976024627685547 0.11778184463762029 ``` Test Plan: Imported from OSS Differential Revision: D17141891 Pulled By: jamesr66a fbshipit-source-id: 14b8c3330017c518a6b385780a449ca51efef0ce	2019-08-30 17:22:24 -07:00
Michael Suo	5455ba634c	remove PYTHON_VERSION (#25494 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25494 As far as I can tell, we don't use this anywhere in our CI scripts? [Diff of processed config](https://gist.github.com/suo/be4ca818afdcb3184f5c61c92f4a4c81) Test Plan: Imported from OSS Differential Revision: D17139125 Pulled By: suo fbshipit-source-id: ff2d025c220a420cda08502eda9fc7d41477e103	2019-08-30 16:50:15 -07:00
davidriazati	7a921ba17d	Manually implement `is_zipfile` (#25279 ) Summary: The default implementation is lenient in that it recognizes a zipfile if the magic number appears anywhere in the archive. So if someone has the bytes `PK\x03\x04` in a tensor, it gets recognized as a zipfile. See https://bugs.python.org/issue28494 This implementation only checks the first 4 bytes of the file for the zip magic number. We could also copy https://github.com/python/cpython/pull/5053's fix, but that seems like overkill. Fixes #25214 ](https://our.intern.facebook.com/intern/diff/17102516/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/25279 Pulled By: driazati Differential Revision: D17102516 fbshipit-source-id: 4d09645bd97e9ff7136a2229fba1d9a1bce5665a	2019-08-30 16:47:50 -07:00
Daya Khudia	fcab254d05	Minor fixes in per channel support for qconv kernel (#25182 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25182 Removing empty line, an unused variable and adding a check for supported quantized data types. ghstack-source-id: 89293181 Test Plan: buck test mode/dev caffe2/test:quantized -- --print-passing-details Reviewed By: jianyuh Differential Revision: D17052234 fbshipit-source-id: dbe470f0cd73fa4fca44bd15424adbaf7ceca469	2019-08-30 16:47:46 -07:00
James Reed	0f928dc0d9	Revert "Memory layout for pooling ops" (#25495 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25495 This reverts commit 8dcd256201a16827bae3610fe05f6566cce787d0. Test Plan: Imported from OSS Differential Revision: D17139716 Pulled By: jamesr66a fbshipit-source-id: 5f4a12e4048e8a50f8400fcde7de1fbce1495d37	2019-08-30 15:46:55 -07:00
David Reiss	77e8dba620	Disable Int8Transpose test Summary: It's failing in the FB internal build because we don't enable that op. Test Plan: buck test //xplat/caffe2:caffe2_testAndroid Reviewed By: supriyar Differential Revision: D17139694 fbshipit-source-id: 8091b71ff826466f3e2e1b4d6f87b9b50d1def20	2019-08-30 15:21:23 -07:00
Jiakai Liu	890d0f88ae	update speed benchmark binary to work in USE_STATIC_DISPATCH mode (#25449 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25449 Currently Variable and Tensor are still not 100% merged. There are various places in ATen/TH codebase where it asserts input type to be Variable/Tensor. Usually when input type is Variable it will dispatch function calls to corresponding generated VariableType methods, where it converts input Variable type to Tensor type with "unpack()" before calling into LegacyTHFunctions and then converts result from Tensor type back to Variable type with "as_variable()". However, when USE_STATIC_DISPATCH mode is enabled, it no longer dispatches function calls to VariableType methods. This way, Variable inputs will remain as Variable instances when they reach LegacyTHFunctions and fail the "checked_tensor_unwrap" asserts. And there are a couple other failed asserts because of similar reason. There are several options to address this problem with USE_STATIC_DISPATCH: 1. Wait until Variable and Tensor are fully merged as planned in https://github.com/pytorch/pytorch/issues/23032; 2. Create Tensors instead of Variables upfront on caller side (JIT); 3. Fix downstream asserts in ATen/TH to tolerant Variable inputs when AutoGrad is disabled; Option 1 will still take some time; Option 2 was tried before and caused a lot problems; Option 3 needs to be conducted case by case as it can be dangerous to remove asserts before 100% merge happens. After digging into it a bit more, turns out NonVariableTypeMode not only controls how it dispatches, but also controls TensorImpl.is_variable() result. So the problem can be addressed by: 1. Set AutoNonVariableTypeMode mode right before calling forward(); 2. Make sure all inputs/params are created as Variable, e.g.: A. should use torch::ones() to create test input tensor instead of at::ones(); B. should not set AutoNonVariableTypeMode before torch::jit::load() call; This diff applied these changes to speed benchmark to proof how it works. Test Plan: - Build speed benchmark binary for Android: ``` ./scripts/build_android.sh \ -DBUILD_BINARY=ON \ -DBUILD_CAFFE2_MOBILE=OFF \ -DUSE_STATIC_DISPATCH=ON \ -DCMAKE_PREFIX_PATH=$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())') \ -DPYTHON_EXECUTABLE=$(python -c 'import sys; print(sys.executable)') ``` - Push binaries and model to Android device: ``` adb push build_android/bin/speed_benchmark_torch /data/local/tmp adb push resnet.pb /data/local/tmp ``` - Run inference on device: ``` /data/local/tmp # ./speed_benchmark_torch --model=resnet.pb \ --input_dims="1,3,224,224" --input_type=float --print_output=true ``` Differential Revision: D17128567 Pulled By: ljk53 fbshipit-source-id: 58cc49ff35d21fefc906172cc3271f984eeb29f0	2019-08-30 15:16:45 -07:00
Jiakai Liu	a779263501	add speed benchmark binary for torch jit (#25486 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25486 Reland PR #25230. Test Plan: Imported from OSS Differential Revision: D17137095 Pulled By: ljk53 fbshipit-source-id: 3b258e29e16d03ef24b5b49d9b67b72257f0f3a8	2019-08-30 15:16:41 -07:00
Jiakai Liu	7ec6b74a35	turn off BUILD_BINARY for android CI jobs (#25485 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25485 I recently enabled binary build macro for android CI in PR #25368 as I started adding new binaries for android. But seems it's fragile, e.g.: PR #25230 failed android-armv8 CI but passed armv7/x86-32/x86-64. Currently it only runs x86-32 for PR so the armv8 failure was not captured before landing. Similar problem might happen for other PRs so I think we should just disable it for now to avoid breaking master CI. The android binaries are for local testing purpose anyway. We can re-enable it when it becomes more stable. Test Plan: - will check CI; Imported from OSS Differential Revision: D17137006 fbshipit-source-id: 2b7901f79e83c77ff82c14a0da3500b9416314b6	2019-08-30 15:16:37 -07:00
Ralf Gommers	6d35579910	Fix implicit fallthrough warnings in FeatureLPPooling.cu (#25451 ) Summary: `-Wimplicit-fallthrough` is enabled for recent GCC versions, and there's about 1000 lines of warnings in the build output with GCC 9.1 like: ``` /home/rgommers/code/pytorch/aten/src/THCUNN/FeatureLPPooling.cu: In function ‘bool runFeatureLPPoolingUpdateOutput(THCState*, const THCDeviceTensor<T, 4>&, THCDeviceTensor<T, 4>&, float, int, int) [with T = c10::Half]’: /home/rgommers/code/pytorch/aten/src/THCUNN/FeatureLPPooling.cu:474:10: warning: this statement may fall through [-Wimplicit-fallthrough=] 474 \| L2_WIDTH_CASE(2); \| ^~~~~~ /home/rgommers/code/pytorch/aten/src/THCUNN/FeatureLPPooling.cu:475:1: note: here 475 \| L2_WIDTH_CASE(3); \| ^ ... /home/rgommers/code/pytorch/aten/src/THCUNN/FeatureLPPooling.cu:639:11: warning: this statement may fall through [-Wimplicit-fallthrough=] 639 \| LP_WIDTH_CASE(15); \| ^~~~~~ /home/rgommers/code/pytorch/aten/src/THCUNN/FeatureLPPooling.cu:640:1: note: here 640 \| LP_WIDTH_CASE(16); \| ^ ``` Fix by ending each case statement with `break;` Pull Request resolved: https://github.com/pytorch/pytorch/pull/25451 Differential Revision: D17131254 Pulled By: ezyang fbshipit-source-id: 55b513620438cbbf86052f22d799d790b0633fa2	2019-08-30 14:37:14 -07:00
Cloud Han	861194e3f8	Fix windows build error when TBB enabled and Windows SDK installed (#25398 ) Summary: Fixed https://github.com/pytorch/pytorch/issues/25320 See the issue for more infomation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25398 Differential Revision: D17131116 Pulled By: ezyang fbshipit-source-id: cc3ebfe746abb33e24b4c884b08d9e57a1ea3476	2019-08-30 14:34:56 -07:00
Hong Xu	03f67e4b16	Remove BUILD_ATEN_ONLY build option (#24441 ) Summary: This build option no longer works. Close https://github.com/pytorch/pytorch/issues/21703 Pull Request resolved: https://github.com/pytorch/pytorch/pull/24441 Differential Revision: D17138131 Pulled By: ezyang fbshipit-source-id: 67adac990645a5df1f7c2e2dbef3689b2c30fcf8	2019-08-30 13:44:38 -07:00
Edward Yang	9bdcc499d1	Delete a few cases where we directly use Backend/TensorTypeId. (#25467 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25467 Use Layout/Device more directly in these cases. ghstack-source-id: 89289651 Test Plan: sandcastle and ossci Differential Revision: D17131883 fbshipit-source-id: ab3c6d1c879b7f26f20a2378364c852dc37508fc	2019-08-30 13:00:20 -07:00
Gregory Chanan	d159104d1f	Kill non-shared cwrap tools. (#25358 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25358 They aren't used anymore. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D17101575 Pulled By: gchanan fbshipit-source-id: 47c89c71951b49a22e3b1912fc7db40d982ad2fb	2019-08-30 12:36:44 -07:00
Gregory Chanan	28d4e2e9a9	Update derivatives.yaml docs to refer to Declarations.yaml rather than Declarations.cwrap. (#25357 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25357 Declarations.cwrap is no longer hooked up derivatives.yaml directly. Test Plan: Imported from OSS Differential Revision: D17101570 Pulled By: gchanan fbshipit-source-id: 838ea6a89c7403e73676292b93d864ecfdd6251b	2019-08-30 12:36:41 -07:00
Elias Ellison	d2a8435c08	add tuple keyword (#25474 ) Summary: Doesn't really add much functionality, since inputs to `tuple()` which we can statically infer the output size is pretty much just tuples. Does improve the error message though. Fix for https://github.com/pytorch/pytorch/issues/24000 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25474 Differential Revision: D17133800 Pulled By: eellison fbshipit-source-id: 41a052895e6ed24a384ec6f8aef0a6769ac094e6	2019-08-30 11:33:49 -07:00
Gregory Chanan	7e61136c3b	Get rid of extract_cwarp. (#25356 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25356 It's no longer used. Test Plan: Imported from OSS Differential Revision: D17101572 Pulled By: gchanan fbshipit-source-id: 7afb15b2f870601a773c946b8b3029bbe0a774ea	2019-08-30 11:18:01 -07:00
Michael Suo	fea4225b8a	Parameterize CircleCI config (#25446 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25446 Parameterize the CircleCI config. So now instead of ~1zillion job specs, there are only a handful, like `pytorch_linux_build` and such. The workflow definition feeds in the appropriate parameters that actually control job behavior. [Diff](https://gist.github.com/suo/12a48efd36948fc71bdb5c719682a64c) of the `circleci config process` output shows that the actual jobs generated are identical, except for some empty env vars being set. Differential Revision: D17133395 Test Plan: Imported from OSS Pulled By: suo fbshipit-source-id: e6d79268b05c91d5079670992bdf4a99e6dc2807	2019-08-30 10:35:04 -07:00
Gregory Chanan	fe055f2dfb	Get rid of more unused plugins. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25355 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D17101578 Pulled By: gchanan fbshipit-source-id: aef433771b6420c09394276b8f73396dd4f305fb	2019-08-30 10:32:17 -07:00
Gregory Chanan	a4fad42a09	Get rid of torch._thnn. (#25354 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25354 It doesn't seem to be used anymore. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D17101577 Pulled By: gchanan fbshipit-source-id: b7c00de8c05bff1336d2012fd7b6f97709391e17	2019-08-30 10:32:13 -07:00
Jerry Zhang	be0f803798	torch/jit/passes/quantization.{h,cpp} and torch/jit/init.cpp (#25403 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25403 att Test Plan: build Imported from OSS Differential Revision: D17125691 fbshipit-source-id: a7e944ea4b45a9f2b3078fcb9e830d8406dd6a86	2019-08-30 10:28:20 -07:00
Huamin Li	9d89c9a30f	change shape for conv and unary ops (#25477 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25477 We want to increase `in_c, out_c` so that the metric reported back are more stable Test Plan: ```[huaminli@devvm2388.ftw3 ~/fbsource/fbcode] buck run mode/dev-nosan caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operators None --iterations 3 ``` runs fine on my devserver, last couple lines of output P107448746 Reviewed By: mingzhe09088 Differential Revision: D17133043 fbshipit-source-id: 0b989a530cbfe3d608471a30ae4bbda10e5216ea	2019-08-30 10:02:30 -07:00
Igor Fedan	1a92b225db	Migrate clamp and clamp_ from the TH to Aten (CPU) (#25290 ) Summary: https://github.com/pytorch/pytorch/issues/24686 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25290 Differential Revision: D17122999 Pulled By: ifedan fbshipit-source-id: d6aec62ded0f01618f8b8c0d8057207df3fd329b	2019-08-30 09:26:53 -07:00
Justin Chiu	e26305ed60	cuda devices should have same dtype (#25470 ) Summary: addresses https://github.com/pytorch/pytorch/issues/25465 was passing in two tensors of different dtypes for a check making sure the two tensors were on the same device. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25470 Differential Revision: D17132404 Pulled By: mrshenli fbshipit-source-id: b03093b8324c98ec4c51be05b3abf9250c680c23	2019-08-30 09:00:21 -07:00
CamiWilliams	329757a907	Torch.flatten() returns a 1-dim tensor on a 0-dim tensor (#25406 ) Summary: PR for `torch.flatten()` to return a 1-dim tensor on a 0-dim tensor > torch.tensor(123).shape -> torch.Size([]) > torch.tensor(123).flatten() -> torch.tensor([123]) > torch.tensor(123).flatten().shape -> torch.Size([1]) resolve https://github.com/pytorch/pytorch/issues/22963 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25406 Differential Revision: D17120464 Pulled By: CamiWilliams fbshipit-source-id: efbecd61f0aefd82f2ab417ca6bb467488ff99de	2019-08-30 08:53:08 -07:00
Yaroslav Bulatov	4fb28e5df9	Fixes #25454 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25456 Differential Revision: D17132144 Pulled By: ezyang fbshipit-source-id: 68d11cbbb80f783959110b626594373ee41981d7	2019-08-30 07:59:26 -07:00
Gregory Chanan	25e6a52e2e	Stop doing nn wrap. (#25353 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25353 It doesn't seem necessary anymore. Test Plan: Imported from OSS Differential Revision: D17101569 Pulled By: gchanan fbshipit-source-id: 67a198ae594dcd64dbd7cf6a73e2160e26e3513e	2019-08-30 07:42:20 -07:00
Gregory Chanan	716815e3de	Stop initializing THNN backend. (#25352 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25352 It doesn't appear to be necessary anymore; assuming this works I'll kill the codegen in a follow-up PR. Test Plan: Imported from OSS Differential Revision: D17101573 Pulled By: gchanan fbshipit-source-id: bd3d1724ee5c659185a161b1e291e30af52f0a8a	2019-08-30 07:42:17 -07:00
CamiWilliams	05bf74a890	Compare shapes of outputs and grad_outputs in autograd.grad (#25349 ) Summary: PR to compare shapes of `outputs` and `grad_outputs` in `torch.autograd.grad()`. > grad_outputs should be a sequence of length matching output containing the pre-computed gradients w.r.t. each of the outputs. resolve https://github.com/pytorch/pytorch/issues/17893 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25349 Differential Revision: D17119931 Pulled By: CamiWilliams fbshipit-source-id: 86c9089e240ca0cea5f4ea8ec7bcff95f9d8cf53	2019-08-30 07:31:15 -07:00
Edward Yang	c56464d13e	Turn off warnings on Windows CI. (#24331 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24331 Currently our logs are something like 40M a pop. Turning off warnings and turning on verbose makefiles (to see the compile commands) reduces this to more like 8M. We could probably reduce log size more but verbose makefile is really useful and we'll keep it turned on for Windows. Some findings: 1. Setting `CMAKE_VERBOSE_MAKEFILE` inside CMakelists.txt itself as suggested in https://github.com/ninja-build/ninja/issues/900#issuecomment-417917630 does not work on Windows. Setting `-DCMAKE_VERBOSE_MAKEFILE=1` does work (and we respect this environment variable.) 2. The high (`/W3`) warning level is by default on MSVC is due to cmake inserting this in the default flags. On recent versions of cmake, CMP0092 can be used to disable this flag in the default set. The string replace trick sort of works, but the standard snippet you'll find on the internet won't disable the flag from nvcc. I inspected the CUDA cmake code and verified it does respect CMP0092 3. `EHsc` is also in the default flags; this one cannot be suppressed via a policy. The string replace trick seems to work... 4. ... however, it seems nvcc implicitly inserts an `/EHs` after `-Xcompiler` specified flags, which means that if we add `/EHa` to our set of flags, you'll get a warning from nvcc. So we probably have to figure out how to exclude EHa from the nvcc flags set (EHs does seem to work fine.) 5. To suppress warnings in nvcc, you must BOTH pass `-w` and `-Xcompiler /w`. Individually these are not enough. The patch applies these things; it also fixes a bug where nvcc verbose command printing doesn't work with `-GNinja`. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17131746 Pulled By: ezyang fbshipit-source-id: fb142f8677072a5430664b28155373088f074c4b	2019-08-30 07:11:07 -07:00
Brian Vaughan	f0c6021846	fix bug in assertNotEqual for int tensors (#25412 ) Summary: re-apply: https://github.com/pytorch/pytorch/pull/25199 but without a failing quantized test Pull Request resolved: https://github.com/pytorch/pytorch/pull/25412 Differential Revision: D17131303 Pulled By: nairbv fbshipit-source-id: edf7736af3ede5e809eded72be9514e922e70db4	2019-08-30 06:52:30 -07:00
peter	c76dacba84	Add windows docs for the binaries Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23150 Differential Revision: D17131252 Pulled By: ezyang fbshipit-source-id: 5d80a41fc8779b93a3f157dfdacc21cf4b809d5a	2019-08-30 06:45:32 -07:00
peter	061f2d1683	Skip useless macros from Windows.h (#25444 ) Summary: Applying https://github.com/pytorch/pytorch/issues/25398 to the whole project. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25444 Differential Revision: D17131251 Pulled By: ezyang fbshipit-source-id: 7a8817f3444aebd6028bf1056514355e2c4cc748	2019-08-30 06:42:44 -07:00
Edward Yang	c2b710c3bd	Revert D17067216: [pytorch][perf] add speed benchmark binary for torch jit Test Plan: revert-hammer Differential Revision: D17067216 Original commit changeset: 0cf4c46f1c90 fbshipit-source-id: a8fa2a72042da817d199d325c30624a451b24582	2019-08-30 06:22:21 -07:00
Michael Suo	60f6cc9d59	Emit script function calls during tracing. (#25089 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25089 Previously, when the tracer encountered a scripted function (or method), it inlined the function into the graph. Now, we emit a CallFunction or CallMethod node instead. Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D16987936 Pulled By: suo fbshipit-source-id: a3e38a4621f3504909ec0542865dc7e381c243d6	2019-08-30 01:30:03 -07:00
BowenBao	bbf84c1a9f	Fix dead link and syntax in ONNX landing page Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25126 Differential Revision: D17129237 Pulled By: dzhulgakov fbshipit-source-id: 80fab457387d357ddcfc23710cb4493ce94cab5e	2019-08-29 23:58:34 -07:00
James Reed	0c222555ce	Attempt to fix windows build Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25450 Test Plan: Imported from OSS Differential Revision: D17128958 Pulled By: jamesr66a fbshipit-source-id: 721a939d77f3c848bb728544ce5c4715094c3f91	2019-08-29 23:49:26 -07:00
svcscm	d49f1349e9	Updating submodules Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: ebf75b38d338867af7d01545de29450c0cc70635	2019-08-29 23:46:30 -07:00
Michael Suo	194acd023a	Some alias analysis fixes (#25425 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25425 1. Properly invalidate memory locations when we change the points-to set. 2. Don't build a new indexToElementMap in toString(), just use `MemoryDag::fromIndex` 3. Fix transitive wildcard assignment Test Plan: Imported from OSS Differential Revision: D17126402 Pulled By: suo fbshipit-source-id: cbd99027d2e78fd333dbf030172d3b7ac4df8349	2019-08-29 23:32:07 -07:00
neginraoof	d291935377	Export Unique Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25050 Differential Revision: D17085391 Pulled By: dzhulgakov fbshipit-source-id: a17d54cf634650d3874d02c2bfacd906572ccf5f	2019-08-29 23:27:29 -07:00
Spandan Tiwari	8986b9e38d	Momentum setting in SyncBatchNorm forward (inference) pass. (#24995 ) Summary: This is a fix for a potential ONNX export issue with SyncBatchNorm where irrespective of the value of momentum, the value for momentum in ONNX BN node is always 0. The details are captured in https://github.com/pytorch/pytorch/issues/18525. The fix in this PR for `SyncBatchNorm` is very similar to the fix that went in https://github.com/pytorch/pytorch/pull/18764 for `BatchNorm` (I think this site was just missed). Please note that there are no ONNX test points added for this, because SyncBatchNorm works exclusively with tensors on GPU and the ONNX test passes are CPU only. If there's a way to add a test point, please let me know. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24995 Differential Revision: D17085570 Pulled By: dzhulgakov fbshipit-source-id: 162d428673c269efca4360fb103854b7319ec204	2019-08-29 23:16:46 -07:00
James Reed	17831648dd	Quantized vec256 + vectorized quantized::add Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25202 Test Plan: Imported from OSS Differential Revision: D17061047 Pulled By: jamesr66a fbshipit-source-id: b08a61a9b4a258a4c1b6a97a6da1db05c3a6b0f7	2019-08-29 21:21:12 -07:00
Yanghan Wang	8cd45b4c46	relax roi_width/roi_height check to non-negative Summary: Pull Request resolved: https://github.com/fairinternal/detectron2/pull/260 Test Plan: sandcastle. Reviewed By: ppwwyyxx Differential Revision: D17127067 fbshipit-source-id: ddca51fa0dab1e683f8c3709e105b6cbdf8b78b0	2019-08-29 21:18:40 -07:00
Gregory Chanan	93b653bba3	Attempt to enable CrossMapLRN2d, as it no longer uses Module._backend. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25343 Test Plan: Imported from OSS Differential Revision: D17101574 Pulled By: gchanan fbshipit-source-id: 71d40f5c2a9c94a71abc52e61f6f7be449a2b41a	2019-08-29 20:15:14 -07:00
Haixin Liu	c59540b7b1	Change exception to warning (#25408 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25408 Change exception to warning so that observer can be called with no data and still provide a scale and zero-point. ghstack-source-id: 89267768 Test Plan: buck test mode/dev caffe2/test:quantization -- 'test_minmax_observer' buck test mode/dev caffe2/test:quantization -- 'test_observer_scriptable' Differential Revision: D17116524 fbshipit-source-id: db4d76e882b57f23161dced846df3a0760194a41	2019-08-29 20:12:57 -07:00
Jiakai Liu	86b1d5f271	add speed benchmark binary for torch jit (#25230 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25230 Add similar benchmark binary as caffe2 speed benchmark for torch JIT. Test Plan: - Test with ResNet-18 torch script model: ``` ./speed_benchmark_torch --model=res18.pb --input_dims="1,3,224,224" --input_type=float --print_output=true ./speed_benchmark_torch --model=res18.pb --input_dims="1,3,224,224" --input_type=float --warmup=5 --iter=20 ``` - Verified building as desktop/server binary works: ``` BUILD_BINARY=ON python setup.py develop ``` - Verified building as android binary works: ``` ./scripts/build_android.sh \ -DBUILD_BINARY=ON \ -DBUILD_CAFFE2_MOBILE=OFF \ -DCMAKE_PREFIX_PATH=$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())') \ -DPYTHON_EXECUTABLE=$(python -c 'import sys; print(sys.executable)') ``` Differential Revision: D17067216 Pulled By: ljk53 fbshipit-source-id: 0cf4c46f1c90b89bd8dca2a14bb0e5fb70b233a1	2019-08-29 19:34:44 -07:00
Jiakai Liu	e370486d80	fix binaries build for BUILD_CAFFE2_MOBILE=OFF (#25229 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25229 The binaries don't build when BUILD_CAFFE2_MOBILE=OFF (libtorch mode) in which case we don't include caffe2/predictor which is needed by predictor_verifier.cc. Add BUILD_BINARY=ON to libtorch android CI script to make sure binaries can be compiled for libtorch android as we will add speed benchmark binary for it. Test Plan: - Verified BUILD_BINARY=ON works with BUILD_CAFFE2_MOBILE=OFF and ON. - Will check CI builds. Differential Revision: D17067217 Pulled By: ljk53 fbshipit-source-id: 2a28139d9d25ff738be7b49b24849c9d300ef9a9	2019-08-29 19:34:40 -07:00
Shen Li	1294e55c15	Assign each RpcAgent a unique ID, and use ID for sending RPC messages. (#24195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24195 It is not efficient to use a string destination name in every send. Moreover, when we add RRef later, RpcAgent will frequently check RRef ownership. It will be slow as well if we have to go though string comparison every time. This commit assigns each RpcAgent a unique integer ID. In the Python send API, applications can provide either destination name or id. If it is a string name, it will be converted to id by calling the get_id(workerName) API. Test Plan: Imported from OSS Differential Revision: D16770241 Pulled By: mrshenli fbshipit-source-id: fa56128a77a02a402dc6682474bc301dc1b7f43d	2019-08-29 19:19:11 -07:00
Jerry Zhang	629a2b3615	Remove unnecessary checks in InsertQuantDeQuantImpl (#25370 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25370 Removes some checking code that is copied from insert_observers pass Test Plan: python test/test_jit.py 'TestJit.test_insert_quant_dequant' Imported from OSS Differential Revision: D17106633 fbshipit-source-id: 3c39be89dbf58dc6ffd63e1ee1283eba65243ea6	2019-08-29 18:59:18 -07:00
Jerry Zhang	f495a3abac	Skip inserting observers for Tensors inside fused op (#25281 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25281 We want to skip inserting observers for the Tensors that's between the two ops that will be fused, e.g. Conv -> ReLU, this PR just added this pattern, but new patterns can be easily added in the future. Test Plan: python test test/test_jit.py -- 'TestJit.test_insert_observers_skip_values' Imported from OSS Differential Revision: D17106037 fbshipit-source-id: 49697f4d9598a461edc62a2b4148495764a99574	2019-08-29 18:19:26 -07:00
Jianyu Huang	88a27ebb00	Per Channel Quantization Support for Quantized Linear Operator (#25276 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25276 We add the per channel quantization support for the quantized linear operator, based on the recent added per channel quantization APIs in https://github.com/pytorch/pytorch/pull/24935 and https://github.com/pytorch/pytorch/pull/24934. ghstack-source-id: 89267515 Test Plan: buck test mode/dev caffe2/test:quantized -- 'test_qlinear_unpack $test_quantized\.TestQuantizedLinear$' --print-passing-details ``` [jianyuhuang@devvm6560.prn2.facebook.com: ~/fbsource/fbcode/caffe2/test] $ buck test mode/dev caffe2/test:quantized -- 'test_qlinear_unpack $test_quantized\.TestQuantizedLinear$' --print-passing-details Action graph will be rebuilt because files have been added or removed. Parsing buck files: finished in 1.3 sec Building: finished in 5.7 sec (100%) 8114/8114 jobs, 0 updated Total time: 7.0 sec Trace available for this run at /tmp/testpilot.20190827-141824.842847.log TestPilot test runner for Facebook. See https://fburl.com/testpilot for details. Testpilot build revision c4cde854bae419be71282b0f92bf2d57a9203003 fbpkg f45bf410f1694a6882727cf03961702b at Mon Aug 26 22:10:29 2019 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/686/t.par Discovering tests Running 1 tests Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/5629499540372523 ✓ caffe2/test:quantized - test_qlinear_unpack (test_quantized.TestQuantizedLinear) 0.996 1/1 (passed) Test output: > test_qlinear_unpack (test_quantized.TestQuantizedLinear) ... ok > > ---------------------------------------------------------------------- > Ran 1 test in 0.997s > > OK Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/5629499540372523 Summary (total time 5.05s): PASS: 1 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 0 ``` buck test mode/dev caffe2/test:quantized -- 'test_qlinear $test_quantized\.TestQuantizedLinear$' --print-passing-details ``` [jianyuhuang@devvm6560.prn2.facebook.com: ~/fbsource/fbcode/caffe2/test] $ buck test mode/dev caffe2/test:quantized -- 'test_qlinear $test_quantized\.TestQuantizedLinear$' --print-passing-details Action graph will be rebuilt because files have been added or removed. Parsing buck files: finished in 0.9 sec Building: finished in 6.4 sec (100%) 8114/8114 jobs, 2 updated Total time: 7.3 sec Trace available for this run at /tmp/testpilot.20190827-141631.836596.log TestPilot test runner for Facebook. See https://fburl.com/testpilot for details. Testpilot build revision c4cde854bae419be71282b0f92bf2d57a9203003 fbpkg f45bf410f1694a6882727cf03961702b at Mon Aug 26 22:10:29 2019 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/686/t.par Discovering tests Running 1 tests Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/1125900049005601 ✓ caffe2/test:quantized - test_qlinear (test_quantized.TestQuantizedLinear) 2.893 1/1 (passed) Test output: > test_qlinear (test_quantized.TestQuantizedLinear) ... ok > > ---------------------------------------------------------------------- > Ran 1 test in 2.893s > > OK Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/1125900049005601 Summary (total time 6.78s): PASS: 1 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 0 ``` buck test mode/dev caffe2/test:quantized -- 'test_qlinear $test_quantized\.TestDynamicQuantizedLinear$' --print-passing-details ``` [jianyuhuang@devvm6560.prn2.facebook.com: ~/fbsource/fbcode/caffe2/test] $ buck test mode/dev caffe2/test:quantized -- 'test_qlinear $test_quantized\.TestDynamicQuantizedLinear$' --print-passing-details Action graph will be rebuilt because files have been added or removed. Parsing buck files: finished in 1.7 sec Building: finished in 4.9 sec (100%) 8118/8118 jobs, 2 updated Total time: 6.6 sec Trace available for this run at /tmp/testpilot.20190829-153630.613647.log TestPilot test runner for Facebook. See https://fburl.com/testpilot for details. Testpilot build revision f39465ac7f6b26840c8cbd0ae5e367fb8a60ec24 fbpkg cf4e6efcd2fa4642b6f8c26a9bd98d67 at Tue Aug 27 21:58:47 2019 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/687/t.par Discovering tests Running 1 tests Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/4222124657066806 ✓ caffe2/test:quantized - test_qlinear (test_quantized.TestDynamicQuantizedLinear) 3.377 1/1 (passed) Test output: > test_qlinear (test_quantized.TestDynamicQuantizedLinear) ... ok > > ---------------------------------------------------------------------- > Ran 1 test in 3.378s > > OK Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/4222124657066806 Summary (total time 8.18s): PASS: 1 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 0 ``` Differential Revision: D17057818 fbshipit-source-id: 9ad8b9120fd0d9933ca81c132da61b53e2c91b9e	2019-08-29 17:53:08 -07:00
Supriya Rao	3805be62c1	Skip test_compare_tensor_scalar due to overflow error (#25432 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25432 Test fails without width argument (it was dropped from hypothesis). Temporarily skipping until fixed. ghstack-source-id: 89260995 Test Plan: N/A Differential Revision: D17123571 fbshipit-source-id: 2fc934a005959a300c6a962d8507cf0aaa137be5	2019-08-29 17:47:03 -07:00
Supriya Rao	a9bb68d436	Update QNNPACK submodule to 7d2a4e9 (#25400 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25400 Bring in fixes for clamp operator and tests Test Plan: CI Reviewed By: dreiss Differential Revision: D17100464 fbshipit-source-id: b071a8266dbdef19aa7d58a66c43bfa97d59ce02	2019-08-29 16:37:22 -07:00
Edward Yang	7b4eddede9	Delete toType(const DeprecatedTypeProperties&, ...) (#25332 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25332 This method makes reference to a deprecated class, we now delete it. This deletion was somewhat involved. Pre-existing use sites of toType: - Tensor::cpu()/cuda()/hip() - native::type_as - SummaryOps: toType(CPU(kDouble)) translated into to(kDouble) as weights is an input argument and therefore assumed to be on CPU already. Similar for CUDA. - TensorTransformations: toType(CUDA(kLong)) translated into cuda(), as the inputs are actually already the correct dtype, and this translation is just to move them to CUDA - Adjusted native_test to take TensorOptions instead of DeprecatedTypeProperties, killing toType along the way in favor of to - Some tests for toType with UndefinedType which I just deleted - CopyBackwards stores TensorOptions now instead of DeprecatedTypeProperties ghstack-source-id: 89177526 Test Plan: sandcastle and ossci Differential Revision: D17096824 fbshipit-source-id: 964e5a073b9d37594e911d8bca98c9eab5766826	2019-08-29 16:20:18 -07:00
David Reiss	d704097d33	Add Int8Transpose operator (#16382 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16382 Adding an Int8TransposeOp that inherits from TransposeOp. Small refactoring to normal TransposeOp to move main logic into a TransposeImpl function. Test Plan: int8_test.cc Reviewed By: supriyar Differential Revision: D13822715 fbshipit-source-id: a4d61bdf8e4e1d3f2e30b86d325810ed44c21635	2019-08-29 16:06:25 -07:00
Zafar Takhirov	e44c09ecae	making quant utilities inplace Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25054 Test Plan: Imported from OSS Differential Revision: D16974198 Pulled By: zafartahirov fbshipit-source-id: 54befc8429990adafe746d1255d117fca5f12e11	2019-08-29 16:03:13 -07:00
Gregory Chanan	23fde77d3d	Remove Module._backend as it's not used anymore. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25342 Test Plan: Imported from OSS Differential Revision: D17101571 Pulled By: gchanan fbshipit-source-id: 2cda46fe197e26a1cacb5e912f535809973d306e	2019-08-29 15:43:49 -07:00
Wanchao Liang	f077847a45	Revert D17078081: Invariant typevar matching on callsite checks Test Plan: revert-hammer Differential Revision: D17078081 Original commit changeset: 54476469679a fbshipit-source-id: 88a25dc1877caae9ba967e747cd0ebdfe996a345	2019-08-29 15:04:16 -07:00
Mingbo Wan	247cac263f	Revert D17003555: Multiple fixes to test_c10d.py. Test Plan: revert-hammer Differential Revision: D17003555 Original commit changeset: 0e0429367fb6 fbshipit-source-id: 622b5fc09e5f50dccca9ff295c62999bc8528ead	2019-08-29 14:59:49 -07:00
Shen Li	04764d5751	Fix allreduce_coalesced tests in c10d (#25419 ) Summary: 1. `test_allreduce_coalesced_stress` is flaky. 2. `test_allreduce_coalesced_checks` uses GPU but didn't claim so. cc jfc4050 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25419 Differential Revision: D17119311 Pulled By: mrshenli fbshipit-source-id: f560b126d6bc01363a14bdf6d697ecd55c4db468	2019-08-29 14:46:07 -07:00
Richard Zou	2513ca66ca	Add guards for using named tensor with serialization and multiprocessing (#25345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25345 Test Plan - New tests [namedtensor ci] Test Plan: Imported from OSS Differential Revision: D17101486 Pulled By: zou3519 fbshipit-source-id: 58e803b042056ee6abab8551517f74078f2b81d5	2019-08-29 14:10:33 -07:00
Richard Zou	0bb69f6071	Add guard for named tensors in the JIT (#25344 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25344 Test Plan - [namedtensor ci] Test Plan: Imported from OSS Differential Revision: D17101487 Pulled By: zou3519 fbshipit-source-id: d6170a809dfd98e6a4dba8450433c439962991cc	2019-08-29 14:10:28 -07:00
root	8640aef505	Add support for non-affine batch norm with float stats and half inputs (#22750 ) Summary: This PR creates support for non-affine batch norm with float running estimates and half inputs. Changed were made similar to https://github.com/pytorch/pytorch/issues/16735. I couldn't find a specific test for `SyncBatchNorm`, so I used [this code](https://gist.github.com/ptrblck/ab45bfcde6df55ac28a7be18531f4718) to test it. cc ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/22750 Differential Revision: D17119965 Pulled By: ezyang fbshipit-source-id: 2e8c5d63fc3c636b8a1338c43c9c101a0f5e9b22	2019-08-29 14:04:37 -07:00
davidriazati	fe922a2e84	Fix `item()` call in docs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25404 Pulled By: driazati Differential Revision: D17116098 fbshipit-source-id: e365f254f38a3134898817d75201dd9ae009ecb4	2019-08-29 13:50:04 -07:00
Iurii Zdebskyi	1ea1d7f095	Fixed masking warnings in tests (#25317 ) Summary: Fixing deprecation warnings in tests related to uint8 masking and indexing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25317 Differential Revision: D17099063 Pulled By: izdeby fbshipit-source-id: 49f1d85dcd9464d61e3156eebc07390e9f6fa1b4	2019-08-29 12:13:52 -07:00
James Reed	8dcd256201	Memory layout for pooling ops Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25374 Test Plan: Imported from OSS Differential Revision: D17107577 Pulled By: jamesr66a fbshipit-source-id: e40dacaddf5ee17e6483be9e9302d3afc1a708c7	2019-08-29 11:43:03 -07:00
svcscm	2339d9f19c	Updating submodules Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 520f0f1681258afd1d09118cf679228160bf8d1a	2019-08-29 11:36:36 -07:00
Pritam Damania	4cdce0da71	Multiple fixes to test_c10d.py. (#25334 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25334 1) There was a bug in https://github.com/pytorch/pytorch/pull/25012, where the tests which needed to be skipped for return code checking was incorrect. 2) Added proper setup and teardown for the nccl_error tests. 3) Ensure AssertionError is not ignored for tests that skip return code checking. Test Plan: unit tests Differential Revision: D17003555 fbshipit-source-id: 0e0429367fb6dae251b74e9f8b2baa67a48a0d22	2019-08-29 11:33:55 -07:00
Ivan Kobzarev	0604b45f23	pytorch android circleci integration (#25286 ) Summary: Introducing circleCI jobs for pytorch_android gradle builds, the ultimate goal of it at the moment - to run: ``` gradle assembleRelease -p ~/workspace/android/pytorch_android assembleRelease ``` To assemble android gradle build (aar) we need to have results of libtorch-android shared library with headers for 4 android abis, so pytorch_android_gradle_build requires 4 jobs ``` - pytorch_android_gradle_build: requires: - pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build - pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_64_build - pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v7a_build - pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v8a_build ``` All jobs use the same base docker_image, differentiate them by committing docker images with different android_abi -suffixes (like it is now for xla and namedtensor): (it's in `&pytorch_linux_build_defaults`) ``` if [[ ${BUILD_ENVIRONMENT} == "namedtensor" ]]; then export COMMIT_DOCKER_IMAGE=$output_image-namedtensor elif [[ ${BUILD_ENVIRONMENT} == "xla" ]]; then export COMMIT_DOCKER_IMAGE=$output_image-xla elif [[ ${BUILD_ENVIRONMENT} == "-x86" ]]; then export COMMIT_DOCKER_IMAGE=$output_image-android-x86 elif [[ ${BUILD_ENVIRONMENT} == "-arm-v7a" ]]; then export COMMIT_DOCKER_IMAGE=$output_image-android-arm-v7a elif [[ ${BUILD_ENVIRONMENT} == "-arm-v8a" ]]; then export COMMIT_DOCKER_IMAGE=$output_image-android-arm-v8a elif [[ ${BUILD_ENVIRONMENT} == "-x86_64" ]]; then export COMMIT_DOCKER_IMAGE=$output_image-android-x86_64 else export COMMIT_DOCKER_IMAGE=$output_image fi ``` pytorch_android_gradle_build job copies headers and libtorch.so, libc10.so results from libtorch android docker images, to workspace first and to android_abi=x86 docker image afterwards, to run there final gradle build calling `.circleci/scripts/build_android_gradle.sh` For PR jobs we have only `pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build` libtorch android build => it will have separate gradle build `pytorch_android_gradle_build-x86_32` that does not do docker copying, it calls the same `.circleci/scripts/build_android_gradle.sh` which has only-x86_32 logic by condition on BUILD_ENVIRONMENT: `[[ "${BUILD_ENVIRONMENT}" == -gradle-build-only-x86_32 ]]` And has filtering to un only for PR as for other runs we will have the full build. Filtering checks `-z "${CIRCLE_PULL_REQUEST:-}"` ``` - run: name: filter_run_only_on_pr no_output_timeout: "5m" command: \| echo "CIRCLE_PULL_REQUEST: ${CIRCLE_PULL_REQUEST:-}" if [ -z "${CIRCLE_PULL_REQUEST:-}" ]; then circleci step halt fi ``` Updating docker images to the version with gradle, android_sdk, openjdk - jenkins job with them https://ci.pytorch.org/jenkins/job/pytorch-docker-master/339/ pytorch_android_gradle_build successful run: https://circleci.com/gh/pytorch/pytorch/2604797#artifacts/containers/0 pytorch_android_gradle_build-x86_32 successful run: https://circleci.com/gh/pytorch/pytorch/2608945#artifacts/containers/0 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25286 Reviewed By: kostmo Differential Revision: D17115861 Pulled By: IvanKobzarev fbshipit-source-id: bc88fd38b38ed0d0170d719fffa375772bdea142	2019-08-29 11:29:23 -07:00
ShahriarSS	cad3abb036	Adding ModuleList to modules.h (#25346 ) Summary: Here is a PR adding ```ModuleList``` to ```modules.h``` so that it can be used by including ```torch/torch.h```. yf225 edit: Closes https://github.com/pytorch/pytorch/issues/25293. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25346 Differential Revision: D17115013 Pulled By: yf225 fbshipit-source-id: 38a1848b9a8272fa411865dfc83b76d10c5789a0	2019-08-29 10:49:22 -07:00
Jerry Zhang	e231bd16fb	Revert D17112656: [pytorch][PR] fix bug in assertNotEqual for int tensors Test Plan: revert-hammer Differential Revision: D17112656 Original commit changeset: 43e0e7da6d58 fbshipit-source-id: 0a0f7b8b125f24a45023ddb46fe144f21499b723	2019-08-29 10:36:56 -07:00
Shen Li	8cdad0ab9f	Remove a unused member var (stop_) in process_group_agent Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25392 Differential Revision: D17114129 Pulled By: mrshenli fbshipit-source-id: fe2513f694751a22d47783dd9fafade8ddc3b559	2019-08-29 10:19:40 -07:00
Michael Suo	e59bbc82a0	Upgrade to circleci version 2.1 configs (#25336 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25336 1. Remove versions from workflows 2. Escape heredoc `<<` used in shells 3. Replace "." with "_" in binary job names (we already do the same for other jobs) 4. (Bonus), fix `should_run_job.py` it so that commits with `[ci]` don't accidentally skip all jobs Let's see if it works Pull Request resolved: https://github.com/pytorch/pytorch/pull/25336 Test Plan: Imported from OSS Differential Revision: D17114619 Pulled By: suo fbshipit-source-id: 722606ad862af565cd0ba4bb539daeb9d8f5da71	2019-08-29 10:10:38 -07:00
Gregory Chanan	a8ae33ce27	Move autograd function for CrossMapLRN2d from being backend specific to modules/_functions. (#25339 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25339 This is to get rid of backend-specific dispatch in modules; this autograd function is no longer backend specific so doesn't need to be in a backend specific location. Test Plan: Imported from OSS Differential Revision: D17101576 Pulled By: gchanan fbshipit-source-id: f4f0bd3ecc2d4dbd8cdfedbaabcadb8c603d2507	2019-08-29 09:55:11 -07:00
Gregory Chanan	7a9f37d7af	Kill backend-specific lookup in CrossMapLRN2d, as it never succeeds. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25331 Test Plan: Imported from OSS Differential Revision: D17097300 Pulled By: gchanan fbshipit-source-id: 571e9691da13d34206ff3aabb8cb0cd1e82f6097	2019-08-29 09:55:07 -07:00
Gregory Chanan	66e521edd5	Kill ConvTransposeMixin.forward, which doesn't seem to be used. (#25326 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25326 And also uses self._backend, which I'm trying to kill or at least drastically reduce. Test Plan: Imported from OSS Differential Revision: D17097303 Pulled By: gchanan fbshipit-source-id: f55d7df2a668425978499d4a4338b23ba6cf1b90	2019-08-29 09:55:02 -07:00
Gregory Chanan	2e934c78dd	Remove THNN sparse autograd Functions. (#25323 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25323 They don't seem to be used anymore. Test Plan: Imported from OSS Differential Revision: D17097302 Pulled By: gchanan fbshipit-source-id: dc1133e32586818a9b2e2b7560d259d36c7b36f6	2019-08-29 09:54:58 -07:00
Richard Zou	c2e1cb38fd	Fix dependency by moving Dimname.{h,cpp} NamedTensor.{h,cpp} to core/ (#25280 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25280 `ATen/core/Tensor.h` and `ATen/core/TensorMethods.h` both depend on Dimname.h and NamedTensor.h. Therefore `Dimname.h` and `NamedTensor.h` should really be in `ATen/core`. It's not a problem right now because this dependency chain (core files cannot depend on non-core files) isn't enforced in our OSS builds, but it is necessary to resolve this before removing the BUILD_NAMEDTENSOR flag. Test Plan - [namedtensor ci] Test Plan: Imported from OSS Differential Revision: D17087195 Pulled By: zou3519 fbshipit-source-id: f06e4268d91fabadb04b41d5b78fb0e530f030fd	2019-08-29 09:46:35 -07:00
Sebastian Messmer	cb022d7bec	Fix AliasAnalysisKind::PURE on MSVC (#25375 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25375 Either MSVC or the Windows headers have a PURE macro defined and will replace any occurrences of the PURE token in code with an empty string. Replace AliasAnalysisKind::PURE with AliasAnalysisKind::PURE_FUNCTION. Note: this is bc breaking. ghstack-source-id: 89202222 Test Plan: unit tests Differential Revision: D17107743 fbshipit-source-id: 899a20651ba32d50691956b5424b351586c21cec	2019-08-29 09:42:41 -07:00
Hong Xu	07fe66f25e	logical_xor doc cleanup Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25364 Differential Revision: D17105048 Pulled By: gchanan fbshipit-source-id: 8bef3e330ef00decb3118a5ae7d17308a58878a2	2019-08-29 09:09:16 -07:00
Edward Yang	58a0dee749	Replace open registration TensorTypeId with closed enum. (#25252 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25252 Our model going forward for extensions will be that you will have to get an allocation of an ID in our system. This is how things work in practice today; we're just simplifying our underlying registration since there is no need to have distributed registration. There are some codemods in this diff: ``` codemod --extensions cpp,h,cc,cuh,py,in --exclude-paths=c10/core/TensorTypeId.h '([A-Za-z]+?)TensorId' 'TensorTypeId::\1TensorId' codemod --extensions cpp,h,cc,cuh,py,in 'TensorTypeIds::undefined' 'TensorTypeId::UndefinedTensorId' codemod --extensions cpp 'TensorType1' 'TensorTypeId::CPUTensorId' codemod --extensions cpp 'TensorType2' 'TensorTypeId::CUDATensorId' codemod --extensions cpp 'TensorType3' 'TensorTypeId::XLATensorId' codemod --extensions cpp 'TensorType1' 'CPUTensorId' codemod --extensions cpp 'TensorType2' 'CUDATensorId' codemod --extensions cpp 'TensorType3' 'XLATensorId' ``` The main hand-written changes are in c10/core/TensorTypeId.h Other manual fixes: - aten/src/ATen/core/op_registration/op_registration.cpp - stop using std::string operator+ - aten/src/ATen/function_wrapper.py - handle a hardcoded TypeId() that wasn't caught by codemod - torch/csrc/tensor/python_tensor.h - fix now incorrect forward declaration of TensorTypeId - aten/src/ATen/core/op_registration/ - remove out-of-line registration Differential Revision: D17072001 Test Plan: ossci and sandcastle Pulled By: ezyang fbshipit-source-id: c641515fd0604c045c54fbb1d6b1b950f45e89d1	2019-08-29 08:55:58 -07:00
Hong Xu	2e1c37c95c	Move the CUDA implementation of ceil to ATen. (#24866 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24866 Fix #24542 Test Plan: Imported from OSS Differential Revision: D16965903 Pulled By: VitalyFedyunin fbshipit-source-id: b9decaa58bec813a23d369b5e1eec627599f41da	2019-08-29 08:48:31 -07:00
Shen Li	1f21c422e4	Add missing call to DistAutogradContainer::init (#25391 ) Summary: cc pritamdamania87 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25391 Differential Revision: D17113014 Pulled By: mrshenli fbshipit-source-id: 2de37b832be1dd7a68ecd3576d93d72a960648b5	2019-08-29 08:44:52 -07:00
Zayd Hammoudeh	c84dfa8fa3	Issue #24962 : Fix cuda method to support "None" arg for device and a … (#25018 ) Summary: …default value Addresses https://github.com/pytorch/pytorch/issues/24962. A valid (and the default) value for the `device` parameter in the `cuda` method is `None`. Type signature was returning invalid linter errors in PyCharm. Verified fix in latest PyCharm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25018 Differential Revision: D17098520 Pulled By: VitalyFedyunin fbshipit-source-id: d83eb9976f09c75b4a033cb49c81d972e3fd37c1	2019-08-29 08:16:58 -07:00
Gregory Chanan	2e3a37e630	Kill THNN function auto generation. (#25322 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25322 As far as I can tell, none of these are actually used anymore. Test Plan: Imported from OSS Differential Revision: D17097301 Pulled By: gchanan fbshipit-source-id: 649ee0fd549f6e2a875faef7c32b19c70bb969b6	2019-08-29 07:54:07 -07:00
Pearu Peterson	8145dd35ef	Describe the relation between fold and unfold operations. (#24840 ) Summary: Resolves https://github.com/pytorch/pytorch/issues/21817. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24840 Differential Revision: D17113060 Pulled By: ezyang fbshipit-source-id: 1f1a010d84582a943de7b1173c09e91fb0bd22ce	2019-08-29 07:48:25 -07:00
Stefan Krah	c845984271	CUDA_KERNEL_LOOP: prevent int overflow in loop increment. (#24818 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24309. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24818 Differential Revision: D16927215 Pulled By: ezyang fbshipit-source-id: aeab5226fec6045941399693479975db4542c79e	2019-08-29 07:38:55 -07:00
peter	6d66902a81	Re-enable libtorch tests on Windows Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25377 Differential Revision: D17113007 Pulled By: ezyang fbshipit-source-id: 5442fe41b971cd7f63244e503103c64ef2c2d816	2019-08-29 07:35:24 -07:00
Brian Vaughan	1e2b19db6d	fix bug in assertNotEqual for int tensors (#25199 ) Summary: assertNotEqual was failing to detect differences in int tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25199 Differential Revision: D17112656 Pulled By: nairbv fbshipit-source-id: 43e0e7da6d58eb1c837a508d462a748b2065bdd9	2019-08-29 07:32:50 -07:00
Zafar Takhirov	e8acc2ebb1	Removing future imports from the test fixtures. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25296 Test Plan: Imported from OSS Differential Revision: D17090201 Pulled By: zafartahirov fbshipit-source-id: 5a4f6ac0ea475b55d2c610e2f9f4f0cef8690e8f	2019-08-29 01:39:59 -07:00
davidriazati	07db41bb07	Remove spurious print Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25378 Pulled By: driazati Differential Revision: D17109684 fbshipit-source-id: 0d437b81c5d765427d129eeb217ea2a951c426d3	2019-08-29 00:49:22 -07:00
Jerry Zhang	b7d992eb46	Integration tests for qconfig_dict (#25217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25217 att Test Plan: python test/test_quantizer.py Imported from OSS Differential Revision: D17065129 fbshipit-source-id: 05c5a31d5768a46521bbdbac4df79d40fe06f8fc	2019-08-28 22:20:17 -07:00
Mikhail Zolotukhin	910d2f18fc	Implement FoldConvBatchnorm2d pass. (#25282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25282 For now it will be used in quantization, but it can be used as a standalone pass too. Couple of things are not finished at this moment: - Batchnorm.eps value is hardcoded. This is bad and wrong, but we cannot access fields listed in __constants__ from IR now. Once we fix this, we should remove the hardcoded value. - We do not remove Batchnorm submodules from the parent module even when they were merged into a Conv. Once we figure out API for removing attributes and modules, we should fix this. Test Plan: Imported from OSS Differential Revision: D17086611 Pulled By: ZolotukhinM fbshipit-source-id: d58a947a3b2205d8f3629d693b70b9ad2b5a9102	2019-08-28 21:56:05 -07:00
Jerry Zhang	96db3ad413	insert_quant_dequant work with qconfig_dict (#25127 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25127 Extend insert_quant_dequant pass to go through forward function graphs Test Plan: ``` python test/test_jit.py 'TestJit.test_insert_quant_dequant' python test/test_quantizer.py ``` Imported from OSS Differential Revision: D17001137 fbshipit-source-id: 41b029906fe5c8bc0de01956059388a7d552a380	2019-08-28 21:43:29 -07:00
Jerry Zhang	11b4d57711	insert_observers use qconfig_dict (#25069 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25069 This PR changes the API of insert_observers to use qconfig_dict, full functionality support will come in later PRs Test Plan: ``` python test/test_quantizer.py python test/test_jit.py ``` Imported from OSS Differential Revision: D17001135 fbshipit-source-id: 16df6fa521fcc0c9e268a375be8e1a630e77011a	2019-08-28 21:07:31 -07:00
davidriazati	efe808b326	Fix old annotate() error (#25261 ) Summary: Fixes #25067 ](https://our.intern.facebook.com/intern/diff/17103889/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/25261 Pulled By: driazati Differential Revision: D17103889 fbshipit-source-id: bd94cb36cf4829e63ad39ae169047b9b9e857679	2019-08-28 20:50:24 -07:00
Martin Yuan	490eb7fed9	Add GET_ATTR instruction (#25151 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25151 The prim::GetAttr operator depends on node. However, in lite interpreter there will be no node dependency. Promote the operator to a first-class instruction. Test Plan: Imported from OSS Differential Revision: D17076412 fbshipit-source-id: 8de20978445bb598634c5462e66e4459dcd567be	2019-08-28 20:45:55 -07:00
Martin Yuan	5dd01a7eea	Pull instruction definitions out of interpreter.cpp. (#25148 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25148 Instructions will be used in lite interpreter as well. Pull it out of interpreter.cpp, so that the lite interpreter doesn't have to compile with interpreter.cpp. Test Plan: Imported from OSS Differential Revision: D17076413 fbshipit-source-id: 99b3d8d27a96823a4a4dde6b2337ee44635e34cb	2019-08-28 20:17:36 -07:00
James Reed	8456c96967	Make quantized relu ops inherit the memory format from input Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25271 Test Plan: Imported from OSS Differential Revision: D17107576 Pulled By: jamesr66a fbshipit-source-id: 43cdf5d9a9321113cbb28f365a761b0bdd390926	2019-08-28 19:58:25 -07:00
James Reed	f88f9e1331	Ensure quantized::add stride matches inputs (#25265 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25265 This ensures that the output strides match the input strides. Previously, we would degenerate down to slow scalar code because the call to _empty_affine_quantize would produce a tensor with different strides than the operands. When this mismatch occurs, TensorIterator uses the scalar code. This fixes that Benchmark script: ``` import torch, time x = torch.rand(1, 56, 56, 256) y = torch.rand(1, 56, 56, 256) qX = torch.quantize_linear(x, 0.1, 128, torch.quint8) qY = torch.quantize_linear(y, 0.1, 128, torch.quint8) s = time.time() for i in range(1000): x + y print('float contig', time.time() - s) s = time.time() for i in range(1000): torch.ops.quantized.add(qX, qY, 0.5, 1) print('quantized contig', time.time() - s) x = torch.rand(1, 56, 56, 256) y = torch.rand(1, 56, 56, 256) qX = torch.quantize_linear(x, 0.1, 128, torch.quint8).permute([0, 3, 1, 2]) qY = torch.quantize_linear(y, 0.1, 128, torch.quint8).permute([0, 3, 1, 2]) x = x.permute([0, 3, 1, 2]) y = y.permute([0, 3, 1, 2]) s = time.time() for i in range(1000): x + y print('float strided', time.time() - s) s = time.time() for i in range(1000): torch.ops.quantized.add(qX, qY, 0.5, 1) print('quantized strided', time.time() - s) ``` Before this change ``` $ OMP_NUM_THREADS=1 python cmp.py float contig 0.4625673294067383 quantized contig 1.8083674907684326 float strided 0.46366071701049805 quantized strided 8.30056643486023 ``` After this change ``` $ OMP_NUM_THREADS=1 python cmp.py float contig 0.48703694343566895 quantized contig 2.0587124824523926 float strided 0.4711723327636719 quantized strided 2.0382332801818848 ``` Test Plan: Imported from OSS Differential Revision: D17077811 Pulled By: jamesr66a fbshipit-source-id: 25f52743081162122dfc9eb4bc39185d4cc4ba3b	2019-08-28 19:58:21 -07:00
Michael Suo	fa902c58ee	fix inliner bug (#25052 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25052 Previously we would not inline nested functions, now we do. Test Plan: Imported from OSS Differential Revision: D16973848 Pulled By: suo fbshipit-source-id: 94aa0b6f84a2577a663f4e219f930d2c6396d585	2019-08-28 19:45:47 -07:00
Zafar Takhirov	18c77dd243	Quantized comparators (#24387 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24387 Pull Request resolved: https://github.com/pytorch/pytorch/pull/24387 Differential Revision: D16824421 Test Plan: Imported from OSS Pulled By: zafartahirov fbshipit-source-id: aae5495fd2d50095c9bac6424d77343e3d09876f	2019-08-28 19:22:08 -07:00
Pritam Damania	7818e7e5d4	Basic framework for Distributed Autograd context. (#24875 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24875 As per https://github.com/pytorch/pytorch/issues/23110, each autograd pass would be assigned a unique autograd_context_id. In this change we introduce a DistAutogradContainer per worker which holds information for each autograd pass currently running. DistAutogradContainer has a map from the autograd_context_id to DistAutogradContext (which holds all the relevant information for the autograd pass). DistAutogradContext currently only stores the autograd_context_id and more information would be added to it later as we build out the rest of the framework. The autograd_context_id is a 64 bit globally unique integer where the first 16 bits are the worker_id and next 48 bits are auto-incrementing for uniqueness. Sample python code on how this would be used for distributed autograd: ``` import torch.distributed.autograd as dist_autograd worker_id = 0 dist_autograd.init(worker_id) with dist_autograd.context() as context_id: # forward pass... # backward pass... # optimizer step... ``` ghstack-source-id: 89119248 Test Plan: unit tests. Differential Revision: D16356694 fbshipit-source-id: d1a8678da0c2af611758dbb5d624d554212330ce	2019-08-28 18:51:56 -07:00
davidriazati	8e189a327c	Fix lint Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25371 Pulled By: driazati Differential Revision: D17106672 fbshipit-source-id: eab87a22798da40dd10487dc2f4b1528bd1f703e	2019-08-28 18:25:19 -07:00
Basil Hosmer	91db62a8bb	Invariant typevar matching on callsite checks (#25136 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25136 Previously we were calling unifyType to match typevars at callsites. unifyType actually does merging (e.g. to handle control flow joins) so its effect at callsites was bivariance, allowing typevar bindings to widen as new concrete types were encountered in arguments. Fixes issue #24856 Strip refinements when doing invariant matching on type vars. Previous change (bivariance to invariance) makes type matching sensitive to the addition of type refinements. Use unshapedType to avoid considering refinements when doing matching. Test Plan: Imported from OSS Differential Revision: D17078081 Pulled By: bhosmer fbshipit-source-id: 54476469679af698cfe9bd020a39de31271f52cc	2019-08-28 17:37:09 -07:00
davidriazati	43c4b9f2a5	Add source location to class instantiation error (#24990 ) Summary: Fixes #24987 Pull Request resolved: https://github.com/pytorch/pytorch/pull/24990 Pulled By: driazati Differential Revision: D17099779 fbshipit-source-id: 296e2b4ccc3fddabd4998497d0753e99680ba92d	2019-08-28 17:14:00 -07:00
Vincent Quenneville-Belair	05f1fed693	Add OneCycleLR (#25324 ) Summary: Squash rebase of https://github.com/pytorch/pytorch/issues/21258 ghstack-source-id: 7d3ce522ac4dd3050bc6c6bbda1eaaeb8bc4b2c1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25324 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25325 Differential Revision: D17095722 Pulled By: vincentqb fbshipit-source-id: 7fe69b210924ee3b39223dd78122aea61267234a	2019-08-28 16:59:40 -07:00
lili	1b7f7aa12a	change LBFGS's default tolerance_grad to 1e-7 (#25240 ) Summary: Hi, I noticed after v1.2.0 the implement of LBFGS optimizer has been changed. In this new implement, the return condition has been changed from the sum of the gradients to the max value in the gradients (see: `b15d91490a/torch/optim/lbfgs.py (L313)`). But the default tolerance_grad parameter has not been changed (which is too large for max of gradients), so this result in lots of my old codes not optimizing or only optimizing for one or two steps. So, I came up this pull request to suggest that changing this tolerance_grad to a smaller value Pull Request resolved: https://github.com/pytorch/pytorch/pull/25240 Differential Revision: D17102713 Pulled By: vincentqb fbshipit-source-id: d46acacdca1c319c1db669f75da3405a7db4a7cb	2019-08-28 16:46:04 -07:00
Gregory Chanan	f362a5a04b	Revert "Let logical_xor support non-bool tensors." (#25269 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25269 This reverts commit 5ca612b55ec1205f98e6bc5d5e64b1bf35f3b3cd. Test Plan: Imported from OSS Differential Revision: D17080088 fbshipit-source-id: e6b6215b713910c448e9a6b831b08f28b849c64a	2019-08-28 15:41:51 -07:00
Elias Ellison	44bd63c7a1	don't throw in constant prop (#25270 ) Summary: Don't throw in constant propagation, since the op we're running may not be reached. Previously we would only only catch `C10::Error`; however it's hard to maintain that the entire codebase doesn't throw any other types of errors, and some errors map nicely to python errors, like `std::index_error` to IndexError. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25270 Differential Revision: D17102545 Pulled By: eellison fbshipit-source-id: 9fd485821743ad882e5c6fc912ca47b0b001b0e9	2019-08-28 15:34:01 -07:00
Stefan Krah	5dd915cd1a	@albanD's #15219 augmented with SavedVariable::weak_grad_fn_ (#23502 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/10532 via https://github.com/pytorch/pytorch/issues/15219 and lots of analysis by albanD. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23502 Differential Revision: D16881340 Pulled By: ezyang fbshipit-source-id: b483fe6c89ed9d27674c3347c043fe509ba80007	2019-08-28 14:58:53 -07:00
Alexander Melnikov	eb2c5930b2	contrib-tensorboard: removed external tensorboardX dependency (#25259 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25259 Switching to tensorboard instead of tensorflow Test Plan: went through instructions in [fbsource/fbcode/caffe2/caffe2/contrib/tensorboard/tensorboard.md] to make sure everything is working (using/not using tensorboard/tensorflow) Reviewed By: orionr Differential Revision: D17059111 fbshipit-source-id: aaa26dec840fb517b3bc7dc988f3a8c54566d356	2019-08-28 13:55:25 -07:00
Richard Zou	df51cbe397	Include the correct header for make_unique in named tensor headers (#25178 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25178 Previously, we were using torch/csrc/utils/memory.h. This switches those headers to be c10/util/C++17.h. Context: ATen and torch are the same library now, so one can call code in torch from ATen. However, I haven't seen an example of that yet (aside from the named tensor code that uses make_unique from torch). In this PR I try to maintain the ATen / torch separation just in case it matters. Test Plan - Check that code compiles [namedtensor ci] Test Plan: Imported from OSS Differential Revision: D17051453 Pulled By: zou3519 fbshipit-source-id: 44b6393a748bdb1e671ecb1e9a615c33202e8515	2019-08-28 13:51:08 -07:00
Richard Zou	6f5fe96c80	Implement name inference for torch.matmul (#25177 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25177 Test Plan - new tests [namedtensor ci] Test Plan: Imported from OSS Differential Revision: D17051452 Pulled By: zou3519 fbshipit-source-id: 7259cdb7ba7f480035528cf3c60ef6d051e42db5	2019-08-28 13:51:04 -07:00
Richard Zou	d2719b549d	Implement name inference for torch.bmm (#25123 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25123 The approach is different for CPU and CUDA. In particular: - in CPU, I added a name inference rule to bmm_out - in CUDA, bmm calls THCTensor_(baddbmm) so I added a name inference rule to that. When one calls baddbmm on CPU or CUDA, it'll error out with NYI due to named_guard: True on it in native_functions.yaml. I'm not planning on implementing baddbmm soon because it's a little tricky to add it to CPU and bmm is more commonly used function. Test Plan - new tests [namedtensor ci] Test Plan: Imported from OSS Differential Revision: D16998073 Pulled By: zou3519 fbshipit-source-id: 8dc01898964318717911f28eebd6cdfffc7dfcf2	2019-08-28 13:51:00 -07:00
SsnL	eb756746ab	Fix possible deadlock in SharedCache inside a forked child proc (#25158 ) Summary: Related: https://github.com/pytorch/pytorch/issues/24927#issuecomment-524608021 `fork` inherits lock state. So if we happen to unfortunately fork when the `SharedCache` lock is held. We could deadlock in the child process when some code tries to acquire it. Following pytorch multiprocessing library design, this patch resets the lock to a new object after fork. A similar example from python core lib for `multiprocessing.Queue` is : ```py class Queue(object): def __init__(self, ...): ... self._after_fork() if sys.platform != 'win32': register_after_fork(self, Queue._after_fork) def _after_fork(self): debug('Queue._after_fork()') self._notempty = threading.Condition(threading.Lock()) self._buffer = collections.deque() self._thread = None self._jointhread = None self._joincancelled = False self._closed = False self._close = None self._send_bytes = self._writer.send_bytes self._recv_bytes = self._reader.recv_bytes self._poll = self._reader.poll ``` `d4d60134b2/Lib/multiprocessing/queues.py (L54-L78)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/25158 Differential Revision: D17091227 Pulled By: soumith fbshipit-source-id: ee7130f47d7bbd42fc34a2598f1f6974d8d7cdb7	2019-08-28 13:34:03 -07:00
Zafar Takhirov	805cf983b9	Fixes test_equal Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25275 Test Plan: Imported from OSS Differential Revision: D17083204 Pulled By: zafartahirov fbshipit-source-id: ecad9761dbf6cb27ae570485ee00eb8bffef60f5	2019-08-28 13:18:24 -07:00
Haixin Liu	06757acb30	Refactor MinMax observer (#23902 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23902 Copied from Daya's diff in pytorch/pytorch #23191 Refactor MinMax observer and create the base observer class to prepare for future observers such as histogram observer. ghstack-source-id: 89146014 Test Plan: Added a test test_minmax_observer buck test mode/dev caffe2/test:quantization -- 'test_minmax_observer' ``` Running 1 tests Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/2533274797931635 ✓ caffe2/test:quantization - test_minmax_observer (test_quantization.ObserverTest) 0.055 1/1 (passed) Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/2533274797931635 Summary (total time 4.26s): PASS: 1 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 0 ``` buck test mode/dev caffe2/test:quantization -- 'test_observer_scriptable' ``` Running 1 tests Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/5348024563344195 ✓ caffe2/test:quantization - test_observer_scriptable (test_quantization.ObserverTest) 1.762 1/1 (passed) Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/5348024563344195 Summary (total time 6.02s): PASS: 1 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 0 ``` Differential Revision: D16663221 fbshipit-source-id: 3d0e1aa9e4d27808e61b10604782606de067a34a	2019-08-28 13:12:38 -07:00
Richard Zou	e335cc3a95	Fix named tensor test (#25313 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25313 `sign` was recently ported from TH to ATen, undoing some named tensor changes and breaking the CI named tensor test. This PR re-enables named tensor for `sign`. Test Plan - [namedtensor ci] Test Plan: Imported from OSS Differential Revision: D17093439 Pulled By: zou3519 fbshipit-source-id: 11185ad88a0eaf56078b94e9547bbbd6d02d0aab	2019-08-28 12:53:06 -07:00
Meteorix	0cc92de447	Extend nn.Transformer to support BERT (gelu) (#24181 ) Summary: To use transformer for BERT, we need `gelu` activation. https://github.com/pytorch/pytorch/issues/24177 Pull Request resolved: https://github.com/pytorch/pytorch/pull/24181 Differential Revision: D16790327 Pulled By: zhangguanheng66 fbshipit-source-id: b4eed21ad1a4d753bb090fa7fd78886714a6d761	2019-08-28 12:39:47 -07:00
SsnL	6100de9b1b	implement bool_tensor.bernoulli_ (#25076 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/25072 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25076 Differential Revision: D17073453 Pulled By: ezyang fbshipit-source-id: 42410da8c9911c1d7b3543bde740c7e66ae0cc1c	2019-08-28 12:25:27 -07:00
Will Feng	5248dd1a51	Use C10_DEPRECATED_MESSAGE instead of TORCH_WARN_ONCE for Tensor.data<T>() (#25319 ) Summary: Using `TORCH_WARN_ONCE` for `Tensor.data<T>()` is still causing deadlocks internally. According to Dima: "So the problem seems to be in TORCH_WARN/c10::Warning::warn which produces a warning - we setup a wrapper that sends the message back to python land. But doing so requires acquiring GIL and it somehow deadlocks. In general using TORCH_WARN in so low-level API is dangerous as there's no guarantee whether we're running under GIL or not." In order to avoid causing accidental deadlocks in other code including external extensions, the use of `TORCH_WARN_ONCE` in `Tensor.data<T>()` is changed to `C10_DEPRECATED_MESSAGE` in this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25319 Reviewed By: dzhulgakov Differential Revision: D17094933 Pulled By: yf225 fbshipit-source-id: e29dc35187f73ca7865cfa5a9ecde708cd237c58	2019-08-28 12:25:23 -07:00
Will Feng	80974dde4c	Move new_criterion_tests from test_nn.py to common_nn.py (#25333 ) Summary: Moving so that `new_criterion_tests` can be used from `test_cpp_api_parity.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25333 Differential Revision: D17097188 Pulled By: yf225 fbshipit-source-id: 7f7905cc6799bca8dc6b3c9cc43995313c6bc058	2019-08-28 12:22:15 -07:00
Vitaly Fedyunin	d0a525b592	Remove unused THTensor_(add) and similar functions code. (#24864 ) Summary: Remove unused: TH_API void THTensor_(add)(THTensor r_, THTensor t, scalar_t value); TH_API void THTensor_(sub)(THTensor r_, THTensor t, scalar_t value); TH_API void THTensor_(add_scaled)(THTensor r_, THTensor t, scalar_t value, scalar_t alpha); TH_API void THTensor_(sub_scaled)(THTensor r_, THTensor t, scalar_t value, scalar_t alpha); THC_API void THCTensor_(add)(THCState state, THCTensor self, THCTensor src, scalar_t value); THC_API void THCTensor_(sub)(THCState state, THCTensor self, THCTensor src, scalar_t value); THC_API void THCTensor_(add_scaled)(THCState state, THCTensor self, THCTensor src, scalar_t THC_API void THCTensor_(sub_scaled)(THCState state, THCTensor self, THCTensor src, scalar_t Pull Request resolved: https://github.com/pytorch/pytorch/pull/24864 Differential Revision: D16916608 Pulled By: VitalyFedyunin fbshipit-source-id: d6182638bde196031e2cdbee898b117e08634ddd	2019-08-28 12:05:32 -07:00
Gao, Xiang	509abd9a81	Fix typo "takes takes" -> "takes" Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24785 Differential Revision: D16881150 Pulled By: ezyang fbshipit-source-id: ffdd9b8df4c8de84f8bba33bfc4d7aba114022ce	2019-08-28 11:38:38 -07:00
Edward Yang	53ac931af2	Disable cuda_distributions_test and converter_nomigraph_test on Windows. (#25305 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25305 See https://github.com/pytorch/pytorch/issues/25304 and https://github.com/pytorch/pytorch/issues/25312 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D17092777 Test Plan: Imported from OSS Pulled By: ezyang fbshipit-source-id: 5c02a7ca6ead62bed214bc0bdcc0398f3aff2484	2019-08-28 11:06:10 -07:00
jfc4050	590619ab8c	Support all_reduce a list of same-device tensors #21640 (#24949 ) Summary: addresses https://github.com/pytorch/pytorch/issues/21640 for CPU tensors and the Gloo backend. Questions: - ~~currently takes `AllreduceOptions`, since all of the options are the same. Would it be better to make a new `AllreduceCoalescedOptions` class?~~ - ~~I decided to inherit from `ProcessGroupGloo::AsyncWork` instead of `AsyncAllreduceWork` to shorten the inheritance chain a bit and for consistency with existing classes. However, this means that the two `getFunction` methods are copy-pasted. Would inheriting from `AsyncAllreduceWork` be preferable?~~ - ~~should the work class be named `AsyncCoalescedAllreduceWork` or `AsyncAllreduceCoalescedWork`?~~ thank you! Pull Request resolved: https://github.com/pytorch/pytorch/pull/24949 Differential Revision: D17055580 Pulled By: mrshenli fbshipit-source-id: e63b5fcaec6021053ea960776a09ee8cf11d1ec2	2019-08-28 10:57:37 -07:00
Igor Fedan	afb7a162fb	Migrate erfinv and erfinv_ from the TH to Aten (CUDA) (#24943 ) Summary: https://github.com/pytorch/pytorch/issues/24560 Pull Request resolved: https://github.com/pytorch/pytorch/pull/24943 Differential Revision: D16996434 Pulled By: ifedan fbshipit-source-id: 77111a4e47bb2b20f65225d48e7213cd77ddae19	2019-08-28 09:30:08 -07:00
Pavel Belevich	112f249446	Port `pow` operator from the TH code to Aten (#23492 ) Summary: Fixing https://github.com/pytorch/pytorch/issues/24750 ``` DEBUG = 0 OMP_NUM_THREADS = 1 import torch base = torch.randn(1000000) exp = torch.randn(1000000) out = torch.empty_like(base) timeit base.pow(0) +30x old 6.26 ms ± 35.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) new 213 µs ± 3.38 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) timeit base.pow(1/3) +6x old 56 ms ± 911 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) new 9.41 ms ± 237 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) timeit base.pow(-1/3) +6x old 57 ms ± 1.65 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) new 9.49 ms ± 293 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) timeit base.pow(1/2) +6x old 4.04 ms ± 14.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) new 620 µs ± 3.35 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) timeit base.pow(-1/2) +5x old 6.56 ms ± 43 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) new 1.24 ms ± 19.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) timeit base.pow(1) no diff old 322 µs ± 4.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) new 331 µs ± 7.26 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) timeit base.pow(-1) +3.5x old 2.48 ms ± 15.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) new 717 µs ± 130 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) timeit base.pow(2) no diff old 328 µs ± 7.42 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) new 324 µs ± 4.93 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) timeit base.pow(-2) +3.5x old 2.45 ms ± 11.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) new 662 µs ± 3.83 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) timeit base.pow(3) +7x old 2.39 ms ± 60.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) new 334 µs ± 7.26 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) timeit base.pow(-3) +9x old 93.7 ms ± 5.27 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) new 10.3 ms ± 666 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) timeit base.pow(123456.789) +5x old 46.5 ms ± 418 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) new 9.68 ms ± 325 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) timeit base.pow(-123456.789) +5x old 46.5 ms ± 784 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) new 10 ms ± 541 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) timeit base.pow(exp) +6x old 60.6 ms ± 4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) new 9.7 ms ± 379 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) timeit torch.pow(0, exp) no diff old 18.3 ms ± 859 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) new 21.2 ms ± 333 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) timeit torch.pow(1, exp) +30x old 6.01 ms ± 81.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) new 203 µs ± 1.08 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) timeit torch.pow(-1, exp) +3x old 30.8 ms ± 5.51 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) new 9.67 ms ± 441 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) timeit torch.pow(42, exp) +8x old 80.1 ms ± 1.57 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) new 9.51 ms ± 103 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) timeit torch.pow(-42, exp) +2x old 21.8 ms ± 4.37 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) new 9.5 ms ± 89.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) timeit torch.pow(0, exp, out=out) no diff old 20.2 ms ± 3.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) new 22.1 ms ± 648 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) timeit torch.pow(1, exp, out=out) +30x old 6.7 ms ± 397 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) new 203 µs ± 4.64 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) timeit torch.pow(-1, exp, out=out) +3x old 32.5 ms ± 3.61 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) new 9.4 ms ± 99.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) timeit torch.pow(42, exp, out=out) +10x old 91 ms ± 7.45 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) new 9.64 ms ± 291 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) timeit torch.pow(-42, exp, out=out) +2.5x old 25.9 ms ± 5.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) new 10.1 ms ± 698 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` BC: enforce stronger shape requirements on the output tensor (out= keyword argument) and do not allow output tensor to be resized if it is also used as one of the inputs. BC: enforce stronger integer tensor base power integer exponent requirement on CPU and CUDA: `Integers to negative integer powers are not allowed.` Pull Request resolved: https://github.com/pytorch/pytorch/pull/23492 Differential Revision: D16731583 Pulled By: pbelevich fbshipit-source-id: 4e5bf689357fe82a19371e42d48abbb7b4c1c3ca	2019-08-28 09:11:50 -07:00
vainaijr	d7cce32303	note location Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25311 Differential Revision: D17093302 Pulled By: soumith fbshipit-source-id: 14510351cf3f1568cfc415488eb0ba05a8af6cf8	2019-08-28 08:55:00 -07:00
Igor Fedan	9b1097958e	Migrate digamma\digamma_\polygamma\polygamma_ from the TH to Aten (CPU) (#25048 ) Summary: https://github.com/pytorch/pytorch/issues/24612 https://github.com/pytorch/pytorch/issues/24550 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25048 Differential Revision: D16996440 Pulled By: ifedan fbshipit-source-id: 0d76588d179d4c932e3fc284cb399dcfc77bc622	2019-08-28 08:29:13 -07:00
Edward Yang	529bb859b2	Revert D17052534: [pytorch][PR] Creates Torch-friendly Event class and adds Stream tracking to autograd Test Plan: revert-hammer Differential Revision: D17052534 Original commit changeset: d91b308ad0f7 fbshipit-source-id: dacc7e70a835a8fa6ae71246999b4eff3383f3f3	2019-08-28 08:24:43 -07:00
Igor Fedan	e123e24e7e	Implementation of cpu_serial_kernel for TensorIterator (#25125 ) Summary: https://github.com/pytorch/pytorch/issues/24472 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25125 Differential Revision: D17073868 Pulled By: ifedan fbshipit-source-id: 9df1a64669c97854ba436aafef59e79c22a21a7f	2019-08-28 08:24:40 -07:00
Patrick Donnelly	883628cb5c	Added documentation for nn.functional.bilinear (#24951 ) Summary: Adds documentation for `nn.functional.bilinear`, as requested in https://github.com/pytorch/pytorch/issues/9886. The format follows that of `nn.functional.linear`, and borrows from `nn.bilinear` in its description of `Tensor` shapes. I am happy to add more extensive documentation (e.g. "Args," "Example(s)"). From what I gather, the format of comments is inconsistent across functions in `nn.functional.py` and between modules (e.g. `nn.functional` and `nn`). It's my first PR, so guidance for contributing documentation and other code would be greatly appreciated! Pull Request resolved: https://github.com/pytorch/pytorch/pull/24951 Differential Revision: D17091261 Pulled By: soumith fbshipit-source-id: efe2ad764700dfd6f30eedc03de4e1cd0d10ac72	2019-08-28 08:19:25 -07:00
Gregory Chanan	fe541aab5f	Align AT_FORALL macros with DISPATCH macros wrt Half. (#25268 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25268 The AT_FORALL AND macros with mistakenly already include Half, which differs from the Dispatch macros. This change shouldn't have any effect. Test Plan: Imported from OSS Differential Revision: D17079747 Pulled By: gchanan fbshipit-source-id: 635eb167722ce850d6c1949fac652de4dddf32ee	2019-08-28 08:15:40 -07:00
SsnL	6c9410ffd1	Fix infer np scalar dtype mem leak (#24267 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24200 . I'm a bit worried that the test might be flaky... Pull Request resolved: https://github.com/pytorch/pytorch/pull/24267 Differential Revision: D17079762 Pulled By: gchanan fbshipit-source-id: a120688b9583ca4b74bdfb295914298f22540ffd	2019-08-28 07:51:54 -07:00
Hong Xu	dfa48f9942	Disable the copy constructor and = operator of DispatchStub (#24932 ) Summary: They are not supposed to be copied. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24932 Differential Revision: D16940997 Pulled By: gchanan fbshipit-source-id: 6f16211ec57f8db6baec86e17288c8050c89cab5	2019-08-28 07:49:08 -07:00
Gregory Chanan	45943bd611	Remove some unused plugins. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25201 Test Plan: Imported from OSS Differential Revision: D17060444 Pulled By: gchanan fbshipit-source-id: 94722533fecc6d4eb11940eaf4f71aeea41502fb	2019-08-28 07:43:52 -07:00
Mirwaisse Djanbaz	687aa781df	Fix typo Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25238 Differential Revision: D17076308 Pulled By: mrshenli fbshipit-source-id: 2827150be1d15af63088db21051ab0e3476992e6	2019-08-28 07:39:11 -07:00
Gu, Jinghui	718feb6d76	upgrade MKL-DNN to v0.20.3 (#22910 ) Summary: 1. upgrade MKL-DNN to v0.20.3 2. allow user to change the capability of primitive cache in mkldnn-bridge by environment value LRU_CACHE_CAPACITY 3. support to fill all tensor elements by one scalar 4. fix the link issue if build with private MKLML other than pre-installed MKL 5. add rnn support in mkldnn-bridge Pull Request resolved: https://github.com/pytorch/pytorch/pull/22910 Differential Revision: D16365998 Pulled By: VitalyFedyunin fbshipit-source-id: b8d2bb454cbfbcd4b8983b1a8fa3b83e55ad01c3	2019-08-28 07:30:14 -07:00
Raghuraman Krishnamoorthi	9945c0cea6	Work around for bias quantization for conv and linear operators (#25212 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25212 In eager mode, all modules need to work with input tensors that can change qparams dynamically. This issue https://github.com/pytorch/pytorch/issues/23874 will address this via FBGEMM modifications. This is a work around before that. ghstack-source-id: 89118038 Test Plan: buck test caffe2/test:quantized -- 'test_conv_api $test_quantized_nn_mods\.ModuleAPITest$' --print-passing-details Summary (total time 65.86s): PASS: 1 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 0 Differential Revision: D17064471 fbshipit-source-id: 3c192442b19bf2d9d88d4e52de6c24dc134a846f	2019-08-28 07:24:03 -07:00
Satendra Gera	a74d702e57	Return a message instead of void from rpc udf (#25283 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25283 Return a message instead of void from rpc udf This is to help thrift style rpc where there is no need for explicit send for a response. We need to figure out how to solve the non-blocking callback case but don't want to block the thrift backed rpc agent implementation till then. ghstack-source-id: 89130305 Differential Revision: D16825072 fbshipit-source-id: 75cb1c9aa5a10363b1c6b12cd21c50d7047d2268	2019-08-28 06:43:19 -07:00
Edward Yang	98beb9ecd8	Revert D17059087: [quant] Reducing the test size for adaptive avg pool Test Plan: revert-hammer Differential Revision: D17059087 Original commit changeset: 915f46ecae61 fbshipit-source-id: 8498fb0a7cab473babe385bc2ee1c8a9a734395a	2019-08-28 06:26:45 -07:00
Summer Deng	febcb3b7b3	int8 static quantization in the numerical debugger Summary: Fix the static int8 transformation in the numerical debugger Test Plan: Example: buck run mode/opt caffe2/caffe2/fb/fbgemm/numerical_debugger:multithreaded_sparsenn_emulator -- --warmup 0 --iter 1 --threads 1 --runs 1 --local_model_path=/data/models/mobile_cvr_int8/101245796_428 --filler="logfiledb" --benchmark_model_transformation="int8_static" --run_dir=/data/models/mobile_cvr_int8/local_output/ --local_dataset_path=/data/models/mobile_cvr_int8/dataset/test/dataset_cached_reader.db --output_tensors="Sigmoid/sigmoid" --attach_ne_reporter=true --output_prediction_blob="Sigmoid/sigmoid" --activation-histogram-file=/data/models/mobile_cvr_int8/activation_histograms/101245796_428_train_10000_hist.txt.0x7f1f36133ea0 --dump_nets --caffe2_logging_print_net_summary=1 Reviewed By: amylittleyang Differential Revision: D16368515 fbshipit-source-id: b2649cec0fa35b852842a419fea1ea7105e5225c	2019-08-28 01:24:13 -07:00
Jianyu Huang	c7ef50bd14	Upgrade the deprecated data to data_ptr APIs (#25295 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25295 As Title says. Test Plan: CI Reviewed By: hl475 Differential Revision: D17089457 fbshipit-source-id: b45ca24decd6033e7e207f17540d486df6ef2ddc	2019-08-28 00:18:40 -07:00
Xing Wang	8a8844dc83	Add the sparse feature information during logging in sparse lookup layer (#24863 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24863 Add the sparse feature name in logging for ease of debugging Test Plan: ./buck-out/gen/caffe2/caffe2/fb/dper/layer_models/sparse_nn/pooling_test#binary.par -r test_simple_sum_pooling_named_exception Another test for id_score_list. the original sparse_key is equivalent to get_key(self.input_record)() P98343716 ./buck-out/gen/caffe2/caffe2/python/layers_test-2.7#binary.par -r test_get_key Reviewed By: chocjy Differential Revision: D16901964 fbshipit-source-id: 2523de2e290aca20afd0b909111541d3d152a588	2019-08-27 23:25:26 -07:00
Zafar Takhirov	b1f7e13d5f	Revert D17063240: [fix] Specify width for st.floats in hypothesis_utils.tensor Test Plan: revert-hammer Differential Revision: D17063240 Original commit changeset: 0572fb810d8c fbshipit-source-id: 62b7cf70388a5484d925805b53edb879247df4da	2019-08-27 23:23:12 -07:00
Zachary DeVito	ca4bc9fc07	improve interface error messages (#25228 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25228 This adds a facility to isSubtypeOf for it to explain why a type is not a subtype of something else. It is used in situations where it is not clear from the types python_str alone why the relationship is now true. Because of subtle interaction between default arguments, overloads, and virtual methods, it uses isSubtypeOfExt for the extended version to avoid requiring readers to understand the interaction. Test Plan: Imported from OSS Differential Revision: D17066673 Pulled By: zdevito fbshipit-source-id: 4de7c40fbf7f9eeae045d33a89a038538cf87155	2019-08-27 22:54:50 -07:00
Zachary DeVito	fba107f18e	add serialization of interface (#25227 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25227 Adds cases to NamedType serialization to so that interfaces are written. Similar implementation to NamedTuples Test Plan: Imported from OSS Differential Revision: D17066674 Pulled By: zdevito fbshipit-source-id: fda5419260fad29e8c4ddb92de1d3447d621d982	2019-08-27 22:54:46 -07:00
Zachary DeVito	a01358f91d	Remove PythonPrint's is_method_ member (#25226 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25226 Given the current structure, it is easier to just call different functions to get the desired behavior. Test Plan: Imported from OSS Differential Revision: D17066672 Pulled By: zdevito fbshipit-source-id: 88e76c5ee870d9d1e9887aebcac5e7873fabe6b1	2019-08-27 22:54:42 -07:00
Zachary DeVito	61818b8986	Add interface declarations to JIT (#25258 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25258 this is the first commit in a series to add interfaces to JIT. Interfaces allow the specification through a blank python class of an abstract interface that can be used in type annotations for Script functions. If a TorchScript class implements all the methods in the interface with the appropriate types, then it is implicitly considered to implement that interface. Follows required: * implementation of serialization * implementation in the parser frontend * better error reporting for explaining why a class does not meet an interface specification. Test Plan: Imported from OSS Differential Revision: D17079963 Pulled By: zdevito fbshipit-source-id: a9986eeba2d4fdedd0064ce7d459c0251480a5a0	2019-08-27 22:54:37 -07:00
Elias Ellison	011db3bcaa	fix closures which always throw. (#25278 ) Summary: When a closure was declared that always throw'd we would erroneously propagate the ExitThrows status to the block in which it was declared, causing us to remove the subsequent code in the block. [this code](https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/script/exit_transforms.cpp#L462) was meant to handle this case, however it didn't handle the case when we were transforming Loops and the prim::Function wasn't a target block. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25278 Differential Revision: D17084780 Pulled By: eellison fbshipit-source-id: ee31a4cc243653f615e4607ece29cdac8ef5710e	2019-08-27 22:16:54 -07:00
Will Feng	085bd15880	Add TORCH_WARN_ONCE, and use it in Tensor.data<T>() (#25207 ) Summary: This PR adds `TORCH_WARN_ONCE` macro, and use it in `Tensor.data<T>()`. cc. gchanan Pull Request resolved: https://github.com/pytorch/pytorch/pull/25207 Differential Revision: D17066263 Pulled By: yf225 fbshipit-source-id: 411c6ccc8326fb27ff885fee4638df8b5ba4d449	2019-08-27 21:42:44 -07:00
Yanghan Wang	e34ef04301	register HeatmapMaxKeypoint with C10 (#25191 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25191 registering as C10. Test Plan: buck test mode/dev-nosan caffe2/caffe2/python/operator_test:heatmap_max_keypoint_op_test Reviewed By: newstzpz Differential Revision: D17056321 fbshipit-source-id: 989b72d7e3c9f23684b10d5fc9b98177ad4ee47b	2019-08-27 20:13:57 -07:00
Funtowicz Morgan	2c22076342	Moving sign function to ATen (#22861 ) Summary: This PR linked to https://github.com/pytorch/pytorch/issues/22806 moving sign function to ATen. sign(x) supports bool, and vectorized operation on CPU. sign(NaN) is defined to return 0. sign(bool) is a no-op, the resulting tensor will holds the same values than the input one. - [x] CPU Backend - [x] CUDA Backend - [x] Bring support for bool dtype - [x] Bring support for Half dtype - [x] Add test for NaN - [x] Add test for bool dtype - [x] Delete legacy implementation in THTensorMoreMath.cpp Performances: ```python timeit -s 'import torch; x = torch.randn((1000, 1000))' -n 1000 'torch.sign(x)' timeit -s 'import torch; x = torch.randn((1000, 1000), device="cuda")' -n 1000 'torch.sign(x); torch.cuda.synchronize()' ``` \| device \| before \| after \| \| :-------------: \| :-------------: \| :-----: \| \| CPU \| 1.24 msec \| 33.9 usec \| \| GPU \| 680 usec \| 7.13 usec \| \| CPU (1 thread) \| 0.82 msec \| 0.73 msec \| \| GPU (1 thread) \| 16.1 used \| 15.9 usec \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/22861 Differential Revision: D16503452 Pulled By: VitalyFedyunin fbshipit-source-id: a87ce7fff139642ef4ed791f15873074ad0d53af	2019-08-27 19:01:34 -07:00
Raghuraman Krishnamoorthi	9d06a984f8	Serialization for nn.quantized.functional modules (#25220 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25220 Add load_from_state_dict and save_from_state_dict for quantized functional modules ghstack-source-id: 89070836 Test Plan: buck test mode/dev caffe2/test:quantization -- 'test_scriptability_serialization\ $test_quantization.ScriptabilityTest$' --print-passing-details Differential Revision: D17065243 fbshipit-source-id: 413ce0a95d0c27fedb23894f1483e3da2f60f417	2019-08-27 18:56:10 -07:00
Supriya Rao	5b4e052904	Add new qnnpack_add and qnnpack_maxpool op to C10 registry (#24103 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24103 This change adds a quantized add and maxpool2d operation for pytorch mobile. These operators follow the structure of qnnpack in terms of create/setup and run calls. The plan to refactor QNNPACK to make it more functional is currently for FC and Conv ops where the cost of create/setup is high. For ops like add and maxpool the cost of calling create and setup in each operator invocation is negligible. Once we migrate FC and Conv QNNPACK ops to be functional in nature, we will consider changing these ops as well to make it consistent. ghstack-source-id: 88997042 Test Plan: python test/test_quantized.py TestQNNPackOps.test_qnnpack_add python test/test_quantized.py TestQNNPackOps.test_qnnpack_maxpool2d Differential Revision: D16734190 fbshipit-source-id: 5152aed88e8bbe4f701dba4886eac989bdcefe8f	2019-08-27 18:56:06 -07:00
Zafar Takhirov	86a35d7b8d	Fixing the enforcement of the zero_point Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25193 Test Plan: Imported from OSS Differential Revision: D17058781 Pulled By: zafartahirov fbshipit-source-id: 7c665e1a0618c04a44f0c0e72e1bcc741a388e1c	2019-08-27 18:52:09 -07:00
James Reed	3c3d95cf1d	disable deadline checking on test_adaptive_avg_pool2d Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25255 Test Plan: Imported from OSS Differential Revision: D17078073 Pulled By: jamesr66a fbshipit-source-id: fd4c3442e87088a9b2f338a2687c5dddd0d93b81	2019-08-27 18:45:23 -07:00
Pavel Belevich	2e224d62b6	Add USE_CUDNN check to AT_CUDNN_ENABLED definition (#25037 ) Summary: We have environment variable USE_CUDNN with self-explanatory name. However cpp code is compiled based on cpp macro definition AT_CUDNN_ENABLED, which is defined as: ``` IF (NOT AT_CUDA_ENABLED OR NOT CUDNN_FOUND) MESSAGE(STATUS "CuDNN not found. Compiling without CuDNN support") set(AT_CUDNN_ENABLED 0) ELSE() include_directories(SYSTEM ${CUDNN_INCLUDE_DIRS}) set(AT_CUDNN_ENABLED 1) ENDIF() ``` So, even if USE_CUDNN is set to 0, cpp is compiled with cuDNN if cmake finds cuDNN in the system. I actually tested it and was very surprised when I was debugging cuDNN code which I built with USE_CUDNN=0. I believe that cmake code above should look like this: `IF (NOT AT_CUDA_ENABLED OR NOT CUDNN_FOUND OR NOT USE_CUDNN) ...` Pull Request resolved: https://github.com/pytorch/pytorch/pull/25037 Differential Revision: D17048683 Pulled By: pbelevich fbshipit-source-id: 48afa19eaae0bba2ffd49c1f68db0b4efd5cf85e	2019-08-27 18:43:11 -07:00
Ivan Kobzarev	f82c4ce6d6	Add libtorch android build with shared lib for 4 android abis (#25192 ) Summary: In current pytorch/master we have only libtorch android build for static libraries for armv7 This change adds the same builds with shared library to circleCI, abis: x86, x86_64, arm -v7a, arm-v8a In pytorch_build_data.py I added new AndroidAbiConfigNode: class AndroidAbiConfigNode(TreeConfigNode): def init2(self, node_name): self.props["android_abi"] = node_name def child_constructor(self): return ImportantConfigNode That can be children of ExperimentalFeatureConfigNode And it results: ("android", [ ("r19c", [ ("3.6", [ ("android_abi", [XImportant("x86")]), ("android_abi", [XImportant("x86_64")]), ("android_abi", [XImportant("arm-v7a")]), ("android_abi", [XImportant("arm-v8a")]), ]) ]), ]), As all parameters are used for docker_image_name generation, while I wanted to use the same docker image for all android jobs - I introduced in Conf.parms_list_ignored_for_docker_image in pytorch_build_definitions.py It contains parameters that will not be joined to docker_image but used for job name generation and build_environment generation Pull Request resolved: https://github.com/pytorch/pytorch/pull/25192 Reviewed By: kostmo Differential Revision: D17078465 Pulled By: IvanKobzarev fbshipit-source-id: c87534a45fb92c395e0dd3471213d42d3613c604	2019-08-27 18:43:07 -07:00
Owen Anderson	f8852c947b	Implement a bunch of pickle serialization features that optimize for size. (#23759 ) Summary: This saves about 10KB of compressed size on FaceBlaze. https://github.com/pytorch/pytorch/issues/23582 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23759 Differential Revision: D16641664 fbshipit-source-id: 5a7cec1a1b5123bb2a3eaa21ea12e041be551561	2019-08-27 18:40:38 -07:00
Jianyu Huang	3af758c077	data -> data_ptr: upgrade the deprecated APIs (#25223 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25223 Before this PR, it shows the following warning: ``` > caffe2/aten/src/ATen/core/Tensor.h:297: UserWarning: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. > TORCH_WARN("Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead."); > caffe2/aten/src/ATen/core/Tensor.h:297: UserWarning: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. > TORCH_WARN("Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead."); ``` After this PR, the warning message should disappear. ghstack-source-id: 89113498 Test Plan: CI Differential Revision: D17066471 fbshipit-source-id: e4fec964b5333ff968c8cf218286d4a8ab8dbe54	2019-08-27 18:38:16 -07:00
Sebastian Messmer	a4fa167878	Optimize LeftRight and either (#25133 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25133 This is driven by benchmarks I did for moving ATen ops to the c10 operator library. Improvements: - tell the compiler that the error cases are unlikely so it can optimize code better - optimize cache layout of LeftRight. ghstack-source-id: 88907294 Test Plan: unit tests Differential Revision: D16998010 fbshipit-source-id: 0e3cbff0a4983133a4447ec093444f5d85dd61d6	2019-08-27 18:33:29 -07:00
Jiakai Liu	9fd62436b4	get rid of dynamic_cast in Quantizer (#25001 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25001 Seems QScheme and Quantizer class type has 1-1 mapping, so use it to compare whether two quantizers are equal instead of using dynamic_cast. This way the code can remain mobile friendly as our internal mobile build doesn't enable rtti by default. ghstack-source-id: 88925243 Test Plan: - builds; - will check CI tests; Differential Revision: D16951501 fbshipit-source-id: 585b354f64e5188fd34f01d456c91cec232ba6b0	2019-08-27 18:33:24 -07:00
Ailing Zhang	858493d168	generic overrideable convolution for backends (#23562 ) Summary: One possible solution based on our discussion yesterday: ezyang gchanan zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/23562 Differential Revision: D16998161 Pulled By: ailzhang fbshipit-source-id: 07fe3a335f43b4205a421b3521aeb5fa4dc80279	2019-08-27 18:33:21 -07:00
Zafar Takhirov	ac862e6ddc	Reducing the test size for adaptive avg pool (#25195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25195 The test will fila for large samples adue to deadline constraint in the hypothesis framework. Test Plan: Imported from OSS Differential Revision: D17059087 Pulled By: zafartahirov fbshipit-source-id: 915f46ecae61de1b384136c14da25ee875d1c02d	2019-08-27 18:28:42 -07:00
Raghuraman Krishnamoorthi	f5a3d59254	Handle empty qconfig for functional Modules (#25215 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25215 ghstack-source-id: 89044252 Test Plan: Test implemented in D16879132/ Differential Revision: D17064670 fbshipit-source-id: 08d3d566aa123bedf318ab5a8bc9b71457930ff2	2019-08-27 12:31:26 -07:00
Andrew Li	3779893d1d	Implementation of cyclical learning rate (#23914 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23914 Implementation of cyclical learning rate, see https://arxiv.org/pdf/1506.01186.pdf Test Plan: canary: https://fburl.com/fblearner/siqb34md Reviewed By: chenshouyuan Differential Revision: D16632831 fbshipit-source-id: 20bd9d7fb61af5a8b594b039c5d434a0cc96fadc	2019-08-27 10:44:16 -07:00
Jerry Zhang	c351a68f5b	Specify width for st.floats in hypothesis_utils.tensor (#25188 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25188 circleci complains about generated numbers are not representable by float32 and it pollutes the logs: https://circleci.com/gh/pytorch/pytorch/2554740?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link Test Plan: circleci Imported from OSS Differential Revision: D17063240 fbshipit-source-id: 0572fb810d8ccd8cdf3f3ac7efdf0cfce5aee6ca	2019-08-27 10:21:05 -07:00
Edward Yang	44a7879b6e	Disable flaky test_adaptive_avg_pool2d test. (#25249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25249 See #25097 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17071632 Pulled By: ezyang fbshipit-source-id: 1c5ad7204f1d30f5c67d682fbb083608e067cb2a	2019-08-27 09:09:25 -07:00
Raghuraman Krishnamoorthi	c142dbf876	Fix scriptability for Observer (#25219 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25219 Ensure that observer code remains scriptable after addition of warnings ghstack-source-id: 89055664 Test Plan: buck test caffe2/test:quantization -- 'test_observer_scriptable $test_quantization\.ObserverTest$' --print-passing-details Differential Revision: D17065218 fbshipit-source-id: b3599613b4835bf1c5241aff191b40ba5f40d7be	2019-08-27 08:54:40 -07:00
Hong Xu	92750acb88	Move the detection of cuDNN to FindCUDNN.cmake (#24938 ) Summary: Currently they sit together with other code in cuda.cmake. This commit is the first step toward cleaning up cuDNN detection in our build system. Another attempt to https://github.com/pytorch/pytorch/issues/24293, which breaks manywheels build because it does not handle `USE_STATIC_CUDNN` properly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24938 Differential Revision: D17070920 Pulled By: ezyang fbshipit-source-id: a4d017a3505c102d9c435a73ae62332e4336c52e	2019-08-27 06:51:52 -07:00
Richard Zou	2f4f6c2563	Implement name inference for torch.dot (#24474 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24474 torch.dot is a little weird. It ignores the names of its inputs to be consistent with the rest of our matrix multiplication functions. I've written the implementation using a helper function that is also used by other matrix multiplication functions so that it is easy to change the behavior. Test Plan - new tests [namedtensor ci] Test Plan: Imported from OSS Differential Revision: D16915802 Pulled By: zou3519 fbshipit-source-id: 628a6de1935357022cc92f4d23222736a70bb070	2019-08-27 06:49:27 -07:00
Edward Yang	9340b155bc	Revert D15901930: Add interface declarations to JIT Test Plan: revert-hammer Differential Revision: D15901930 Original commit changeset: 22c82d12c9c2 fbshipit-source-id: 4009a3ce7af245d7e0f4924824ece59cdc774180	2019-08-27 06:41:32 -07:00
Adam Paszke	1f57b8b738	Add myself as a CODEOWNER for better discoverability (#25231 ) Summary: Not meant to be a landing blocker or anything like that. This only lets me setup some more effective email filters, hopefully allowing me to discover the current changes earlier and be more responsive. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25231 Differential Revision: D17070735 Pulled By: soumith fbshipit-source-id: 171c8dcd48edf64a9dc3367015e4166baa860c0a	2019-08-27 06:22:40 -07:00
Raghuraman Krishnamoorthi	f622ec8084	Update mapping dictionary to support functionalmodules and pooling operations (#25216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25216 ghstack-source-id: 89045562 Test Plan: buck test mode/dev caffe2/test:quantization -- 'test_resnet_base\ $test_quantization.PostTrainingQuantTest$' --print-passing-details Differential Revision: D17065029 fbshipit-source-id: b248abf6de162f38e35e6bace17bde1be9e38c57	2019-08-26 23:00:01 -07:00
Raghuraman Krishnamoorthi	4d2bf0b51b	Move test QAT tests to double precision to ensure numerics match (#25211 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25211 P Change dtypes of all tensors in testqat to double precision. Without this change, the backward pass showed small mismatches the root cause of which wasnt clear. With this change, the numerics match to a precision of 1e-10 and this test is useful and provides a tight check on numerics. ghstack-source-id: 89041119 Test Plan: buck test caffe2/test:quantized -- 'test_conv_bn_relu $test_qat\.IntrinsicQATModuleTest$' --print-passing-details Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/3377699726578151 ✓ caffe2/test:quantized - test_conv_bn_relu (test_qat.IntrinsicQATModuleTest) 17.777 1/1 (passed) Test output: > test_conv_bn_relu (test_qat.IntrinsicQATModuleTest) ... ok > > ---------------------------------------------------------------------- > Ran 1 test in 17.778s > > OK Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/3377699726578151 Summary (total time 22.03s): PASS: 1 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 0 Differential Revision: D17064183 fbshipit-source-id: 7f6d5d2b71430b6aaf4f6d741b56a2bd1247ac29	2019-08-26 22:55:39 -07:00
Jerry Zhang	b15d91490a	Remove InsertQuantDeQuantNode (#25000 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25000 Remove deprecated insert_quantdequant pass Test Plan: . Imported from OSS Differential Revision: D17001139 fbshipit-source-id: 5ecabdff84598fe21f24ea827b615e697081ee53	2019-08-26 20:00:25 -07:00
Jerry Zhang	fbb88f5d71	Remove insert_observers pass (#24999 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24999 As described in previous PR, we are doing module level observer rather than global observer now, so majority of code are deprecated. But we still keeps some logic that is independent of this decision in the new code. Test Plan: . Imported from OSS Differential Revision: D17001138 fbshipit-source-id: b456f80d587a61e368c626e7e8ac2a4f1282268b	2019-08-26 20:00:22 -07:00
Jerry Zhang	0b60f5c0f8	Remove deprecated graph mode quantization tests (#24998 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24998 Original graph mode was developed at the time when we don't have conrete API of qconfig yet and it has a global observer module which is passed around during the whole quantization flow, we have a much clearer picture of quantization API now, and we are going to use a per Tensor observer design, just like in eager mode. This PR removes the deprecated tests, next PR will remove deprecated code. Test Plan: ``` python test/test_quantizer.py ``` Imported from OSS Differential Revision: D17001140 fbshipit-source-id: 87f342cfa8ea6b45606372c51dbfc493065a737a	2019-08-26 20:00:18 -07:00
Jerry Zhang	ab0229388c	add import for test_quantizer.py (#25222 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25222 forgot to add import.. Test Plan: python test/test_quantizer.py Imported from OSS Differential Revision: D17066193 fbshipit-source-id: 638119053724b21151eb6f05adfd39d094e44de7	2019-08-26 19:57:30 -07:00
Ivan Kobzarev	9e27cf617e	Initial commit for android torchvision utils (#25185 ) Summary: Initial commit of pytorch_android_torchvision that has utility methods for android.media.Image, YUV_420_888 format (camera output) -> Tensor(Float) with torchvision format, normalized by ImageNet mean,std Bitmap -> Tensor(Float) torchvision format Pull Request resolved: https://github.com/pytorch/pytorch/pull/25185 Reviewed By: dreiss Differential Revision: D17053008 Pulled By: IvanKobzarev fbshipit-source-id: 6bf7a39615bf876999982b06925e7444700e284b	2019-08-26 19:40:44 -07:00
Ivan Kobzarev	c0334015ed	add to Tensor symmetric methods getDataAsIntArray, getDataAsByteArray (#25183 ) Summary: Tensor has getDataAsFloatArray(), we also support Int and Byte Tensors, adding symmetric methods for Int and Byte, that will throw IllegalStateException if called for not appropriate type Pull Request resolved: https://github.com/pytorch/pytorch/pull/25183 Reviewed By: dreiss Differential Revision: D17052674 Pulled By: IvanKobzarev fbshipit-source-id: 1d44944461ad008e202e382152cd0690c61124f4	2019-08-26 19:11:11 -07:00
Jerry Zhang	c2e0383975	skip tests if fbgemm is not supported for test_quantizer.py (#25209 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25209 att Test Plan: ossci Imported from OSS Differential Revision: D17063744 fbshipit-source-id: d4ff860f3cd80c3a90d06c4f13d9ae0e9fe8e125	2019-08-26 17:31:22 -07:00
Zachary DeVito	4b22cf6bd5	Add interface declarations to JIT (#21972 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21972 ghimport-source-id: 280f89ca678615f915be2139d1c05cb6bc39eefc Test Plan: Imported from OSS Differential Revision: D15901930 Pulled By: zdevito fbshipit-source-id: 22c82d12c9c2600e569d7083e2771fd6ec3de2b1	2019-08-26 16:57:59 -07:00
Zachary DeVito	6e42580d32	Simplify NamedType Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25058 Test Plan: Imported from OSS Differential Revision: D16974556 Pulled By: zdevito fbshipit-source-id: f15611df5117abb5b03dfd22fb412421f6385976	2019-08-26 16:57:55 -07:00
Raghuraman Krishnamoorthi	26a438d4fb	Revert D16852280: Work around for bias quantization for conv and linear operators Test Plan: revert-hammer Differential Revision: D16852280 Original commit changeset: 988f8ff91616 fbshipit-source-id: e2cf03e13dc8dcf0db22d43740d72fd8b069fd74	2019-08-26 16:25:33 -07:00
Raghuraman Krishnamoorthi	17f69eff22	Revert D16879133: Handle empty qconfig for functional Modules Test Plan: revert-hammer Differential Revision: D16879133 Original commit changeset: 230f5204cfbd fbshipit-source-id: 29b4bfe066b173797f3d9f2fcf7cbf5ee21ff8fb	2019-08-26 16:25:29 -07:00
Raghuraman Krishnamoorthi	a9fdc1923b	Revert D16879132: Update mapping dictionary to support functionalmodules and pooling operations Test Plan: revert-hammer Differential Revision: D16879132 Original commit changeset: cd8c10182aa7 fbshipit-source-id: 9b67ccf73f43d15ef50bf0331d3df4d57835931b	2019-08-26 16:25:25 -07:00
Raghuraman Krishnamoorthi	978a964be4	Revert D17053634: Move test QAT tests to double precision to ensure numerics match Test Plan: revert-hammer Differential Revision: D17053634 Original commit changeset: e19d555adee2 fbshipit-source-id: 6ae9be6459b6ac7fe046817f02d12b0f5b6d6ca3	2019-08-26 16:23:03 -07:00
Jerry Zhang	a1bf4d7ee1	Integration tests for initial quantization graph mode (#24428 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24428 att Test Plan: python test/test_quantizer.py Imported from OSS Differential Revision: D17001136 fbshipit-source-id: b0c6cd433efdcbc6b54b429a29677fc509221937	2019-08-26 15:38:29 -07:00
Raghuraman Krishnamoorthi	77ee1f5f3c	Revert D16923660: Support observer without any data calibration Test Plan: revert-hammer Differential Revision: D16923660 Original commit changeset: 9927ed4e4ee9 fbshipit-source-id: 31a2b28584aae3808df6508b4caedb54de32156d	2019-08-26 15:36:26 -07:00
Raghuraman Krishnamoorthi	c3c36a5b68	Revert D16923651: Serialization for nn.quantized.functional modules Test Plan: revert-hammer Differential Revision: D16923651 Original commit changeset: eb1234be1941 fbshipit-source-id: c80d0b50db0f949cc293dbc2f825404bbc8cb86c	2019-08-26 15:36:21 -07:00
Raghuraman Krishnamoorthi	ff30201fff	Revert D17059486: Fix scriptability for Observer Test Plan: revert-hammer Differential Revision: D17059486 Original commit changeset: 70ea9ee39f0b fbshipit-source-id: 6f39057b264e4d4213cf07496929274240bce917	2019-08-26 15:32:21 -07:00
Raghuraman Krishnamoorthi	85d1ebd26e	Fix scriptability for Observer (#25197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25197 Ensure that observer code remains scriptable after addition of warnings ghstack-source-id: 89022474 Test Plan: buck test caffe2/test:quantization -- 'test_observer_scriptable $test_quantization\.ObserverTest$' --print-passing-details Differential Revision: D17059486 fbshipit-source-id: 70ea9ee39f0b896c7801e168666f88c156dbf15b	2019-08-26 15:27:27 -07:00
Mike Ruberry	433fe47d95	Creates Torch-friendly Event class and adds Stream tracking to autograd (#25130 ) Summary: Resubmission of https://github.com/pytorch/pytorch/issues/23424 because previous PR was borked. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25130 Differential Revision: D17052534 Pulled By: mruberry fbshipit-source-id: d91b308ad0f730646bb7b3492a601cd9b05c72d8	2019-08-26 15:19:06 -07:00
Richard Zou	088201f95d	Implement name inference for addmv, addmv_, mv (#24471 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24471 mv(Tensor[M, N], Tensor[O]) ignores the names of N and O and returns a tensor with names [M]. Test Plan: - new tests [namedtensor ci] Differential Revision: D16915805 Pulled By: zou3519 fbshipit-source-id: d7d47903f249f85ef3be8a188d51993834bf5f55	2019-08-26 15:03:26 -07:00
Richard Zou	78fa8a8ad0	Implement name inference for expand (#24469 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24469 tensor.expand(*sizes) returns a tensor with names equal to tensor.names plus unnamed padding in the beginning dimensions. For example, Tensor[H, W].expand(10, 2, 128, 128) -> Tensor[None, None, H, W]. Test Plan: - new tests [namedtensor ci] Differential Revision: D16915804 Pulled By: zou3519 fbshipit-source-id: 77ac97f42e9959d7f6d358c5286e3dc27488e33d	2019-08-26 15:03:22 -07:00
Elias Ellison	277cd748f9	skip fstrings test if not py36 (#25184 ) Summary: Fixes py35 job on master Pull Request resolved: https://github.com/pytorch/pytorch/pull/25184 Differential Revision: D17057957 Pulled By: eellison fbshipit-source-id: 53decc408680d9436395698cbd4b4ede98933159	2019-08-26 13:58:45 -07:00
Zachary DeVito	121839b2f8	Fix bugs in assignment to optionals (#25059 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25059 This fixes the cases where a type annotated with optional cannot be conditionally assigned to none: ``` x : Optional[int] = 4 if ...: x = None ``` Test Plan: Imported from OSS Differential Revision: D16975166 Pulled By: zdevito fbshipit-source-id: 5a7a81224d08b9447e1f4d957fcd882091e02f32	2019-08-26 13:47:54 -07:00
Zafar Takhirov	3b3261ca8e	Adding Scalar add/mul. (#24447 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24447 Note: This should be landed ONLY after #24259 Pull Request resolved: https://github.com/pytorch/pytorch/pull/24447 Differential Revision: D16846006 Test Plan: Imported from OSS Pulled By: zafartahirov fbshipit-source-id: 458fd65279d98cb177ef206240d24dfcbc8d1c1b	2019-08-26 13:05:44 -07:00
Zafar Takhirov	a3e6e82b6c	Adding return for the observer in the functional_modules.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25168 Test Plan: Imported from OSS Differential Revision: D17048164 Pulled By: zafartahirov fbshipit-source-id: 40ee1f276ee5421255de5b2fc14194402ded10db	2019-08-26 13:03:08 -07:00
Nikolay Korovaiko	c395f42109	fix to loggin in AA Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25143 Differential Revision: D17004030 Pulled By: Krovatkin fbshipit-source-id: 5081c8f89238b7eaf72267ec67b714e125378782	2019-08-26 12:24:00 -07:00
Richard Zou	0156d02b59	Implement name inference for mm, addmm (#24306 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24306 Featuring: - a new way of writing name inference tests. At some point I'll migrate the older tests over. - The out= variants aren't implemented. This is because they are a little weird: the output gets resized, but I haven't throught through what semantics that should have. Test Plan: - new tests [namedtensor ci] Differential Revision: D16915801 Pulled By: zou3519 fbshipit-source-id: 29ae2ee414c7d98e042965458c5dccef7ddbd4dd	2019-08-26 12:20:26 -07:00
Richard Zou	6195aee2c6	Fix binary op name inference between unnamed and named tensors. (#24921 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24921 Let `unnamed = torch.randn(1, 1, 1)` and `named = torch.randn(1, 1, names=('N', 'C'))`. Previously, there was a bug where `unnamed + named` would error out. This happened because `unify_from_right(unnamed.opt_names(), named.opt_names())` would return `named.names()`, which was propagated to the output tensor. However, the output tensor has dim 3, but `names.names()` only has 2 elements, so the code would throw an error. The solution implemented in this PR is to stop trying to do premature optimization. If all inputs to an operation doesn't have names, then don't run name inference. However, if any inputs do, then materialize the names and run name inference. It's possible to make this more efficient for the case where some inputs are named and some aren't, but we should benchmark these cases and determine if it is necessary for it to be more efficient. Test Plan: - new tests [namedtensor ci] Differential Revision: D16930710 Pulled By: zou3519 fbshipit-source-id: 0de73c803c8b0f9a1c2d80684b9a47cccba91cbc	2019-08-26 12:20:22 -07:00
Raghuraman Krishnamoorthi	5d6b3dfdf4	Move test QAT tests to double precision to ensure numerics match (#25189 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25189 Change dtypes of all tensors in testqat to double precision. Without this change, the backward pass showed small mismatches the root cause of which wasnt clear. With this change, the numerics match to a precision of 1e-10 and this test is useful and provides a tight check on numerics. ghstack-source-id: 88999698 Test Plan: buck test caffe2/test:quantized -- 'test_conv_bn_relu $test_qat\.IntrinsicQATModuleTest$' --print-passing-details Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/3377699726578151 ✓ caffe2/test:quantized - test_conv_bn_relu (test_qat.IntrinsicQATModuleTest) 17.777 1/1 (passed) Test output: > test_conv_bn_relu (test_qat.IntrinsicQATModuleTest) ... ok > > ---------------------------------------------------------------------- > Ran 1 test in 17.778s > > OK Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/3377699726578151 Summary (total time 22.03s): PASS: 1 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 0 Differential Revision: D17053634 fbshipit-source-id: e19d555adee29b49bff873fcc01f527e8272f1c6	2019-08-26 12:17:01 -07:00
Raghuraman Krishnamoorthi	95a3ffc2f1	Serialization for nn.quantized.functional modules (#24924 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24924 Add load_from_state_dict and save_from_state_dict for quantized functional modules ghstack-source-id: 89001576 Test Plan: buck test mode/dev caffe2/test:quantization -- 'test_scriptability_serialization\ $test_quantization.ScriptabilityTest$' --print-passing-details Differential Revision: D16923651 fbshipit-source-id: eb1234be1941ccf268a2fc5b756540ab973f3ffb	2019-08-26 12:16:57 -07:00
Raghuraman Krishnamoorthi	a5710e2303	Support observer without any data calibration (#24923 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24923 Replace exception with warning for un initialized min/max values to support creation of quantized models without observers. ghstack-source-id: 89003800 Test Plan: Replace error message with warning for observers Differential Revision: D16923660 fbshipit-source-id: 9927ed4e4ee977c1388595ddef042204f71076a4	2019-08-26 12:16:53 -07:00
Raghuraman Krishnamoorthi	794f63fe92	Update mapping dictionary to support functionalmodules and pooling operations (#24804 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24804 ghstack-source-id: 89003799 Test Plan: buck test mode/dev caffe2/test:quantization -- 'test_resnet_base\ $test_quantization.PostTrainingQuantTest$' --print-passing-details Differential Revision: D16879132 fbshipit-source-id: cd8c10182aa732ddf655bcda17f72ea08033a633	2019-08-26 12:16:49 -07:00
Raghuraman Krishnamoorthi	d7f6ac1dbb	Handle empty qconfig for functional Modules (#24803 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24803 ghstack-source-id: 89003797 Test Plan: Test implemented in D16879132/ Differential Revision: D16879133 fbshipit-source-id: 230f5204cfbd149fea1c0985578a2572a0e0f2a8	2019-08-26 12:16:46 -07:00
Raghuraman Krishnamoorthi	ea601d90d6	Work around for bias quantization for conv and linear operators (#24789 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24789 In eager mode, all modules need to work with input tensors that can change qparams dynamically. This issue https://github.com/pytorch/pytorch/issues/23874 will address this via FBGEMM modifications. This is a work around before that. ghstack-source-id: 89003798 Test Plan: buck test caffe2/test:quantized -- 'test_conv_api $test_quantized_nn_mods\.ModuleAPITest$' --print-passing-details Summary (total time 65.86s): PASS: 1 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 0 Differential Revision: D16852280 fbshipit-source-id: 988f8ff91616eddf511e71926aa7d2d0f1938188	2019-08-26 12:16:42 -07:00
Wanchao Liang	969c918f56	bind autograd.backward and tensor.backward in TorchScript (#23913 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23913 This PR bind torch.autograd.backward and tensor.backward to TorchScript, and make aliasing to the conservative for these two ops, this is mainly because backward op might write to every input tensors in the graph Test Plan: Imported from OSS Differential Revision: D16923272 fbshipit-source-id: 8a4016c62e00d00e0dee3d8c599d3aca220202f7	2019-08-26 12:11:02 -07:00
Jerry Zhang	9a9ef21bad	quant_fusion jit pass (#24427 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24427 Added following pass: - _jit_pass_quant_fusion: Fusion pass that replaces the dequant->conv->quant patterns to quantized_conv Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D17001142 fbshipit-source-id: 729a6bf291c5268b24f5716ccadfcfb63e039c0b	2019-08-26 11:17:39 -07:00
Jerry Zhang	105fbb9cce	insert_quant_dequant jit pass (#24426 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24426 Added following pass: - _jit_pass_insert_quant_dequant: removes observer modules and calls, insert quantize_linear-int_repr-_dequantize_linear calls for activation, weight and bias, the scale of bias is calculated from the scale of input activation and weight Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D17001141 fbshipit-source-id: e81faac697a9c0df862adc5aa8ca2aa9e4ae5fd9	2019-08-26 10:52:32 -07:00
Daya Khudia	d80625754f	per channel quantization support (#25134 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25134 copy of https://our.intern.facebook.com/intern/diff/D16909378/ it was reverted in https://our.intern.facebook.com/intern/diff/D16997422/ Per channel quantization support in qconv2d + tests ghstack-source-id: 88992610 Test Plan: buck test mode/dev caffe2/test:quantized -- --print-passing-details ``` Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/4222124656103386 Summary (total time 64.42s): PASS: 33 FAIL: 0 SKIP: 3 caffe2/test:quantized - test_qlinear (test_quantized.TestDynamicQuantizedLinear) caffe2/test:quantized - test_qnnpack_relu (test_quantized.TestQNNPackOps) caffe2/test:quantized - test_qnnpack_linear (test_quantized.TestQNNPackOps) FATAL: 0 TIMEOUT: 0 OMIT: 0 ``` Differential Revision: D16999104 fbshipit-source-id: 008447ffbc0144f0fc76f3cf143a2f69b65922fd	2019-08-26 10:43:09 -07:00
Tzu-Wei Huang	cd14518ee8	hyperparameter plugin (#23134 ) Summary: closes https://github.com/pytorch/pytorch/issues/16838 example usage: ```python writer.add_hparam(hparam_dict= {'lr': 0.1, 'bsize': 12}, metrics= {'accuracy': 0.987, 'loss': 10}) ``` cc orionr Pull Request resolved: https://github.com/pytorch/pytorch/pull/23134 Reviewed By: orionr Differential Revision: D16807300 Pulled By: sanekmelnikov fbshipit-source-id: 4072c529076f423b34b00b68be2d6eec444423fe	2019-08-26 10:40:34 -07:00
Will Feng	1bf1970fe2	Add Python/C++ torch.nn API parity test harness (#23852 ) Summary: This PR adds test harness for checking Python / C++ API parity for `torch.nn.Module` subclasses. Under the hood, we use JIT tracing to transfer `nn.Module` state from Python to C++, so that we can test initialization / forward / backward on Python / C++ modules with the same parameters and buffers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23852 Differential Revision: D16830204 Pulled By: yf225 fbshipit-source-id: 9b5298c0e8cd30e341a9f026e6f05604a82d6002	2019-08-26 08:02:25 -07:00
jasjuang	573b1cd224	prevent generating caffe2::mkl for multiple times (#25167 ) Summary: fixes issue https://github.com/pytorch/pytorch/issues/25004 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25167 Differential Revision: D17051290 Pulled By: ezyang fbshipit-source-id: 30c2b6d6ffca2ce8dae45a4a706ce45d6386c672	2019-08-26 07:39:56 -07:00
peter	c24314bf0e	Ensure tests get passed on Windows (#25145 ) Summary: (1) check error codes after every test command (2) add missing LibTorch tests mentioned in https://discuss.pytorch.org/t/pre-compiled-tests-failing/54166 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25145 Differential Revision: D17050539 Pulled By: ezyang fbshipit-source-id: 8a01e5f3c97b181cf2cd7641a545551dcb3627b8	2019-08-26 06:02:24 -07:00
Pavel Belevich	30bc65271d	torch.from_numpy fix for np.int (#25139 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/22615 Because of different sizeof(long) we have the following relations between NPY_TYPES and NPY_INTXX aliases: ``` int value Enum Unix Windows 1 NPY_BYTE NPY_INT8 NPY_INT8 3 NPY_SHORT NPY_INT16 NPY_INT16 5 NPY_INT NPY_INT32 - 7 NPY_LONG NPY_INT64 NPY_INT32 9 NPY_LONGLONG - NPY_INT64 ``` I suggest the following fix for `numpy_dtype_to_aten` method: ``` if (dtype == NPY_INT \|\| dtype == NPY_INT32) { return kInt; } else if (dtype == NPY_LONGLONG \|\| dtype == NPY_INT64) { return kLong; } ``` On Unix it will be replaced with: Unix: ``` if (dtype == 5 \|\| dtype == 5) { return kInt; } else if (dtype == 9 \|\| dtype == 7) { return kLong; } ``` and on Windows with: ``` if (dtype == 5 \|\| dtype == 7) { return kInt; } else if (dtype == 9 \|\| dtype == 9) { return kLong; } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/25139 Differential Revision: D17048443 Pulled By: pbelevich fbshipit-source-id: 9f2c27ff2829b893a35d3d57f176a58e7749a468	2019-08-26 05:07:22 -07:00
Yu Shi	43a2fd0e24	Support focal loss in MTML Summary: [Not in need of review at this time] Support focal loss in MTML (effectively dper2 in general) as described in https://arxiv.org/pdf/1708.02002.pdf. Adopt approach similar to Yuchen He's WIP diff D14008545 Test Plan: Passed the following unit tests buck test //caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test -- test_lr_loss_based_focal_loss buck test //caffe2/caffe2/fb/dper/layer_models/tests:mtml_test_2 -- test_mtml_with_lr_loss_based_focal_loss buck test //caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test -- test_lr_loss_based_focal_loss_with_stop_grad_in_focal_factor Passed ./fblearner/flow/projects/dper/canary.sh; URL to track workflow runs: https://fburl.com/fblearner/446ix5q6 Model based on V10 of this diff f133367092 Baseline model f133297603 Protobuf of train_net_1 https://our.intern.facebook.com/intern/everpaste/?color=0&handle=GEq30QIFW_7HJJoCAAAAAABMgz4Jbr0LAAAz Reviewed By: hychyc90, ellie-wen Differential Revision: D16795972 fbshipit-source-id: 7bacae3e2255293d337951c896e9104208235f33	2019-08-25 01:42:25 -07:00
Geoffrey Goh	b7b80c6bdd	Fix ios_crash:backtrace=FBCameraFramework:caffe2::getClockTimeMilliseconds() (perf_observer.cc (#24813 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24813 clock_gettime does not work on some apple platforms e.g. OSX < 10.12 Use a custom implementation, similar to https://opensource.apple.com/source/Libc/Libc-1158.1.2/gen/clock_gettime.c.auto.html T52655182 Test Plan: sandcastle tests Differential Revision: D16883407 fbshipit-source-id: a42828bb91bb0c43297e9bdce4b18f7c9ea4274d	2019-08-24 21:16:02 -07:00
Mikhail Zolotukhin	f2bcad5ddf	Add logging to JIT CSE pass. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25141 Test Plan: Imported from OSS Differential Revision: D17003448 Pulled By: ZolotukhinM fbshipit-source-id: ec9f738efc0baf80b3447b12e7c43d24237e8496	2019-08-24 19:52:39 -07:00
James Reed	f71ddd4292	Switch hub to use `requests` because of SSL (#25083 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25083 I missed this in the last PR Test Plan: Imported from OSS Differential Revision: D17005372 Pulled By: jamesr66a fbshipit-source-id: 1200a6cd88fb9051aed8baf3162a9f8ffbf65189	2019-08-24 12:06:49 -07:00
Mikhail Zolotukhin	85bca16a61	SubgraphMatcher: matching modules support. (#25075 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25075 This change adds a special behavior to subgraph matcher to allow it to match calls to modules. Namely, when a node in the pattern graph has a 'match::module' type, it is considered 'match' only when the corresponding node in the target graph is a 'prim::GetAttr' obtaining a submodule which type matches the type specified in 'name' attribute of the 'match::module' node. Currently when comparing the expected module type we check if the string specified in 'name' prefixes qualified name of the module GetAttr returns. In future when qualified name format is better defined we will probably change it for the exact comparison. Why do we want this? In some cases we would like to perform fusion on a module level rather than on a graph-level. A popular example of such fusion would be Conv-BN. It is inpractical to match batchnorm on graph-evel because that would mean we woudl need to specify its full and exact implementation in the pattern graph. If we match on the CallMethod level, however, the problem becomes trivial. The feature added in this PR allows to detect patterns with 'CallMethod' nodes, which in its turn allows us to use subgraph rewriter to replace such patterns with some node (or nodes). I expect that a usual approach would be to use subgraph-rewriter to replace all matches with some artificial node and then in additional pass replace such nodes with calls to another module or something else. It is not possible at the moment to use subgraph-rewriter to add a call to a method of a new module, because it can not add a new submodule, but we probably would add a higher level API to do that. Test Plan: Imported from OSS Differential Revision: D16978652 Pulled By: ZolotukhinM fbshipit-source-id: 37307a5ec65cf4618ad8eb595ef5f8ae656e2713	2019-08-23 21:15:46 -07:00
Mikhail Zolotukhin	16289c2fdc	SubgraphMatcher: add logging. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25074 Test Plan: Imported from OSS Differential Revision: D16978654 Pulled By: ZolotukhinM fbshipit-source-id: a59c86c11ea6a6e0acc09d0b1d73fa22e8d1451b	2019-08-23 21:15:41 -07:00
Mikhail Zolotukhin	b5096b68d3	SubgraphMatcher: Factor out matchAttributes. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25073 Test Plan: Imported from OSS Differential Revision: D16978653 Pulled By: ZolotukhinM fbshipit-source-id: 57b5d371fcb74f8dbbb2b64cbd98a92134f3e78a	2019-08-23 21:15:37 -07:00
Jongsoo Park	a54f8f0f21	use avx2 for Add without broadcast and when inputs are uint8_t (#25098 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25098 Use the same optimization we used for Sum operator in Add when broadcast is not used and inputs are uint8_t. The optimization uses AVX2 instruction and use fp32 (instead of pure fixed point arithmetic). It does introduce numerical difference but only for minor cases like tie-breaking when rounding. Test Plan: buck test caffe2/caffe2/quantization/server:elementwise_add_dnnlowp_op_test Reviewed By: jianyuh Differential Revision: D16985776 fbshipit-source-id: 8097503dd55f7d39857b3e4102db0f91327a6f55	2019-08-23 18:20:22 -07:00
Jianyu Huang	363655dc48	Use the EmbeddingLookup API which takes the offsets instead of lengths (#24945 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24945 As Title says. ghstack-source-id: 88903516 Test Plan: To Check with CI. ``` import torch, time eb = torch.nn.EmbeddingBag(1000000, 64, mode='sum') input = torch.LongTensor(1500).random_(0, 1000000) offsets = torch.zeros(64, dtype=torch.int64) niter = 10000 s = time.time() for i in range(niter): out = eb(input, offsets) time_per_iter = (time.time() - s) / niter print('time_per_iter', time_per_iter) print('GB/s', (input.numel() * 64 * 4 + out.numel() * 4) / time_per_iter / 1e9) ``` Reviewed By: bddppq Differential Revision: D16930519 fbshipit-source-id: 44d59ca2588deecde1adb096673fc100bcd9bc46	2019-08-23 17:15:44 -07:00
Ailing Zhang	35a00155e3	print padding_mode for Conv modules if not zeros (#23996 ) Summary: padding_mode info is helpful if it's not default. `Conv1d(3, 4, kernel_size=(2,), stride=(2,), padding=(1,), padding_mode=circular)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/23996 Differential Revision: D16766348 Pulled By: ailzhang fbshipit-source-id: b2511ec0ab6b6cfb32c0915fe9e84f9b96a641f5	2019-08-23 16:30:46 -07:00
Yinghai Lu	add57fd267	Support lowering of fp16 weights Summary: It's needed by fp16 SLS. Test Plan: The lowering works but NNPI doesn't seem to support fp16 SLS yet. Reviewed By: zrphercule Differential Revision: D16996047 fbshipit-source-id: e830e4926b416cb7770975838baf17a88dde6d91	2019-08-23 16:06:15 -07:00
zrphercule	bc83ed10fa	Revert "per channel quantization support (#24936 )" (#25131 ) Summary: This reverts commit 9e9965035ce92c7f3eda36fa9b18f4bc0042001b. Since it is breaking master Pull Request resolved: https://github.com/pytorch/pytorch/pull/25131 Differential Revision: D16997422 Pulled By: zrphercule fbshipit-source-id: cc467600fad4940e0db7b2387a0a6c938fe50470	2019-08-23 16:01:06 -07:00
Michael Suo	9854435588	move some methods into function.cpp (#25119 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25119 Make `defaultSchemaFor` an anonymous function and move it + its caller into function.cpp Purely mechanical changes Test Plan: Imported from OSS Differential Revision: D16994147 Pulled By: suo fbshipit-source-id: 96da8b3527eea37ad7beae433122384303a010c9	2019-08-23 15:58:49 -07:00
Michael Suo	65beee5872	Add a skip_override option to should_run_job.py (#25118 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25118 This allows people to temporarily disable a job from running on PRs. We should use this only if there is a long-running breakage that can't be fixed in a simple way. Test Plan: Imported from OSS Differential Revision: D16994074 Pulled By: suo fbshipit-source-id: 6aa9c618057c126d16065e53a60204665d8ff0eb	2019-08-23 15:51:28 -07:00
Wanchao Liang	5fd3251c50	add some sparse tensor ops support in TorchScript (#24967 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24967 Fixes https://github.com/pytorch/pytorch/issues/24140 Test Plan: Imported from OSS Differential Revision: D16975865 fbshipit-source-id: 134ecfff6ecb7144079d4eae85b186293aa26dd3	2019-08-23 15:48:14 -07:00
Daya Khudia	12ea1d74f0	Add missing functions and methods for channelwise quantization (#24934 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24934 1) Functions and methods to get scales and zero_points for channelwise quantization were missing. Adding these. 2) Correctly print quantized tensors for channelwise quantization. ghstack-source-id: 88868339 Test Plan: buck test mode/dev caffe2/test:quantized -- 'test_qtensor\ $test_quantized_tensor.TestQuantizedTensor$' --print-passing-details ``` Running 1 tests Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/1970324844629541 ✓ caffe2/test:quantized - test_qtensor (test_quantized_tensor.TestQuantizedTensor) 0.161 1/1 (passed) Test output: > test_qtensor (test_quantized_tensor.TestQuantizedTensor) ... ok > > ---------------------------------------------------------------------- > Ran 1 test in 0.161s > > OK Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/1970324844629541 Summary (total time 6.61s): PASS: 1 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 0 ``` To be added in a followup diff. Current output for printing qtensors: print(W_q.int_repr()) print(W_q) ``` > tensor([[[[-3, 0, 0], > [ 4, -2, -4], > [-1, -3, -2]], > > [[-3, 1, 3], > [-3, -3, 3], > [-3, -5, -1]]], > > > [[[ 4, -3, -4], > [ 4, -3, -3], > [ 4, -1, -1]], > > [[ 2, -3, 0], > [ 3, 1, 1], > [ 2, -4, 0]]]], dtype=torch.int8) > tensor([[[[-0.9273, -0.2318, -0.2318], > [ 0.6955, -0.6955, -1.1592], > [-0.4637, -0.9273, -0.6955]], > > [[-0.9273, 0.0000, 0.4637], > [-0.9273, -0.9273, 0.4637], > [-0.9273, -1.3910, -0.4637]]], > > > [[[ 0.3938, -0.1575, -0.2363], > [ 0.3938, -0.1575, -0.1575], > [ 0.3938, 0.0000, 0.0000]], > > [[ 0.2363, -0.1575, 0.0788], > [ 0.3150, 0.1575, 0.1575], > [ 0.2363, -0.2363, 0.0788]]]], size=(2, 2, 3, 3), dtype=torch.qint8, > quantization_scheme=torch.per_channel_affine, > scale=tensor([0.2318, 0.0788]), zero_point=tensor([ 1, -1])) ``` Differential Revision: D16659715 fbshipit-source-id: f8d3eeaff8f618aa0cca4fd076db73318e6df946	2019-08-23 15:44:16 -07:00
Daya Khudia	9e9965035c	per channel quantization support (#24936 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24936 Per channel quantization support in qconv2d + tests ghstack-source-id: 88897977 Test Plan: buck test mode/dev caffe2/test:quantized -- --print-passing-details ``` Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/4222124656103386 Summary (total time 64.42s): PASS: 33 FAIL: 0 SKIP: 3 caffe2/test:quantized - test_qlinear (test_quantized.TestDynamicQuantizedLinear) caffe2/test:quantized - test_qnnpack_relu (test_quantized.TestQNNPackOps) caffe2/test:quantized - test_qnnpack_linear (test_quantized.TestQNNPackOps) FATAL: 0 TIMEOUT: 0 OMIT: 0 ``` Differential Revision: D16909378 fbshipit-source-id: a5dbe00aab220a01557ef03c905dcbe4668432c4	2019-08-23 14:51:31 -07:00
Daya Khudia	aba15ce904	Per Channel quantization APIs (#24935 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24935 Adding per channel qtensor creation APIs Added two tests: EmptyPerchannelQuantized in aten/src/ATen/test/quantized_test.cpp test_perchannel_qtensor_creation in test/test_quantized_tensor.py ghstack-source-id: 88888140 Test Plan: buck test mode/dev caffe2/test:quantized -- 'test_per_channel_qtensor_creation' --print-passing-details Differential Revision: D16696959 fbshipit-source-id: f179247cc1c461bec215e17b51263060003493a5	2019-08-23 14:49:32 -07:00
Jianyu Huang	ad7250d315	Make EmbeddingLookup APIs take offsets rather than lengths to match the PyTorch's EmbeddingBag (#24944 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24944 As Title says, we would like to make the EmbeddingLookup APIs take offsets rather than lengths to match the PyTorch's EmbeddingBag. ghstack-source-id: 88883902 Test Plan: python hp_emblookup_codegen.py --use-offsets Check the benchmark in D16990830. Reviewed By: jspark1105 Differential Revision: D16924271 fbshipit-source-id: 7fac640c8587db59fd2304bb8e8d63c413f27cb8	2019-08-23 14:43:56 -07:00
David Reiss	6981b4e5bb	Update QNNPACK submodule to 901e9d4 (#25044 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25044 Bring in Windows fixes, new microkernels, and zero-batch support. Test Plan: CI Reviewed By: supriyar Differential Revision: D16946393 fbshipit-source-id: 3047eb73f1980e4178b795a20d53e744f176c2d8	2019-08-23 14:37:10 -07:00
Richard Zou	c013c06653	Add helper function Tensor::names() (#24914 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24914 There are two helpers, Tensor::names(), and Tensor::opt_names(). - Tensor::names() always returns a DimnameList; if the tensor doesn't have names, it returns a DimnameList of all `None` names. - Tensor::opt_names() returns an optional<DimnameList>: it returns names if the tensor has names allocated, otherwise, nullopt. Tensor::opt_names() is more of an implementation detail. It is recommended that devs use Tensor::has_names() and Tensor::names() because those result in a cleaner API. This PR also cleans up callsites of Tensor::opt_names() to use Tensor::names() where applicable. Finally, this PR also adds impl::get_names(TensorImpl), which is the analogous function for TensorImpl. (Tensor::opt_names() <-> impl::get_opt_names(TensorImpl*)). Test Plan: - run existing tests. [namedtensor ci] Differential Revision: D16919767 Pulled By: zou3519 fbshipit-source-id: ef30c9427a3d8e978d2e6d01c7f74f5174ccd52c	2019-08-23 14:32:15 -07:00
Richard Zou	530db2c7c2	Rename Tensor::names() to Tensor::opt_names() (#24907 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24907 This better reflects the semantics because Tensor::opt_names() returns an `optional<DimnameList>`, not just a DimnameList. Also rename `impl::get_names` to `impl::get_opt_names` (that is the `TensorImpl*` variant of `Tensor::opt_names()`. Test Plan - run existing tests [namedtensor ci] gh-metadata: pytorch pytorch 24907 gh/zou3519/110/head Test Plan: Imported from OSS Differential Revision: D16919768 Pulled By: zou3519 fbshipit-source-id: 094d404576b3f4b39629d0204e51c6ef48ee006e	2019-08-23 14:32:11 -07:00
Richard Zou	867d8af20f	Fix `FIXME_default_names` by storing static list of 64 none names (#24885 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24885 Store a static pre-allocated vector of names. When one calls `default_names`, it gives a const reference to some amount of these names. Also make clearer the maximum number of dimensions we support for named tensors. Right now it is 64 but that number is easy to change. 64 follows some internal pytorch maximum number of dimensions; TensorIterator reduce ops have a limit of 64 dims. Test Plan: - new tests [namedtensor ci] Differential Revision: D16915803 Pulled By: zou3519 fbshipit-source-id: 931741b199456f8976882b82f25ab5af6dcd108b	2019-08-23 14:32:07 -07:00
Sebastian Messmer	6c83424620	Optimize performance for unboxed-only kernels (#25055 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25055 An ATen kernel registered with the c10 dispatcher doesn't need a cache, so let's not call a cache creator function when the kernel is looked up. ghstack-source-id: 88834902 Test Plan: unit tests Differential Revision: D16974248 fbshipit-source-id: 5f9e65d706ec5f836804cb6e5f693f5a01f66714	2019-08-23 14:09:50 -07:00
zrphercule	5b84514a9f	Fix lint checker breakage caused by #25111 (#25122 ) Summary: fix lint by flake8 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25122 Differential Revision: D16995103 Pulled By: zrphercule fbshipit-source-id: 810be4d8073cae73d4b0f6d82b410fd235a73bbb	2019-08-23 14:07:31 -07:00
Michael Suo	199e15faf2	fix clang-tidy failing on master (#25121 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25121 Turns out there is a more idiomatic way to use azure variables. This also fixes clang-tidy failing on master Test Plan: Imported from OSS Differential Revision: D16994595 Pulled By: suo fbshipit-source-id: 5c5b1b47ced57cff85c4302cde43ff8c8c3f54c0	2019-08-23 13:50:24 -07:00
Basil Hosmer	2ec23804e2	dictPop: dereference dict.find() iterator before calling dict.erase() (#25056 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25056 For some combinations of key and entry ordering (and only on an OSX build) dict.pop() would return a value other than the popped one, failing test_pop in test_jit.py. Caused by erase() mutating the iterator returned from find(), fixed by dereferencing it first. Test Plan: Imported from OSS Differential Revision: D16975020 Pulled By: bhosmer fbshipit-source-id: ce84e9aed6b90010121c0ef5d6c9ed8d2d1356b8	2019-08-23 13:16:46 -07:00
Elias Ellison	ab38059bc7	fix annotated assignment (#25094 ) Summary: Fixing parsing for annotated assignment `List[int] a = []`. See https://github.com/pytorch/pytorch/pull/24989/files?file-filters%5B%5D=.py for changes to the test_jit_py3 & run_test files. follow up to https://github.com/pytorch/pytorch/pull/24477 and fix for https://github.com/pytorch/pytorch/issues/25086 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25094 Differential Revision: D16985016 Pulled By: eellison fbshipit-source-id: 6be1363f2503303b96bd2e6a9f188ad72441f4eb	2019-08-23 13:14:38 -07:00
davidriazati	1c4495d8ac	Clean up after running doc tests (#25036 ) Summary: ](https://our.intern.facebook.com/intern/diff/16965612/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/25036 Pulled By: driazati Differential Revision: D16965612 fbshipit-source-id: 494a734c27c1330ea0917397efbad6bc4f40be73	2019-08-23 12:52:48 -07:00
Michael Suo	d1f0823d23	fix clang-tidy failing all the time on random lines (#25078 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25078 Our script is set up to only run on lines generated by diffing your branch against the base branch. But we were using `$TRAVIS_BRANCH` to refer to the target branch, which was causing the script to diff against master, generating many spurious lines of diff output to be clang-tidy'd Test Plan: Imported from OSS Differential Revision: D16993054 Pulled By: suo fbshipit-source-id: 7bffa890f6a1a2d5566ef01b9798c4eb86d8169f	2019-08-23 12:50:06 -07:00
Ivan Kobzarev	2cccad2c56	Turn off fbgemm for libtorch android build (#25113 ) Summary: https://github.com/pytorch/FBGEMM (USE_FBGEMM is ON by default for x86, x86_64) Build libtorch for android_abi x86_64 fails due to this. Turning it off for android builds Pull Request resolved: https://github.com/pytorch/pytorch/pull/25113 Reviewed By: dreiss Differential Revision: D16992459 Pulled By: IvanKobzarev fbshipit-source-id: 3cf35a67043288cb591cc3b23c261258c28cf304	2019-08-23 12:47:53 -07:00
Tongzhou Wang	e42b238f7f	pin_memory thread now uses 1 thread only (#25111 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/25010 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25111 Differential Revision: D16992718 Pulled By: soumith fbshipit-source-id: fe23721d4cc293fa245c84c656241730335077dd	2019-08-23 12:42:11 -07:00
Richard Zou	9a793a49e7	Add thread-local-state NamesMode and NoNamesGuard (#24942 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24942 NamesMode determines whether or not to ignore the names field of TensorImpl. In particular, when it is disabled, all tensors are treated as unnamed. Test Plan: - New tests [namedtensor ci] Differential Revision: D16930708 Pulled By: zou3519 fbshipit-source-id: 867b31c4daff4e1eabafea45ed489efda4471efb	2019-08-23 11:46:54 -07:00
Ivan Kobzarev	56245ffe05	Fix python lints for generate_test_torchscripts.py (#25107 ) Summary: Fix lints, checked with flake8 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25107 Reviewed By: zrphercule Differential Revision: D16991296 Pulled By: IvanKobzarev fbshipit-source-id: 5b69d716e3c458dc2cfe5b668a390c7272b1c74f	2019-08-23 11:37:23 -07:00
Jianyu Huang	649c9cd1ca	Enable UBSAN test for FBGEMM in dynamic quant test (#25099 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25099 As title says ghstack-source-id: 88875870 Test Plan: CI Differential Revision: D16986248 fbshipit-source-id: 2a0de41e89e413a32957b12308e5e6f48715477f	2019-08-23 10:44:38 -07:00
Ivan Kobzarev	d62bca9792	jni-java wrapper for pytorchScript api (#25084 ) Summary: TLDR; initial commit of android java-jni wrapper of pytorchscript c++ api The main idea is to provide java interface for android developers to use pytorchscript modules. java API tries to repeat semantic of c++ and python pytorchscript API org.pytorch.Module (wrapper of torch::jit::script::Module) - static Module load(String path) - IValue forward(IValue... inputs) - IValue runMethod(String methodName, IValue... inputs) org.pytorch.Tensor (semantic of at::Tensor) - newFloatTensor(long[] dims, float[] data) - newFloatTensor(long[] dims, FloatBuffer data) - newIntTensor(long[] dims, int[] data) - newIntTensor(long[] dims, IntBuffer data) - newByteTensor(long[] dims, byte[] data) - newByteTensor(long[] dims, ByteBuffer data) org.pytorch.IValue (semantic of at::IValue) - static factory methods to create pytorchscript supported types Examples of usage api could be found in PytorchInstrumentedTests.java: Module module = Module.load(path); IValue input = IValue.tensor(Tensor.newByteTensor(new long[]{1}, Tensor.allocateByteBuffer(1))); IValue output = module.forward(input); Tensor outputTensor = output.getTensor(); ThreadSafety: Api is not thread safe, all synchronization must be done on caller side. Mutability: org.pytorch.Tensor buffer is DirectBuffer with native byte order, can be created with static factory methods specifing DirectBuffer. At the moment org.pytorch.Tensor does not hold at::Tensor on jni side, it has: long[] dimensions, type, DirectByteBuffer blobData Input tensors are mutable (can be modified and used for the next inference), Uses values from buffer on the momment of Module#forward or Module#runMethod calls. Buffers of input tensors is used directly by input at::Tensor Output is copied from output at::Tensor and is immutable. Dependencies: Jni level is implemented with usage of fbjni library, that was developed in Facebook, and was already used and opensourced in several opensource projects, added to the repo as submodule from personal account to be able to switch submodule when fbjni will be opensourced separately. ghstack-source-id: b39c848359a70d717f2830a15265e4aa122279c0 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25084 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25105 Reviewed By: dreiss Differential Revision: D16988107 Pulled By: IvanKobzarev fbshipit-source-id: 41ca7c9869f8370b8504c2ef8a96047cc16516d4	2019-08-23 10:42:44 -07:00
Richard Zou	3a59a9b36c	Implement name inference for t(), transpose(...) (#24941 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24941 Test Plan - [namedtensor ci] Test Plan: Imported from OSS Differential Revision: D16930707 Pulled By: zou3519 fbshipit-source-id: 833a2bfd27f3bb3b7bc4327ac62a1d02ec526127	2019-08-23 09:01:53 -07:00
Mads R. B. Kristensen	f583f2e657	Fixed test_numba_integration (#25017 ) Summary: The semantic of the _auto-convert GPU arrays that support the __cuda_array_interface__ protocol_ has changed a bit. It used to throw an exception when using `touch.as_tensor(...,device=D)` where `D` is a CUDA device not used in `__cuda_array_interface__`. Now, this is supported and will result in an implicit copy. I do not what have changes but `from_blob()` now supports that the input and the output device differ. I have updated the tests to reflect this, which fixes https://github.com/pytorch/pytorch/issues/24968 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25017 Differential Revision: D16986240 Pulled By: soumith fbshipit-source-id: e6f7e2472365f924ca155ce006c8a9213f0743a7	2019-08-23 08:58:08 -07:00
Zachary DeVito	5254b12002	cleanup tmp name generation (#25065 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25065 Using global atomic variables is bad because sending the same AST through the compiler twice will produce different graphs. This makes it a member of the translation struct. Test Plan: Imported from OSS Differential Revision: D16975355 Pulled By: zdevito fbshipit-source-id: 23e15ffd58937a207898a4c4bed82628237e3c2e	2019-08-22 22:49:16 -07:00
bnehoran	0ae030f87e	Typo correction in cuda_deterministic_backward.rst (#25011 ) Summary: I presume this is what was intended. cc t-vi Pull Request resolved: https://github.com/pytorch/pytorch/pull/25011 Differential Revision: D16980939 Pulled By: soumith fbshipit-source-id: c55b22e119f3894bd124eb1dce4f92a719ac047a	2019-08-22 21:19:39 -07:00
James Reed	192a26249d	Temporarily fix hub SSL cert issue (#25042 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25042 [ci pytorch_linux_xenial_cuda9_cudnn7_py2_test] Test Plan: Imported from OSS Differential Revision: D16974162 Pulled By: jamesr66a fbshipit-source-id: 52b00dec748b2704941f634b7a9a3671a2627b89	2019-08-22 18:08:45 -07:00
Mikhail Zolotukhin	5c78e0c470	Fix a bug in creating a prefix string in jit log. (#25051 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25051 In #24355 I factored out a function for creating a prefix in jit_log, but I made a copypasta error there: the prefix stringstream was initialized from the input string instead of an empty string. Test Plan: Imported from OSS Differential Revision: D16974156 Pulled By: ZolotukhinM fbshipit-source-id: 014fe0e3366e85e984a6936ec9bb17f571107f6e	2019-08-22 17:44:42 -07:00
Alexander Sidorov	e92506a258	BlackBoxPredictor OSS part N + 1 : strip fb/predictor/Transforms.h dependency (#23350 ) (#23350 ) Summary: Overal context: open-source BlackBoxPredictor as the entry point for inference in Caffe2 (thread safe abstraction for Caffe2 inference). This should be used in ThroughputBenchmark for the purpose of framework comparison This specific diff: There should be no harm in moving transformation code to OSS. On the advantages side we will be able to compare production Caffe2 setup with PyTorch in the most fair way via ThroughputBenchmark. This approach avoid any complicated transformation regirstries. Building those proper would be significant engineering effort as well as production risk. In the past we had SEVs related to transforms being turned off due to various refactors. Given that we don't plan to build any other significant investments into transformation logic except existing ones (like TVM and Glow), and those also relate to open-source technologies, I came up to the conclusion of moving to OSS the whole thing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23350 ghstack-source-id: 87121538 Pull Request resolved: https://github.com/pytorch/pytorch/pull/24928 Test Plan: waitforsandcastle Differential Revision: D16445133 Pulled By: salexspb fbshipit-source-id: a93106489611dfe427b0f144717bc720d04e47f3	2019-08-22 17:11:00 -07:00
Zafar Takhirov	9764c2e6f0	Adding quantized mul kernel Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24444 Test Plan: Imported from OSS Differential Revision: D16844824 Pulled By: zafartahirov fbshipit-source-id: 626c40e1cad8329c3d8517156f2d36d7a7472890	2019-08-22 16:54:15 -07:00
Zachary DeVito	f9f5af0ed7	Revert D16949314: [jit] Fix bugs in assignment to optionals Test Plan: revert-hammer Differential Revision: D16949314 Original commit changeset: 7f63d88b30a3 fbshipit-source-id: d1f00de2ad9c3484b731ad1b24205ca60024355d	2019-08-22 16:50:48 -07:00
Zachary DeVito	bb79b61ce7	Fix bugs in assignment to optionals (#24989 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24989 This fixes the cases where a type annotated with optional cannot be conditionally assigned to none: ``` x : Optional[int] = 4 if ...: x = None ``` Test Plan: Imported from OSS Differential Revision: D16949314 Pulled By: zdevito fbshipit-source-id: 7f63d88b30a3f5b024c2a539aa74967c9202af00	2019-08-22 16:27:46 -07:00
Pritam Damania	f8611eaa7e	Disable tsan for test_dataloader.py. (#25005 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25005 Seeing a bunch of failures in TSAN mostly with the following error: ``` ThreadSanitizer: starting new threads after multi-threaded fork is not supported. Dying (set die_after_fork=0 to override) ``` TSAN is unsafe to use in a multi-threaded program after fork() and setting die_after_fork can lead to deadlocks. As a result, I'm disabling tsan. ghstack-source-id: 88765698 Differential Revision: D16954347 fbshipit-source-id: 18895cd82b5052938284b46479d8470af2d74a06	2019-08-22 16:20:54 -07:00
Pritam Damania	149c646b74	Detect and handle NCCL errors appropriately in ProcessGroupNCCL. (#25012 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25012 Resubmitting https://github.com/pytorch/pytorch/pull/22907 with build fix. This change adds the following functionality: 1) WorkNCCL isCompleted, isSuccess methods check for NCCL errors and set the appropriate exception. 2) Added a watchdog thread to ProcessGroupNCCL which checks for errors in the cached communicators and removes them from the cache. 3) Use ncclCommAbort in NCCLComm destructor since ncclCommDestroy can block forever waiting for work. 4) Added a simulate_nccl_errors.py script to simulate NCCL errors. https://github.com/pytorch/pytorch/issues/17882 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22907 Test Plan: 1) Run the simulate_nccl_errors.py to verify NCCL errors are caught. Differential Revision: D16958078 fbshipit-source-id: 662b0b8b8ee250e2b6d15bdfc9306d71c4f66219	2019-08-22 16:12:41 -07:00
Jiakai Liu	1037652224	disable custom class logic for mobile build to avoid rtti (#24994 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24994 Use C10_MOBILE to gate CustomClass lookup logic for mobile build, which uses typeid() and requires "-frtti" which is off by default for internal mobile build. Not sure whether we ever need CustomClass for internal use cases. Feel the change is not too intrusive - but I'm willing to hear others' thoughts. ghstack-source-id: 88754932 Reviewed By: dreiss Differential Revision: D16951430 fbshipit-source-id: 445f47ee4e9c16260e2fd2c43f5684cea602e3d9	2019-08-22 14:44:30 -07:00
Vitaly Fedyunin	a805a0d3ca	Remove deprecated TH(topk) code. #24778 (#24857 ) Summary: Remove deprecated TH(topk) code. https://github.com/pytorch/pytorch/issues/24778 Pull Request resolved: https://github.com/pytorch/pytorch/pull/24857 Differential Revision: D16916614 Pulled By: VitalyFedyunin fbshipit-source-id: 00299fb85614b87f69b77d9672a4ace33d6cdfaa	2019-08-22 13:19:46 -07:00
Elias Ellison	664555c757	Fix fbcode weak ordering (#25026 ) Summary: Same FBCode-only weak ordering issue as previously encountered :( Internal assert fails a test Pull Request resolved: https://github.com/pytorch/pytorch/pull/25026 Differential Revision: D16966994 Pulled By: eellison fbshipit-source-id: 649331ae1317df870f26a968e3f40f2b7a3a072a	2019-08-22 12:00:19 -07:00
Zachary DeVito	e2ccccee9a	Load tensors directly from pickle archive Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23281 Test Plan: Imported from OSS Differential Revision: D16452815 Pulled By: zdevito fbshipit-source-id: 918eef3ad444b598ab655c39037e4baafdcb51e1	2019-08-22 11:48:09 -07:00
Igor Fedan	c33adf539c	Fix for cdist backward for non-batch tensors (#22915 ) Summary: Fix for: https://github.com/pytorch/pytorch/issues/22353 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22915 Differential Revision: D16291406 Pulled By: ifedan fbshipit-source-id: 2decbe94c95165f7ddb2c8e2f4c4747c19069a4c	2019-08-22 11:36:37 -07:00
Rohan Varma	4b77cae360	Add qconv_test to benchmarking tests (#24913 ) Summary: Adds the tests defined in `qconv_tests.py` to `benchmark_all_tests.py` so that they are ran by `benchmark_all_tests`. The next diff will create another `ai_benchmark_test` specifying the qconv operations similar to D16768680. Since AI-PEP integrates with benchmark_all_tests, this should add these qconv benchmarks to AI-PEP. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24913 Test Plan: `buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test` (runs only test who's `tag` is `short`) `buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --tag_filter resnext101_32x4d` (runs test who's `tag` is `resxnet101_32x4d`). This runs the tests for all the imported modules in `benchmark_all_test.py` (i.e. add_test, batchnorm_test, qconv_test, etc) ``` buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operators QConv2d,QLinear ``` tests the QConv and QLinear operators Relevant output for `qconv_test.py` (for short tag): ``` # Benchmarking PyTorch: QConv2d # Mode: Eager # Name: QConv2d_N1_IC64_OC128_H56_W56_G1_kernel1_stride1_pad0 # Input: N: 1, IC: 64, OC: 128, H: 56, W: 56, G: 1, kernel: 1, stride: 1, pad: 0 Forward Execution Time (us) : 957.848 # Benchmarking PyTorch: QConv2d # Mode: Eager # Name: QConv2d_N1_IC256_OC256_H56_W56_G32_kernel3_stride1_pad1 # Input: N: 1, IC: 256, OC: 256, H: 56, W: 56, G: 32, kernel: 3, stride: 1, pad: 1 Forward Execution Time (us) : 3638.806 # Benchmarking PyTorch: QConv2d # Mode: Eager # Name: QConv2d_N1_IC256_OC256_H56_W56_G1_kernel1_stride1_pad0 # Input: N: 1, IC: 256, OC: 256, H: 56, W: 56, G: 1, kernel: 1, stride: 1, pad: 0 Forward Execution Time (us) : 3870.311 # Benchmarking PyTorch: QConv2d # Mode: Eager # Name: QConv2d_N1_IC512_OC512_H56_W56_G32_kernel3_stride2_pad1 # Input: N: 1, IC: 512, OC: 512, H: 56, W: 56, G: 32, kernel: 3, stride: 2, pad: 1 Forward Execution Time (us) : 10052.192 ``` For resnext tag: ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : resnext101_32x4d # Benchmarking PyTorch: QConv2d # Mode: Eager # Name: QConv2d_N1_IC512_OC512_H14_W14_G32_kernel3_stride1_pad1 # Input: N: 1, IC: 512, OC: 512, H: 14, W: 14, G: 32, kernel: 3, stride: 1, pad: 1 Forward Execution Time (us) : 543.171 # Benchmarking PyTorch: QConv2d # Mode: Eager # Name: QConv2d_N1_IC512_OC1024_H28_W28_G1_kernel1_stride2_pad0 # Input: N: 1, IC: 512, OC: 1024, H: 28, W: 28, G: 1, kernel: 1, stride: 2, pad: 0 Forward Execution Time (us) : 1914.301 # Benchmarking PyTorch: QConv2d # Mode: Eager # Name: QConv2d_N1_IC512_OC256_H28_W28_G1_kernel1_stride1_pad0 # Input: N: 1, IC: 512, OC: 256, H: 28, W: 28, G: 1, kernel: 1, stride: 1, pad: 0 Forward Execution Time (us) : 1809.069 # Benchmarking PyTorch: QConv2d # Mode: Eager # Name: QConv2d_N1_IC512_OC512_H28_W28_G1_kernel1_stride1_pad0 # Input: N: 1, IC: 512, OC: 512, H: 28, W: 28, G: 1, kernel: 1, stride: 1, pad: 0 Forward Execution Time (us) : 3100.579 # Benchmarking PyTorch: QConv2d # Mode: Eager # Name: QConv2d_N1_IC512_OC512_H28_W28_G32_kernel3_stride2_pad1 # Input: N: 1, IC: 512, OC: 512, H: 28, W: 28, G: 32, kernel: 3, stride: 2, pad: 1 Forward Execution Time (us) : 2247.540 # Benchmarking PyTorch: QConv2d # Mode: Eager # Name: QConv2d_N1_IC64_OC128_H56_W56_G1_kernel1_stride1_pad0 # Input: N: 1, IC: 64, OC: 128, H: 56, W: 56, G: 1, kernel: 1, stride: 1, pad: 0 Forward Execution Time (us) : 1001.731 # Benchmarking PyTorch: QConv2d # Mode: Eager # Name: QConv2d_N1_IC64_OC256_H56_W56_G1_kernel1_stride1_pad0 # Input: N: 1, IC: 64, OC: 256, H: 56, W: 56, G: 1, kernel: 1, stride: 1, pad: 0 Forward Execution Time (us) : 1571.620 ``` Differential Revision: D16908445 Pulled By: rohan-varma fbshipit-source-id: b711bc3591ce5dcd3ab2521134cff2b12188e3ac	2019-08-22 11:28:49 -07:00
James Reed	049284e14d	Make observer scriptable Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24996 Test Plan: Imported from OSS Differential Revision: D16952938 Pulled By: jamesr66a fbshipit-source-id: 3d08e0c746603d0fe090fb3dbf13c5fc9dc022f4	2019-08-22 11:28:45 -07:00
Guanheng Zhang	956a347e68	Fix the lint error in transformer doc. (#25027 ) Summary: Fix the lint error in transformer doc. https://github.com/pytorch/pytorch/pull/24837 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25027 Differential Revision: D16963508 Pulled By: zhangguanheng66 fbshipit-source-id: 3f70e32c74d2319ffb8e2143d3181ed38e62067d	2019-08-22 11:28:41 -07:00
Xiao Fang	3385693edd	gradient clipping by norm Summary: as titled Reviewed By: hbjerry, alyssawangqq Differential Revision: D16797498 fbshipit-source-id: 4ea05ab9f06b309d32faa3218e79899c9f8d9cf2	2019-08-22 11:20:40 -07:00
Jerry Zhang	1a2a9fab31	Remove Symmetric Quantizer in backend (#24964 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24964 To reduce complications in quantized kernel implementation, we decided not to have symmetric quantizer, since it can be expressed by affine quantizer, but we will still have symmetric quantization qscheme in frontend, and user can still specify tensors to be symmetrically quantized, while the actual quantized Tensor represetation will only have affine quantization. Differential Revision: D16965114 fbshipit-source-id: 0e9a5a00131878a302e211fda65a1aa427204eea	2019-08-22 11:18:09 -07:00
Elias Ellison	e8ea44796e	add support for multiple assignment statements (#24477 ) Summary: add support for : `a = b, c = (1, 2)` partial fix for https://github.com/pytorch/pytorch/issues/24256 Pull Request resolved: https://github.com/pytorch/pytorch/pull/24477 Differential Revision: D16963413 Pulled By: eellison fbshipit-source-id: 0433a1e759b3aa719ef1b766bb5160f2ca814205	2019-08-22 10:17:14 -07:00
Igor Fedan	901f9eaa89	Migrate erfinv and erfinv_ from the TH to Aten(CPU) (#24908 ) Summary: https://github.com/pytorch/pytorch/issues/24698 Pull Request resolved: https://github.com/pytorch/pytorch/pull/24908 Differential Revision: D16962714 Pulled By: ifedan fbshipit-source-id: 51309b601fbd29e09f5a0efa67eb24a184fee81a	2019-08-22 10:13:28 -07:00
Vaibhav Sinha	632aeb034d	Fix log_prob() in torch.distributions.Uniform, HalfCauchy and Gamma (#23017 ) Summary: This fixes https://github.com/pytorch/pytorch/issues/22970. Specifically, `torch.distributions.uniform.Uniform.log_prob()` now works even if `value` is passed as a python float. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23017 Differential Revision: D16383258 Pulled By: vincentqb fbshipit-source-id: 26943c33431d6da6f47e0897d6eda1c5f5541d28	2019-08-22 08:19:41 -07:00
John Olafenwa	b9a5188178	Fixed Error in Transformer Example (#24837 ) Summary: In the examples for creating an instance of the Transformer module, src and tgt parameters (from forward) were added which are not present in the __init__ . Pull Request resolved: https://github.com/pytorch/pytorch/pull/24837 Differential Revision: D16938065 Pulled By: zhangguanheng66 fbshipit-source-id: 7b2d2180d95ddb65930ad83c87c926e35f2bf521	2019-08-22 07:37:24 -07:00
Xintao Chen	789f4ad87b	Fixing size implementation for struct slot_list_impl (#24351 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24351 Context: I was doing some exploration on APIs for jit script module internals. I found there can be a bug(cannot cast Module to Slot) when I try to check size of sub_modules in one module. (please also provide suggestions if you think my diff is not optimal or wrong) See the following: for (auto m1 : module.get_modules()) { // module is the module loading from P79892263. std::cout << "test module " << " " << m1.get_modules().size() << "\n"; } With this change, its going to return 0 (expected) Without this change, the following error will throw: P79892732 Also, I put a RFC here since I am looking for some ideas for any tests I should add, and where I should add those tests. Reviewed By: smessmer Differential Revision: D16803759 fbshipit-source-id: 1e2ae6b69d9790c700119d2d0b9f9f85f41616d4	2019-08-22 03:39:45 -07:00
Pritam Damania	310c5be005	Skip setting `CUDA_NVCC_EXECUTABLE` if `CACHE_WRAPPER_DIR` not set. (#25006 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25006 Builds without sccache or ccache would run into issues since `CACHE_WRAPPER_DIR` would not be set. As a result `CUDA_NVCC_EXECUTABLE` would be set to /nvcc and the build would fail. ghstack-source-id: 88766907 Differential Revision: D16954651 fbshipit-source-id: fea41da52dc9f8f03e6356d348f5900978db3651	2019-08-21 21:27:23 -07:00
bnehoran	74b65c32be	Add align_corners option to grid_sample and affine_grid, change default to False (#24929 ) Summary: Resolves: https://github.com/pytorch/pytorch/issues/20785 Addresses https://github.com/pytorch/pytorch/issues/24470 for `affine_grid` Subsumes and closes: https://github.com/pytorch/pytorch/pull/24878 and likewise closes: https://github.com/pytorch/pytorch/issues/24821 Adds the `align_corners` option to `grid_sample` and `affine_grid`, paralleling the option that was added to `interpolate` in version 0.4.0. In short, setting `align_corners` to `False` allows these functions to be resolution agnostic. This ensures, for example, that a grid generated from a neural net trained to warp 1024x1024 images will also work to warp the same image upsampled/downsampled to other resolutions like 512x512 or 2048x2048 without producing scaling/stretching artifacts. Refer to the documentation and https://github.com/pytorch/pytorch/issues/20785 for more details. #### BC-Breaking Changes - Important: BC-Breaking change because of new default for `align_corners` The old functionality can still be achieved by setting `align_corners=True`, but the default is now set to `align_corners=False`, since this is the more correct setting, and since this matches the default setting of `interpolate`. - Should not cause BC issues: BC-Breaking change for pathological use case 2D affine transforms on 1D coordinates and 3D affine transforms on 2D coordinates (that is, when one of the spatial dimensions has an empty span) are ill-defined, and not an intended use case of `affine_grid`. Whereas before, all grid point components along such dimension were set arbitrarily to `-1` (that is, before multiplying be the affine matrix), they are now all set instead to `0`, which is a much more consistent and defensible arbitrary choice. A warning is triggered for such cases. #### Documentation - Update `affine_grid` documentation to express that it does indeed support 3D affine transforms. This support was already there but not documented. - Add documentation warnings for BC-breaking changes in `grid_sample` and `affine_grid` (see above). #### Refactors - `affine_grid` no longer dispatches to cuDNN under any circumstances. The decision point for when the cuDNN `affine_grid_generator` is compatible with the native PyTorch version and when it fails is a headache to maintain (see [these conditions](`5377478e94/torch/nn/_functions/vision.py (L7-L8)`)). The native PyTorch kernel is now used in all cases. - The kernels for `grid_sample` are slightly refactored to make maintenance easier. #### Tests Two new tests are added in `test_nn.py`: - `test_affine_grid_error_checking` for errors and warnings in `affine_grid` - `test_affine_grid_3D` for testing `affine_grid`'s 3D functionality. The functionality existed prior to this, but wasn't tested. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24929 Differential Revision: D16949064 Pulled By: ailzhang fbshipit-source-id: b133ce0d47a2a5b3e2140b9d05fb05fca9140926	2019-08-21 21:17:49 -07:00
Will Feng	420b37f3c6	Deprecate tensor.data<T>(), and codemod tensor.data<T>() to tensor.data_ptr<T>() (#24886 ) Summary: This PR adds deprecation message for `tensor.data<T>()` (`91d94e7d41`), and changes all call sites of `tensor.data<T>()` to `tensor.data_ptr<T>()` in PyTorch core. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24886 Differential Revision: D16924576 Pulled By: yf225 fbshipit-source-id: 0943d6be73245c7c549c78597b74c3b07fa24440	2019-08-21 20:11:24 -07:00
Michael Suo	aa66146974	Add ASAN instructions to CONTRIBUTING.md Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24848 Test Plan: Imported from OSS Differential Revision: D16896673 Pulled By: suo fbshipit-source-id: 32e58abe9fd79a8217cecbc7832e436684edaf80	2019-08-21 19:16:21 -07:00
James Reed	173dc5d16f	__reduce__ for QScheme (#24969 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24969 This allows pickling qscheme objects Test Plan: Imported from OSS Differential Revision: D16946567 Pulled By: jamesr66a fbshipit-source-id: 57dbedb1e1aca2a2e17916eed662f727053ea926	2019-08-21 19:08:54 -07:00
James Reed	4966268a1d	Move CPU-only jobs to xenial Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24506 Test Plan: Imported from OSS Differential Revision: D16862005 Pulled By: jamesr66a fbshipit-source-id: cc4b3eee7f442a63ddc68667ac42404fe0b49d6c	2019-08-21 18:12:55 -07:00
davidriazati	6dca147946	Misc doc updates #2 (#24445 ) Summary: Another pass over the docs, this covers most of the remaining stuff * content updates for new API * adds links to functions instead of just names * removes some useless indentations * some more code examples + `testcode`s ](https://our.intern.facebook.com/intern/diff/16847964/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/24445 Pulled By: driazati Differential Revision: D16847964 fbshipit-source-id: cd0b403fe4a89802ce79289f7cf54ee0cea45073	2019-08-21 16:45:19 -07:00
Jerry Zhang	0eb55f9ddd	PrepareQuant step (#24425 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24425 - _jit_pass_prepare_quant: clone the observer module in argument and insert that to the module we want to quantize, insert observer calls for the Tensor values we want to observe Differential Revision: D16933120 fbshipit-source-id: 7248de6132429ba943a09831a76486f7a3cd52a3	2019-08-21 16:39:48 -07:00
Roy Li	14ac7a1d87	Add epsilon argument to Adagrad optimizer (#24980 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24980 We'll need this internally, so just updating the open source version. the other optimizers have this argument anyways. Test Plan: Imported from OSS Differential Revision: D16945279 Pulled By: li-roy fbshipit-source-id: 0b8cc86f15387cd65660747899d3d7dd870cff27	2019-08-21 16:36:51 -07:00
Michael Suo	65d650c6c6	restore default constructor of OutputArchive (#24955 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24955 Some third-party code relies on this default constructor. It's not invalid to construct an OuputArchive with an indepednent CU, so restoring it. Test Plan: Imported from OSS Differential Revision: D16935254 Pulled By: suo fbshipit-source-id: 40b6494e36d10c5009b3031648bee96b2e38b49a	2019-08-21 16:13:13 -07:00
Sebastian Messmer	38314e5b3f	Improve c10 dispatcher lookup perf (#24882 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24882 Previously, looking up a kernel accidentally copied the DispatchTableEntry, which has as its member a std::function cache creator function. Being an std::function, it was expensive to copy and cost us more than 50ns on each op call. This diff fixes this by not copying DispatchTableEntry anymore. ghstack-source-id: 88611173 Differential Revision: D16910530 fbshipit-source-id: 44eeaa7f6ffead940b4a124f0c31d8ef71404db3	2019-08-21 14:12:27 -07:00
Zafar Takhirov	a99a4485fa	Added relu6 kernel (#24799 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24799 Pull Request resolved: https://github.com/pytorch/pytorch/pull/24799 Differential Revision: D16875493 Test Plan: Imported from OSS Pulled By: zafartahirov fbshipit-source-id: 0d256db193c6a8e0d37dbdf6cf35dd031fd4ec6c	2019-08-21 13:57:00 -07:00
Zafar Takhirov	81ac6260d8	Use absolute import of the parent folder without alias. (#24792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24792 This will prevent the circular dependencies in the future Differential Revision: D16868861 Test Plan: Imported from OSS Pulled By: zafartahirov fbshipit-source-id: 92cf77094b2c56560d380c1fd1df8e1e24a86359	2019-08-21 13:36:23 -07:00
Richard Zou	e6b0ebdfd5	Fix named tensor build (#24940 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24940 We're missing an include for named tensors in templates/TypeDefault.h. Test Plan: - run ci [namedtensor ci] Differential Revision: D16930709 Pulled By: zou3519 fbshipit-source-id: c15d631761a78d5e50fe265a3129239e72042a83	2019-08-21 11:31:14 -07:00
Wanchao Liang	f6daab5686	bind autograd.grad function into TorchScript (#24871 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24871 Bind the torch.autograd.grad function into TorchScript so that well formed inputs can directly call this from a TorchScript function. This also change the serliazation a bit, it fixes a small bug where node output type can never be tensor type in prim::ListConstruct(only its elementype can be), and add the case where we need to annotate the ListType if the element type is optional type to preserve type information when reimport Differential Revision: D16923273 fbshipit-source-id: 151cc13411c8c287def35b4e65122d9fc083ccfd	2019-08-21 11:22:23 -07:00
BowenBao	f21265203e	Update onnxruntime CI version (#24414 ) Summary: Use explicit versioned nightly whl such that to provide coverage of ONNX updates not in release yet. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24414 Differential Revision: D16940810 Pulled By: bddppq fbshipit-source-id: 7bf76554898958e0f48883a1da7a3bdc781be7f8	2019-08-21 11:19:42 -07:00
davidriazati	b99ab492ea	Fix missing `super` call error Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24852 Pulled By: driazati Differential Revision: D16902742 fbshipit-source-id: a72403dc37a406ee228d3b19afc22bd86812f962	2019-08-21 10:53:38 -07:00
davidriazati	3d27e6327e	Remove `torch.contrib._graph_vis` (#24874 ) Summary: This hasn't been edited in a while and doesn't work anymore. Its use case is also served pretty well by `script_module.code`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24874 Pulled By: driazati Differential Revision: D16941025 fbshipit-source-id: 11acd05cea5e44eeb1d48188a2de645669b21610	2019-08-21 10:48:07 -07:00
Tao Xu	4659269d1b	Remove unused ATen headers for mobile (#24850 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24850 ### Summary There are 1373 header files in total that have been installed on mobile, many of which are not being used. Take ATen for example, there are 165 header files in total. Folders like `cuda/`, `cudann`, `miopen`, etc are not needed. This PR will remove 33 unnecessary header files as well as some cuda files. ### Test Plan - `build_ios.sh` finished successfully - `libtorch.a` can be compiled and run on mobile Test Plan: Imported from OSS Differential Revision: D16897314 Pulled By: xta0 fbshipit-source-id: 54e046936439a549fe633ec791a10a2a3d36fa8b	2019-08-21 10:04:49 -07:00
Michael Suo	b3008fad2e	Revert D16220638: [pytorch][PR] Detect and handle NCCL errors appropriately in ProcessGroupNCCL. Differential Revision: D16220638 Original commit changeset: fbc8881ea0c3 fbshipit-source-id: 10d2f3d446064adb3cf44e1f9911dcf259bbfbfb	2019-08-21 09:40:38 -07:00
Stefan Krah	1b8efd3d92	Avoid race condition in intrusive_ptr.reset_() (#24464 ) Summary: This is a spin-off from https://github.com/pytorch/pytorch/issues/24368 (high priority inherited from https://github.com/pytorch/pytorch/issues/3818). Pull Request resolved: https://github.com/pytorch/pytorch/pull/24464 Differential Revision: D16928898 Pulled By: ezyang fbshipit-source-id: 2d66c7adbd52de52869b1fe69ce2842d035dbf86	2019-08-21 03:48:35 -07:00
Johannes M Dieterich	da860bda3d	Use correct WARP_SIZE for ROCm for EmbeddingBag Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24868 Differential Revision: D16936540 Pulled By: ezyang fbshipit-source-id: 08def671416888eb6ce57a690c7ff7743c21ad8c	2019-08-21 03:17:52 -07:00
davidriazati	1d53d07566	Add docs to CI (#24435 ) Summary: Stacked PRs * #24445 - [jit] Misc doc updates #2 * #24435 - [jit] Add docs to CI This integrates the [doctest](http://www.sphinx-doc.org/en/master/usage/extensions/doctest.html) module into `jit.rst` so that we can run our code examples as unit tests. They're added to `test_jit.py` under the `TestDocs` class (which takes about 30s to run). This should help prevent things like #24429 from happening in the future. They can be run manually by doing `cd docs && make doctest`. * The test setup requires a hack since `doctest` defines everything in the `builtins` module which upsets `inspect` * There are several places where the code wasn't testable (i.e. it threw an exception on purpose). This may be resolvable, but I'd prefer to leave that for a follow up. For now there are `TODO` comments littered around. ](https://our.intern.facebook.com/intern/diff/16840882/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/24435 Pulled By: driazati Differential Revision: D16840882 fbshipit-source-id: c4b26e7c374cd224a5a4a2d523163d7b997280ed	2019-08-20 21:40:44 -07:00
Pritam Damania	0a23151293	Detect and handle NCCL errors appropriately in ProcessGroupNCCL. (#22907 ) Summary: This change adds the following functionality: 1) WorkNCCL isCompleted, isSuccess methods check for NCCL errors and set the appropriate exception. 2) Added a watchdog thread to ProcessGroupNCCL which checks for errors in the cached communicators and removes them from the cache. 3) Use ncclCommAbort in NCCLComm destructor since ncclCommDestroy can block forever waiting for work. 4) Added a simulate_nccl_errors.py script to simulate NCCL errors. https://github.com/pytorch/pytorch/issues/17882 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22907 Test Plan: 1) Run the simulate_nccl_errors.py to verify NCCL errors are caught. Differential Revision: D16220638 fbshipit-source-id: fbc8881ea0c38a4d09a77045691e36557b7b0b25	2019-08-20 20:37:37 -07:00
svcscm	8d46741bae	Updating submodules Reviewed By: zpao fbshipit-source-id: d2447223283ea7ac6e2f01f5bee4fd84163f0fe0	2019-08-20 17:31:54 -07:00
Lucian Grijincu	9c9f14029d	Revert D16929363: Revert D16048264: Add static dispatch mode to reduce mobile code size Differential Revision: D16929363 Original commit changeset: 69d302929e18 fbshipit-source-id: add36a6047e4574788eb127c40f6166edeca705f	2019-08-20 17:08:31 -07:00
Elias Ellison	8e3c0210a5	extend torch.jit._overload to module methods (#24259 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24259 Follow up to https://github.com/pytorch/pytorch/pull/23886, add the same overload api specified in PEP 484 to module methods to reduce the friction of adding method overloads that was brought up in #23266. The usage is: ``` torch.jit.overload def add(self, y: int) -> int: ... torch.jit.overload def add(self, y: float) -> float: ... def add(): ... ``` Test Plan: Imported from OSS Differential Revision: D16921304 Pulled By: eellison fbshipit-source-id: 784e2f26f7ca9a330a434a603c86b53725c3dc71	2019-08-20 16:47:35 -07:00
Kexuan Sun	4b3ea92787	Test if descriptions of args are in the template (#24161 ) Summary: As in https://github.com/pytorch/pytorch/issues/23439, some descriptions of arguments in `_torch_docs.py` have been replaced by `common_args`, it would be helpful to check if any descriptions can be replaced for new docs in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24161 Differential Revision: D16889293 Pulled By: ezyang fbshipit-source-id: bf6f581494482d6eb32e634f73e84a4586766230	2019-08-20 16:34:50 -07:00
Stefan Krah	7ebac74d0a	Fix deprecation warnings Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24841 Differential Revision: D16904301 Pulled By: cpuhrsch fbshipit-source-id: 01b90d18619f51cd6b5b6a2a5a3ee0617f7b4f41	2019-08-20 16:30:11 -07:00
Lucian Grijincu	bd6cf5099b	Revert D16048264: Add static dispatch mode to reduce mobile code size Differential Revision: D16048264 Original commit changeset: ad1e50951273 fbshipit-source-id: 69d302929e183e2da26b64dcc24c69c3b7de186b	2019-08-20 16:26:18 -07:00
Mikhail Zolotukhin	8ca6220509	Remove unused DynamicDAG class. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24890 Test Plan: Imported from OSS Differential Revision: D16912935 Pulled By: ZolotukhinM fbshipit-source-id: 3e4b160119cb73e47811d8636b64a86088a33102	2019-08-20 16:17:59 -07:00
Wanchao Liang	8756ec989e	bind autograd functions into C++ (#24342 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24342 Right now the two APIs that provided in autograd package only have python bindings and we could not call them either in C++ API or in TorchScript. This PR make these two APIs available purely in C++ (with preserving semantics) and can be used in C++ API and TorchScript Differential Revision: D16923271 fbshipit-source-id: 049d6fbd94cd71ecc08b2716f74d52ac061f861e	2019-08-20 15:36:34 -07:00
Gregory Chanan	b28a2b3a38	Attempt to fix windows build. (#24916 ) Summary: It looks like https://github.com/pytorch/pytorch/pull/24455 broke the windows build, probably an instance of: https://github.com/pytorch/pytorch/issues/12117. I don't have a windows machine handy and I'm not sure: 1) what the rules are with dependent constructors 2) if the type is used transitively but this is a minimal attempt at fixing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24916 Differential Revision: D16922615 Pulled By: gchanan fbshipit-source-id: 0dfde84d0c462a14c479eed02ffafb5c4b3c12bb	2019-08-20 14:29:10 -07:00
Edward Yang	907f5020c3	Revert D16914345: [pytorch][PR] Move the detection of cuDNN to FindCUDNN.cmake Differential Revision: D16914345 Original commit changeset: fd261478c01d fbshipit-source-id: b933ad7ed49028ab9ac6976c3ae768132dc9bacb	2019-08-20 14:23:12 -07:00
microsheep	012526dd6b	Fix Typing Error for Padding with asymmetric signatures (#24895 ) Summary: This PR resolves https://github.com/pytorch/pytorch/issues/24806 Pull Request resolved: https://github.com/pytorch/pytorch/pull/24895 Differential Revision: D16925208 Pulled By: ezyang fbshipit-source-id: f4a374ca86e2e99faa30ca4b41c681e9976fe2de	2019-08-20 14:14:12 -07:00
Richard Zou	a77cb2ccd1	Revert D16915800: Implement name inference for t(), transpose(...) Differential Revision: D16915800 Original commit changeset: d8e5beff3daa fbshipit-source-id: f8b966fdc485d8250ae74d8bbbda157b45c2d1a0	2019-08-20 14:07:06 -07:00
Richard Zou	cf30ec1b83	Revert D16915806: Add thread-local-state NamesMode and NoNamesGuard Differential Revision: D16915806 Original commit changeset: 21f7ff1eadeb fbshipit-source-id: 5d17dd3463d3e23f5adce36a71b63bd5d66a8e9c	2019-08-20 14:07:02 -07:00
Richard Zou	d750ab13dc	Add thread-local-state NamesMode and NoNamesGuard (#24367 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24367 NamesMode determines whether or not to ignore the names field of TensorImpl. In particular, when it is disabled, all tensors are treated as unnamed. Test Plan: - New tests [namedtensor ci] Differential Revision: D16915806 Pulled By: zou3519 fbshipit-source-id: 21f7ff1eadebd678d6cd9a16ff25dd6134272b76	2019-08-20 13:46:51 -07:00
Richard Zou	acf3b76bf0	Implement name inference for t(), transpose(...) (#24203 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24203 Test Plan - [namedtensor ci] Test Plan: Imported from OSS Differential Revision: D16915800 Pulled By: zou3519 fbshipit-source-id: d8e5beff3daa7e5fd5bfed5b02d8089cac300de8	2019-08-20 13:46:47 -07:00
Michael Suo	39e8d71dbd	Use a ptr to store autograd profiler rng (#24889 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24889 Trying to fix #2575. [Here](https://gist.github.com/suo/7b0bc4b49d3c9e095b9f7eef8fa7c6e8) is all TLS in libtorch.so (thanks ezyang for figuring how to find this) I noticed that `CallbackManager::sample_zero_one()::gen` has size 5000, which seems bigger than the other ones. So make it heap-allocated instead. Caveat: I have no idea if this will actually fix anything, or whether making this variable heap-allocated is a bad idea. Test Plan: Imported from OSS Differential Revision: D16912540 Pulled By: suo fbshipit-source-id: 71eb0391bf4c6e85b090f8650a2fbfc2107f2707	2019-08-20 13:43:13 -07:00
James Reed	896e4b6e09	Support QScheme in script Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24358 Test Plan: Imported from OSS Differential Revision: D16811412 Pulled By: jamesr66a fbshipit-source-id: 2b0c981f7e8793bf036e398e02aca3c62ddcb64b	2019-08-20 13:09:44 -07:00
Zachary DeVito	bdc57d3833	Merge ProfiledTensorType and TensorType (#24284 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24284 This PR finishes the unification of all Tensor types into a single object. ProfiledTensorType is renamed to TensorType and the old TensorType is deleted. Notes: * Fixes bug in merge for VaryingShape by changing its representation to an optional list of optional ints. * Removes ProfiledTensorType::create(type) invocations that can now simply be expect calls on tensor type. Test Plan: Imported from OSS Differential Revision: D16794034 Pulled By: zdevito fbshipit-source-id: 10362398d0bb166d0d385d74801e95d9b87d9dfc	2019-08-20 13:01:28 -07:00
Roy Li	6824c9018d	Add static dispatch mode to reduce mobile code size Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22335 Test Plan: Imported from OSS Differential Revision: D16048264 Pulled By: li-roy fbshipit-source-id: ad1e50951273962a51bac7c25c3d2e5a588a730e	2019-08-20 12:21:32 -07:00
Jianyu Huang	0c5c442cb1	Clang formatting the code [1/2] (#24867 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24867 The code is formatted by `clang-format -i .h .cpp`. ghstack-source-id: 88589458 Differential Revision: D16904273 fbshipit-source-id: acc8981ca4ce28b333af331b252ea23b10f5b9a0	2019-08-20 11:45:15 -07:00
liqunfu	3463583349	Fix some typos in documentation (#23507 ) Summary: ~~In case of tensor indexing with a scalar index, index_select returns a tensor with the same rank as the input. To match this behavior in ONNX, we make index a 1D tensor so that with a gather it also produces a tensor with the same rank as the input.~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/23507 Differential Revision: D16586805 Pulled By: bddppq fbshipit-source-id: 8f5d964d368873ec372773a29803b25f29a81def	2019-08-20 10:50:13 -07:00
Yanli Zhao	1efdf57aa7	throw remote exception on client side (#24138 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24138 catch exception thrown on server, send the exception message back to client and rethrow it. Reviewed By: mrshenli Differential Revision: D16748748 fbshipit-source-id: ce18b3ea1b1d28645ec292f58aa0c818d93e559e	2019-08-20 09:40:35 -07:00
Max Balandat	d33623f7c1	Make SobolEngine use random seed if not specified (#24884 ) Summary: Addresses https://github.com/pytorch/pytorch/issues/24881. Makes behavior consistent with the rest of the random functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24884 Test Plan: Unit tests Reviewed By: sdsingh Differential Revision: D16912036 Pulled By: Balandat fbshipit-source-id: eff00cca989926a5d9e20d8846a8674f7cd270cb	2019-08-20 09:22:41 -07:00
Hong Xu	6ce6939be9	Move the detection of cuDNN to FindCUDNN.cmake (#24784 ) Summary: Currently they sit together with other code in cuda.cmake. This commit is the first step toward cleaning up cuDNN detection in our build system. Another attempt to https://github.com/pytorch/pytorch/issues/24293, which breaks manywheels build because it does not handle `USE_STATIC_CUDNN`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24784 Differential Revision: D16914345 Pulled By: ezyang fbshipit-source-id: fd261478c01d879dc770c1f1a56b17cc1a587be2	2019-08-20 01:55:46 -07:00
peterjc123	d9b4149e99	Fix cmake backslash syntax error on Windows. (#24420 ) Summary: ``` [1/1424] Building NVCC (Device) object caffe2/CMakeFiles/torch.dir/operators/torch_generated_weighted_sample_op.cu.obj CMake Warning (dev) at torch_generated_weighted_sample_op.cu.obj.Release.cmake:82 (set): Syntax error in cmake code at C:/Users/Ganzorig/pytorch/build/caffe2/CMakeFiles/torch.dir/operators/torch_generated_weighted_sample_op.cu.obj.Release.cmake:82 when parsing string C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/include;C:/Users/Ganzorig/pytorch/aten/src;C:/Users/Ganzorig/pytorch/build;C:/Users/Ganzorig/pytorch;C:/Users/Ganzorig/pytorch/cmake/../third_party/googletest/googlemock/include;C:/Users/Ganzorig/pytorch/cmake/../third_party/googletest/googletest/include;;C:/Users/Ganzorig/pytorch/third_party/protobuf/src;C:/Users/Ganzorig/pytorch/cmake/../third_party/benchmark/include;C:/Users/Ganzorig/pytorch/cmake/../third_party/eigen;C:/Users/Ganzorig/Anaconda3/envs/code/include;C:/Users/Ganzorig/Anaconda3/envs/code/lib/site-packages/numpy/core/include;C:/Users/Ganzorig/pytorch/cmake/../third_party/pybind11/include;C:/Users/Ganzorig/pytorch/cmake/../third_party/cub;C:/Users/Ganzorig/pytorch/build/caffe2/contrib/aten;C:/Users/Ganzorig/pytorch/third_party/onnx;C:/Users/Ganzorig/pytorch/build/third_party/onnx;C:/Users/Ganzorig/pytorch/third_party/foxi;C:/Users/Ganzorig/pytorch/build/third_party/foxi;C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/include;C:/Users/Ganzorig/pytorch/caffe2/../torch/csrc/api;C:/Users/Ganzorig/pytorch/caffe2/../torch/csrc/api/include;C:/Program Files/NVIDIA Corporation/NvToolsExt/include;C:/Users/Ganzorig/pytorch/caffe2/aten/src/TH;C:/Users/Ganzorig/pytorch/build/caffe2/aten/src/TH;C:/Users/Ganzorig/pytorch/caffe2/../torch/../aten/src;C:/Users/Ganzorig/pytorch/build/caffe2/aten/src;C:/Users/Ganzorig/pytorch/build/aten/src;C:/Users/Ganzorig/pytorch/caffe2/../torch/../aten/src;C:/Users/Ganzorig/pytorch/build/caffe2/../aten/src;C:/Users/Ganzorig/pytorch/build/caffe2/../aten/src/ATen;C:/Users/Ganzorig/pytorch/build/aten/src;C:/Users/Ganzorig/pytorch/caffe2/../torch/csrc;C:/Users/Ganzorig/pytorch/caffe2/../torch/../third_party/miniz-2.0.8;C:/Users/Ganzorig/pytorch/caffe2/../torch/csrc/api;C:/Users/Ganzorig/pytorch/caffe2/../torch/csrc/api/include;C:/Users/Ganzorig/pytorch/build/caffe2/aten/src/TH;C:/Users/Ganzorig/pytorch/aten/src/TH;C:/Users/Ganzorig/pytorch/aten/src;C:/Users/Ganzorig/pytorch/build/caffe2/aten/src;C:/Users/Ganzorig/pytorch/build/aten/src;C:/Users/Ganzorig/pytorch/aten/src;C:/Users/Ganzorig/pytorch/aten/../third_party/catch/single_include;C:/Users/Ganzorig/pytorch/aten/src/ATen/..;C:/Users/Ganzorig/pytorch/build/caffe2/aten/src/ATen;C:/Users/Ganzorig/pytorch/third_party/miniz-2.0.8;C:/Users/Ganzorig/pytorch/caffe2/core/nomnigraph/include;C:/Users/Ganzorig/pytorch/caffe2/;C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/include;C:/Users/Ganzorig/pytorch/build/caffe2/aten/src/TH;C:/Users/Ganzorig/pytorch/aten/src/TH;C:/Users/Ganzorig/pytorch/build/caffe2/aten/src/THC;C:/Users/Ganzorig/pytorch/aten/src/THC;C:/Users/Ganzorig/pytorch/aten/src/THCUNN;C:/Users/Ganzorig/pytorch/aten/src/ATen/cuda;C:/Users/Ganzorig/pytorch/build/caffe2/aten/src/TH;C:/Users/Ganzorig/pytorch/aten/src/TH;C:/Users/Ganzorig/pytorch/aten/src;C:/Users/Ganzorig/pytorch/build/caffe2/aten/src;C:/Users/Ganzorig/pytorch/build/aten/src;C:/Users/Ganzorig/pytorch/aten/src;C:/Users/Ganzorig/pytorch/aten/../third_party/catch/single_include;C:/Users/Ganzorig/pytorch/aten/src/ATen/..;C:/Users/Ganzorig/pytorch/build/caffe2/aten/src/ATen;C:/Users/Ganzorig/pytorch/third_party/protobuf/src;C:/Users/Ganzorig/pytorch/c10/../;C:/Users/Ganzorig/pytorch/build;C:/Users/Ganzorig/pytorch/third_party/cpuinfo/include;C:/Users/Ganzorig/pytorch/third_party/FP16/include;C:/Users/Ganzorig/pytorch/third_party/foxi;C:/Users/Ganzorig/pytorch/third_party/foxi;C:/Users/Ganzorig/pytorch/third_party/onnx;C:/Users/Ganzorig/pytorch/build/third_party/onnx;C:/Users/Ganzorig/pytorch/build/third_party/onnx;C:/Users/Ganzorig/pytorch/c10/cuda/../..;C:/Users/Ganzorig/pytorch/build;C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/include;C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/include;C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/include;C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1\include;C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/include Invalid escape sequence \i Policy CMP0010 is not set: Bad variable reference syntax is an error. Run "cmake --help-policy CMP0010" for policy details. Use the cmake_policy command to set the policy and suppress this warning. This warning is for project developers. Use -Wno-dev to suppress it. ``` Compared to https://github.com/pytorch/pytorch/issues/24044 , this commit moves the fix up, and uses [bracket arguments](https://cmake.org/cmake/help/v3.12/manual/cmake-language.7.html#bracket-argument). PR also sent to upstream: https://gitlab.kitware.com/cmake/cmake/merge_requests/3679 Pull Request resolved: https://github.com/pytorch/pytorch/pull/24420 Differential Revision: D16914193 Pulled By: ezyang fbshipit-source-id: 9f897cf4f607502a16dbd1045f2aedcb49c38da7	2019-08-20 01:25:20 -07:00
Ailing Zhang	b0737ccdc1	Revert D16887357: [pytorch][PR] [BC-BREAKING] Add align_corners option to grid_sample and affine_grid, change default to False Differential Revision: D16887357 Original commit changeset: ea09aad7853e fbshipit-source-id: 0bebb159be4e6ebe479771b42c0b483f5a84a094	2019-08-19 22:05:56 -07:00
Mike Ruberry	f01548e5a4	Removes SymbolicVariable from tests (#24007 ) Summary: This PR removes SymbolicVariable from all tests as well as the specialize_autogradzero and canonicalize_ops passes. These passes used SymbolicVariable in a relatively simple way compared to its few remaining uses. Removing SymbolicVariable means graphs must be constructed by other methods. IRParser was preferred for tests, but tests requiring pointers to graph internals or differentiation use direct construction instead. See https://github.com/pytorch/pytorch/issues/23989, which was discovered during this process, for why IRParser cannot be used when differentiation is required. Direct construction was also used in the updated passes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24007 Test Plan: Only refactors existing tests and preserves current checks; no additional testing needed. Differential Revision: D16906045 Pulled By: mruberry fbshipit-source-id: b67df4611562cd7618f969890e2b6840750c7266	2019-08-19 20:49:37 -07:00
Michael Suo	755f91b400	serializing function calls (#23799 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23799 Before, we inlined as part of the initial IR generation process, which has a few disadvantages: 1. It loses information about what nodes came from which function/method calls. Other parties who want to implement transformations on the function/module level don't have a reliable way of doing so. 2. It duplicates a ton of code if we are inlining the same function/method a tons of times. After this PR: inline is deferred to the optimization stage, so optimizations that rely on inlining will still work. But things get serialized with the function/method calls in. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23799 Differential Revision: D16652819 Test Plan: Imported from OSS Reviewed By: jamesr66a Pulled By: suo fbshipit-source-id: a11af82aec796487586f81f5a9102fefb6c246db	2019-08-19 18:42:43 -07:00
Will Feng	eb7b39e02f	Templatize Tensor.data_ptr() (#24847 ) Summary: This PR templatizes `Tensor.data_ptr()`, to prepare for the deprecation of `Tensor.data<T>()` and introduction of `Tensor.data()` that has the same semantics as `Variable.data()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24847 Differential Revision: D16906061 Pulled By: yf225 fbshipit-source-id: 8f9db9fd105b146598a9d759aa4b4332011da8ea	2019-08-19 17:02:18 -07:00
Negin Raoof	bf978e7890	cumsum (#24476 ) Summary: Added support for cumsum in symbolic opset 11 + op and ORT tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/24476 Differential Revision: D16896780 Pulled By: bddppq fbshipit-source-id: b52355796ee9f37004c9258f710688ad4b1ae8a2	2019-08-19 16:57:04 -07:00
davidriazati	e0e5813b72	Fix unicode in comments (#24218 ) Summary: Fixes #24164 ](https://our.intern.facebook.com/intern/diff/16901789/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/24218 Pulled By: driazati Differential Revision: D16901789 fbshipit-source-id: 8f1c7af437e66119bec616bc906c96d5d92cfb13	2019-08-19 16:33:21 -07:00
Mike Ruberry	7f86fb8995	Moves (most) ops to symbolic script (#23794 ) Summary: This PR removes the following operators to symbolic script: - add - sub - mul - div - threshold - clamp - addmm - comparison ops (lt, le, ge, ...) - fmod - remainder - max_pool2d_with_indices Additionally, the view and reshape operations were removed from autodiff.cpp (they were already written in symbolic script). The functionality of these operators is mostly preserved, except clamp and threshold have been modified to be gradient preserving at the boundary. Moving clamp also changed the graph tested in test_jit.py, which I think is expected. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23794 Test Plan: Existing tests provided sufficient coverage. Differential Revision: D16902986 Pulled By: mruberry fbshipit-source-id: 478f2a59d9a5b0487fc523fd594cb775cb617525	2019-08-19 15:49:33 -07:00
Michael Suo	ef14d88f27	Make torch.jit.Attribute work with PYTORCH_ENABLED=0 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23851 Test Plan: Imported from OSS Differential Revision: D16840394 Pulled By: suo fbshipit-source-id: b72e081513de73f565f3aeaa61125b7d3aa9c3e7	2019-08-19 15:23:21 -07:00
Pavel Belevich	6100205eb8	TensorIterator::binary_op input-output overlap check (#24058 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/8212 This fix is based on the idea that in-place ops(e.g. add_(...)) and out ops(e.g. tensor.add(..., out=...)) must check that the output tensor does not partially overlap with any of it's input tensors. Otherwise the result of such op is unexpected to the user. Since TensorIterator is a common backend for such ops and it's already used to check output self-overlapping, this fix is implemented in the same place. MemOverlapStatus enum class is introduced to model two tensors overlapped state: - TOO_HARD if at least one of them is not contiguous - FULL if both are contiguous and share exactly the same memory array [data(), data() + numel() *itemsize()] - PARTIAL is both are contiguous but underlying memory is shared partially, in other words memory arrays overlap but not identical. - NO if both are contiguous but have independent non overlapping memory arrays Performance test of clone/addcmul_/addcdiv_ with check_mem_overlaps: a = torch.empty(10000000, device='cpu') b = torch.randn(10000000, device='cpu') timeit a.copy_(b) master: 10.3 ms ± 429 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) branch: 10.2 ms ± 946 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) a = torch.empty(10000000, device='cuda') b = torch.randn(10000000, device='cuda') timeit a.copy_(b) master: 373 µs ± 97.9 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) branch: 373 µs ± 120 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) a = torch.randn(1000000, device='cpu') b = torch.randn(1000000, device='cpu') c = torch.randn(1000000, device='cpu') timeit a.addcmul_(b, c) master: 2.02 ms ± 212 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) branch: 2.11 ms ± 200 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) a = torch.randn(1000000, device='cuda') b = torch.randn(1000000, device='cuda') c = torch.randn(1000000, device='cuda') timeit a.addcmul_(b, c) master: 72.6 µs ± 627 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) branch: 72.4 µs ± 18.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) a = torch.randn(1000000, device='cpu') b = torch.randn(1000000, device='cpu') c = torch.randn(1000000, device='cpu') timeit a.addcdiv_(b, c) master: 2.19 ms ± 583 µs per loop (mean ± std. dev. of 7 runs, 1000 loop each) branch: 1.97 ms ± 125 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) a = torch.randn(1000000, device='cuda') b = torch.randn(1000000, device='cuda') c = torch.randn(1000000, device='cuda') timeit a.addcdiv_(b, c) master: 71.3 µs ± 1.98 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) branch: 71.7 µs ± 3.96 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) a = torch.empty(100, device='cpu') b = torch.randn(100, device='cpu') timeit a.copy_(b) master: 12.1 µs ± 1.11 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each) branch: 11.1 µs ± 61.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) a = torch.empty(100, device='cuda') b = torch.randn(100, device='cuda') timeit a.copy_(b) master: 20.9 µs ± 1.62 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) branch: 22.8 µs ± 2.63 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) a = torch.randn(100, device='cpu') b = torch.randn(100, device='cpu') c = torch.randn(100, device='cpu') timeit a.addcmul_(b, c) master: 24.1 µs ± 2.7 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) branch: 24 µs ± 91.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) a = torch.randn(100, device='cuda') b = torch.randn(100, device='cuda') c = torch.randn(100, device='cuda') timeit a.addcmul_(b, c) master: 34.5 µs ± 4.82 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) branch: 29.8 µs ± 496 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) a = torch.randn(100, device='cpu') b = torch.randn(100, device='cpu') c = torch.randn(100, device='cpu') timeit a.addcdiv_(b, c) master: 21.3 µs ± 210 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) branch: 23.8 µs ± 403 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) a = torch.randn(100, device='cuda') b = torch.randn(100, device='cuda') c = torch.randn(100, device='cuda') timeit a.addcdiv_(b, c) master: 30.3 µs ± 257 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) branch: 31.8 µs ± 214 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) Pull Request resolved: https://github.com/pytorch/pytorch/pull/24058 Differential Revision: D16767892 Pulled By: pbelevich fbshipit-source-id: 0cdaaa471d003a2886b1736f8985842226b8493a	2019-08-19 15:06:04 -07:00
Vishwak Srinivasan	4358cbe01b	Allow torch.tril / triu to handle bool and half inputs (#24163 ) Summary: Changelog: - Enable torch.tril / triu for bool and float16 dtypes Pull Request resolved: https://github.com/pytorch/pytorch/pull/24163 Test Plan: - Tests added in test_torch.py for all devices and dtypes (except bfloat16) Fixes https://github.com/pytorch/pytorch/issues/24035 Differential Revision: D16793315 Pulled By: ezyang fbshipit-source-id: 2bbc51ce567405a7cb2d8ab567eee6c2e40aa76a	2019-08-19 15:02:53 -07:00
vishwakftw	f849ebf1fe	Enable torch.eye for bool and half (#24148 ) Summary: Changelog: - Enable torch.eye for bool and float16 dtypes Pull Request resolved: https://github.com/pytorch/pytorch/pull/24148 Test Plan: - Tests added in test_torch.py for all available devices and dtypes (except torch.bfloat16) Fixes https://github.com/pytorch/pytorch/issues/24088 Differential Revision: D16891048 Pulled By: ezyang fbshipit-source-id: 3e86fe271bd434300c396e63f82c1a1f3adac2b4	2019-08-19 14:59:37 -07:00
Jianyu Huang	6cf14361f4	Add the default_weight_observer for the dynamic quantization path (#24231 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24231 As suggested in https://github.com/pytorch/pytorch/pull/23128#discussion_r309528932, we will add a default weight observer for the dynamic quantization path. We need to move `observer` and `qconfig` to a separate namespace. ghstack-source-id: 88583658 Differential Revision: D16781092 fbshipit-source-id: 5cd59c881a7f98b82704ca318b1e63650d73062a	2019-08-19 14:54:22 -07:00
Frank Jiang	d7c6debc14	Remove gradient value as input from SparseNormalize op (#24357 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24357 SparseNormalize does not need to know the gradient value to the lookup table, only the indices of the embeddings that need to be updated. By removing this input, we allow SparseNormalize to be used alongside SparseAdagradFusion Differential Revision: D16809919 fbshipit-source-id: cc19692ba4dea8854663ae1ed8cf9365e90c99bc	2019-08-19 14:47:09 -07:00
Johannes M Dieterich	9ebdf01962	For int64_t atomicAdd, use the available compiler builtin on ROCm. (#24854 ) Summary: Do not use the explicit CAS loop. This will perform better if there is any contention. Since this feature is ROCm-only, the HIP layer provides no helper function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24854 Differential Revision: D16902292 Pulled By: ezyang fbshipit-source-id: df192063c749f2b39f8fc304888fb0ae1070f20e	2019-08-19 14:30:03 -07:00
Yuxin Wu	927fb56ee0	Allow SyncBatchNorm without DDP in inference mode (#24815 ) Summary: Fix https://github.com/pytorch/pytorch/issues/22538 Pull Request resolved: https://github.com/pytorch/pytorch/pull/24815 Test Plan: Can run a detectron2 evaluation without entering DDP. #sandcastle Differential Revision: D16883694 Pulled By: ppwwyyxx fbshipit-source-id: 3195bc4e7f43a994821069f229b26302e2988739	2019-08-19 13:43:42 -07:00
Zachary DeVito	a04f729b51	Fix VaryingShape::merge Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24455 Test Plan: Imported from OSS Differential Revision: D16853270 Pulled By: zdevito fbshipit-source-id: 328aab6873fbff64aa9a4c1d5917d302f6b45397	2019-08-19 12:31:16 -07:00
Rohan Varma	60518e0035	Add resnext 32x4d shapes to benchmark (#24503 ) Summary: Adds resnext-1011 32x4d shapes to the qconv benchmarks. (Also ran the code formatter) Pull Request resolved: https://github.com/pytorch/pytorch/pull/24503 Test Plan: Run tests on devserver: ```buck run mode/opt caffe2/benchmarks/operator_benchmark/pt:qconv_test -- --omp_num_threads 1 --mkl_num_threads 1``` Reviewed By: dskhudia Differential Revision: D16845746 Pulled By: rohan-varma fbshipit-source-id: d9f842e5f455fccecf547129c5faffa253a49e23	2019-08-19 12:04:48 -07:00
Rohan Varma	a6a13e36f5	Change kernel_size to self.kernel_size to resolve error in quantized conv module (#24499 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24499 We are attempting to subscript a variable `kernel_size` that may not be an iterable in the `conv.py` module currently. This leads to errors unless the user passes in an iterable for kernel_size. D16830855 changed `self.kernel_size` to be a pair type, but did not actually use the variable. We want to use `self.kernel_size` which is a pair even if the user passed in an int for `kernel_size` so that we stop having the subscripting error. Differential Revision: D16859809 fbshipit-source-id: cd2a5497e89d88e518ca7b8a97bf9e69803ee2ba	2019-08-19 11:59:44 -07:00
Will Feng	5aa0f89d65	Build libtorch binary with new ABI (#23908 ) Summary: This PR enables building libtorch with new ABI, using gcc 5.4 on Ubuntu 16.04. Accompanying PR: https://github.com/pytorch/builder/pull/335. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23908 Differential Revision: D16898009 Pulled By: yf225 fbshipit-source-id: 516b444c1fc94c7b05d3be84ef81ef23e9041bfc	2019-08-19 11:23:05 -07:00
Shen Li	b6803d62fd	Use snake names for all files in distributed.rpc (#24502 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24502 Files in distributed.rpc package mixes snake camel names. This commit cleans that up and all files use snake names now. ghstack-source-id: 88548990 Reviewed By: xush6528 Differential Revision: D16860155 fbshipit-source-id: 3a22a89bf6c4e11aac5849564fc53296a04d6a8b	2019-08-19 10:58:59 -07:00
Yanghan Wang	3b22bbeb5b	enable "keeps" from BoxWithNMSLimit and caffe2_fastrcnn_outputs_inference Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24451 Reviewed By: newstzpz Differential Revision: D16850259 fbshipit-source-id: 22f69d71a558d63c32a27d271a7557fc35a55176	2019-08-19 10:54:22 -07:00
Michael Suo	c6617b370b	Cache node operators to speed up optimization (#24827 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24827 We already cache the node's schema, but alias analysis wants operators. This ends up being almost 70% of the on-cpu time optimizing a large graph. Here's some results on a [sample model](https://gist.github.com/suo/63ab9638516002176f94553a37060f61) (the units are seconds). Before: ``` compiled in: 20.256319999694824 first run in: 313.77824568748474 ``` After: ``` compiled in: 18.8815860748291 first run in: 42.58739233016968 ``` More than a 7x speedup! Still slower than I'd like though so I'll keep digging. Test Plan: Imported from OSS Differential Revision: D16887540 Pulled By: suo fbshipit-source-id: 2449be2898889d00ac094c3896e37b0e6a8c5f08	2019-08-19 10:30:23 -07:00
hacker_itchy	c0a796d95d	Update docs for softmax in onnx supported operators (#24832 ) Summary: Update the softmax in onnx supported operators from `softmax (only dim=-1 supported)` to `softmax`, as all cases of dim options are supported in: [https://github.com/pytorch/pytorch/issues/18482](https://github.com/pytorch/pytorch/pull/18482): ONNX Export All Cases of Softmax Pull Request resolved: https://github.com/pytorch/pytorch/pull/24832 Differential Revision: D16896538 Pulled By: bddppq fbshipit-source-id: 284039ffa42f09b0043e95cfe9f17e1afde53814	2019-08-19 10:13:41 -07:00
ShahriarSS	cd622f7655	C++ ModuleList Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24317 Differential Revision: D16893634 Pulled By: yf225 fbshipit-source-id: 9d810ad5a41bd46e5d0fa766e851178c60226866	2019-08-19 10:02:40 -07:00
Barak Nehoran	87217cfd2a	Add align_corners option to grid_sample and affine_grid, change default to False (#23923 ) Summary: Resolves: https://github.com/pytorch/pytorch/issues/20785 Adds the `align_corners` option to `grid_sample` and `affine_grid`, paralleling the option that was added to `interpolate` in version 0.4.0. In short, setting `align_corners` to `False` allows these functions to be resolution agnostic. This ensures, for example, that a grid generated from a neural net trained to warp 1024x1024 images will also work to warp the same image upsampled/downsampled to other resolutions like 512x512 or 2048x2048 without producing scaling/stretching artifacts. Refer to the documentation and https://github.com/pytorch/pytorch/issues/20785 for more details. Important: BC-Breaking Change because of new default The old functionality can still be achieved by setting `align_corners=True`, but the default is now set to `align_corners=False`, since this is the more correct setting, and since this matches the default setting of `interpolate`. The vectorized 2D cpu version of `grid_sampler` is refactored a bit. I don’t suspect that this refactor would affect the runtime much, since it is mostly done in inlined functions, but I may be wrong, and this has to be verified by profiling. ~The tests are not yet updated to reflect the new default. New tests should probably also be added to test both settings of `align_corners`.~ _Tests are now updated._ Pull Request resolved: https://github.com/pytorch/pytorch/pull/23923 Differential Revision: D16887357 Pulled By: ailzhang fbshipit-source-id: ea09aad7853ef16536e719a898db8ba31595daa5	2019-08-19 09:45:44 -07:00
Stefan Krah	9e7083d0a9	Remove unused files from THNN and THCUNN (#24820 ) Summary: Spin-off from https://github.com/pytorch/pytorch/issues/24818. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24820 Differential Revision: D16890917 Pulled By: ezyang fbshipit-source-id: 88df6d3ba98600acc810eda406daa1d850ed3320	2019-08-19 07:52:08 -07:00
Ralf Gommers	92c63d90e8	Remove support for old architectures in cpp_extension and CMake (#24442 ) Summary: This is a follow-up to gh-23408. No longer supported are any arches < 3.5 (numbers + 'Fermi' and 'Kepler+Tegra'). Pull Request resolved: https://github.com/pytorch/pytorch/pull/24442 Differential Revision: D16889283 Pulled By: ezyang fbshipit-source-id: 3c0c35d51b7ac7642d1be7ab4b0f260ac93b60c9	2019-08-19 06:23:33 -07:00
Michael Suo	dfdb86a595	big cpp test reorg (#24801 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24801 This is to fix the ODR-violations in fbcode static builds, which have been broken for several months. This PR is unfortunately quite large, but the changes are only mechanical: 1. Tests defined in header files -> tests defined in cpp files 2. Remove the `torch::jit::testing` namespace -> `torch::jit`. 3. Single `test.h` file that aggregates all tests. 4. Separate out files for gtest and python versions of the tests instead of using a build flag 5. Add a readme for how to add a new test, and explaining a bit about why the cpp tests are the way they are. Test Plan: Imported from OSS Differential Revision: D16878605 Pulled By: suo fbshipit-source-id: 27b5c077dadd990a5f74e25d01731f9c1f491603	2019-08-18 16:49:56 -07:00
Mikhail Zolotukhin	85564c1456	Record function name as an attribute of CallFunction nodes. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24446 Test Plan: Imported from OSS Differential Revision: D16845758 Pulled By: ZolotukhinM fbshipit-source-id: fc1536d597eb6ac4076c04de56f93899b52d6cda	2019-08-18 15:36:30 -07:00
Vishwak Srinivasan	9228dd766a	Modify symmetric eigendecomposition derivative (#23018 ) Summary: The derivative of the symmetric eigendecomposition was previously a triangular matrix. Changelog: - Modify the derivative of symeig from a triangular matrix to a symmetric matrix with reason specified as a comment. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23018 Test Plan: - Existing gradcheck and gradgradchecks are ported to test_autograd to verify that the change is correct. Input to symeig is symmetrized before passing Differential Revision: D16859070 Pulled By: ezyang fbshipit-source-id: 2d075abdf690909f80781764cfaf938b581d0ef6	2019-08-17 12:57:00 -07:00
Horace He	5a032f02ed	Added .pyi file for flatten (#24459 ) Summary: Generated with `stubgen` and moved from `out/flatten.pyi` to `flatten.pyi.in`. https://github.com/pytorch/pytorch/pull/22245#issuecomment-521875658 Pull Request resolved: https://github.com/pytorch/pytorch/pull/24459 Differential Revision: D16881182 Pulled By: ezyang fbshipit-source-id: 5e25fad55f169b5a58ab7522b583d7c923314d4d	2019-08-17 12:19:12 -07:00
Michael Suo	0ce7264ed6	Don't require slow test reporting in `run_tests.py --pytest` (#24448 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24448 The setting `--durations=10` was hard-coded, which is annoying as I don't necessarily care. A good alternative to get the same behavior is: ``` python run_tests.py --pytest -- --durations=10 ``` Test Plan: Imported from OSS Differential Revision: D16876380 Pulled By: suo fbshipit-source-id: 1e14d366db45b6b9bf4a4ab1633b0f6ece29f6bc	2019-08-17 01:26:07 -07:00
James Reed	a0b13b4fa5	extra_repr for quantized modules (#24443 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24443 This gives us useful information about the Module when we print it, like so: ``` FloatModule( (quant): Quantize() (conv1): Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1), scale=0.08209919929504395, zero_point=128) (conv2): Conv2d(20, 50, kernel_size=(5, 5), stride=(1, 1), scale=0.16885940730571747, zero_point=128) (fc1): Linear(in_features=800, out_features=500, bias=True, scale=0.12840059399604797, zero_point=128) (fc2): Linear(in_features=500, out_features=10, bias=True, scale=0.260015606880188, zero_point=128) (dequant): DeQuantize() ) ``` Test Plan: Imported from OSS Differential Revision: D16847140 Pulled By: jamesr66a fbshipit-source-id: 8c995108f17ed1b086d1fb30471a41c532c68080	2019-08-16 22:38:45 -07:00
Shen Li	99dea08e60	Use c10::ThreadPool to send and receive messages (#23968 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23968 Existing ProcessGroupAgent uses a single thread to send all messages, and a single thread to listen and process all received messages. This causes both performance issues and also prevents nested RPCs. For example, when running nested RPC A->B->A->B, the second recv on B cannot start until the first recv on B finishes. If the second recv is triggered by a nested RPC in the first recv, it will deadlock. Ideally, we should expose sth like responder or FutureResult to the Python land to support nested asynchronous UDFs. This diff adds a shared ThreadPool for send and recv. Send use it do send out messages, and recv use it to process received messages. There is still a dedicated thread to listen for incoming messages and add it to task queue. There are two goals: 1) speed up ProcessGroupAgent 2) use ThreadPool as a temporary solution for (a small number of) nested RPCs ghstack-source-id: 88476246 Differential Revision: D16695091 fbshipit-source-id: fd18a5c65e7fcd1331b73d1287673e6e10d2dd86	2019-08-16 17:49:05 -07:00
Zafar Takhirov	dd97743de7	Enables `inplace` in the quantized relu (#24374 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24374 This is a duplicate to bring back #23704 with diff revision D16634539 Test Plan: Imported from OSS Differential Revision: D16818664 Pulled By: zafartahirov fbshipit-source-id: c8f7965356555a6a995eaeea6820ea62cbbea6fd	2019-08-16 16:53:09 -07:00
davidriazati	aed306dcf7	Add `@ignore` for script classes (#23614 ) Summary: This lets you mark a class so that it won't be recursively compiled. This also runs up against a weird thing on the UX side, that to ignore a module you have to `ignore` its `forward()` method but to ignore a class you use `ignore` on the class declaration. The `ignore` on the class declaration matches the use of `script` for script classes but is confusing to those that don't know the difference between script classes / modules. ](https://our.intern.facebook.com/intern/diff/16770068/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/23614 Pulled By: driazati Differential Revision: D16770068 fbshipit-source-id: bee9a9e88b6c798ce779f622c4f929adae4eaf45	2019-08-16 16:34:22 -07:00
davidriazati	10c456417c	Clear recursive error stack on each compilation (#23458 ) Summary: Previously we weren't clearing the stack, so any failures that didn't stop the program stayed around in the stack and would show up if something else accessed the stack. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23458 Pulled By: driazati Differential Revision: D16866719 fbshipit-source-id: 29739b11f79de91c6468129da1bdcbf3c53b42d9	2019-08-16 16:10:19 -07:00
Iurii Zdebskyi	eee3e92936	Enabled torch.mm and torch.mv for bfloat16 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24224 Test Plan: Imported from OSS Differential Revision: D16779996 Pulled By: izdeby fbshipit-source-id: c859d8945a564edfa3f8a1430f140ae30d484d19	2019-08-16 15:46:15 -07:00
Mikhail Zolotukhin	cf57f73c11	Module: add dump function that recursively prints contents of the module. (#24356 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24356 Pull Request resolved: https://github.com/pytorch/pytorch/pull/24356 Test Plan: Imported from OSS Differential Revision: D16864133 Pulled By: ZolotukhinM fbshipit-source-id: 1af757334bc8e156427783bc37500de3c934378b	2019-08-16 15:13:02 -07:00
Mikhail Zolotukhin	9b73c77390	jit_log: Extract a function that prefixes all lines of a string with another string. (#24355 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24355 Pull Request resolved: https://github.com/pytorch/pytorch/pull/24355 Test Plan: Imported from OSS Differential Revision: D16864134 Pulled By: ZolotukhinM fbshipit-source-id: 8b456858d8ee07fd4ca3fb1759237756df897cd9	2019-08-16 15:12:58 -07:00
Will Feng	76716f6c06	Respect pre-defined DOCKER_IMAGE value in binary_populate_env.sh (#24787 ) Summary: `binary_populate_env.sh` is used by `binary_linux_test`, and for libtorch with new ABI we need to run the tests on a docker image different from `soumith/manylinux-cudaXX`. In such cases, we should respect the actual DOCKER_IMAGE value defined in the CircleCI job description. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24787 Differential Revision: D16867976 Pulled By: yf225 fbshipit-source-id: dc0a68bffc5789249ae14491ef485c7cc2fc1c34	2019-08-16 14:23:45 -07:00
Elias Ellison	2e44630d35	fix double copying of constants (#24412 ) Summary: Fix for https://github.com/pytorch/pytorch/issues/24369 Pull Request resolved: https://github.com/pytorch/pytorch/pull/24412 Test Plan: Imported from GitHub, without a `Test Plan:` line. Differential Revision: D16843311 Pulled By: eellison fbshipit-source-id: b25552c49b963c031c98749bcda31f65cd82f19d	2019-08-16 13:29:22 -07:00
Zafar Takhirov	af908d57ea	Increasing precision for avg pool (#23906 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23906 The 1-off error is expected for the average pool due to double rounding. Increasing unittest precision tolerance to 1.0 to avoid failing. Test Plan: Imported from OSS Differential Revision: D16678044 Pulled By: zafartahirov fbshipit-source-id: 4e73934e4379b1d108af649ec77053998e44c560	2019-08-16 13:07:41 -07:00
mal	6b656565ab	Hooks for C++ API (#24393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24393 Ability to register hook on a variable, similar to python autograd API. register_hook will take a function as argument and create a CppFunctionPreHook similar to PyFunctionPreHook. It will return the index of the hook which can be passed to remove_hook to disable the hook. Test Plan: Added tests. Differential Revision: D16861722 fbshipit-source-id: d08047f932e38c7bde04283a18b2d0311c8ad604	2019-08-16 12:44:20 -07:00
Ralf Gommers	a3b8607811	Fix test_jit_cuda_archflags failure on py27 due to changing dict order. (#24501 ) Summary: See gh-23408. Was failing for `pytorch_linux_xenial_cuda9_cudnn7_py2_test`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24501 Differential Revision: D16860932 Pulled By: soumith fbshipit-source-id: 715858d905f74a23e42a9a1da97f036a3e30f0c9	2019-08-16 12:44:16 -07:00
Zafar Takhirov	562c5cd73b	Adds a placeholder for the 'mul' operator. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24421 Test Plan: Imported from OSS Differential Revision: D16833438 Pulled By: zafartahirov fbshipit-source-id: 51b20ac060ad657b3f12e4f1cf47369414b342b6	2019-08-16 11:32:51 -07:00
Diego Estrada	50161f3b3c	Add ONNX Export Support to empty and empty_like (#24166 ) Summary: Empty and empty_like return uninitialized tensors with specific sizes. The values in the tensor cannot be predicted, that's why tests in test_pytorch_onnx_onnxruntime.py and test_pytorch_onnx_caffe2.py are not added. The tests in test_operators.py verify the onnx graph and output shape. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24166 Differential Revision: D16831571 Pulled By: bddppq fbshipit-source-id: b2500f36ced4735da9a8418d87a39e145b74f63a	2019-08-16 10:40:18 -07:00
Zachary DeVito	1df57c943f	pickler read guard (#24433 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24433 bounds checker was only used once per instruction. If a read in the middle of an instruction went of the end of the stream, it would just read invalid memory. This replaces bounds checker with just one guarded read function. Test Plan: Imported from OSS Differential Revision: D16836178 Pulled By: zdevito fbshipit-source-id: a7f70d0f293bf26c3220a12bafb8a06678931016	2019-08-16 10:19:13 -07:00
Zachary DeVito	ee898bffc3	fix IR parsing bug Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24294 Test Plan: Imported from OSS Differential Revision: D16797690 Pulled By: zdevito fbshipit-source-id: f89664dc7da3547c316aa5875bf67bef672430c2	2019-08-16 10:10:42 -07:00
Alexander Melnikov	d7b86d0c11	added test_tensorboard.py to TARGETS (#24040 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24040 This diff fixes failed test in test_tensorboard.py: - fixed test_image_with_boxes: tests compares serialized protobuf Summary object with image against expected serialized protobuf in file. Turns out that comparing images string by string might not work (e.g. if they were serialized with different versions of image library) - images can be equal, though due to differences in metadata or compression methods actual strings might differ. Changed to compare images using == from PIL.Image Reviewed By: orionr Differential Revision: D16715831 fbshipit-source-id: 7dd4a7cfc8e63767ed727656f1891edd273d95da	2019-08-16 09:44:00 -07:00
Edward Yang	c676db230d	Revert D16834297: Move the search of cuDNN files to FindCUDNN.cmake. Differential Revision: D16834297 Original commit changeset: ec2c0ba0c659 fbshipit-source-id: 028a727f4baaaf4439c7ca17c999bba7ea6d419f	2019-08-16 08:30:21 -07:00
Heungsub Hans Lee	e166811598	Documentation for Tensor.record_stream() (#24078 ) Summary: This patch writes documentation for `Tensor.record_stream()`, which is not a documented API currently. I've discussed publishing it with colesbury in https://github.com/pytorch/pytorch/issues/23729. The documentation is based on [the introduction at `CUDACachingAllocator.cpp`](`25d1496d58/c10/cuda/CUDACachingAllocator.cpp (L47-L50)`). ~~I didn't explain full details of the life cycle of memory blocks or stream awareness of the allocator for the consistent level of details with other documentations.~~ I explained about the stream awareness in a note block. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24078 Differential Revision: D16743526 Pulled By: zou3519 fbshipit-source-id: 05819c3cc96733e2ba93c0a7c0ca06933acb22f3	2019-08-16 08:07:33 -07:00
Hong Xu	cef0443464	Ensure proper file executable permissions in CI. (#24214 ) Summary: Some files have inproper executable permissions (which git tracks). This commit adds a test in CI to ensure that executable permissions are off for files that shouldn't have such a permission. This also ensures fixes such as https://github.com/pytorch/pytorch/issues/21305 are complied in the future. --- Disclaimer: I'm the author of flake8-executable, and I've been using it on my end for over a month and thus I think it should be stable enough. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24214 Differential Revision: D16783437 Pulled By: ezyang fbshipit-source-id: 018e55798f1411983c65444e6304a25c5763cd19	2019-08-16 06:11:09 -07:00
Hong Xu	482607c16c	Move the search of cuDNN files to FindCUDNN.cmake. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24293 Test Plan: Imported from OSS Differential Revision: D16834297 Pulled By: ezyang fbshipit-source-id: ec2c0ba0c659d82fffd40d52ae723934377aa49c	2019-08-16 06:07:25 -07:00
Bin Wen	e78dad3593	Add BPR loss to TTSN (#24439 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24439 many literatures mentioned BPR is useful for improving recommendation quality. Add a BPR loss so that we can train TTSN with it. Would like to see if it can improve retrieval models. reference: https://arxiv.org/pdf/1205.2618.pdf Reviewed By: dragonxlwang Differential Revision: D16812513 fbshipit-source-id: 74488c714a37ccd10e0666d225751a845019eb94	2019-08-15 23:20:15 -07:00
Huamin Li	5c57cedc16	change the location of wipe cache (#24454 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24454 We want to change the place of wipe_cache. From what we observed, the original place does not help. Reviewed By: mingzhe09088 Differential Revision: D16853205 fbshipit-source-id: 1f6224a52433cbe15c0d27000b4ac140fb9cd4c3	2019-08-15 20:55:47 -07:00
Michael Suo	0b3a63b048	skip broken test Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24453 Test Plan: Imported from OSS Differential Revision: D16852563 Pulled By: suo fbshipit-source-id: 2338574d53cb7ae7e0e922f0efc7ae99477b021c	2019-08-15 19:55:25 -07:00
Shen Li	64974ae71e	Fix naming convention inconsistency and formats in test_rpc.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24407 Reviewed By: xush6528 Differential Revision: D16830605 Pulled By: mrshenli fbshipit-source-id: 795962e56a8433f8015f44b6ed4b6183488b00d6	2019-08-15 19:43:15 -07:00
James Reed	a1b111709d	Assert weight_observer has the correct dtype Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24436 Test Plan: Imported from OSS Differential Revision: D16847141 Pulled By: jamesr66a fbshipit-source-id: 1dde5c26449115b53e71d410b41204d743787c44	2019-08-15 19:40:14 -07:00
Kimish Patel	354ecc42bc	Exposing the API for use with pytorch/tvm repo. (#24430 ) Summary: Exposing the API for use with pytorch/tvm repo. PR: https://github.com/pytorch/tvm/pull/86 Pull Request resolved: https://github.com/pytorch/pytorch/pull/24430 Test Plan: Just exposing API. Differential Revision: D16834888 Pulled By: kimishpatel fbshipit-source-id: 29955e75908e68988a46b7e9c37e6eb6aea1b20f	2019-08-15 17:59:55 -07:00
Zafar Takhirov	1a74bd407d	Fixes the adding of the observer to the FloatFunctional (#24418 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24418 Fixes #24394 The observer is not added correctlty, because one of the conditions is not met. Test Plan: Imported from OSS Differential Revision: D16833951 Pulled By: zafartahirov fbshipit-source-id: bb4699e6a1cf6368c7278272a68e5e7c6d3f59a8	2019-08-15 17:27:00 -07:00
Jerry Zhang	49efbdce88	Convert bias to float in quantized conv module (#24424 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24424 att Differential Revision: D16839540 fbshipit-source-id: 1cc8b128a6403dd19b4cd405fae49e10b5cd44e1	2019-08-15 15:56:08 -07:00
Raghuraman Krishnamoorthi	696cabae9b	Baseline observer module, ensuring that (min,max) range includes zero. (#24297 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24297 ghstack-source-id: 88252409 Differential Revision: D16635637 fbshipit-source-id: fcef20b9c88b2c3bd97e311514e5b2d0339ff28a	2019-08-15 15:25:23 -07:00
James Reed	f03700b997	Fix QConfig_dynamic typename (#24431 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24431 Pickle's fully-qualified name lookup would fail when trying to serialize QConfig_dynamic since the __name__ on the instance would refer to the wrong class name Test Plan: Imported from OSS Differential Revision: D16835705 Pulled By: jamesr66a fbshipit-source-id: e146835cbe10b08923d77298bc93b0f5b0ba37c5	2019-08-15 15:25:19 -07:00
Ralf Gommers	cd20773701	Set CUDA arch correctly when building with torch.utils.cpp_extension (#23408 ) Summary: The old behavior was to always use `sm_30`. The new behavior is: - For building via a setup.py, check if `'arch'` is in `extra_compile_args`. If so, don't change anything. - If `TORCH_CUDA_ARCH_LIST` is set, respect that (can be 1 or more arches) - Otherwise, query device capability and use that. To test this, for example on a machine with `torch` installed for py37: ``` $ git clone https://github.com/pytorch/extension-cpp.git $ cd extension-cpp/cuda $ python setup.py install $ cuobjdump --list-elf build/lib.linux-x86_64-3.7/lltm_cuda.cpython-37m-x86_64-linux-gnu.so ELF file 1: lltm.1.sm_61.cubin ``` Existing tests in `test_cpp_extension.py` for `load_inline` and for compiling via `setup.py` in test/cpp_extensions/ cover this. Closes gh-18657 EDIT: some more tests: ``` from torch.utils.cpp_extension import load lltm = load(name='lltm', sources=['lltm_cuda.cpp', 'lltm_cuda_kernel.cu']) ``` ``` # with TORCH_CUDA_ARCH_LIST undefined or an empty string $ cuobjdump --list-elf /tmp/torch_extensions/lltm/lltm.so ELF file 1: lltm.1.sm_61.cubin # with TORCH_CUDA_ARCH_LIST = "3.5 5.2 6.0 6.1 7.0+PTX" $ cuobjdump --list-elf build/lib.linux-x86_64-3.7/lltm_cuda.cpython-37m-x86_64-linux-gnu.so ELF file 1: lltm_cuda.cpython-37m-x86_64-linux-gnu.1.sm_35.cubin ELF file 2: lltm_cuda.cpython-37m-x86_64-linux-gnu.2.sm_52.cubin ELF file 3: lltm_cuda.cpython-37m-x86_64-linux-gnu.3.sm_60.cubin ELF file 4: lltm_cuda.cpython-37m-x86_64-linux-gnu.4.sm_61.cubin ELF file 5: lltm_cuda.cpython-37m-x86_64-linux-gnu.5.sm_70.cubin ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/23408 Differential Revision: D16784110 Pulled By: soumith fbshipit-source-id: 69ba09e235e4f906b959fd20322c69303240ee7e	2019-08-15 15:25:15 -07:00
Ralf Gommers	02dd9a4058	Fix CUDNN location related build issue on Antergos Linux (based on Arch) (#24300 ) Summary: The issue is that `python setup.py install` will fail right at the end of the build, with: ``` File "setup.py", line 380, in run report('-- Detected cuDNN at ' + CUDNN_LIBRARY + ', ' + CUDNN_INCLUDE_DIR) TypeError: must be str, not NoneType ``` This is due to `USE_CUDNN` being True, but CUDNN library and include dir not being auto-detected. On this distro, the CUDA install goes into `/opt/cuda/` while CUDNN goes into `/usr/lib`. ``` $ locate libcudnn.so ... /usr/lib/libcudnn.so /usr/lib/libcudnn.so.7 /usr/lib/libcudnn.so.7.6.1 $ locate libcublas.so # targets/... symlinked from /opt/cuda/lib64 ... /opt/cuda/targets/x86_64-linux/lib/libcublas.so ``` One could work around this by setting `CUDNN_LIB_DIR`, but that's annoying and you only find out after running into this. The path is added after `CUDA_HOME`, so should not be a problem on systems which have multiple CUDA installs and select one via `CUDA_HOME` Pull Request resolved: https://github.com/pytorch/pytorch/pull/24300 Differential Revision: D16839323 Pulled By: soumith fbshipit-source-id: 5285fff604584ccfbe6368c5ee5a066f8fc10802	2019-08-15 15:22:49 -07:00
Jianyu Huang	b10a3e916f	Remove redundant assignment (#24408 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24408 As Title says. ghstack-source-id: 88388745 Differential Revision: D16830709 fbshipit-source-id: 87eafcd3236abcec94cf87009fc705ad26d87eca	2019-08-15 13:38:33 -07:00
Zachary DeVito	498276631b	Remove type subclassing (#24257 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24257 Type subclassing was used to support our old hierarchy of Tensor types. Now that we have one tensor type it is not needed. This removes: * isSubclass, since it is now always false. * type slicing, which was only needed for subclasses. * AutogradZeroTensor, which is folded into ProfiledTensorType Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D16794035 Pulled By: zdevito fbshipit-source-id: 9a3e6101df0d51029a5e667a9c9137d2ae119aa7	2019-08-15 13:31:37 -07:00
Zachary DeVito	0cbd7fa46f	remove CompleteTensorType Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24169 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D16765329 Pulled By: zdevito fbshipit-source-id: 88560cefba635c3d586a3e4dee67f9b1d901a642	2019-08-15 13:31:34 -07:00
Hong Xu	5ca612b55e	Let logical_xor support non-bool tensors. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23978 Test Plan: Imported from OSS Differential Revision: D16719299 Pulled By: gchanan fbshipit-source-id: 2fe170be6090733e20410db7cf99266543299c58	2019-08-15 12:21:31 -07:00
Hong Xu	00e4870001	Let logical_not support non-bool tensors. (#23916 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23916 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23916 Test Plan: Imported from OSS Differential Revision: D16719300 Pulled By: gchanan fbshipit-source-id: 5be6aeea9a38cc40ad59d0449d25a25f7dfa2787	2019-08-15 12:21:27 -07:00
Andrey Malevich	6f08be46b0	Implement gradient operator for GatherByKeys. (#24348 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24348 Partition + GatherByKeys pair is pretty handy for implementing strategy where part of the keys will be on local machine, while part of the keys will end up on the remote machin (for cases when there is exactly 1 id). Reviewed By: aazzolini Differential Revision: D16802988 fbshipit-source-id: 4c7ac97fc0db3ce88575fccab0c7bf69dcbef965	2019-08-15 12:19:22 -07:00
Edward Yang	b0e794e6e9	Configure pytorch-probot (#24423 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24423 This enables auto-CC'ing based on labels, see https://github.com/pytorch/pytorch/issues/24422 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D16833974 Pulled By: ezyang fbshipit-source-id: de07ea5f0ade9d5ed2160ce8308cf146321bb354	2019-08-15 12:09:01 -07:00
Zafar Takhirov	74ea28322d	Replacing axis with dim in quantized cat Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24151 Test Plan: Imported from OSS Differential Revision: D16754347 Pulled By: zafartahirov fbshipit-source-id: c2ebab2f25e0423f16d4f329f98b2e9e138ed549	2019-08-15 12:08:57 -07:00
Edward Yang	b53ff49c1e	Fix Caffe2 Windows build by switching to ninja. (#24330 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24330 In principle, we should be able to use the MSVC generator to do a Windows build, but with the latest build of our Windows AMI, this is no longer possible. An in-depth investigation about why this is no longer working should occur in https://github.com/pytorch/pytorch/issues/24386 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/24330 Test Plan: Imported from OSS Differential Revision: D16828794 Pulled By: ezyang fbshipit-source-id: fa826a8a6692d3b8d5252fce52fe823eb58169bf	2019-08-15 12:06:13 -07:00
BowenBao	83bfd76b2f	Relax precision constraint on ONNXRuntime._gru_test (#24340 ) Summary: https://github.com/pytorch/pytorch/issues/24268 Pull Request resolved: https://github.com/pytorch/pytorch/pull/24340 Differential Revision: D16833477 Pulled By: bddppq fbshipit-source-id: d256d6bdd950c38ecc835af848222f03cfc6130c	2019-08-15 11:55:04 -07:00
Heinrich Küttler	32ed676b46	Make aten_to_numpy_dtype in tensor_numpy.h public. (#23943 ) Summary: The corresponding numpy_dtype_to_aten is public already so this should be fine. Tests still pass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23943 Differential Revision: D16690742 Pulled By: soumith fbshipit-source-id: 81431a3316509cff8a9122e10e8f6a362bbcc9c0	2019-08-15 11:52:46 -07:00
neginraoof	3574d9ff70	updated pixel_shuffle in opset 11 to use depthToSpace Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23739 Differential Revision: D16800355 Pulled By: bddppq fbshipit-source-id: 1502c5b7ec1495286bad17b6ffa359cf995f78fb	2019-08-15 11:37:44 -07:00
davidriazati	b59fa077b3	Misc doc updates / fixes (#24371 ) Summary: This is a bunch of changes to the docs for stylistic changes, correctness, and updates to the new script API / recent TorchScript changes (i.e. namedtuple) For reviewers, ping me to see a link of the rendered output. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24371 Pulled By: driazati Differential Revision: D16832417 fbshipit-source-id: a28e748cf1b590964ca0ae2dfb5d8259c766a203	2019-08-15 11:31:24 -07:00
Jerry Zhang	5df773415b	Add _pair for quantized conv module (#24409 ) Summary: Add _pair for kernel_size Pull Request resolved: https://github.com/pytorch/pytorch/pull/24409 Differential Revision: D16830855 Pulled By: jerryzh168 fbshipit-source-id: 3d6cc49b8088dd522338ab0e13911d8627df63d7	2019-08-15 11:13:57 -07:00
Stephen Chen	c5e1e5c300	Put ParseBlackListOps() into caffe2::glow namespace (#24384 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24384 So that we can use them in other functions. Reviewed By: yinghai Differential Revision: D16824289 fbshipit-source-id: 3cb33cfa9a5c479a63db6438aef518209bdfb1f4	2019-08-15 10:53:10 -07:00
Jerry Zhang	754bf383b1	Change return type of observer to two tensors (#24339 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24339 Att Differential Revision: D16820813 fbshipit-source-id: 3e7301f1700176e19f46e8677a644ba167209254	2019-08-15 10:26:44 -07:00
Michael Suo	53eba982bd	kill TK_NAMED_TUPLE_DEF (#24350 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24350 `TK_NAMED_TUPLE_DEF` shouldn't exist, because NamedTuples are not distinct syntactic things. The fact that NamedTuples and Classes are treated differently is a property of our implementation, not the language grammar. This PR kills it and re-uses `CLASS_DEF` instead. Test Plan: Imported from OSS Differential Revision: D16825273 Pulled By: suo fbshipit-source-id: f6d97d7e4fbdf789fd777f514eac97f32e2bbae2	2019-08-15 10:15:52 -07:00
Michael Suo	c6eddbb90f	copy methods when creating a derived class type (#24349 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24349 Methods that derive a new class type from the old one need to copy the `method_` field as well as the attributes. Test Plan: Imported from OSS Differential Revision: D16825274 Pulled By: suo fbshipit-source-id: 938334e0733d2a89f00ec46984cbd5beecb4c786	2019-08-15 10:15:48 -07:00
Jerry Zhang	761ae8e9b6	Add intrinsic module mappings (#23753 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23753 Add intrinsic(fused) module mappings in quantize.py to enable mapping fused modules in both QAT and post PTQ Differential Revision: D16820749 fbshipit-source-id: 07de76a4f09b44bde8b193c103eac02c22b875b6	2019-08-15 09:37:24 -07:00
Iurii Zdebskyi	52b4221bfa	Enabled masked methods for bfloat16 (#24183 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24183 ----------- Fix: Enabled masked select/scatter/fill for BFloat16 on CPU Test: via unit tests Test Plan: Imported from OSS Differential Revision: D16763461 Pulled By: izdeby fbshipit-source-id: fe733635a2064e5a088a108ff77c2a1a1487a27c	2019-08-15 08:45:24 -07:00
Hong Xu	bc92ce9e07	Recommend logical_not() instead of bitwise_not() when applying sub and neg on bool tensors. (#23860 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23860 Close #23836 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23860 Test Plan: Imported from OSS Differential Revision: D16678299 Pulled By: gchanan fbshipit-source-id: b08e77f6a41c3994240849985caaff7c559d3f83	2019-08-15 08:40:29 -07:00
Hong Xu	338f9c860f	Add logical_xor operator (#23847 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23847 Related to #23836 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23847 Test Plan: Imported from OSS Differential Revision: D16678300 Pulled By: gchanan fbshipit-source-id: 67020aca5830b6bec2f561105954e0a8c2ee37e0	2019-08-15 08:40:25 -07:00
Hong Xu	1f4c73618c	Add logical_not operator. (#23839 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23839 Close #23836 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23839 Test Plan: Imported from OSS Differential Revision: D16678301 Pulled By: gchanan fbshipit-source-id: 54e7b3f3b04c577e239b88493247e1c036266774	2019-08-15 08:40:21 -07:00
peter	10d2ada17d	Fix Z7_MSVC_OVERRIDE for C source files (#24389 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24145#issuecomment-521507234 Pull Request resolved: https://github.com/pytorch/pytorch/pull/24389 Differential Revision: D16828222 Pulled By: ezyang fbshipit-source-id: dcf652fbd8b8945c71993e9b99394e18ac542e6b	2019-08-15 06:52:42 -07:00
Fritz Obermeyer	0745591855	Vectorize LowerCholeskyTransform (#24131 ) Summary: Removes older `torch.stack`-based logic in favor of `torch.diagonal()` and `torch.diag_embed()`. I see 100x speedup in my application, where my batched matrix has shape `(800, 32 ,32)`. ```py import torch from torch.distributions import constraints, transform_to x = torch.randn(800, 32, 32, requires_grad=True) # Before this PR: %%timeit transform_to(constraints.lower_cholesky)(x).sum().backward() # 579 ms ± 34.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) # After this PR: %%timeit transform_to(constraints.lower_cholesky)(x).sum().backward() # 4.5 ms ± 241 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/24131 Differential Revision: D16764035 Pulled By: ezyang fbshipit-source-id: 170cdb0d924cdc94cd5ad3b75d1427404718d437	2019-08-15 06:46:19 -07:00
Fan Wang	59094c409e	Refactor and expose metadata of tum_history layer for online prediction Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24290 Reviewed By: xianjiec Differential Revision: D16570968 fbshipit-source-id: f68d42f3a8e1a6c8d30e00c2dd7f7efc1fb35d7c	2019-08-15 00:27:11 -07:00
Huamin Li	1b38a6f602	add wipe cache Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24390 Reviewed By: mingzhe09088 Differential Revision: D16808041 fbshipit-source-id: 1b19f47706e4e2f2e03356469315b55c6ff76d20	2019-08-14 23:48:52 -07:00
Yanli Zhao	ab39a55331	python udf over rpc (#23569 ) Summary: This diff is to support python user defined function over rpc for https://github.com/pytorch/pytorch/issues/23110, work flow is like this: 1. pickle python udf 2. pass pickle to C++ 3. C++ pass over rpc from client to server 4. server call runPythonUDF() python function to unpickle and run python udf and pickle the udf result using python embedder 6. pass back serialized result from server to client 7. client call loadPythonUDFResult() python function to unpickle result 7. return it to python right now, put rpc_sync_builtin() and rpc_async_builtin() as temporary interfaces for builtin operator remote calls, they accept qualified name string, this interface can execute builtin operators in C++ land. rpc_sync() and rpc_async() accept python callables only right now, it could be user define python functions or builtin operator python functions, the python functions will be executed in python land. once we can resolve builtin operator python callables to qualified name string, we can merge rpc_sync_builtin() into rpc_sync() then Pull Request resolved: https://github.com/pytorch/pytorch/pull/23569 Test Plan: unit tests Differential Revision: D16390764 Pulled By: zhaojuanmao fbshipit-source-id: 2cf2c22a979646830b5581bd75eabf8b3cca564c	2019-08-14 23:13:33 -07:00
James Reed	de58df4c6f	JIT trace testing Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23987 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D16744208 Pulled By: jamesr66a fbshipit-source-id: 8e65898cc8edebcc46b862e3d33f85071d701a04	2019-08-14 22:11:32 -07:00
Jie	064d156511	(#23574 ) Summary: Assert that there's no multiple written-to to a single memory location, which caused corrupted output. Fixed batched matrix trlu logic, which relies on the previous copy behavior to support tensors with stride 0 at leading dimension. This fixes the issue proposed at: https://github.com/pytorch/pytorch/issues/23063 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23574 Differential Revision: D16600717 Pulled By: ezyang fbshipit-source-id: e41e14f03eccf97398b64ba43647110beb1529e6	2019-08-14 21:12:07 -07:00
Hong Xu	d9d5d9a913	Sanity fixes for bitwise_not (#24296 ) Summary: (intentionally left blank) Pull Request resolved: https://github.com/pytorch/pytorch/pull/24296 Differential Revision: D16809598 Pulled By: ezyang fbshipit-source-id: 00718faf1ece06b6af0160763ac22d9cb10c2575	2019-08-14 21:07:26 -07:00
Kexuan Sun	e2a6212912	Resolve unused variables in tests (#24075 ) Summary: Variables such as `device` and `sparse` in for loops should be used in tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24075 Differential Revision: D16763073 Pulled By: ezyang fbshipit-source-id: 8735cbc8d9ed695db8489cfc949c895180a7b826	2019-08-14 21:02:52 -07:00
Jianyu Huang	f66c90469b	Fix Lint (#24381 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24381 As pointed out in https://github.com/pytorch/pytorch/pull/24299#issuecomment-521471089, the previous PR broke the Lint. ghstack-source-id: 88339887 Reviewed By: jamesr66a Differential Revision: D16822443 fbshipit-source-id: 3aed5b9404b0f0fcf453c05b59189974243b0df2	2019-08-14 19:22:09 -07:00
James Reed	806b24f168	Temporarily disable warnings in dynamic quantization ops Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24376 Test Plan: Imported from OSS Differential Revision: D16819772 Pulled By: jamesr66a fbshipit-source-id: da6514bc1b96c3860039538f4d851064bad78d61	2019-08-14 18:14:13 -07:00
James Reed	7597741159	Run quantization tests first Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24366 Test Plan: Imported from OSS Differential Revision: D16815295 Pulled By: jamesr66a fbshipit-source-id: 01478ce2fcbe0620cd5cf9854121602e0663c057	2019-08-14 18:09:32 -07:00
Zachary DeVito	6a48a5b65c	Fix more warnings Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24291 Test Plan: Imported from OSS Differential Revision: D16795898 Pulled By: zdevito fbshipit-source-id: cbd5f2dd4e3bbd361909ae13c243561899568ad0	2019-08-14 17:47:54 -07:00
James Reed	a919fc3704	test {__init__,from_float} on nnq{,d}.Linear Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24364 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D16812543 Pulled By: jamesr66a fbshipit-source-id: be05a658fa4562f3fcf3548e30b1fe9a77d1151c	2019-08-14 17:42:23 -07:00
James Reed	79710604cc	fix lint Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24375 Test Plan: Imported from OSS Differential Revision: D16819647 Pulled By: jamesr66a fbshipit-source-id: 84eefe1ee27bd05ed9b8745d8011dddf6cb3ddbf	2019-08-14 17:37:39 -07:00
Jianyu Huang	0f64043b49	Remove the activation observer for default_qconfig (#24299 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24299 As suggested in https://github.com/pytorch/pytorch/pull/24232, we will remove the activation observer for dynamic quantization path. ghstack-source-id: 88287094 Differential Revision: D16798590 fbshipit-source-id: 07a245d5584b5b15c6895d9b09deef4a0605073a	2019-08-14 17:21:50 -07:00
Rui Zhu	5b0de85868	Register FC/Conv DNNLowp separately for supporting both tensor type (#24361 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24361 Currently we only support Conv in kernel but have entrance for both type using one same class It is time make change Reviewed By: csummersea Differential Revision: D16604713 fbshipit-source-id: b98d39a2c7960707cd50ba27e43dce73f741eeeb	2019-08-14 17:15:42 -07:00
svcscm	0647a3f4c7	Updating submodules Reviewed By: cdelahousse fbshipit-source-id: a85a5b8258ce777e5001b0973e173707c729b8e4	2019-08-14 16:39:07 -07:00
Jianyu Huang	e8d2ddc2c4	Make the default qconfig_dict (#24232 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24232 As suggested in https://github.com/pytorch/pytorch/pull/23128#discussion_r306650311, we will make the keys of default_qconfig_dict as `torch.nn.Linear`. That is, we will do the dynamic quantization on the `torch.nn.Linear` by default, if the user just specify `torch.quantize_dynamic(model)`. ghstack-source-id: 88287089 Differential Revision: D16781191 fbshipit-source-id: 991a5e151a9ea32b879d6897cd9862855d747135	2019-08-14 15:12:55 -07:00
Jianyu Huang	53fbfd8fe8	Fix the dimension mismatch issues when running the BERT model (#23330 ) Summary: We found the following dimension mismatch issues when running the BERT model with the dynamic quantization: ``` Traceback (most recent call last): File "bert.py", line 75, in <module> outputs = model(tokens_tensor, token_type_ids=segments_tensors) File "/home/jianyuhuang/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 539, in __call__ result = self.forward(input, kwargs) File "/home/jianyuhuang/anaconda3/lib/python3.7/site-packages/pytorch_transformers/modeling_bert.py", line 709, in forward head_mask=head_mask) File "/home/jianyuhuang/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 539, in __call__ result = self.forward(input, *kwargs) File "/home/jianyuhuang/anaconda3/lib/python3.7/site-packages/pytorch_transformers/modeling_bert.py", line 437, in forward layer_outputs = layer_module(hidden_states, attention_mask, head_mask[i]) File "/home/jianyuhuang/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 539, in __call__ result = self.forward(input, *kwargs) File "/home/jianyuhuang/anaconda3/lib/python3.7/site-packages/pytorch_transformers/modeling_bert.py", line 415, in forward attention_outputs = self.attention(hidden_states, attention_mask, head_mask) File "/home/jianyuhuang/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 539, in __call__ result = self.forward(input, *kwargs) File "/home/jianyuhuang/anaconda3/lib/python3.7/site-packages/pytorch_transformers/modeling_bert.py", line 372, in forward self_outputs = self.self(input_tensor, attention_mask, head_mask) File "/home/jianyuhuang/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 539, in __call__ result = self.forward(input, **kwargs) File "/home/jianyuhuang/anaconda3/lib/python3.7/site-packages/pytorch_transformers/modeling_bert.py", line 303, in forward query_layer = self.transpose_for_scores(mixed_query_layer) File "/home/jianyuhuang/anaconda3/lib/python3.7/site-packages/pytorch_transformers/modeling_bert.py", line 296, in transpose_for_scores return x.permute(0, 2, 1, 3) RuntimeError: number of dims don't match in permute ``` Before the quantization, the dimension of ```x``` in ```transpose_for_scores``` is ```[1, 14, 12, 64]```; After the quantization, the dimension of ```x``` in ```transpose_for_scores``` is ```[14, 12, 64]```. There is a dimension mismatch on the output of the ```torch.ops.quantized.fbgemm_linear_dynamic``` operators. The first dimension is missing, which cause the issues with the abvove permute. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23330 ghstack-source-id: 88287092 Differential Revision: D16463334 fbshipit-source-id: 4bdb836d1df31ba7c0bd44e3339aabdc8b943ae1	2019-08-14 14:20:50 -07:00
Sam Gross	40be39e4c7	Fix perf bug with indexed assignment (index_put_) (#24083 ) Summary: TensorIterator was incorrectly moving the stride 0 dimension to the inner-most dim in the assignment: a[idx] = b Note that the corresponding read was still fast: c = a[idx] This was noticed by adamlerer ``` import torch import time import sys N = 300000 torch.set_num_threads(1) a = torch.zeros(N, 128) b = torch.zeros(N, 128) idx = torch.arange(N) %timeit c = a[idx] # before and after: ~91.3 ms %timeit a[idx] = b # before: 4.38 sec after: 44.1 ms ``` Note that the indexed read is slower than the indexed assignment on my computer because the read has to allocate a new output (which is zero'ed by the kernel). The indexed assignment doesn't allocate any new Tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24083 Differential Revision: D16805440 Pulled By: colesbury fbshipit-source-id: 70a2e74ae79691afbfa9f75b3d7d1e6806f603f5	2019-08-14 14:14:43 -07:00
davidriazati	9fe4052b6c	Add `trace_module` to docs (#24258 ) Summary: Stacked PRs * #24258 - [jit] Add `trace_module` to docs * #24208 - [jit] Cleanup documentation around `script` and `trace` ](https://our.intern.facebook.com/intern/diff/16811342/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/24258 Pulled By: driazati Differential Revision: D16811342 fbshipit-source-id: 893be85a711ab180319b790ed1c72b93022373c1	2019-08-14 14:04:14 -07:00
davidriazati	716abd8705	Cleanup documentation around `script` and `trace` (#24208 ) Summary: Stacked PRs * #24258 - [jit] Add `trace_module` to docs * #24208 - [jit] Cleanup documentation around `script` and `trace` Examples / info was duplicated between `ScriptModule`, `script`, and `trace`, so this PR consolidates it and moves some things around to make the docs more clear. For reviewers, if you want to see the rendered output, ping me for a link ](https://our.intern.facebook.com/intern/diff/16746236/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/24208 Pulled By: driazati Differential Revision: D16746236 fbshipit-source-id: fac3c6e762a31c897b132b8421baa8d4d61f694c	2019-08-14 14:04:10 -07:00
James Reed	0619b57c4c	Add the ability to compile exports on traced modules (#24298 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24298 This helps in situations like when you have `__{g,s}etstate__` on an `nn.Module` and you'd like to trace the module, but still preserve the serialization methods to make the module semantically correct Test Plan: Imported from OSS Differential Revision: D16799800 Pulled By: jamesr66a fbshipit-source-id: 91c2957c94c9ec97a486ea376b2a3e3a821270af	2019-08-14 13:51:22 -07:00
James Reed	45962ac5b6	equal() for QuantizedCPU Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24211 Test Plan: Imported from OSS Differential Revision: D16799801 Pulled By: jamesr66a fbshipit-source-id: d3c17a7b5305f94217aef2740124506f34fe2458	2019-08-14 13:51:18 -07:00
Jianyu Huang	584c6986fd	Add the type matching rule for qconfig_dict (#23212 ) Summary: We want to use the Module type as the key for the qconfig_dict for the module replacement during the quantization. Before this Diff, to dynamic quantize the BERT model, we have to specify each layer: ``` qconfig_dict = { 'encoder.layer.0.attention.self.query': default_qconfig, 'encoder.layer.0.attention.self.key': default_qconfig, 'encoder.layer.0.attention.self.value': default_qconfig, 'encoder.layer.0.attention.output.dense': default_qconfig, 'encoder.layer.0.intermediate.dense': default_qconfig, 'encoder.layer.0.output.dense': default_qconfig, 'encoder.layer.1.attention.self.query': default_qconfig, 'encoder.layer.1.attention.self.key': default_qconfig, 'encoder.layer.1.attention.self.value': default_qconfig, 'encoder.layer.1.attention.output.dense': default_qconfig, 'encoder.layer.1.intermediate.dense': default_qconfig, 'encoder.layer.1.output.dense': default_qconfig, ... } ``` After this Diff, we only need the following ``` qconfig_dict = { torch.nn.Linear : default_qconfig } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/23212 ghstack-source-id: 88287091 Reviewed By: zafartahirov Differential Revision: D16436542 fbshipit-source-id: 11fbe68ee460560c1a7cdded63581eb7a00e5a89	2019-08-14 13:07:36 -07:00
Edward Yang	bb9996509b	Fix expansion of stride argument in avg_pool3d (#23963 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23963 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D16695160 Pulled By: ezyang fbshipit-source-id: dc8fd1f0c7096fcd4eb48ce42069307915052a77	2019-08-14 12:47:10 -07:00
Edward Yang	897245c16d	Fix expansion of stride argument in avg_pool2d (#23961 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23961 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D16695162 Pulled By: ezyang fbshipit-source-id: 28eca6920bd1b4e72286b4ab859cf513dcd0db44	2019-08-14 12:47:07 -07:00
Edward Yang	d373dac817	Fix expansion of stride argument in max_pool3d (#23960 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23960 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D16695161 Pulled By: ezyang fbshipit-source-id: 36d1777467bbe3f8842736c570b029b72954e027	2019-08-14 12:47:03 -07:00
Edward Yang	4952224455	Fix expansion of stride argument in max_pool2d (#23954 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23954 There is only one substantive change: when stride.size() == 1, we expand it to size 2. However, I also took the opportunity to give a better error message. Testing here is bare minimum, because I'm in a hurry. Just make sure C++ API with all size 1 inputs works. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D16695163 Pulled By: ezyang fbshipit-source-id: 31674bf97db67e60e4232514c88a72be712bd9ae	2019-08-14 12:46:59 -07:00
Richard Zou	4bfd33ed36	Name inference for softmax, log_softmax and Dimname overloads. (#24087 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24087 Added name inference rules for softmax and log_softmax. Added the overloads for Dimname dim to softmax and log_softmax. Test Plan: - [namedtensor ci] Differential Revision: D16763391 Pulled By: zou3519 fbshipit-source-id: 676a14666d42441eb7d3c9babef7461c7b78d290	2019-08-14 12:19:27 -07:00
Richard Zou	5cb8a7b396	Fix out= function semantics for named tensors. (#24028 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24028 Previously, torch.abs(tensor, out=out) would ignore the names of the `out` tensor and overwrite them with the names of `tensor`. This patch changes the behavior to the following: 1) If `out` does not have names, then overwite them with `tensor.names`. 2) If `out` does have names, then check that `out.names` equals `tensor.names`. This patch also includes the following clean ups: - renamed `default_names` to `FIXME_default_names` because it is inefficient and needs to be fixed. - Renamed impl::internal_get_names / impl::internal_has_names to impl::get_names / impl::set_names. Devs should feel free to use them, so I removed the internal_ prefix. - Moved internal_set_names to NamedTensor.{h, cpp}. These functions still have the internal_ prefix because their use requires caution. Test Plan: - [namedtensor ci] Differential Revision: D16763387 Pulled By: zou3519 fbshipit-source-id: 57dcc7c759246def0db2746d1dca8eddd5e90049	2019-08-14 12:19:23 -07:00
Vitaly Fedyunin	a5872a16a0	Rename torchtest.test_all_device_types to torchtest.for_all_device_types (#24337 ) Summary: Rename decorator to `for_all_device_types` as `test_` prefixed name recognized as test in some environments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24337 Differential Revision: D16806807 Pulled By: VitalyFedyunin fbshipit-source-id: 3132366046e183329ba5838a4bc29441fdb5bd4e	2019-08-14 12:09:51 -07:00
Michael Suo	8a7e57c416	clean up import_source (#24282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24282 This moves a test from Python to cpp, and in doing so lets us clean up a bunch of otherwise unused code. Test Plan: Imported from OSS Differential Revision: D16800562 Pulled By: suo fbshipit-source-id: ebc29bb81f4fb2538081fa309ead1739980f1093	2019-08-14 11:26:26 -07:00
Michael Suo	c158848abe	class_table_ to deps_table_ (#24281 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24281 These are not just classes anymore, rename Test Plan: Imported from OSS Differential Revision: D16800564 Pulled By: suo fbshipit-source-id: 8b8d508944c26a8916fc7642df43f22583dfcf82	2019-08-14 11:26:22 -07:00
Michael Suo	735df86caa	make FunctionType a NamedType (#24280 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24280 This simplifies the groundwork for serializing functions. Test Plan: Imported from OSS Differential Revision: D16800560 Pulled By: suo fbshipit-source-id: 129b32dddb39494daeade33c87d76248486a86b2	2019-08-14 11:26:18 -07:00
Michael Suo	025116cf4a	make NamedType an interface (#24279 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24279 As title. I want to let children be able to define how to get their own name Test Plan: Imported from OSS Differential Revision: D16800563 Pulled By: suo fbshipit-source-id: 6a12ffef96b0dfa5543c5463386170de7726ad58	2019-08-14 11:26:14 -07:00
Michael Suo	5839a59ae3	simplify NamedType interface (#24278 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24278 We had a lot of redundant methods. Killing them. Test Plan: Imported from OSS Differential Revision: D16800561 Pulled By: suo fbshipit-source-id: 60acc1d5b0f34130a1f66a1e5bc7df364a5feb57	2019-08-14 11:26:10 -07:00
Elias Ellison	abadf0079f	fix list comprehension type assumed to the same as input type (#24271 ) Summary: Previously we didn't handle list comprehensions where the expression produced a different type than the input list. `[float(x) for x in [1, 2, 3]` Fix for https://github.com/pytorch/pytorch/issues/24239 Pull Request resolved: https://github.com/pytorch/pytorch/pull/24271 Differential Revision: D16806564 Pulled By: eellison fbshipit-source-id: 1af6a174b9d17a6ea7154511133c12c691eb9188	2019-08-14 11:20:03 -07:00
Michael Suo	a69a62cf83	fix test_jit.py so it can be run in parallel (#24311 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24311 Now you can run tests with `pytest -n auto test/test_jit.py` to get tests to run in parallel. On my devfair in opt mode, this takes < 30 seconds, which is a huge improvement. The actual changes are places where we hard-coded certain things that got changed due to how pytest-xdist distributes tests: 1. Warnings are filtered after they are tripped once, so `test_trace_warn` shouldn't rely on warning counts. 2. various file/save things hardcoded paths inappropraitely Test Plan: Imported from OSS Differential Revision: D16801256 Pulled By: suo fbshipit-source-id: 62a3543dd7448a7d23bdef532953d06e222552ee	2019-08-14 11:15:14 -07:00
Kevin Wilfong	88b1f6619e	Return list of AccessedFeatures from get_accessed_features (#23983 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23983 While testing I realized that model layers can extract different types of features from the same column. For example, MultifeedFeaturesTransform uses float and ID list features from the "features" column. get_accessed_features returns a map from column to AccessedFeatures, and AccessedFeatures only has the feature IDs for one feature type. This is incompatible with have multiple types of features per column, one type ends up overwriting another in the map. To fix this, I've modified get_accessed_features to return a map from column to a list of AccessedFeatures objects. Reviewed By: itomatik Differential Revision: D16693845 fbshipit-source-id: 2099aac8dc3920dd61de6b6ad5cf343c864803bc	2019-08-14 10:50:27 -07:00
Stephen Chen	b53916a373	C2/glow: assign net_pos to a net before applying onnxifi_blacklist_ops (#24262 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24262 Previously for onnxifi_blacklist_ops option, we figure out the net_pos based on the order of ops in the net. But this logic is wrong if the net already has net_pos assigned and we may end up blacklisting unintended ops. Fix this issue to always assign net_pos before computing any blacklist. Reviewed By: yinghai Differential Revision: D16789166 fbshipit-source-id: 2d08a7737d417822f2209adb4dcb24dbb258ff90	2019-08-14 10:39:15 -07:00
Richard Zou	f996f8d61d	Update tensor.view_names / tensor.names_ API (#23973 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23973 Without loss of generality, I describe the API for `tensor.view_names`. `tensor.names_` has an analogous API. `tensor.view_names(names)` returns a view on tensor with named dims `names`. `names` must be of length `tensor.dim()`; otherwise, if '' is in `names`, then it (known as the "glob") is expanded greedily to be equal to the corresponding names from `tensor.names`. For example, ``` >>> x = torch.empty(2, 3, 5, 7, names=('N', 'C', 'H', 'W')) >>> x.view_names('', 'height', 'width').names ('N', 'C', 'height', 'width') >>> x.view_names('batch', '', 'width').names ('batch', 'C', 'H', 'width') ``` tensor.view_names(**rename_map) returns a view on tensor that has renamed dims as specified in the mapping `rename_map`. For example, ``` >>> x = torch.empty(2, 3, 5, 7, names=('N', 'C', 'H', 'W')) >>> x.view_names(W='width', H='height').names ('N', 'C', 'height', 'width') ``` These are different(!!!) from the C++ API, which only allows the following: - tensor.view_names(optional<DimnameList>) C++ API parity for named tensors is not important right now; I am punting that to the future. Test Plan: - [namedtensor ci] Differential Revision: D16710916 Pulled By: zou3519 fbshipit-source-id: 7cb8056c0fb4c97b04c3a2d1dd0f737e0a67ce34	2019-08-14 09:40:35 -07:00
Richard Zou	2fcdb3a1f3	Rename set_names -> view_names, set_names_ -> names_ (#23962 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23962 This change should make the semantics clearer. `tensor.names_(names)` sets tensor.names to be `names`. `tensor.view_names(names)` returns a view of the tensor with names `names`. Test Plan - [namedtensor ci] Test Plan: Imported from OSS Differential Revision: D16710915 Pulled By: zou3519 fbshipit-source-id: c82fa9812624d03c86f7be84b0a460e3c047aaa0	2019-08-14 09:40:31 -07:00
Richard Zou	7030f2c623	Implement tensor.align_to(names), torch.align_tensors(tensors) (#23804 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23804 `output = tensor.align_to(names)` returns a view of `tensor` such that `output.names = names`. Dimensions with the same names in `tensor` and `output` have the same sizes; dimensions with new names have size 1. The following must be true for this operation to succeed: 1) tensor.names must be a subsequence (not necessarily contiguous) of `names` 2) Aligning tensor.names to names must not change the absolute position from the right of any unnamed dimension. In practice, these constraints mean that aligning cannot transpose names. Some examples: - Tensor[C].align_to(C) -> Tensor[C] - Tensor[N].align_to([N, C]) -> Tensor[N, C] - Tensor[H, W].align_to([N, H, W, C]) -> Tensor[N, H, W, C] - Tensor[None].align_to([N, None]) -> Tensor[N, None] - Tensor[N].align_to([N, None None]) -> Tensor[N, None, None] Examples of error cases: - Tensor[W, H].align_to([N, H, W, C]) -> Error (not a subsequence) - Tensor[None, H].align_to([None, H, W]) -> Error (would change the absolute position from the right of a None dimension) `torch.align_tensors(tensors)` aligns the named dimensions of each tensor according to the alignment rules so that they can be used in an operation. More concretely, it aligns each tensor to the longest names among the names of the tensors in `tensors`. This allows users to emulate "broadcasting by names", which is one of the things named tensors tries to enable. Here is an example: ``` imgs: Tensor[N, C, H, W] scale: Tensor[N] // Doesn't work because we do broadcasting by alignment by default imgs * scale // Does work imgs, scale = torch.align_tensors(imgs, scale) imas * scale ``` Future: - Consider allowing broadcasting by names by default. Test Plan: - The diff looks pretty large but more than half of it is testing. - new tests [namedtensor ci] Differential Revision: D16657927 Pulled By: zou3519 fbshipit-source-id: e2f958bf5146c8ee3b694aba57d21b08e928a4e6	2019-08-14 09:40:27 -07:00
Richard Zou	eabfca3577	Named inference for contiguous(), bernoulli variants, and dropout. (#24109 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24109 See title. Test Plan: - New tests [namedtensor ci] Differential Revision: D16763389 Pulled By: zou3519 fbshipit-source-id: ea14af0fe812d04ca7127a080e56c273b21c30bc	2019-08-14 06:19:28 -07:00
Richard Zou	ad42c7d0f3	Implement name inference rule for empty_like, clone (#24108 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24108 `torch.empty_like(tensor)` and `tensor.clone()` both propagate names to the output tensor. As a part of this change, I fixed the empty(..., names=) overload to include the `memory_format` argument in the normal `empty` declaration in native_functions.yaml. Test Plan: - [namedtensor ci] Differential Revision: D16763392 Pulled By: zou3519 fbshipit-source-id: c7b2bc058d26a515a5fd8deef22c2acb290c8816	2019-08-14 06:19:24 -07:00
Richard Zou	65fa0233c5	Add `names` argument to ones, rand, randn, zeros, full; fix empty (#24107 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24107 In the short term, we implement this by having overloads for each of these functions. In the long term, the plan is to move DimnameList to TensorOptions so that we do not have to duplicate work. Also fixes the implementation of empty. If there are no names, we should just return an unnamed tensor instead of telling the user we don't support their backend/layout. Test Plan: - [namedtensor ci] Differential Revision: D16763393 Pulled By: zou3519 fbshipit-source-id: 7324a6b157187d4f74abc5459052f3323a417412	2019-08-14 06:19:21 -07:00
Hao Lu	e4c9aa8124	format changes Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24270 Reviewed By: yinghai Differential Revision: D16775953 fbshipit-source-id: 8a77e770aa52c0afdd60cf44330dda35846d434b	2019-08-14 01:08:31 -07:00
James Reed	7afe0a8c6d	no_deadline on ModuleAPITests and skip on dynamic quantization test Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24307 Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D16800749 Pulled By: jamesr66a fbshipit-source-id: 7ce466794c13d598b4396bd33fcdcffb57bac1cb	2019-08-13 23:27:15 -07:00
Nikolay Korovaiko	9492a5e0b6	Add logging to autodiff Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23664 Differential Revision: D16800787 Pulled By: Krovatkin fbshipit-source-id: e59a34bff0fb91eb8151c7a5504cdfa6fa23c32b	2019-08-13 23:22:16 -07:00
James Reed	93d2cd7619	Skip test_quantized_nn_mods tests if theres no FBGEMM Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24302 Test Plan: Imported from OSS Differential Revision: D16800352 Pulled By: jamesr66a fbshipit-source-id: 56650d8c937afca77005ad39a5bc38ebd6e71414	2019-08-13 21:23:19 -07:00
Tao Xu	514285890c	Enable QNNPACK for iOS (#24030 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24030 The cmake arg - `USE_QNNPACK` was disabled for iOS build due to its lack of support for building multiple archs(armv7;armv7s;arm64) simultaneously.To enable it, we need to specify the value of IOS_ARCH explicitly in the build command: ``` ./scripts/build_ios.sh \ -DIOS_ARCH=arm64 \ -DBUILD_CAFFE2_MOBILE=OFF \ ``` However,the iOS.cmake will overwirte this value according to the value of `IOS_PLATFORM`. This PR is a fix to this problem. Test Plan: - `USE_QNNPACK` should be turned on by cmake. - `libqnnpack.a` can be generated successfully. - `libortch.a` can be compiled and run successfully on iOS devices. <img src="https://github.com/xta0/AICamera-ObjC/blob/master/aicamera.gif?raw=true" width="400"> Differential Revision: D16771014 Pulled By: xta0 fbshipit-source-id: 4cdfd502cb2bcd29611e4c22e2efdcdfe9c920d3	2019-08-13 21:10:59 -07:00
Jianyu Huang	e94ba742b0	Dynamic Quantized Linear Module (#23128 ) Summary: - ~~Add a unit test for the Dynamic Quantized Linear operator (```torch.fbgemm_linear_quantize_weight```, ```torch.fbgemm_pack_quantized_matrix```, and ```torch.fbgemm_linear_int8_weight```) in ```test_quantized.py```.~~ Move this to D16404027 for a separate review. - Add the Dynamic Quantized Linear module in ```torch/nn/quantized/modules/linear.py```. ~~This is in a rudimentary stage. Will add more functions later~~. - Add the torch.quantize logic (prepare, eval, convert) for dynamic quantization. - Add a unit test for the Dynamic Quantized Linear module in ```test_nn_quantized.py```. - Add a unit test for the Model-level Quantization API Pull Request resolved: https://github.com/pytorch/pytorch/pull/23128 ghstack-source-id: 88257232 Differential Revision: D16258664 fbshipit-source-id: 4be3ac39ee27c088b341c741d3f09f51d5a23ef0	2019-08-13 21:01:23 -07:00
Hong Xu	0b1fee0819	Remove escape_path in our build system. (#24044 ) Summary: Which was added in https://github.com/pytorch/pytorch/issues/16412. Also make some CUDNN_* CMake variables to be build options so as to avoid direct reading using `$ENV` from environment variables from CMake scripts. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24044 Differential Revision: D16783426 Pulled By: ezyang fbshipit-source-id: cb196b0013418d172d0d36558995a437bd4a3986	2019-08-13 20:38:19 -07:00
Orion Reblitz-Richardson	c771d50ca2	Remove hard Caffe2 dependency for TensorBoard (#24295 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24175 and https://github.com/pytorch/pytorch/issues/15618 We should not be importing caffe2 (and dependencies like future, etc) unless needed within `torch.utils.tensorboard`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24295 Reviewed By: NarineK Differential Revision: D16797594 Pulled By: orionr fbshipit-source-id: 984935e2121b62ea1b87a9de33af18ec45b7837b	2019-08-13 20:33:24 -07:00
Jianyu Huang	ec1e53b462	Add dynamic quantized Linear op in PyTorch (#23464 ) Summary: As suggested in https://github.com/pytorch/pytorch/pull/22891, we will add an overload for torch.fbgemm_linear_int8_weight (dynamic quantized version of linear function) that takes PackedLinearWeight as input and is pretty much the same in signature as regular aten::linear. The previous Diff D16381552 is reverted because `quantize_linear` expects the scale to be `float`, and the zero_point to be `int`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23464 ghstack-source-id: 88257231 Differential Revision: D16527741 fbshipit-source-id: 66585f668c6e623c50514eb11633bb711d8767f2	2019-08-13 19:59:35 -07:00
Ailing Zhang	3e5e18d2e9	Fix tensor construction from array (#24283 ) Summary: fixes https://github.com/pytorch/xla/issues/929 The original issue complains about no storage because it is trying to construct a xla tensor from tensor_cpu method. `56fb5e03b5/aten/src/ATen/native/TensorFactories.cpp (L731)` In general for backend other than CPU, this `at::tensor` should construct a CPU tensor and move the tensor to the right backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24283 Differential Revision: D16793872 Pulled By: ailzhang fbshipit-source-id: bdb502a4e1ee4e78d24751917c4cda6f9928b1d2	2019-08-13 19:47:14 -07:00
Zafar Takhirov	45ca36faaf	Add out variant Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23971 Test Plan: Imported from OSS Differential Revision: D16695592 Pulled By: zafartahirov fbshipit-source-id: 210dfceae90ac75c53f56bbb96170bdd8e6b8ff3	2019-08-13 17:36:24 -07:00
James Reed	4e0af295c1	Fix and test conv2d constructor and from_float Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24277 Test Plan: Imported from OSS Differential Revision: D16793043 Pulled By: jamesr66a fbshipit-source-id: bbf74c87aa11adfe15e31ea8190e7542b8127c65	2019-08-13 17:07:19 -07:00
James Reed	e7f1977bae	test_nn_quantized -> test_quantized_nn_mods (#24201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24201 It turns out that the `run_test` script uses a blacklist of "exclude" tests and tests if the test name [starts with](https://github.com/pytorch/pytorch/blob/master/test/run_test.py#L342) the given blacklist item. `nn` was passed as a blacklist item in CI, and that meant that not only was test_nn skipped, but also test_nn_quantized. This renames the test to avoid this situation, and imo puts it in a better position lexicographically next to the other quantization tests. Test Plan: Imported from OSS Differential Revision: D16772820 Pulled By: jamesr66a fbshipit-source-id: 4cde0729b48ae3e36fcedab9c98197831af82dde	2019-08-13 17:07:15 -07:00
Richard Zou	98a3b3d565	Add name propagation for at::alias, add tensor.set_names (#24202 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24202 tensor.set_names(names) is the out-of-place variant of tensor.set_names_(names). This naming is probably confusing so I am taking any and all suggestions. Test Plan: - run tests [namedtensor ci] Differential Revision: D16773014 Pulled By: zou3519 fbshipit-source-id: 61024303c1a34db631cc4cb2c53757345e40d72c	2019-08-13 17:01:18 -07:00
Max Kalinin	517b3c4cd2	Fix validation of dynamic axes names (#23974 ) Summary: Existing code adds two enumerators to the set instead of forming their union. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23974 Differential Revision: D16732762 Pulled By: ezyang fbshipit-source-id: 787737b7cf4b97ca4e2597e2da4a6ade863ce85c	2019-08-13 16:33:27 -07:00
Jing Huang	74765c0015	Fix rotated rect intersection error (#24171 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24171 There can be up to 24, instead of 16, intersections (including duplications) returned from rotated_rect_intersection_pts, which caused errors of num <= 16 assertions in https://fburl.com/scuba/mzmf49xc (thanks to Ananth's report) when the boxes are extremely close (e.g., the newly added unit test case) Differential Revision: D16760676 fbshipit-source-id: 289c25ef82c094d98bfe570c5d35c055e49703cb	2019-08-13 16:23:13 -07:00
Iurii Zdebskyi	0ea8f22951	Enabled comparison ops for bfloat16 dtype on CPU (#24182 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24182 ----- Fix: Enabled comparison operations for BFloat16 on CPU Test: via unit tests Test Plan: Imported from OSS Differential Revision: D16763460 Pulled By: izdeby fbshipit-source-id: 885ff9006d3bd60bb945147c3b86f97cd0d26f7b	2019-08-13 15:44:24 -07:00
Tongzhou Wang	98d3d1659e	Document benchmarking practice for CUDA Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23910 Differential Revision: D16732365 Pulled By: ezyang fbshipit-source-id: 24e055602d479293da3e00a7143bba8f92bb7c4a	2019-08-13 15:07:23 -07:00
Huamin Li	f511abb701	increase default warmup iter and iter (#24272 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24272 As title, plus some lint Reviewed By: mingzhe09088 Differential Revision: D16792312 fbshipit-source-id: 1386c369c96da04a584d1f7127b708b29d4b47d2	2019-08-13 14:35:19 -07:00
Michael Suo	0f8d1fbe96	Revert D16611883: [jit] simplify NamedType interface Differential Revision: D16611883 Original commit changeset: a32c0a8b8b7e fbshipit-source-id: c0829ec8432a32b0174c26a2cd18f85c0e7f8a3f	2019-08-13 14:07:04 -07:00
Raghuraman Krishnamoorthi	1c5e48bbd0	Observer returns original tensor for post training quantization (#24196 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24196 Observer returns output with no changes for post training quant. This unifies observer semantics for QAT and PTQ. ghstack-source-id: 88140887 Differential Revision: D16768277 fbshipit-source-id: fae7c94e3dc0eeda363e9982b3865a15113e11bd	2019-08-13 14:01:37 -07:00
Frank Jiang	1439152e72	Make hashing default for bucket-weighted pooling (#24266 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24266 As title Reviewed By: huginhuangfb Differential Revision: D16775870 fbshipit-source-id: f919fdffa014ef3ce9a69fe173dd240e91813c3e	2019-08-13 13:56:32 -07:00
Michael Suo	19528d4106	Revert D16611885: [jit] make NamedType an interface Differential Revision: D16611885 Original commit changeset: 620b22c314ed fbshipit-source-id: 5b9cd23ab39dfdb0182a34d4dfc8a3393c862243	2019-08-13 13:48:01 -07:00
davidriazati	c2d352138c	Fix missing version < 2 guard in import (#24255 ) Summary: This was accidentally removed in #23241 Pull Request resolved: https://github.com/pytorch/pytorch/pull/24255 Pulled By: driazati Differential Revision: D16788490 fbshipit-source-id: 9465570ade0299a845ec1b51cf88efe9c49b439b	2019-08-13 13:43:00 -07:00
Edward Yang	6e442a3fe6	Revert D16611884: [jit] make FunctionType a NamedType Differential Revision: D16611884 Original commit changeset: 620d3446cb35 fbshipit-source-id: d2daa30a84dec796a2c8d8309ef41fd27d601825	2019-08-13 13:27:07 -07:00
Edward Yang	f36c3e9e4a	Revert D16684391: [jit] class_table_ to deps_table_ Differential Revision: D16684391 Original commit changeset: af0024c0b7fb fbshipit-source-id: c9b98ac60b460963dc50f4837100909ff8f6c3ea	2019-08-13 13:27:03 -07:00
Edward Yang	94aae71ba9	Revert D16684390: [jit] clean up import_source Differential Revision: D16684390 Original commit changeset: fca81ca14d1a fbshipit-source-id: eb229097560ab1ead43756175e552764c8a14703	2019-08-13 13:26:59 -07:00
Zafar Takhirov	4e6698573b	Ignoring the test logs in case the tests are ran from the parent directory Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24212 Test Plan: Imported from OSS Differential Revision: D16775806 Pulled By: zafartahirov fbshipit-source-id: e1a2290129447b847c6bf6fa1aa3514c7e63aaf3	2019-08-13 12:24:17 -07:00
meijieru	bd054e7cef	reduce memory usage for centered rmsprop (#24170 ) Summary: Reduce gpu memory usage by using in-place operation Pull Request resolved: https://github.com/pytorch/pytorch/pull/24170 Differential Revision: D16784495 Pulled By: vincentqb fbshipit-source-id: 03820cdc9a3952b95b9af0f87d3a9bb0f21e9b4d	2019-08-13 12:18:31 -07:00
Edward Yang	5ae909b443	Revert D15920763: Move TensorOptions to ATen/core Differential Revision: D15920763 Original commit changeset: c3429973180a fbshipit-source-id: 0efb27722b371e1047f02240f071bc222b52e51d	2019-08-13 12:07:18 -07:00
Pieter Noordhuis	14ab44f834	Fix flake8 issues in ./torch/jit Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24240 Differential Revision: D16783395 Pulled By: ezyang fbshipit-source-id: 8427b7cd7d0552820cbbf20ebfca86898f3f53f7	2019-08-13 11:50:02 -07:00
Zachary DeVito	c2549cb8d3	Remove DimensionedTensorType (#24077 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24077 This replaces all uses of DimensionedTensorType with ProfiledTensorType. For places where we propagate shape information, we still follow the dimension-only propagation rules, meaning that even if full size information is known on inputs the outputs will only have dimension information. This fixes several bugs in existing implentations that this change uncovered: * requires_grad was not propgated correctly across loops * requires_grad on ProfiledTensorType returned false when requires_grad information is unknown but the conservative result is true * some equality code on ProfiledTensorType contained bugs. Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D16729581 Pulled By: zdevito fbshipit-source-id: bd9f823c1c6b1d06a236a1b5b2b2fcdf0245edce	2019-08-13 10:05:47 -07:00
Zafar Takhirov	4cc16782f3	Removing the make_module script. (#23635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23635 It appears it is the same complexity to add new modules using a base class and using a generation script. Test Plan: Imported from OSS Differential Revision: D16593364 Pulled By: zafartahirov fbshipit-source-id: 852dcf41f3dfa2a89152042b8e61d0b6defa8feb	2019-08-13 09:58:28 -07:00
Vitaly Fedyunin	6d14f7a214	Simplify tests that should cover all possible devices (#23824 ) Summary: This PR introduce `pytorchtest.test_all_device_types()` decorator which helps to write CPU, CUDA tests faster, iterating single test through all available devices Simple `test_var_mean_some_dims` becomes ``` test_var_mean_some_dims (__main__.TestTorch) ... ok test_var_mean_some_dims_cpu (__main__.TestTorch) ... ok test_var_mean_some_dims_cuda (__main__.TestTorch) ... ok ``` ```python class pytorchtest(): """Allows to generate and run per-device unittests. This decorator class allows to generate and run per-device unittest. Example: class _TestTorchMixin(pytorchtest): pytorchtest.test_all_device_types() def test_zeros_like(self, device): expected = torch.zeros((100, 100,), device=device) Will execute: test_zeros_like (__main__.TestTorch) ... skipped 'Look at test_zeros_like_cpu, test_zeros_like_cuda results.' test_zeros_like_cpu (__main__.TestTorch) ... ok test_zeros_like_cuda (__main__.TestTorch) ... ok To work properly, test class should be inherited from the `pytorchtest`. test_all_device_types decorator does not guarantee proper functionality in combination with other decorators. Please do not extend this decorator to support other cases (such as dtype, layouts, etc) without consulting with bigger group. Devices is the special case as build flags control additions/removals (see https://github.com/pytorch/pytorch/pull/23824 for the reference). """ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/23824 Differential Revision: D16716959 Pulled By: VitalyFedyunin fbshipit-source-id: ba39af0f9bce2c4a64da421bbc24d6a1c1d9139d	2019-08-13 09:36:31 -07:00
Sergio Giro	dc870a3761	Hypothesis tests: add ability to enforce shape inference (#23935 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23935 Add parameter to enforce that outputs are inferred Reviewed By: yinghai Differential Revision: D16667772 fbshipit-source-id: 44f9c47133749b48c0db25a54f9bd9f4698f3e7d	2019-08-13 05:32:41 -07:00
Zafar Takhirov	5bf299b140	Add out variant Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23956 Test Plan: Imported from OSS Differential Revision: D16692445 Pulled By: zafartahirov fbshipit-source-id: 75c1befb4c9fae7bbe5fb7b1e9bc1a89bf0e4f51	2019-08-13 00:28:54 -07:00
Sebastian Messmer	199398bbd1	Disambiguate tensor and string ops (#23748 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23748 This extends the changes from https://github.com/pytorch/pytorch/pull/23532 ghstack-source-id: 88157704 Differential Revision: D16629907 fbshipit-source-id: ffcf937ec34a798a971e7d28ad85afb3b646d1fe	2019-08-12 20:35:08 -07:00
Sebastian Messmer	a0ddb728e6	toString(FunctionSchema) shows overload name (#23694 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23694 - ghstack-source-id: 88157699 Differential Revision: D16611686 fbshipit-source-id: a48ef1dd49e785e059fb027d9809e9b6deeb6e67	2019-08-12 20:35:04 -07:00
Shen Li	ca9456f10f	Use JIT function schema parser to parse builtin RPC ops Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24207 Reviewed By: mrshenli Differential Revision: D16774405 Pulled By: smessmer fbshipit-source-id: 5a1771ebfbd2e505c4d83155e0e1da63e4fa3b25	2019-08-12 20:35:01 -07:00
Michael Suo	bb4f4e4d03	clean up import_source (#23846 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23846 This moves a test from Python to cpp, and in doing so lets us clean up a bunch of otherwise unused code. Test Plan: Imported from OSS Differential Revision: D16684390 Pulled By: suo fbshipit-source-id: fca81ca14d1ac9e4d6b47ae5eecaa42b38d69147	2019-08-12 20:30:06 -07:00
Michael Suo	2dbd36b384	class_table_ to deps_table_ (#23845 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23845 These are not just classes anymore, rename Test Plan: Imported from OSS Differential Revision: D16684391 Pulled By: suo fbshipit-source-id: af0024c0b7fbcca68785ec3fc6dc288ec46a1b84	2019-08-12 20:30:01 -07:00
Michael Suo	3f90b85ebc	make FunctionType a NamedType (#23697 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23697 This simplifies the groundwork for serializing functions. Test Plan: Imported from OSS Differential Revision: D16611884 Pulled By: suo fbshipit-source-id: 620d3446cb353befde090a81a250cdd2d5e35aa8	2019-08-12 20:29:57 -07:00
Michael Suo	873e86acbe	make NamedType an interface (#23696 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23696 As title. I want to let children be able to define how to get their own name Test Plan: Imported from OSS Differential Revision: D16611885 Pulled By: suo fbshipit-source-id: 620b22c314eddf95159546810e1a00b1646663b8	2019-08-12 20:29:53 -07:00
Michael Suo	a0836cb8da	simplify NamedType interface (#23691 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23691 We had a lot of redundant methods. Killing them. Test Plan: Imported from OSS Differential Revision: D16611883 Pulled By: suo fbshipit-source-id: a32c0a8b8b7e909b386a70abb0827c26cbd37e20	2019-08-12 20:29:49 -07:00
Michael Suo	6cae07a668	search class type for methods (#23689 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23689 We store methods, no reason to try to lock the CU to find a method on a class type Test Plan: Imported from OSS Differential Revision: D16610045 Pulled By: suo fbshipit-source-id: d84ad81faa42c4e2da20b666fa3645e22f11dac3	2019-08-12 20:29:45 -07:00
James Reed	7923884a03	Fix incorrect type annotation on Linear __setstate__ Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24209 Test Plan: Imported from OSS Differential Revision: D16777886 Pulled By: jamesr66a fbshipit-source-id: 4f75b3c16458f093a5ae658d36dcb7a6d313410a	2019-08-12 19:21:41 -07:00
James Reed	c00c9b2be2	fix py2 imports in _intrinsic/modules (#24206 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24206 `unicode_literals` messes up python2 when the literals are put in `__all__`, because the python interpreter expects str and not unicode for elements in an import statement. This fixes that Test Plan: Imported from OSS Differential Revision: D16774391 Pulled By: jamesr66a fbshipit-source-id: fee2562f58b2e2c6480726d8809696961a37c8dd	2019-08-12 19:21:37 -07:00
Supriya Rao	40db964455	Add support for using caffe2::ThreadPool in pytorch mobile QNNPACK. (#23658 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23658 How things work for caffe2: Caffe2 Ops -> NNPACK/QNNPACK -> pthreadpool_compute_1/2/3/4d_tiled -> pthreadpool_compute_1d (caffe2 shim) -> caffe2::ThreadPool Before this PR: Pytorch Ops -> NNPACK/QNNPACK -> pthreadpool_compute_1/2/3/4d_tiled -> pthreadpool_compute_1d (third_party implementation without mobile optimization) caffe2::ThreadPool is optimized for mobile. This change leverages this logic for pytorch mobile as a temporary solution improve pytorch mobile perf. It is guarded by the C10_MOBILE macro. For server side we return nullptr. Plan for next steps: Implement a mobile version of "at::parallel_for" which uses caffe2::ThreadPool internally so all ATen/TH multithreading usage is mobile optimized. Refactor QNNPACK and/or pthreadpool to explicitly using "at::parallel_for" primitive to replace pthreadpool_compute_1d for Pytorch. After QNNPACK is refactored, we will delete the mobile_threadpool() API. ghstack-source-id: 88073396 Reviewed By: dreiss Differential Revision: D16594020 fbshipit-source-id: 9f94600756d5f86d24a12a2fd7df3eebd0994f1d	2019-08-12 18:14:15 -07:00
Daya Khudia	f510409281	Enable FBGEMM tests under UBSAN as well (#23570 ) Summary: Enabling tests under UBSAN as well Pull Request resolved: https://github.com/pytorch/pytorch/pull/23570 Test Plan: buck test mode/dev caffe2/test:quantized ``` Running 29 tests Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/3940649677415136 ✓ caffe2/test:quantized - test_qtensor (test_quantized_tensor.TestQuantizedTensor) 0.536 1/29 (passed) ✓ caffe2/test:quantized - test_qtensor_per_channel_affine (test_quantized_tensor.TestQuantizedTensor) 0.453 2/29 (passed) ✓ caffe2/test:quantized - test_qtensor_reshape (test_quantized_tensor.TestQuantizedTensor) 0.302 3/29 (passed) ✓ caffe2/test:quantized - test_qadd_relu_same_qparams (test_quantized.TestQuantizedOps) 0.332 4/29 (passed) ✓ caffe2/test:quantized - test_qtensor_view (test_quantized_tensor.TestQuantizedTensor) 0.351 5/29 (passed) ✓ caffe2/test:quantized - test_qadd_relu_different_qparams (test_quantized.TestQuantizedOps) 0.348 6/29 (passed) ✓ caffe2/test:quantized - test_qtensor_dequantize_linear (test_quantized_tensor.TestQuantizedTensor) 0.338 7/29 (passed) ✓ caffe2/test:quantized - test_qtensor_copy (test_quantized_tensor.TestQuantizedTensor) 0.267 8/29 (passed) ✓ caffe2/test:quantized - test_qtensor_clone (test_quantized_tensor.TestQuantizedTensor) 0.330 9/29 (passed) ✓ caffe2/test:quantized - test_qrelu (test_quantized.TestQuantizedOps) 1.774 10/29 (passed) ✓ caffe2/test:quantized - test_pool_api (test_nn_quantized.ModuleAPITest) 0.418 11/29 (passed) ✓ caffe2/test:quantized - test_qtensor_load_save (test_quantized_tensor.TestQuantizedTensor) 0.724 12/29 (passed) ✓ caffe2/test:quantized - test_relu_api (test_nn_quantized.FunctionalAPITest) 1.013 13/29 (passed) ✓ caffe2/test:quantized - test_qtensor_quant_dequant (test_quantized_tensor.TestQuantizedTensor) 1.055 14/29 (passed) ✓ caffe2/test:quantized - test_qtensor_permute (test_quantized_tensor.TestQuantizedTensor) 0.696 15/29 (passed) ✓ caffe2/test:quantized - test_qtensor_dtypes (test_quantized_tensor.TestQuantizedTensor) 0.841 16/29 (passed) ✓ caffe2/test:quantized - test_quant_dequant_api (test_nn_quantized.ModuleAPITest) 0.616 17/29 (passed) ✓ caffe2/test:quantized - test_qtensor_creation (test_quantized_tensor.TestQuantizedTensor) 0.698 18/29 (passed) ✓ caffe2/test:quantized - test_qconv (test_quantized.TestQuantizedConv) 4.743 19/29 (passed) ✓ caffe2/test:quantized - test_cat (test_quantized.TestQuantizedOps) 6.992 20/29 (passed) ✓ caffe2/test:quantized - test_linear_api (test_nn_quantized.ModuleAPITest) 8.970 21/29 (passed) ✓ caffe2/test:quantized - test_conv_api (test_quantized_conv.QuantizedConvTest) 9.403 22/29 (passed) ↷ caffe2/test:quantized - test_qnnpack_linear (test_quantized.TestQNNPackOps) 0.000 23/29 (skipped) Test output: > Skipped: QNNPACK does not play well with UBSAN at the moment, so we skip the test if we are in a UBSAN environment. > test_qnnpack_linear (test_quantized.TestQNNPackOps) ... skipped 'QNNPACK does not play well with UBSAN at the moment, so we skip the test if we are in a UBSAN environment.' > > ---------------------------------------------------------------------- > Ran 1 test in 0.000s > > OK (skipped=1) ↷ caffe2/test:quantized - test_qnnpack_relu (test_quantized.TestQNNPackOps) 0.000 24/29 (skipped) Test output: > Skipped: QNNPACK does not play well with UBSAN at the moment, so we skip the test if we are in a UBSAN environment. > test_qnnpack_relu (test_quantized.TestQNNPackOps) ... skipped 'QNNPACK does not play well with UBSAN at the moment, so we skip the test if we are in a UBSAN environment.' > > ---------------------------------------------------------------------- > Ran 1 test in 0.000s > > OK (skipped=1) ✓ caffe2/test:quantized - test_max_pool2d (test_quantized.TestQuantizedOps) 8.453 25/29 (passed) ✓ caffe2/test:quantized - test_qlinear_unpack (test_quantized.TestQuantizedLinear) 0.664 26/29 (passed) ✓ caffe2/test:quantized - test_qconv_unpack (test_quantized.TestQuantizedConv) 2.965 27/29 (passed) ✓ caffe2/test:quantized - test_qlinear (test_quantized.TestQuantizedLinear) 1.915 28/29 (passed) ✓ caffe2/test:quantized - test_conv_api (test_nn_quantized.ModuleAPITest) 60.804 29/29 (passed) ✓ caffe2/test:quantized - main 0.000 (passed) Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/3940649677415136 Summary (total time 68.66s): PASS: 28 FAIL: 0 SKIP: 2 caffe2/test:quantized - test_qnnpack_linear (test_quantized.TestQNNPackOps) caffe2/test:quantized - test_qnnpack_relu (test_quantized.TestQNNPackOps) FATAL: 0 TIMEOUT: 0 OMIT: 0 ``` Reviewed By: jianyuh Differential Revision: D16569166 Pulled By: dskhudia fbshipit-source-id: 53522b4162eb1ebb35b408a1503d9664305c85b0	2019-08-12 17:59:22 -07:00
James Reed	71fd30e33b	JIT serialization for Conv2d Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24117 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D16765475 Pulled By: jamesr66a fbshipit-source-id: 4e6f91efac01cd26e2f1e21569242e4a45e4f8de	2019-08-12 16:24:58 -07:00
James Reed	f66bfa7ec4	state_dict serialization for Conv2d + some bugfixes Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24116 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D16765476 Pulled By: jamesr66a fbshipit-source-id: 96275cea87d7f5e7de5d1925cbce220066f1a465	2019-08-12 16:24:54 -07:00
James Reed	9559c1af3a	Re-work Conv2d Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24115 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D16765474 Pulled By: jamesr66a fbshipit-source-id: 2bb24ad828f5ff325bd978e384c5ec6a0c9610b0	2019-08-12 16:24:50 -07:00
Zachary DeVito	4a754dc3e3	cleanup warnings Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24133 Test Plan: Imported from OSS Differential Revision: D16746249 Pulled By: zdevito fbshipit-source-id: 051f048b03043d6947544cd02ae44288bd439ef9	2019-08-12 16:12:30 -07:00
Stephen Roller	1daac9c0a2	Update tensorboard.rst (#22026 ) Summary: Patch Description: Update the docs to reflect one no longer needs to install tensorboard nightly, as Tensorboard 1.14.0 was [released last week](https://github.com/tensorflow/tensorboard/releases/tag/1.14.0). Testing: Haven't actually tested pytorch with tensorboard 1.14 yet. I'll update this PR once I have. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22026 Differential Revision: D16772136 Pulled By: orionr fbshipit-source-id: 2e1e17300f304f50026837abbbc6ffb25704aac0	2019-08-12 15:02:26 -07:00
Ilia Cherniavskii	936632b120	Thread local debug info Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22365 Test Plan: USE_CUDA=0 python setup.py develop ./build/bin/test_jit Imported from OSS Reviewed By: ajyu Differential Revision: D16065129 Pulled By: ilia-cher fbshipit-source-id: f300985459a83c2c1049ed8c4fefd23a3144047a	2019-08-12 14:53:57 -07:00
davidriazati	90895c8f85	Fix trace docs (#24191 ) Summary: These were incorrect and didn't run before Pull Request resolved: https://github.com/pytorch/pytorch/pull/24191 Pulled By: driazati Differential Revision: D16770604 fbshipit-source-id: 0d8547185871f7f4b1e44c660e45699ed8240900	2019-08-12 14:48:42 -07:00
Gregory Chanan	497bc3f283	Remove unused parameter from FORALL macros and rename STUBS to QINTS. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23340 Test Plan: Imported from OSS Differential Revision: D16467981 Pulled By: gchanan fbshipit-source-id: f4535c21ea54838d2086b2887a73e02e28b783d9	2019-08-12 14:43:39 -07:00
Gregory Chanan	f5fefd62e2	Align AT_FORALL macros with AT_DISPATCH macros. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23339 Test Plan: Imported from OSS Differential Revision: D16467983 Pulled By: gchanan fbshipit-source-id: 84a29a03d3ec9c6416cad254a9ff1005fdc6324f	2019-08-12 14:43:35 -07:00
davidriazati	75c1419b46	Add Pickler C++ API (#23241 ) Summary: This PR adds functions to wrap the Pickler and exposes them to the C++ API ](https://our.intern.facebook.com/intern/diff/16746451/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/23241 Pulled By: driazati Differential Revision: D16746451 fbshipit-source-id: 25ea5db4174006ce41e2e8989c8a345b82f637a7	2019-08-12 14:43:31 -07:00
Edward Yang	d125b5ffa2	Fix C412 lint from flake8-comprehensions update. (#24184 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24184 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D16764168 Pulled By: ezyang fbshipit-source-id: cc252a860fd7e4b7fb2b95c5d9fcdbf6935ffeb6	2019-08-12 14:34:45 -07:00
Gregory Chanan	06c09a266b	Ignore bugprone-lambda-function-name in clang-tidy. (#24190 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/23947. In https://github.com/pytorch/pytorch/pull/23970, I ignored these in dispatch macros, but I think it's more maintainable to just block this globally. And it's a pretty minor issue if it happens anyway. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24190 Differential Revision: D16766329 Pulled By: gchanan fbshipit-source-id: 7ae7b7781562a8974d974f7eefa8ec7551eb09fc	2019-08-12 14:21:29 -07:00
Tao Xu	4c6c9ffaf8	Move iOS.cmake to the cmake folder (#24029 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24029 The cmake toolchain file for building iOS is currently in `/third-pary/ios-cmake`. Since the upstream is not active anymore, It's better to maintain this file ourselves moving forward.This PR is also the prerequisite for enabling QNNPACK for iOS. Test Plan: - The `libtorch.a` can be generated successfully - The `libtorch.a` can be compiled and run on iOS devices <img src="https://github.com/xta0/AICamera-ObjC/blob/master/aicamera.gif?raw=true" width="400"> Differential Revision: D16770980 Pulled By: xta0 fbshipit-source-id: 1ed7b12b3699bac52b74183fa7583180bb17567e	2019-08-12 14:17:28 -07:00
Spandan Tiwari	7583519b87	Provide argument in ONNX export to exclude intializers from graph inputs. (#23284 ) Summary: Starting ONNX IR version 4, the initializers in the ONNX graph do not have to be inputs of the graphs. This constraint, which existed in IR version 3 and earlier, was relaxed in IR version 4. This PR provides an API level argument to allow ONNX export with the relaxed constraint of IR version 4, i.e. provides the option to not include initializers as inputs. This allows backends/runtimes to do certain optimizations, such as constant folding, better. Edit: After discussion with houseroad we have the following behavior. For any OperatorExportType, except OperatorExportTypes.ONNX, the current status of export is maintained in this PR by default. However, the user can override it by setting the `keep_initializers_as_inputs` argument to the export API. But when exporting to ONNX, i.e. OperatorExportType is OperatorExportTypes.ONNX, the current status is changed in that by default the initializers are NOT part of the input. Again, the default can be overridden by setting the `keep_initializers_as_inputs` argument. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23284 Differential Revision: D16459961 Pulled By: bddppq fbshipit-source-id: b8f0270dfaba47cdb8e04bd4cc2d6294f1cb39cf	2019-08-12 14:17:25 -07:00
Brian Vaughan	465b4de9d4	add function name to error messages generated by checked_tensor_unwrap (#24187 ) Summary: Improve error messages by showing the relevant function call that failed. Before: ``` >>> torch.ones(1, dtype=torch.float) < torch.ones(1, dtype=torch.double) Traceback (most recent call last): File "<stdin>", line 1, in <module> RuntimeError: Expected object of scalar type Float but got scalar type Double for argument https://github.com/pytorch/pytorch/issues/2 'other' ``` After: ``` >>> torch.ones(1, dtype=torch.float) < torch.ones(1, dtype=torch.double) Traceback (most recent call last): File "<stdin>", line 1, in <module> RuntimeError: Expected object of scalar type Float but got scalar type Double for argument https://github.com/pytorch/pytorch/issues/2 'other' in call to _th_lt ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/24187 Differential Revision: D16769167 Pulled By: nairbv fbshipit-source-id: 4992eb4e86bdac2ab8805cc5356f7f92c63e1255	2019-08-12 14:02:22 -07:00
Richard Zou	75db368031	Revert D16763388: Add name propagation for at::alias, add tensor.set_names Differential Revision: D16763388 Original commit changeset: 4b2fb3acc051 fbshipit-source-id: 5be35bdcc2e7c71378af9e34be19305bdd4ba0d1	2019-08-12 13:42:43 -07:00
Richard Zou	6772f537f0	Revert D16763390: Improve test_namedtensor.py with named tensor equality check Differential Revision: D16763390 Original commit changeset: 170e27ebc4d7 fbshipit-source-id: dbabe837793d8db6493a221b91e43a065baece75	2019-08-12 13:42:39 -07:00
davidriazati	cb4a6327a3	Delete `WeakScriptModuleProxy` (#23398 ) Summary: This PR deletes `WeakScriptModuleProxy` and uses `ScriptModule` directly and moves the recursive script stuff into `torch/jit/_recursive.py`. The first commit is just moving code, the latter 2 contain the actual changes ](https://our.intern.facebook.com/intern/diff/16712340/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/23398 Pulled By: driazati Reviewed By: eellison Differential Revision: D16712340 fbshipit-source-id: f907efcec59bb2694c079ab655304324c125e9bb	2019-08-12 13:36:47 -07:00
Geoffrey Goh	ceb9a573d9	Implement virtual memory computation in caffe2_benchmark binary (#24144 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24144 Implement virtual memory computation in caffe2_benchmark binary in windows. Reviewed By: hl475 Differential Revision: D16752250 fbshipit-source-id: aceb13ddd507aa2e6bad07de28d79776e6ee517c	2019-08-12 13:08:47 -07:00
Richard Zou	90f3f9d9aa	Improve test_namedtensor.py with named tensor equality check (#24106 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24106 Test Plan - Code reading. assertTensorDataAndNamesEqual isn't used in this commit but it'll be used in future commits. - [namedtensor ci] Test Plan: Imported from OSS Differential Revision: D16763390 Pulled By: zou3519 fbshipit-source-id: 170e27ebc4d79aca939c5d101489b20faedc6133	2019-08-12 12:45:00 -07:00
Richard Zou	1108fa1acb	Add name propagation for at::alias, add tensor.set_names (#24105 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24105 tensor.set_names(names) is the out-of-place variant of tensor.set_names_(names). This naming is probably confusing so I am taking any and all suggestions. Test Plan: - run tests [namedtensor ci] Differential Revision: D16763388 Pulled By: zou3519 fbshipit-source-id: 4b2fb3acc0514515e7ca805dbc5c3d4a9bd96317	2019-08-12 12:44:56 -07:00
Zafar Takhirov	bb4f380f35	Optimizing out the division in the fusion Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23275 Test Plan: Imported from OSS Differential Revision: D16450294 Pulled By: zafartahirov fbshipit-source-id: 2f1ebf3673ed0467a9f6a912e08e5d95f9b27d0b	2019-08-12 11:35:37 -07:00
svcscm	b028cc752b	Updating submodules Reviewed By: cdelahousse fbshipit-source-id: bae513d6026fd7526994742db1a77c05ae587657	2019-08-12 11:35:33 -07:00
svcscm	a671609a41	Updating submodules Reviewed By: yns88 fbshipit-source-id: 57fcd91545c429f994a6f6183156b848355abc1f	2019-08-12 09:55:16 -07:00
Pavel Belevich	b9a006f947	Make all at::Tensor in-place methods const (#23945 ) Summary: https://github.com/pytorch/pytorch/issues/23901 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23945 Differential Revision: D16738606 Pulled By: pbelevich fbshipit-source-id: df170374e23901f7486b980584641ae6ffaf6cc4	2019-08-12 08:12:38 -07:00
Richard Zou	bde73860c6	Move TensorOptions to ATen/core (#22020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22020 ghimport-source-id: 62766d49658ee84b8076c555432b50e13d104bc6 Test Plan: Imported from OSS Differential Revision: D15920763 Pulled By: zou3519 fbshipit-source-id: c3429973180a65606da82face5c0ee377035e716	2019-08-12 07:41:12 -07:00
Geovanni Zhang	a5f697619c	Add interfaces in lr_scheduler.pyi (#23934 ) Summary: Some interfaces of schedulers defined in lr_scheduler.py are missing in lr_scheduler.pyi. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23934 Differential Revision: D16726622 Pulled By: ezyang fbshipit-source-id: 45fd2d28fbb658c71f6fcd33b8997d6ee8e2b17d	2019-08-12 07:03:41 -07:00
Michael Suo	77c08aa46c	serialize modules as classes Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23098 Test Plan: Imported from OSS Differential Revision: D16383328 Pulled By: suo fbshipit-source-id: 36389b8e45c3febb7f224cd9c630fe643fa90bef	2019-08-11 15:50:29 -07:00
Michael Suo	5ec1c293eb	Revert D16552212: [jit] fix py-compat fbcode lint warnings Differential Revision: D16552212 Original commit changeset: 7c7de5a096ad fbshipit-source-id: b5ea5f626883e2b213b9d02875e83e64ed206e58	2019-08-10 21:58:25 -07:00
Leo Mao	6be24be9ff	OpenCV 4 compatibility fix for caffe2/video (#24143 ) Summary: Trying to fix https://github.com/pytorch/pytorch/issues/24073 as in https://github.com/pytorch/pytorch/issues/9966. Make caffe2 compile with OpenCV 4. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24143 Differential Revision: D16753624 Pulled By: ezyang fbshipit-source-id: 524eac10a9285e0c0a803a8566917aa95aa0662c	2019-08-10 14:50:20 -07:00
Michael Suo	365b3ff56e	send flake8 to stderr (#24100 ) Summary: Doing these one at a time Pull Request resolved: https://github.com/pytorch/pytorch/pull/24100 Differential Revision: D16753599 Pulled By: suo fbshipit-source-id: cfd317a2463cf6792758abe04c0f01a146a7ec47	2019-08-10 13:35:27 -07:00
Mikhail Zolotukhin	d3f6d5885d	Replace Module::copy_into with Module::clone. (#24068 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24068 The new method has a simpler interface (no arguments). Resolves #23915. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24068 Differential Revision: D16736379 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: 1c1f397ce9cdaa5467fd7da3025cf44d1436ae6b	2019-08-09 18:25:38 -07:00
James Reed	9843993888	is_quantized support in JIT Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24099 Test Plan: Imported from OSS Differential Revision: D16742983 Pulled By: jamesr66a fbshipit-source-id: f760df4e7e91f9f76c7e153db59984b3ae938280	2019-08-09 17:32:42 -07:00
James Reed	a45dafc66a	JIT Serialization of nnq.Linear (#24048 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24048 Add `__{g,s}etstate__ methods on `nnq.Linear` for JIT (and torch.{save,load} serialization). Unfortunately, this unearthed a bug in serialization documented in https://github.com/pytorch/pytorch/issues/24045. The check that triggered the bug has been disabled pending a fix Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D16728347 Pulled By: jamesr66a fbshipit-source-id: c3b850be3b831f4c77cec3c2df626151b2af8b34	2019-08-09 17:14:58 -07:00
James Reed	ca2010cfea	State dict serialization of nnq.Linear (#24047 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24047 Add `_{save_to,load_from}_state_dict` methods to `nnq.Linear` that explicitly deal with conversions from the Python attributes to the serialized state dict form Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D16728346 Pulled By: jamesr66a fbshipit-source-id: 182c9f5069d509147dc9020b341b6cb87505fe7f	2019-08-09 17:14:52 -07:00
James Reed	442b3512d4	Simplified nnq.Linear class (#24046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24046 `nnq.Linear` was a confusing mess of buffers/attributes and Tensor/not tensor members. This PR reworks it to consistently have only Python attributes, with the conversions handled explicitly by state_dict or __{get,set}state__ methods (added in PRs further up the stack Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D16728345 Pulled By: jamesr66a fbshipit-source-id: 47468b776b428fca2409bb55c8b161afb68a3379	2019-08-09 17:14:48 -07:00
Mingzhe Li	b453fd9916	separate input shapes to reduce default execution time (#24136 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24136 This diff aims to reduce the execution of benchmark_all_test which runs all the supported operator benchmarks. In the default run, only one shape of each operator will be benchmarked. The rest of the benchmarks can be triggered with tag_filter flag. Reviewed By: hl475 Differential Revision: D16736448 fbshipit-source-id: 33bd86f6fc2610f87f24240ad559fb11d3e35e89	2019-08-09 17:09:21 -07:00
Wanchao Liang	ca7e2a78e0	support grad and data attribute for tensor in script Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23842 Differential Revision: D16683556 fbshipit-source-id: 3e262dc7e497f07d0edb3ab18efc89f74f1d5736	2019-08-09 16:46:16 -07:00
Wanchao Liang	2790439b9d	add initial support for sparse tensors Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23841 Differential Revision: D16683560 fbshipit-source-id: 3abc098399f73a74c44b22c175b0734d145334aa	2019-08-09 16:46:13 -07:00
Zafar Takhirov	83a594cf56	Adding dequantize_val and requantize_val Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23909 Test Plan: Imported from OSS Differential Revision: D16678276 Pulled By: zafartahirov fbshipit-source-id: db0f3033774b44d6aed6e60e84b20b6f4c220ec0	2019-08-09 16:27:00 -07:00
davidriazati	be5eb6782b	Fix builtin function reference (#24056 ) Summary: This was previously buggy and not being displayed on master. This fixes the issues with the script to generate the builtin function schemas and moves it to its own page (it's 6000+ lines of schemas) Sphinx looks like it will just keep going if it hits errors when importing modules, we should find out how to turn that off and put it in the CI This also includes some other small fixes: * removing internal only args from `script()` and `trace()` docs, this also requires manually keeping these argument lists up to date but I think the cleanliness is worth it * removes outdated note about early returns Pull Request resolved: https://github.com/pytorch/pytorch/pull/24056 Pulled By: driazati Differential Revision: D16742406 fbshipit-source-id: 9102ba14215995ffef5aaafcb66a6441113fad59	2019-08-09 15:58:15 -07:00
Sebastian Messmer	211bafc2ea	c10 dispatcher stores autograd kernels (#23666 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23666 ghstack-source-id: 88050430 Differential Revision: D16603130 fbshipit-source-id: bc77c218a4664ad3b57d6918043c93c9df3b42ca	2019-08-09 15:10:41 -07:00
Sebastian Messmer	296f218ac7	Allow kernels that don't have a boxed version (#23665 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23665 For many ATen ops, c10 can't generate boxed kernel versions yet. We need to allow kernels that have only unboxed versions for them to be registerable with c10. ghstack-source-id: 88050429 Differential Revision: D16603132 fbshipit-source-id: 84cae4a514da104f5035d23a4059ca6197469f9c	2019-08-09 15:10:37 -07:00
Sebastian Messmer	9dbee5f8e5	Unboxed kernels in c10 (#23447 ) Summary: The c10 dispatcher now also stores a `void*` pointer to the unboxed kernel function and this kernel function can be called if the call site knows the exact kernel signature. It is not clear if this API will survive in the long term, but in the short term this allows an easier migration from ATen to c10 and is supposed to replace ATenDispatch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23447 ghstack-source-id: 88050435 Differential Revision: D16521939 fbshipit-source-id: 7e570df5a721defc677c3cc91758651dbe06ce1c	2019-08-09 15:10:33 -07:00
Sebastian Messmer	352032c93c	Open up AliasAnalysisKind for any ops (#23834 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23834 Re-expot of reverted PR https://github.com/pytorch/pytorch/pull/23810 with the bug fixed A previous diff removed the special casing for aten:: and prim:: ops in alias analysis and implements alias analysis purely based on the AliasAnalysisKind. To be sure it doesn't break our existing code base, it added asserts that make sure that our existing aten:: and prim:: ops set the correct AliasAnalysisKind. However, we don't need that restriction for future ops. Since we are now certain all existing cases are set up correctly, we can remove these assertions. ghstack-source-id: 88050427 Differential Revision: D16657239 fbshipit-source-id: 8a7606da8e9bd961bf47e3e1587b622a9c247ec6	2019-08-09 15:10:29 -07:00
Wanchao Liang	390bfd5220	support dict augment assign in script Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23639 Differential Revision: D16683559 fbshipit-source-id: bf82df2a93d6dbf2d60f3618c03a650b15453275	2019-08-09 14:57:00 -07:00
Horace He	c79a07e3a4	Added type annotations to unpooling layers (#24101 ) Summary: Currently, `output_size` gets inferred as a `Tensor` type, which isn't correct. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24101 Differential Revision: D16742162 Pulled By: Chillee fbshipit-source-id: 61ba030773dac2830c75974060bd316c35607126	2019-08-09 14:02:11 -07:00
Mike Ruberry	c21a774076	Moves clamp from autodiff cpp to symbolic script (#23927 ) Summary: This PR: - Moves clamp from autodiff cpp to symbolic script - Adds an additional tuple lowering pass to the graph executor - Updates clamp backwards to be maximally gradient preserving Moving clamp to symbolic script presented two challenges: - When the backward graph is defined the branch taken in the conditional is known, but communicating this information to the Jit is a little tricky. It turns out the Jit has a quirk where variables that can be None at the time of graph instantiation are treated as constants, so testing min and max against None lets the Jit instantiate only one path branch. It might be more natural to select different backward functions for these cases, but that is not yet supported. - Moving clamp to symbolic script introduced an extra tuple construction and immediate unpacking which prevented fusion. This was dealt with by adding an additional tuple removal pass. This issue could appear whenever a symbolic script's return value was defined in an if statement, which made the Jit see the unpacked tuple as being constructed from an if, not a TupleConstruct. The graph is later optimized but tuple lowering was not performed again after these optimizations. Moving clamp to symbolic script also adds some explicit conversions to float in graphs which it appears, but these seem harmless. If clamp were simply moved to symbolic script then its backward graphs would look like this: `graph(%0 : Float(, ), %1 : AutogradZeroTensor, %2 : Float(, ), %3 : int[]?, %4 : Scalar?, %5 : int): %6 : None = prim::Constant() # <string>:5:31 %7 : float = aten::Float(%5) # <string>:12:37 %8 : Float(, ) = prim::FusionGroup_0(%0, %2, %7) %9 : (Float(, ), None, None) = prim::TupleConstruct(%8, %6, %6) %10 : Float(, ), %11 : None, %12 : None = prim::TupleUnpack(%9) return (%10) with prim::FusionGroup_0 = graph(%0 : Float(, ), %1 : Float(, ), %2 : float): %3 : Bool(, ) = aten::le(%1, %2) # <string>:12:29 %mask.5 : Float(, ) = aten::type_as(%3, %1) # <string>:12:29 %5 : Float(, ) = aten::mul(%0, %mask.5) # <string>:13:28 return (%5)` And adding the additional pass to remove tuples eliminates the prim::TupleConstruct and prim::TupleUnpack. Keeping these included previously would cause test_fuser_iou to fail because multiple fusion groups would be created. Since https://github.com/pytorch/pytorch/issues/23372 this test is disabled, however. When enabled the relevant portion of its graph is now: `%59 : float = aten::Float(%26) # <string>:314:38 %60 : float = aten::Float(%27) # <string>:314:61 %61 : int[] = aten::size(%14) # <string>:41:99 %62 : int[] = aten::size(%11) # <string>:42:100 %63 : int[] = aten::size(%15) # <string>:41:99 %64 : int[] = aten::size(%12) # <string>:42:100 %65 : Tensor, %66 : Tensor, %67 : Tensor, %68 : Tensor, %69 : Tensor, %70 : Tensor, %71 : Tensor, %72 : Tensor, %73 : Double(, ) = prim::FusionGroup_0(%w.1, %13, %16, %23, %h.1, %54, %inter.1, %0, %12, %15, %18, %17, %29, %11, %14, %60, %59) %74 : Tensor = aten::_grad_sum_to_size(%73, %53) %75 : Tensor = aten::_grad_sum_to_size(%73, %52) %grad_self.10 : Tensor = aten::_grad_sum_to_size(%65, %61) # <string>:41:30 %grad_other.10 : Tensor = aten::_grad_sum_to_size(%66, %62) # <string>:42:31 %78 : Tensor = prim::FusionGroup_1(%grad_self.10, %74, %36) %79 : Tensor = prim::FusionGroup_2(%grad_other.10, %75, %44) %grad_self.14 : Tensor = aten::_grad_sum_to_size(%67, %21) # <string>:33:30 %grad_other.14 : Tensor = aten::_grad_sum_to_size(%68, %22) # <string>:34:31 %grad_self.12 : Tensor = aten::_grad_sum_to_size(%69, %63) # <string>:41:30 %grad_other.12 : Tensor = aten::_grad_sum_to_size(%70, %64) # <string>:42:31 %grad_self.16 : Tensor = aten::_grad_sum_to_size(%71, %19) # <string>:33:30 %grad_other.16 : Tensor = aten::_grad_sum_to_size(%72, %20) # <string>:34:31 %86 : Tensor, %87 : Tensor = prim::FusionGroup_3(%grad_self.12, %grad_self.16, %74, %39) %88 : Tensor, %89 : Tensor = prim::FusionGroup_4(%grad_other.12, %grad_other.16, %75, %47) return (%79, %88, %89, %78, %86, %87, %grad_self.14, %grad_other.14)` Which I think is expected/desired. Finally, this implementation of clamp backwards is "maximally gradient preserving," which simply means that elements on the boundary now receive gradients. For example, if an element of a tensor is 5 and the clamp is to [2, 5], then that element will now receive a gradient. The prior implementation would zero these gradients. See https://github.com/pytorch/pytorch/issues/7002 for a discussion on preserving gradients. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23927 Test Plan: Existing tests provided sufficient coverage. Differential Revision: D16739740 Pulled By: mruberry fbshipit-source-id: c94291d20e1f3f25197afc7b74dc61aeb204b074	2019-08-09 13:57:03 -07:00
Igor Fedan	1d3d92e770	Port addcdiv operator from the TH code to Aten Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24086 Differential Revision: D16733306 Pulled By: ifedan fbshipit-source-id: c103bc44e0bb42dff0229252e1a12ce9b4e5aeae	2019-08-09 13:48:01 -07:00
David Riazati	3c1270a730	Revert D16675418: [jit] Add Pickler C++ API Differential Revision: D16675418 Original commit changeset: 76543c81ac67 fbshipit-source-id: f0249d16d363c4ecbceecd1bf610dc280e659cc0	2019-08-09 13:13:15 -07:00
svcscm	a6c3a95b7b	Updating submodules Reviewed By: yns88 fbshipit-source-id: c525db5ec7c34f3cfa66530dad6d8b24077c94c8	2019-08-09 13:02:01 -07:00
Vitaly Fedyunin	c48fbbf215	Revert D16603913: [pytorch][PR] Enhance Tensor indexSelect performance Differential Revision: D16603913 Original commit changeset: baaa02f184a8 fbshipit-source-id: bdbafc65ff0f2eb1962fc1c532fa107ed124a46f	2019-08-09 12:41:31 -07:00
Gregory Chanan	f5cb95113d	Don't redefine unecessary type stub. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23338 Test Plan: Imported from OSS Differential Revision: D16467977 Pulled By: gchanan fbshipit-source-id: d38d6bad9cc3ba6e7389186a497564f73832b858	2019-08-09 12:36:39 -07:00
Gregory Chanan	2f03205c65	Support torch::tensor and at::tensor with bool and BFloat16 dtypes. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23337 Test Plan: Imported from OSS Differential Revision: D16467979 Pulled By: gchanan fbshipit-source-id: 2e6ad431c47a61c917d501390d14c55b788958ab	2019-08-09 12:36:35 -07:00
davidriazati	01d98c7cfb	Add Pickler C++ API (#23241 ) Summary: This PR adds functions to wrap the Pickler and exposes them to the C++ API ](https://our.intern.facebook.com/intern/diff/16675418/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/23241 Pulled By: driazati Differential Revision: D16675418 fbshipit-source-id: 76543c81ac67c3e20a75ebc2073191bcbd6573bf	2019-08-09 12:25:30 -07:00
Gregory Chanan	e81f296807	Fixed Bool in IsIntegralType bug (plus review comments) (#23942 ) Summary: Same as https://github.com/pytorch/pytorch/pull/23887, but also includes review comments, so we can kick off a build. Original PR: This [PR](https://github.com/pytorch/pytorch/pull/23346) caused [this](https://github.com/pytorch/pytorch/issues/23882) bug. Fix: - Deprecate old isIntegralType and add overload which takes a boolean flag which tells if torch.bool should be included in integral types or not. Testing: - Added extra test cases - Tested via running unit tests locally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23942 Differential Revision: D16688056 Pulled By: gchanan fbshipit-source-id: eff457e27b13e116c05ffd022b2fb0495abe0e97	2019-08-09 12:25:27 -07:00
Michael Suo	f45ec71c4e	fix py-compat fbcode lint warnings Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23530 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D16552212 Pulled By: suo fbshipit-source-id: 7c7de5a096ad9a125976e4710d3660294d3991c5	2019-08-09 12:06:21 -07:00
Chen, Jian Ping	0002448b43	Enhance Tensor indexSelect performance (#23055 ) Summary: This is try to reduce the overhead on the index_select on CPU path at DLRM (https://github.com/facebookresearch/dlrm). To make src as contiguous can make it go into the parallelied path in Tensor indexSelect function Pull Request resolved: https://github.com/pytorch/pytorch/pull/23055 Differential Revision: D16603913 Pulled By: ezyang fbshipit-source-id: baaa02f184a8e70f1193e5d96ada195a46d140b9	2019-08-09 11:52:04 -07:00
Nehal J Wani	d27fb41167	tensor_numpy: add missing include header (#24042 ) Summary: This patch fixes the following error: ``` In file included from /path/to/lib/python3.6/site-packages/numpy/core/include/numpy/arrayobject.h:4:0, from ../torch/csrc/utils/numpy_stub.h:19, from ../torch/csrc/utils/tensor_numpy.cpp:2: ../torch/csrc/utils/tensor_numpy.cpp: In function 'bool torch::utils::is_numpy_scalar(PyObject*)': ../torch/csrc/utils/tensor_numpy.cpp:223:11: error: 'PyInt_Check' was not declared in this scope return (PyArray_IsIntegerScalar(obj) \|\| ^ ../torch/csrc/utils/tensor_numpy.cpp:225:1: warning: control reaches end of non-void function [-Wreturn-type] } ^``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/24042 Differential Revision: D16732545 Pulled By: ezyang fbshipit-source-id: 8d73d228b88b4a95daedcd7a4ef81c268830792e	2019-08-09 11:43:08 -07:00
Tongliang Liao	4f254c3c33	Fix typo "properlyh" Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24067 Differential Revision: D16732526 Pulled By: ezyang fbshipit-source-id: 0f3a5b53c0e46bd40a6e5c838504301766c00a82	2019-08-09 11:43:04 -07:00
Tongzhou Wang	928754b67d	make more iterator attributes private (#23744 ) Summary: 1. Prefixed underscores to any `DataLoaderIter` attribute that is not part of the data loader ctor argument list. 2. Prefixed `DataLoader.dataset_kind` with underscore because it only makes sense with the private enum `_DatasetKind`, and is an implementation detail. 3. Disallow setting `DataLoader.dataset` and `DataLoader.batch_sampler` after initializing a `DataLoader` because they affect other attributes in `__init__`. These changes should not have major BC breaking effect since the big changes are on the iterator class and most users don't even store it. I GitHub searched `pin_memory_thread` and (while I didn't look through all result pages) results I see are forks of pytorch and blog posts on how data loader works. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23744 Differential Revision: D16732507 Pulled By: ezyang fbshipit-source-id: 9f04d000b4200b8047f31eaa3473780b66cebd26	2019-08-09 11:43:00 -07:00
vishwakftw	9b551b1ff7	Fix regression in triangular_solve when number of batches = 1 for CUDA (#23953 ) Summary: Changelog: - When number of batches = 1, dispatch to trsm instead of trsm_batched in MAGMA Pull Request resolved: https://github.com/pytorch/pytorch/pull/23953 Test Plan: - All triangular_solve tests should pass to ensure that the change is valid Differential Revision: D16732590 Pulled By: ezyang fbshipit-source-id: 7bbdcf6daff8a1af905df890a458ddfedc01ceaf	2019-08-09 11:42:57 -07:00
mal	81ba2df554	Allow forward functions with single output to return Variable (#23803 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23803 Custom `forward()` can return a `Variable` in case of single outputs instead of returning a `variable_list` of size 1. Test Plan: Modified tests involving single output forward functions. Reviewed By: ezyang Differential Revision: D16673857 Pulled By: ezyang fbshipit-source-id: c96d9473b48ad99e6736a68d334b333a917498b7	2019-08-09 11:10:14 -07:00
Richard Zou	0bba302da5	Revert D16621830: Add name propagation for at::alias, add tensor.set_names Differential Revision: D16621830 Original commit changeset: f8a3837d3a37 fbshipit-source-id: 801ab858a0741d98b0b9d56763fa70a9010fe75e	2019-08-09 10:55:18 -07:00
Richard Zou	71352fbd9a	Revert D16667816: Improve test_namedtensor.py with named tensor equality check Differential Revision: D16667816 Original commit changeset: 66519cd5d17b fbshipit-source-id: 51a26cdfb5624695a492d3ac93fb7a402c44e11a	2019-08-09 10:55:14 -07:00
Richard Zou	de97b12dbd	Revert D16647820: Add `names` argument to ones, rand, randn, zeros, full Differential Revision: D16647820 Original commit changeset: c6c53c5f26a8 fbshipit-source-id: a341c6eda49f5dd2e1712b65e61fef99791f0668	2019-08-09 10:55:10 -07:00
Richard Zou	177a5c3f41	Revert D16647821: Implement name inference rule for empty_like, clone Differential Revision: D16647821 Original commit changeset: 43b261f3456b fbshipit-source-id: 03caecd6898efd292b4f5c5b7254f7d31d502d6a	2019-08-09 10:55:06 -07:00
Richard Zou	c23dd83480	Revert D16731478: [pytorch][PR] [C++ Tensor API] Make all at::Tensor in-place methods const Differential Revision: D16731478 Original commit changeset: 076d7aea1299 fbshipit-source-id: eaca61af849772d7a842a84bd203eba4d820874d	2019-08-09 10:32:54 -07:00
Richard Zou	521484eaec	Revert D16657926: Named inference for contiguous(), bernoulli variants, and dropout. Differential Revision: D16657926 Original commit changeset: 8cd46765b1c7 fbshipit-source-id: fce2202dd101cfc3153f279a0a4651c9b735e044	2019-08-09 10:32:48 -07:00
Horace He	bb41e62e3b	Updated SGD docs with subscripts (#23985 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/23982 Obvious improvement imo. Also changed `rho` to `mu`, since `rho` and `p` look very similar. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23985 Differential Revision: D16733037 Pulled By: Chillee fbshipit-source-id: 5431615d1983f24d6582da6fc8103ac0093b5832	2019-08-09 10:32:40 -07:00
Alexander Melnikov	5d47d85392	added mesh plugin (#24039 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24039 This diff adds mesh plugin: - added tests to test_tensorboard.py - fixed an error occured after updating tensorboard to the latest version (added "components" argument to create_summary_metadata): `5e5badc666 (diff-068400aa3e34121b7256539582374597)` Reviewed By: orionr Differential Revision: D16714759 fbshipit-source-id: df349541a058fa90310d1815160e29d20c6ef065	2019-08-09 10:22:43 -07:00
Daya Khudia	aa02b1adcd	Fix qconv benchmark (#24019 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24019 Permutes are done inside the module. We don't need them outside. Setting of scale/zero_point has changed. Reviewed By: jianyuh Differential Revision: D16712437 fbshipit-source-id: e3cedf9d63347fbf8070d1a65a196e6d4b2833fc	2019-08-09 09:17:55 -07:00
Daya Khudia	a0556782a0	fix scale and zero_point names (#23991 ) Summary: scale and zero_point name should match with what's used in other methods of the class. Closes https://github.com/pytorch/pytorch/issues/23881 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23991 Test Plan: buck build mode/opt caffe2/benchmarks/operator_benchmark/pt:qconv_test --show-output Reviewed By: jianyuh Differential Revision: D16703956 Pulled By: dskhudia fbshipit-source-id: 5e894bd84caaa20dc7639d4885d59a72f27d8ec2	2019-08-09 09:17:51 -07:00
Richard Zou	4dd2908dd6	Named inference for contiguous(), bernoulli variants, and dropout. (#23808 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23808 See title. Test Plan: - New tests [namedtensor ci] Differential Revision: D16657926 Pulled By: zou3519 fbshipit-source-id: 8cd46765b1c791b73448ddf4585dae56d635364d	2019-08-09 09:17:47 -07:00
Richard Zou	16b6466e5e	Implement name inference rule for empty_like, clone (#23746 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23746 `torch.empty_like(tensor)` and `tensor.clone()` both propagate names to the output tensor. As a part of this change, I fixed the empty(..., names=) overload to include the `memory_format` argument in the normal `empty` declaration in native_functions.yaml. Test Plan: - [namedtensor ci] Differential Revision: D16647821 Pulled By: zou3519 fbshipit-source-id: 43b261f3456b6bf5fca7b6313e659b259a2ba66d	2019-08-09 09:17:43 -07:00
Richard Zou	11cff2981b	Add `names` argument to ones, rand, randn, zeros, full (#23743 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23743 In the short term, we implement this by having overloads for each of these functions. In the long term, the plan is to move DimnameList to TensorOptions so that we do not have to duplicate work. Test Plan: - [namedtensor ci] Differential Revision: D16647820 Pulled By: zou3519 fbshipit-source-id: c6c53c5f26a86b730cbc4d4eb69907ac0e08fc65	2019-08-09 09:17:39 -07:00
Richard Zou	5fbe824398	Improve test_namedtensor.py with named tensor equality check (#23801 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23801 Test Plan - Code reading. assertTensorDataAndNamesEqual isn't used in this commit but it'll be used in future commits. - [namedtensor ci] gh-metadata: pytorch pytorch 23801 gh/zou3519/90/head Test Plan: Imported from OSS Differential Revision: D16667816 Pulled By: zou3519 fbshipit-source-id: 66519cd5d17bda4c4304a1bc6e2a03ae59d49e39	2019-08-09 09:17:35 -07:00
Richard Zou	78f3b883f0	Add name propagation for at::alias, add tensor.set_names (#23624 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23624 tensor.set_names(names) is the out-of-place variant of tensor.set_names_(names). This naming is probably confusing so I am taking any and all suggestions. Test Plan: - run tests [namedtensor ci] gh-metadata: pytorch pytorch 23624 gh/zou3519/86/head Differential Revision: D16621830 Pulled By: zou3519 fbshipit-source-id: f8a3837d3a370b41210e938369348dcbb4aee53a	2019-08-09 09:17:31 -07:00
Pavel Belevich	be4e6aff12	Make all at::Tensor in-place methods const (#23945 ) Summary: https://github.com/pytorch/pytorch/issues/23901 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23945 Differential Revision: D16731478 Pulled By: pbelevich fbshipit-source-id: 076d7aea12995e3d5fb26bc917291e71c2b7ecd4	2019-08-09 09:17:27 -07:00
Hong Xu	513c4291c5	Suppress implicit-fallthrough warning on g++ >= 7 in caffe2/utils/math_cpu.cc (#24053 ) Summary: These implicit fallthroughs lead to the following warning on g++ 7, because g++ could not recognize the implicit `abort` call in `LOG(FATAL)`. We suppress by adding explicit `return`s. /home/hong/wsrc/pytorch/caffe2/utils/math_cpu.cc: In function void caffe2::math::GemmEx(CBLAS_TRANSPOSE, CBLAS_TRANSPOSE, int , int, int, T, const T, int, const T, int, T, T, int, Context) [with T = float; Context = caffe2::CPUContext; Engine = caf fe2::DefaultEngine]: /home/hong/wsrc/pytorch/c10/util/logging_is_not_google_glog.h:98:10: warning: this statement may fall through [-Wimplicit-fall through=] ::c10::MessageLogger((char)__FILE__, __LINE__, n).stream() ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/hong/wsrc/pytorch/caffe2/utils/math_cpu.cc:179:11: note: in expansion of macro LOG LOG(FATAL) << "Unexpected CBLAS_TRANSPOSE for trans_B"; ^ /home/hong/wsrc/pytorch/caffe2/utils/math_cpu.cc:182:5: note: here case CblasTrans: { ^~~~ In file included from /home/hong/wsrc/pytorch/c10/util/Logging.h:28:0, from /home/hong/wsrc/pytorch/caffe2/core/logging.h:2, from /home/hong/wsrc/pytorch/caffe2/core/types.h:9, from /home/hong/wsrc/pytorch/caffe2/utils/math.h:17, from /home/hong/wsrc/pytorch/caffe2/utils/math_cpu.cc:14: /home/hong/wsrc/pytorch/c10/util/logging_is_not_google_glog.h:98:10: warning: this statement may fall through [-Wimplicit-fall through=] ::c10::MessageLogger((char)__FILE__, __LINE__, n).stream() ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/hong/wsrc/pytorch/caffe2/utils/math_cpu.cc:202:11: note: in expansion of macro LOG LOG(FATAL) << "Unexpected CBLAS_TRANSPOSE for trans_B"; ^ /home/hong/wsrc/pytorch/caffe2/utils/math_cpu.cc:205:5: note: here default: ^~~~~~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/24053 Differential Revision: D16732530 Pulled By: ezyang fbshipit-source-id: 90373879f25b52efca5bf151c7ed58d6ad19d925	2019-08-09 09:17:23 -07:00
wswday	87508f401c	Delete unnecessary file split_types.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23754 Differential Revision: D16732834 Pulled By: ezyang fbshipit-source-id: 087c573ecde8cd05dd7a28f47939a257e1cc25f3	2019-08-09 09:04:19 -07:00
Hong Xu	994f643d9a	Do not force USE_SYSTEM_EIGEN_INSTALL to be OFF in Python build scripts (#23990 ) Summary: Not sure whether 34c0043aaee971a0539c8c3c49c4839f67ae001d still makes sense. `USE_SYSTEM_EIGEN_INSTALL` is OFF by default (as set in CMakeLists.txt). If a user wants to change this build option, I don't see any reason to force them to do it in `CMakeCache.txt`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23990 Differential Revision: D16732569 Pulled By: ezyang fbshipit-source-id: 4604b4a1d5857552ad02e76aee91641aea48801a	2019-08-09 08:33:48 -07:00
Edward Yang	21ea0a115c	Revert D16627924: [pytorch][PR] Port addcdiv operator from the TH code to Aten Differential Revision: D16627924 Original commit changeset: 960856d30fd3 fbshipit-source-id: a375a3ede5ef956a07fb55c7b4a5d4fc34c96ddb	2019-08-09 08:33:44 -07:00
Edward Yang	ce79d5135a	Revert D16634539: Enabling inline in quantized relu Differential Revision: D16634539 Original commit changeset: 84266f92049c fbshipit-source-id: 5e1d8e3560483600a61c2ac62b13e9c3fede8301	2019-08-09 08:33:39 -07:00
Hong Xu	2e8557778b	Refactor randperm test (#23526 ) Summary: CPU and CUDA testing code are largely the same. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23526 Reviewed By: ezyang Differential Revision: D16586271 Pulled By: VitalyFedyunin fbshipit-source-id: 91c70c05789120fde4718ce955de243087a8c993	2019-08-09 08:33:35 -07:00
ShahriarSS	8659131aa6	Add instruction on how to nest nn::Sequential (#23939 ) Summary: yf225 I mentioned the MNASNet implementaiton in the message. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23939 Differential Revision: D16716619 Pulled By: yf225 fbshipit-source-id: a92e4e7a588decce4c5a515370238eb284ae6118	2019-08-09 08:27:17 -07:00
BowenBao	02023d7dba	canonicalize_ops pass bugfix: copy metadata for new output (#23809 ) Summary: Without metadata(datatype) for the new output, exporter won't be able to perform implicit scalar datatype casting. This PR covers a large portion of this common issue seen in many exported models, e.g. https://github.com/pytorch/pytorch/issues/23724 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23809 Reviewed By: ezyang Differential Revision: D16707640 Pulled By: bddppq fbshipit-source-id: 3de985c6b580b9c9ebaec08085c7443bd8d9c7f8	2019-08-09 08:27:13 -07:00
Hong Xu	61db8b64ec	Build option USE_NUMA should only show up on Linux. (#23673 ) Summary: (intentionally left blank) Pull Request resolved: https://github.com/pytorch/pytorch/pull/23673 Differential Revision: D16627453 Pulled By: vincentqb fbshipit-source-id: df62f1b26901bec6369b5589b98124165f40e6f1	2019-08-09 08:17:52 -07:00
Stefan Krah	478c793065	Remove numpy assert that fails on Windows (older numpy versions). (#24012 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24001. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24012 Differential Revision: D16732191 Pulled By: ezyang fbshipit-source-id: 36660a6635ab64d2f63278b1616deb1282dea037	2019-08-09 07:55:02 -07:00
Igor Fedan	fb77f14054	Port addcdiv operator from the TH code to Aten (#23683 ) Summary: https://github.com/pytorch/pytorch/issues/22796 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23683 Differential Revision: D16627924 Pulled By: ifedan fbshipit-source-id: 960856d30fd3f79394925eddd0152cc5e27b39b3	2019-08-09 07:44:57 -07:00
Igor Fedan	9114089d70	port atan2 from TH to ATen (#23558 ) Summary: https://github.com/pytorch/pytorch/issues/22799 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23558 Differential Revision: D16591638 Pulled By: ifedan fbshipit-source-id: d12d4c8229337a22a3278f0c7a8bbc9a86d4c9b7	2019-08-09 07:44:53 -07:00
Zafar Takhirov	9558ccdd76	Enabling inline in quantized relu Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23704 Test Plan: Imported from OSS Differential Revision: D16634539 Pulled By: zafartahirov fbshipit-source-id: 84266f92049ce4410ec25821b8d4699a9e3f123e	2019-08-09 02:37:12 -07:00
Michael Suo	3d23c04a1c	serialize all c++ frontend modules to a single CU. (#23645 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23645 Previously, every module would get its own CompilationUnit when saving from the C++ frontend. That's bad because nothing checks that they have uniquely qualified names or mangles them to make them unique. This was okay when we were doing model.json, but once we serialize modules like classes this will cause an error on import (when we try to re-define the same class a bunch of times. Test Plan: Imported from OSS Differential Revision: D16597709 Pulled By: suo fbshipit-source-id: 0412efd5acfcac26d03f6ed5b5a7dfc023163bc3	2019-08-09 00:52:07 -07:00
Michael Suo	61d0624803	[jit[ make sure NameTuples have unique qualified names Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23798 Test Plan: Imported from OSS Differential Revision: D16652818 Pulled By: suo fbshipit-source-id: c824f26427105ed5f0c553a67ab61c69a1f89655	2019-08-09 00:52:02 -07:00
Wanchao Liang	3613a30345	Move dict_test.cpp to test folder and fix dict_test.cpp for Aten includes (#24071 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24071 Pull Request resolved: https://github.com/pytorch/pytorch/pull/24071 Test Plan: Imported from OSS Differential Revision: D16728574 Pulled By: wanchaol fbshipit-source-id: 6952b9703a40dc35f567bf17fbdcef6e0c6c2d6e	2019-08-08 22:41:16 -07:00
Yan Zhu	e327df3965	SumOp for int32 (#23995 ) Summary: as title, the op can be used to update Length blob values in cuda. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23995 Reviewed By: xianjiec Differential Revision: D16684065 fbshipit-source-id: da562334c8b61a5e54c3aa78156ce5caff619e60	2019-08-08 21:37:43 -07:00
Jongsoo Park	431d6e2189	minor comment fix (#22140 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22140 As title Reviewed By: protonu Differential Revision: D15966759 fbshipit-source-id: 15dbf9de60cced29055aeaac3b71c1ff41cfe1d4	2019-08-08 21:08:47 -07:00
Mingzhe Li	29e2b58b00	Back out "[op-bench][experiment] increase predefined_minimum_secs to reduce variation" (#24065 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24065 Original commit changeset: d4c034f64b1d Reviewed By: hl475 Differential Revision: D16726647 fbshipit-source-id: 6cd6cfdad804efb073062809bcbc4c0921a3d007	2019-08-08 18:36:22 -07:00
Hong Xu	e80b48390d	When matching a line in CMakeCache.txt, ensure A=B and "A"=B are matched (#23745 ) Summary: Currently when reading CMakeCache.txt, only `VAR:TYPE=VAL` can be matched. This works well for CMake-generated lines, but a user may add a line without specifying type (`VAR=VAL`), which is totally legitimate in the eyes of CMake. This improvements in regex ensure that `VAR:TYPE=VAL` is also matched. The situation of `"VAR":TYPE=VAL` is also corrected. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23745 Differential Revision: D16726514 Pulled By: ezyang fbshipit-source-id: 6c50150d58926563837cf77d156c24d644666ef0	2019-08-08 18:07:28 -07:00
Michael Suo	03a40b2bc0	print clang tidy output to stderr (#24052 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24052 This will make things show up in the azure pipelines output Test Plan: Imported from OSS Differential Revision: D16723846 Pulled By: suo fbshipit-source-id: d78cbf476be74ccfb28d6e1b21d66b6641d36e26	2019-08-08 17:42:24 -07:00
svcscm	48c6e5c05a	Updating submodules Reviewed By: yns88 fbshipit-source-id: b69a630eae0260c33a1cd3581015a084a83aa649	2019-08-08 17:02:49 -07:00
Elias Ellison	7d8dfd6f76	make _overloads importable in nn/functional (#24049 ) Summary: Move `_overload` to `_jit_internal.py` so that it can be imported in nn/functional.py for `conv` Pull Request resolved: https://github.com/pytorch/pytorch/pull/24049 Differential Revision: D16723339 Pulled By: eellison fbshipit-source-id: 527e6069dbfa81f8133c405be5350a8c76873a12	2019-08-08 16:57:50 -07:00
Wanchao Liang	6dc555cbe6	support tensor as key type in script Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23638 Differential Revision: D16683557 fbshipit-source-id: 6443acc6772d58cd9082a10e1c2b095d85c9a23e	2019-08-08 16:48:12 -07:00
davidriazati	bdf15311a3	Migration doc fixes (#24033 ) Summary: This time I built the docs to make sure everything looks right ](https://our.intern.facebook.com/intern/diff/16719435/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/24033 Pulled By: driazati Differential Revision: D16719435 fbshipit-source-id: 290c6431e7577ef9fbd595d9ac206df867366937	2019-08-08 16:32:45 -07:00
Edward Yang	32efb43129	Don't add local version to Conda packages. (#24014 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24014 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D16714081 Pulled By: ezyang fbshipit-source-id: d346fbe8a54d5c182f81d2b908b1cdf191e3d822	2019-08-08 13:26:46 -07:00
ShahriarSS	4ccb707161	Removing deprecated warning message from torch.h (#24002 ) Summary: discussed [here](https://github.com/pytorch/vision/issues/1173) Pull Request resolved: https://github.com/pytorch/pytorch/pull/24002 Differential Revision: D16710635 Pulled By: yf225 fbshipit-source-id: 95117dd601061691d4cfd0d644777825aeaeaf8c	2019-08-08 12:49:48 -07:00
Iurii Zdebskyi	5b9f55f33f	Enable Add, sub, mul, and div on CPU for bfloat16 type. (#22851 ) Summary: Enable Add, sub, mul, and div on CPU for bfloat16 type. Tested via unit tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22851 Differential Revision: D16256757 Pulled By: izdeby fbshipit-source-id: 8b62f7581fc0ca0d2cff48ab40d877a9fcf70a5b	2019-08-08 12:34:25 -07:00
Igor Fedan	341d5934b7	Move addcmul to Aten(CUDA) (#23814 ) Summary: https://github.com/pytorch/pytorch/issues/22797 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23814 Differential Revision: D16712381 Pulled By: ifedan fbshipit-source-id: aeca4fdb9b10143932f195900b1f424ef6d26c89	2019-08-08 12:34:21 -07:00
James Reed	3ad940742e	save()/load() tests and fixes Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23911 Test Plan: Imported from OSS Differential Revision: D16698044 Pulled By: jamesr66a fbshipit-source-id: 88881ea183331aa6e4c8fa042d11cf2b14e0fc4c	2019-08-08 12:06:22 -07:00
James Reed	a35d2902ef	jit.script() testing and fixes (#23891 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23891 This adds an initial set of testing coverage for quantization that checks if the modules can be scripted. Testing for tracing and serialization is forthcoming Test Plan: Imported from OSS Differential Revision: D16698045 Pulled By: jamesr66a fbshipit-source-id: 96d80d938b816220af72359165a7b96d998a30c9	2019-08-08 12:06:18 -07:00
Elias Ellison	7d207363bf	Fix master - (#24003 ) Summary: I accidentally removed this in a merge, breaking a test. Fix for master Pull Request resolved: https://github.com/pytorch/pytorch/pull/24003 Differential Revision: D16707108 Pulled By: eellison fbshipit-source-id: 8b59f46e7932b88a7ae246a261c4daf17f23995f	2019-08-08 00:00:53 -07:00
Shen Li	02d3c302d8	Fix build failure on OSX (#23998 ) Summary: https://github.com/pytorch/pytorch/pull/23228 caused build failure on OSX, because rpc.h is included as long as USE_DISTRIBUTED=1, but rpc/init.cpp (and others) is only included when NOT APPLE. So, it cannot find python_functions defined in init.cpp on MacOS. This PR attempt to fix it by wrapping rpc.h with USE_C10D, which is only set when NOT APPLE. I tried this fix locally and it works. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23998 Differential Revision: D16706087 Pulled By: mrshenli fbshipit-source-id: d04fe6717a181a3198289cdef51439708c2e291d	2019-08-07 22:05:41 -07:00
Yanghan Wang	ad64789a1e	add aligned option to RoIAlign Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23706 Reviewed By: ppwwyyxx Differential Revision: D16615823 fbshipit-source-id: fd9152af8bc979cb04044413e66af349b032a99d	2019-08-07 21:22:33 -07:00
Shali Jiang	15d3f0242b	support Gather different indices for different examples in one batch (#23813 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23813 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23285 for example: Inputs: data: [[[2 4 2 0], [0 1 2 0], [1 1 0 0]], [[3 4 1 3], [0 3 2 2], [4 1 0 4]]] idx: [[0 2], [0 1]] outputs: [[[2 4 2 0], [1 1 0 0]], [[3 4 1 3], [0 3 2 2]]] data and idx must have the same outer dimension call Gather or BatchGather with argument match_outer=True Reviewed By: huayuli00 Differential Revision: D16652485 fbshipit-source-id: 9e144e97a8d6fceaf3b5714df1534338068f4a10	2019-08-07 21:14:30 -07:00
Elias Ellison	451fc51d8d	add support for overloading functions (#23886 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23886 This is a series of PRs that will allow us to support adding [padding to conv](https://github.com/pytorch/pytorch/pull/22484) and also reduce the friction of adding method overloads that was brought up in https://github.com/pytorch/pytorch/pull/23266. Support for overloaded functions following the specification in [PEP 484](https://www.python.org/dev/peps/pep-0484/#function-method-overloading). The usage is: ``` torch.jit.overload def add(x: int, y: int) -> int: ... torch.jit.overload def add(x: float, y: float) -> float: ... def add: return x + y ``` Follow up PRs: - Add same API for methods - A couple of cleanups for functions: - don't require default params specified on the overload as well - potentially error if invocation could be matched to multiple overloads. now it just chooses the first one, mypy does the same thing currently Test Plan: Imported from OSS Differential Revision: D16694863 Pulled By: eellison fbshipit-source-id: f94f2933bc1c97fa58f31846acfe962b0630068c	2019-08-07 19:18:19 -07:00
Elias Ellison	9ecc33d6f2	metacompile isinstance checks (#23885 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23885 This is a series of PRs that will allow us to support adding [padding to conv](https://github.com/pytorch/pytorch/pull/22484) and also reduce the friction of adding method overloads that was brought up in https://github.com/pytorch/pytorch/pull/23266. This PR only compiles one if branch if the condition is an isinstance check. This is consistent with what mypy does; it does not report errors if a branch can be determined statically to be unreachable. ``` def foo(x): # type: (int) -> int if isinstance(x, str): return x["1"] return x + 1 reveal_type(foo) # no error, shows int -> int ``` Test Plan: Imported from OSS Differential Revision: D16697092 Pulled By: eellison fbshipit-source-id: d3eb4925cd16d551515ac6ff620a69897dbec130	2019-08-07 19:18:15 -07:00
Elias Ellison	33a1c30cb1	cleanup torch/nn/functional.py (#23977 ) Summary: Cleanup torch/nn/functional now that JIT: - Handles multiple returns - Typechecks exits (exceptions) - assertions refine types Pull Request resolved: https://github.com/pytorch/pytorch/pull/23977 Differential Revision: D16697750 Pulled By: eellison fbshipit-source-id: 1f777d6b9ead1105de50120fffd46d523e1e6797	2019-08-07 16:31:36 -07:00
Brian Johnson	b8b86de89b	Adds torch.random to docs/toc (#23553 ) Summary: Fix for https://github.com/pytorch/pytorch.github.io/issues/162 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23553 Differential Revision: D16700003 Pulled By: soumith fbshipit-source-id: 0d988985fee9aeadd01f9caba24987f960ce2470	2019-08-07 16:31:32 -07:00
Hong Xu	1a9334ea59	Hotpatch CXXFLAGS to be the same as CFLAGS if CXXFLAGS is not set. (#23568 ) Summary: This fixes build regression caused by https://github.com/pytorch/pytorch/issues/23528 because we used to let CXXFLAGS equal CFLAGS. cc suo Pull Request resolved: https://github.com/pytorch/pytorch/pull/23568 Differential Revision: D16568820 Pulled By: suo fbshipit-source-id: 64a0dc923c08ac1751224f42bc4ccdc707341762	2019-08-07 16:25:57 -07:00
Wanchao Liang	c74216d396	add NotIn support in script Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23637 Test Plan: Imported from OSS Differential Revision: D16683558 Pulled By: wanchaol fbshipit-source-id: 27d79850d76506255ba954601fae751e07ad7cd1	2019-08-07 16:07:21 -07:00
Geoffrey Goh	e23e4cc356	Back out "Revert D16469619: Add Virtual Memory and CPU percentage computation to AIBench" Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23821 Reviewed By: hl475 Differential Revision: D16654854 fbshipit-source-id: f057023e890cbcbd9145ef2ecb449df2fbba592b	2019-08-07 15:44:22 -07:00
Elias Ellison	e90adf59a0	Make assertions refine types (#23949 ) Summary: Make assertions like `x is not None` refine the type of x. This is easy to do now that typing understands [exits](https://github.com/pytorch/pytorch/pull/23565). Pull Request resolved: https://github.com/pytorch/pytorch/pull/23949 Differential Revision: D16692772 Pulled By: eellison fbshipit-source-id: 540f28e65a784c72c7c555e0aed0765d5035bc37	2019-08-07 13:06:52 -07:00
Hugo	0f5d071d52	Add python_requires to help pip (#23863 ) Summary: `python_requires` helps the installer choose the correct version of this package for the user's running Python. This is especially necessary when dropping Python 2 (https://github.com/pytorch/pytorch/issues/23795) but is useful now too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23863 Differential Revision: D16692908 Pulled By: soumith fbshipit-source-id: 3c9ba2eb1d1cf12763d6284daa4f18f605abb373	2019-08-07 12:47:53 -07:00
davidriazati	9d1acd6dc2	Disable optimizer for `__setstate__` (#23698 ) Summary: Before calling `__setstate__` when loading a module, we need to disable the optimizer since the module's type does not match the values on the stack (all the tensors will be `UndefinedTensor`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/23698 Pulled By: driazati Differential Revision: D16690935 fbshipit-source-id: 71e2238fd25cd16271af478ef21a3cf4e514a462	2019-08-07 12:37:24 -07:00
Hong Xu	323aad6b20	No need to handle the dependency of INSTALL_TEST on BUILD_TEST in cmake.py (#23806 ) Summary: Simplifying https://github.com/pytorch/pytorch/issues/23793: The dependency relationship between {INSTALL,BUILD}_TEST is already properly handled in CMakeLists.txt. All we need to do is to pass down INSTALL_TEST. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23806 Differential Revision: D16691833 Pulled By: soumith fbshipit-source-id: 7607492b2d82db3f79b174373a92e2810a854a61	2019-08-07 11:34:31 -07:00
James Reed	5df0cf3fb4	clang-format aten/src/ATen/native/quantized (#23898 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23898 These files were not following the clang-format style and as a result, some files (such as TensorFactories.cpp) were extremely hard to read and modify. Test Plan: Imported from OSS Differential Revision: D16684724 Pulled By: jamesr66a fbshipit-source-id: 0600c6dddc778481af5bef798e77072fb7e988aa	2019-08-07 11:04:25 -07:00
Vishwak Srinivasan	5411d1a27b	Fix docstring for argmax (#23775 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/23757 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23775 Differential Revision: D16644692 Pulled By: soumith fbshipit-source-id: d759bb85f73383021e4657325dbac79913042ad2	2019-08-07 09:42:19 -07:00
Ojas Ahuja	10b1254edd	fix crash on torch.Tensor.repeat() for 0 repeats (#23766 ) Summary: This PR fixes https://github.com/pytorch/pytorch/issues/23603 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23766 Differential Revision: D16644866 Pulled By: soumith fbshipit-source-id: ee7d368afdfe874133d0bd90f4d03a191ee22b13	2019-08-07 09:16:00 -07:00
SsnL	ed19580dc4	Fix dataloader._shutdown_workers if not all workers are started (#23761 ) Summary: Otherwise you may see errors like ``` Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x000001F99F5CB9D8> Traceback (most recent call last): File "C:\Users\Divyansh J\Anaconda3\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 883, in __del__ self._shutdown_workers() File "C:\Users\Divyansh J\Anaconda3\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 860, in _shutdown_workers if self.workers_status[worker_id]: IndexError: list index out of range ``` e.g. https://discuss.pytorch.org/t/how-to-construct-dataset-with-iterator-for-multi-process-dataloader/49612/5 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23761 Differential Revision: D16644687 Pulled By: soumith fbshipit-source-id: a60e847431264525079456ff422317af1ac2be4b	2019-08-07 09:06:11 -07:00
Elias Ellison	ed4ee093cb	Make typing understand exceptions (#23565 ) Summary: When we're emitting an if node, if one branch exits allow variables in the other branch to escape scope. This is using the same machinery that already exists for early returns so there are minimal changes to the compiler. Most of the changes are in the exit_transform pass so we don't create terrible graphs when exceptions exist. In a follow up PR i will add a writeup of the transform pass to docs since this should be the last change made to it for a while. This will allow assertions to refine Optional types, as well as allow JIT to understand things like: ``` def foo(x): if x == 1: raise Exception() else: a = 1 return a ``` If you look in nn/functional.py, like 3/4 of the TODOs are this issue. One note is that if a function always throws, I accepted whatever the annotation for the return type is if it exists and otherwise set it to None. This is consistent with what mypy does. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23565 Differential Revision: D16679572 Pulled By: eellison fbshipit-source-id: e58c9e9ddaeb13144c803d90e2beae253c851f7f	2019-08-07 09:06:07 -07:00
Jianyu Huang	2635b6262e	Remove K and N function arguments for fbgemm_pack_quantized_matrix (#22956 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22956 As Title says: remove the extra function arguments for better engineering. Differential Revision: D16297724 fbshipit-source-id: a31be17708d13508c4ce9a3ce7eb5238e8d17984	2019-08-07 08:50:13 -07:00
Kexuan Sun	8e9f9b424f	Replace descriptions of args in doc with template (#23439 ) Summary: Many descriptions of arguments could be replaced by items in the template such as `factory_common_args`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23439 Differential Revision: D16688527 Pulled By: ezyang fbshipit-source-id: 406ce45d72e297f46b5fa9ea5472b3284c8d4324	2019-08-07 08:50:09 -07:00
Edward Yang	a1d945b295	Roll master to 1.3.0 (#23895 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23895 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D16688489 Pulled By: ezyang fbshipit-source-id: a56d0180a0bc57775badd9e31ea3d441d5fd4f88	2019-08-07 08:44:32 -07:00
Johannes M Dieterich	fc36842554	Improve hip-clang support in build_amd.py (#23835 ) Summary: Use the supported way to differentiate and automatically switch between hip-clang and hcc hipification in build_amd.py. Cleaned up from PR https://github.com/pytorch/pytorch/issues/23699 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23835 Differential Revision: D16659661 Pulled By: vincentqb fbshipit-source-id: 05a4250ceb28beda7a7bf73a46c5dc46f6e852bc	2019-08-07 07:49:07 -07:00
Yaxun (Sam) Liu	13a684d50b	Fix test TestCuda.test_streams_multi_gpu_query (#23912 ) Summary: This is a similar issue as TestCuda.test_events_wait. PyTorch test sets a policy() method to assertLeaksNoCudaTensors. Whenever a test is run, assertLeaksNoCudaTensors is called, which in turn calls CudaMemoryLeakCheck, which in turn calls initialize_cuda_context_rng, where it executes torch.randn on each device, where a kernel is launched on each device. Since the kernel may not finish on device 0, the first assertion self.assertTrue(s0.query()) fails. The fix is to insert torch.cuda.synchronize(d0) torch.cuda.synchronize(d1) at the beginning of the test so that previously launched kernels finish before the real test begins. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23912 Differential Revision: D16688599 Pulled By: ezyang fbshipit-source-id: 3de2b555e99f5bbd05727835b9d7c93a026a0519	2019-08-07 07:44:30 -07:00
Gregory Chanan	fc82ec298b	Update CosineAnnealingWarmRestarts to follow PyTorch 1.1+ Step Order. (#23833 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/23480. I only verified that the schedule reaches the restart at the expected step as specified in the issue, it would be good to have someone else verify correctness here. Script: ``` scheduler = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(torch.optim.SGD([torch.randn(1, requires_grad=True)], lr=0.5), T_0=1, T_mult=2) for i in range(9): print(i) print(scheduler.get_lr()) scheduler.step() ``` Output: ``` 0 [0.5] 1 [0.5] 2 [0.25] 3 [0.5] 4 [0.42677669529663687] 5 [0.25] 6 [0.07322330470336313] 7 [0.5] 8 [0.4809698831278217] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/23833 Differential Revision: D16657251 Pulled By: gchanan fbshipit-source-id: 713973cb7cbfc85dc333641cbe9feaf917718eb9	2019-08-07 07:15:50 -07:00
Jianyu Huang	78cc9b92a5	Change fbgemm_linear_{int8,fp16}_weight to fbgemm_linear_{int8,fp16}_weight_fp32_activation (#22955 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22955 Following the comment in https://github.com/pytorch/pytorch/pull/22891, change the fbgemm wrapper function name to indicate whether it is dynamic quantization or static quantization. Differential Revision: D16297512 fbshipit-source-id: 498678e2af27070628be11a6d724ce17c2a3cde5	2019-08-06 23:19:26 -07:00
davidriazati	002d4f9f7d	Erase shape information from class types (#23362 ) Summary: ](https://our.intern.facebook.com/intern/diff/16681944/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/23362 Pulled By: driazati Differential Revision: D16681944 fbshipit-source-id: dba46b6fc3223a2f94dc502531df438f3212d8fb	2019-08-06 22:30:25 -07:00
davidriazati	b0a27278bd	Recursive script migration guide Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23892 Pulled By: driazati Differential Revision: D16677532 fbshipit-source-id: 40f506b1c770e60309c0628d4745047996a05295	2019-08-06 21:43:28 -07:00
Shen Li	8b349073ce	sync and async torch.distributed.rpc for builtin operators (#23228 ) Summary: Features: * sync and async RPC for builtin operators * RpcAgent API * ProcessGroupAgent implementation Goal: * have a minimum working and testable RPC implementation * make sure the RpcAgent API is sufficient for future ThriftAgent and TensorPipeAgent implementation * For tensor pipe implementation, it might allocate multiple underlying communication channels with different types, and might also use streaming serialization/deserialization for large tensors. To support this requirement, the current implementation only convert a BuiltinOp into a Message which contains a byte vector and a tensor table. It is up to the RpcAgent implementation to determine how it would like to serialize a Message object. * For ThriftAgent, as Thrift has it own request/response matching solution, the Message.id is no longer necessary. Hence the id can be dropped during serialization. All it needs to do is to pass the response Message object to the Future returned by send(...). * support blocking and non-blocking RequestCallback * blocking means the callback won't return before sending out the response * non-blocking can be achieved by enqueue the `(from, request, RpcAgent&)` tuple and use a different thread to process them. That is why there is an `RpcAgent&` arg in the param list. We are not exporting this diff until we finalize distributed autograd design and publish the API review publicly. https://fb.quip.com/FabTAZKVgQpf Pull Request resolved: https://github.com/pytorch/pytorch/pull/23228 ghstack-source-id: 87816717 Reviewed By: zhaojuanmao Differential Revision: D15194693 fbshipit-source-id: 7adb600796613cde6073db6c227451b89940ecaf	2019-08-06 16:03:01 -07:00
Hao Lu	c07fc96b94	Set caffe2_tvm_min_ops to 8 (#23893 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23893 Set `caffe2_tvm_min_ops` to 8 for production and tests. Reviewed By: yinghai Differential Revision: D16659420 fbshipit-source-id: ef33b37e2a5128e502a6b8df306914a409f13c2d	2019-08-06 14:48:45 -07:00
svcscm	ddc25efc80	Updating submodules Reviewed By: yns88 fbshipit-source-id: 627d1403a7caf833f63c93dc976e83f10d384925	2019-08-06 12:37:23 -07:00
Edward Yang	68318404f4	Rename cpu-only to cpuonly, as dash features are not supported. (#23879 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23879 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D16670379 Pulled By: ezyang fbshipit-source-id: c498f8362760bdf8526c59043db3276f99e3ccc1	2019-08-06 12:32:16 -07:00
James Reed	40f0b1c844	Enable OSS quantization tests (#23858 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23858 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23718 Changes: - Enable tests for quantization test files in `run_tests.py` - Remove `__future__` imports from `torch/nn/qat/modules/__init__.py`, since `unicode_literals` messes up imports on python2 because the elements in `__all__` will be Unicode and not string - Skip PostTrainingQuantTests if the build doesn't have FBGEMM (only a small subset of targets in tests) or if testing under UBSAN (the suppression file doesn't seem to work) Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D16639467 Pulled By: jamesr66a fbshipit-source-id: 532766797c216976dd7e07d751f768ff8e0fc207	2019-08-06 11:20:30 -07:00
James Reed	6ba60ec9b0	Add flag to temporarily disable MKL-DNN conv (#23837 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23837 This is a temporary workaround to an issue in MKL-DNN's Convolution backwards implementation: https://github.com/pytorch/pytorch/issues/23825 It is only used to enable testing quantization Test Plan: Imported from OSS Differential Revision: D16659081 Pulled By: jamesr66a fbshipit-source-id: de18ebe98dec2a042f28b23373e20da2b44a42a2	2019-08-06 11:20:26 -07:00
Amy Yang	9588cd921e	weight_names bug fix (#23848 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23848 Problem: In experiment running feed model 127607201 (/mnt/public/tracelog/feed_repro2/127607201_0.predictor), encountered blob dimensionality mismatch error when running onnxified net. This is due to the model initializing input blobs in current workspace with blob size 0, and onnxifi() falsely identified those input blobs as weight blobs and assigned wrong dimension. Solution: Add option to pass correct weight blob names to onnxifi() instead of using all blobs in current workspace. Reviewed By: yinghai Differential Revision: D16661396 fbshipit-source-id: cabe44db6b64e6538bef4b65e380312214b3ba9f	2019-08-06 10:58:43 -07:00
Elias Ellison	d413f2d335	format init.cpp (#23840 ) Summary: formatting in advance of pr that touches this file bc there is a lot of formatting noise :'( Pull Request resolved: https://github.com/pytorch/pytorch/pull/23840 Differential Revision: D16659311 Pulled By: eellison fbshipit-source-id: 7dedaccf9b9c455f97efdcce1c58515eb155d261	2019-08-06 10:38:30 -07:00
svcscm	43c4bcba1d	Updating submodules Reviewed By: yns88 fbshipit-source-id: 71bb684dc1f35dfc82c52a049092e63f449468b1	2019-08-06 10:09:55 -07:00
Hugo van Kemenade	52be1448e8	Docs: Delete placeholder to use top-level file (#23869 ) Summary: Replaces and closes https://github.com/pytorch/pytorch/issues/23864. When opening a pull request, GitHub shows you this: ![image](https://user-images.githubusercontent.com/1324225/62534181-30142880-b851-11e9-9b39-32d0ed6ff26c.png) Or this: ![image](https://user-images.githubusercontent.com/1324225/62534569-24753180-b852-11e9-8242-8905ddda1f6f.png) However, that links to https://github.com/pytorch/pytorch/blob/master/.github/CONTRIBUTING.md which looks like: ![image](https://user-images.githubusercontent.com/1324225/62534607-3656d480-b852-11e9-8c8c-37f54e8ca774.png) As the commit message shows, that was a placeholder. There's already a real `CONTRIBUTING.md` document, so delete the placeholder. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23869 Differential Revision: D16667966 Pulled By: ezyang fbshipit-source-id: c4135ebbb75de803ef227e4608e16da1a2e83a0c	2019-08-06 08:14:10 -07:00
Andrey Malevich	d58059bc6f	Fix SliceGradientOp to handle properly empty batches (#23784 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23784 Backward path does nothing during the gradient path when the input as empty, as a result workspace can preserve gradient values from previous iteration and get inconsistent inputs for some of the backward pass operators. This diff should fix this disrepancy by always reinitializing output during the backward path. Reviewed By: dzhulgakov Differential Revision: D16646096 fbshipit-source-id: 8ca68dfad17a63fc87c033cce7b36b40bd77245c	2019-08-06 02:43:32 -07:00
Edward Yang	c002ede107	Delete Travis CI config (#23788 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23788 We be using Azure Pipelines now, matey! Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D16648527 Pulled By: ezyang fbshipit-source-id: d05326c4971fd392868f2a70aa0a9be9c7280f86	2019-08-05 20:04:22 -07:00
James Reed	489cc46686	Define toIValue conversion for dtype (#23708 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23708 Resolves https://github.com/pytorch/pytorch/issues/23631 We always treat dtypes as number types, and we have the conversion logic of dtype->int64_t present in toSugaredValue. So if a dtype appears in a statement being compiled, it's properly converted to its long ScalarType equivalent. However, this logic was missing in `toIValue`, thus making taking dtypes as attributes broken Test Plan: Imported from OSS Differential Revision: D16617222 Pulled By: jamesr66a fbshipit-source-id: 4b10e5795f9c142c2fd9fa1b5d60f6374f5218e0	2019-08-05 19:32:42 -07:00
Edward Yang	e8cf9b686b	Rename previously THNN conv kernels to have naive_ prefix. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23790 Test Plan: Imported from OSS Differential Revision: D16650364 Pulled By: ezyang fbshipit-source-id: 31d72107915cf03e19a746f31ee45fdb2b056101	2019-08-05 18:59:46 -07:00
Edward Yang	60a4ef3074	Remove nightly suffix from nightlies; upload to pytorch-nightly. (#23752 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23752 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D16657471 Pulled By: ezyang fbshipit-source-id: 4d8fcde1d10d4b078c76c643adb6d4a4fc1259c6	2019-08-05 18:49:42 -07:00
Michael Suo	e2f5bc5c08	Properly mangle `nn.Module.__construct` (#23779 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23779 Mangling is two underscores, not one :(. We want this method to be private so that inheritors who define a `__construct` do not interfere with Module initialization Test Plan: Imported from OSS Differential Revision: D16645156 Pulled By: suo fbshipit-source-id: b9060cb35bfaa0391ff200b63fb78b1ac15fee39	2019-08-05 17:58:34 -07:00
Michael Suo	8fc349f7be	fix some compiler warnings Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23816 Test Plan: Imported from OSS Differential Revision: D16654126 Pulled By: suo fbshipit-source-id: addf3d24df514a17a521f8584cd5e142c8a3aec4	2019-08-05 17:52:56 -07:00
Ailing Zhang	51d59a43ba	fix torch.frac documentation (#23830 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/13968 . Following the math formula in wiki: https://en.wikipedia.org/wiki/Fractional_part Pull Request resolved: https://github.com/pytorch/pytorch/pull/23830 Differential Revision: D16656871 Pulled By: ailzhang fbshipit-source-id: a71467870cf9566e0c7b1a045f72607dada81e1f	2019-08-05 17:43:17 -07:00
Wanchao Liang	8fb0d198e9	make nn.LSTM accept PackedSequence instead of Tuples Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23643 Differential Revision: D16615531 fbshipit-source-id: af508838cac21d271d3470f0f16fd75473a6e68d	2019-08-05 17:16:18 -07:00
Hong Xu	a15845555c	Negate halves on GPU using __hneg() when possible, instead of using float conversion. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23626 Test Plan: Imported from OSS Differential Revision: D16656730 Pulled By: ezyang fbshipit-source-id: 7e1f4e334f484a3ed4392949ff7679cefd67a74e	2019-08-05 16:21:38 -07:00
Hong Xu	f90afff3bd	Recommend `~` and `bitwise_not()` when user tries to apply neg (`-`) on a bool tensor. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23621 Test Plan: Imported from OSS Differential Revision: D16656729 Pulled By: ezyang fbshipit-source-id: d107e8caa2ccfa6ff8a1bd8a31b4d79f142d68fb	2019-08-05 16:21:34 -07:00
neginraoof	f278aee731	Std opset export (#22310 ) Summary: Added export for std (standard deviation) op, plus onnxruntime, caffe2 and expect tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22310 Differential Revision: D16109889 Pulled By: bddppq fbshipit-source-id: 067b2d385d463877bb99f673a18da4e5ea823426	2019-08-05 15:55:42 -07:00
Edward Yang	a710f81639	Add CUDA 10.1 to CI. (#23791 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23791 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D16657447 Pulled By: ezyang fbshipit-source-id: a4a5f5abef72146a52a76cfab629f8c105949bb3	2019-08-05 15:55:39 -07:00
Daniel Galvez	0015b188be	Fix typos Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23770 Differential Revision: D16646852 Pulled By: ezyang fbshipit-source-id: 826b041c0b528ae6e0b320d49d8141057c1f9bf3	2019-08-05 15:38:32 -07:00
Edward Yang	44ba092e5b	Remove unnecessary fetch and reset on builder checkout. (#23792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23792 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D16648539 Pulled By: ezyang fbshipit-source-id: f713fca6d428c03ed31aad18464c92265fb81420	2019-08-05 15:32:59 -07:00
Edward Yang	4050de5b58	Revert D16627326: [pytorch][PR] [ROCm] Improve hip-clang support in build_amd.py Differential Revision: D16627326 Original commit changeset: 977003174395 fbshipit-source-id: d26959c85d74ce8b81341a31c9ddb2260bf18c9b	2019-08-05 15:04:47 -07:00
Farhad Ramezanghorbani	ab15d38497	Adam implementation minor fix (#23737 ) Summary: This PR is in accordance with https://github.com/pytorch/pytorch/issues/22628 I had submitted the PR for `adam.py` and `adamw.py` but had forgotten about the `adam.cpp`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23737 Differential Revision: D16623828 Pulled By: vincentqb fbshipit-source-id: 4390fd751d1c0cd12f32214b4234d42a06dcbb20	2019-08-05 14:59:07 -07:00
Michael Suo	8e9fef61f4	Revert D15996322: Open up AliasAnalysisKind for any ops Differential Revision: D15996322 Original commit changeset: df27ed95397b fbshipit-source-id: 3327a3b56d8d1ea2cf0ea998f39ef254c47d5f3f	2019-08-05 14:54:27 -07:00
Yaxun (Sam) Liu	f0a581801a	Improve hip-clang support in build_amd.py (#23699 ) Summary: Use the supported way to differentiate and automatically switch between hip-clang and hcc hipification in build_amd.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23699 Differential Revision: D16627326 Pulled By: vincentqb fbshipit-source-id: 977003174395fb69cf0c96c89232bd6214780cd8	2019-08-05 13:39:28 -07:00
Sebastian Messmer	3ad9dbf9d5	Open up AliasAnalysisKind for any ops (#23810 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23810 A previous diff removed the special casing for aten:: and prim:: ops in alias analysis and implements alias analysis purely based on the AliasAnalysisKind. To be sure it doesn't break our existing code base, it added asserts that make sure that our existing aten:: and prim:: ops set the correct AliasAnalysisKind. However, we don't need that restriction for future ops. Since we are now certain all existing cases are set up correctly, we can remove these assertions. ghstack-source-id: 87733626 Differential Revision: D15996322 fbshipit-source-id: df27ed95397bbe58a76b6b2c2e9808fcfde35294	2019-08-05 13:18:12 -07:00
Hong Xu	1aa4afde80	Document bool tensors for bitwise_not. (#23800 ) Summary: Requested by vadimkantorov at https://github.com/pytorch/pytorch/pull/23621#issuecomment-517945167 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23800 Differential Revision: D16651008 Pulled By: gchanan fbshipit-source-id: 4ce21158bd5dd142edcd951e7ac941521b3d54af	2019-08-05 12:11:45 -07:00
Vitaly Fedyunin	6e4a83ab57	Channels last stored in tensor (#23391 ) Summary: Define 4D tensor as stored in channels last memory format, when dimensions order is NCHW and C-strides < W-strides < H-strides < N-strides (If size of any dimension is equal to 1, this dimension strides value is not taken into account). Channels last contiguous tensor is channel last tensor which occupies contiguous memory block. So x.is_contiguous(memory_format=torch.channels_last) checks if tensor is channels last contiguous. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23391 Differential Revision: D16601414 Pulled By: VitalyFedyunin fbshipit-source-id: 8d098e7eec2f00fb1d12261bc240b3645d4f5b73	2019-08-05 11:50:29 -07:00
Michael Suo	a3c165f9d2	Revert D16452539: support Gather different indices for different examples in one batch Differential Revision: D16452539 Original commit changeset: 7229489f4a9c fbshipit-source-id: 010c177e551cb81521d2af84ce951bf964cdab44	2019-08-05 10:22:01 -07:00
neginraoof	dfd8a08f51	frobenius_norm onnx export added Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23536 Differential Revision: D16566154 Pulled By: bddppq fbshipit-source-id: 6d076274d1d780e7d39d17ddb35ceabe55b394a3	2019-08-05 10:13:00 -07:00
Soumith Chintala	7d9e69e62e	allow INSTALL_TEST to pass through from env to cmake (#23793 ) Summary: This allows `INSTALL_*` to pass through to cmake. Additional fix is that if `INSTALL_TEST` is specified, it wont use `BUILD_TEST` as the default value for `INSTALL_TEST` Pull Request resolved: https://github.com/pytorch/pytorch/pull/23793 Differential Revision: D16648668 Pulled By: soumith fbshipit-source-id: 52c2a0d8033bc556355b87a6731a577940de9859	2019-08-05 09:55:14 -07:00
Daya Khudia	fb06c9e61f	qconv operator level benchmark (#22895 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22895 Adding op level benchmarking for qconv operator Reviewed By: mingzhe09088 Differential Revision: D16274273 fbshipit-source-id: 6674753e38f6692f5e6d0db0cac90c5fbf358147	2019-08-05 09:39:16 -07:00
Hong Xu	be7fe1ccb9	Add tests to ensure that both abs(0.0) and abs(-0.0) lead to 0.0 (#23701 ) Summary: As pointed out by colesbury in https://github.com/pytorch/pytorch/pull/23579#discussion_r309798987 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23701 Differential Revision: D16623781 Pulled By: mrshenli fbshipit-source-id: f48a29499128b08d2ac8bc9e466f2326112ead94	2019-08-05 07:50:06 -07:00
Iurii Zdebskyi	19c675178f	Updated docs and added deprecation warnings to acknowledge a bool tensor (#22261 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22261 ghimport-source-id: 1611d62d056a04c0ad15ef662e594a3d206a78e2 Test Plan: Imported from OSS Differential Revision: D16005990 Pulled By: izdeby fbshipit-source-id: 2413824aa75a0755719e4df11acd21e6607e5a85	2019-08-05 07:42:34 -07:00
Xiang Gao	520982d1df	Zero sized tensor support for repeat_interleave (#23717 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/22753 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23717 Differential Revision: D16623598 Pulled By: mrshenli fbshipit-source-id: 297a3274fb5a5b2fcc0c3ad601337d7eb29fdca2	2019-08-05 07:36:47 -07:00
Shali Jiang	f87a4cc23f	support Gather different indices for different examples in one batch (#23285 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23285 for example: Inputs: data: [[[2 4 2 0], [0 1 2 0], [1 1 0 0]], [[3 4 1 3], [0 3 2 2], [4 1 0 4]]] idx: [[0 2], [0 1]] outputs: [[[2 4 2 0], [1 1 0 0]], [[3 4 1 3], [0 3 2 2]]] data and idx must have the same outer dimension call Gather or BatchGather with argument match_outer=True Reviewed By: huayuli00 Differential Revision: D16452539 fbshipit-source-id: 7229489f4a9c02ee9f3c6a8a24bcd02925d96e07	2019-08-04 21:17:49 -07:00
Soumith Chintala	18d0873b7a	cpu binary builds are built with cu100 docker image now instead of cu80 (#23772 ) Summary: cpu binary builds are built with cu100 docker image now instead of cu80 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23772 Differential Revision: D16644224 Pulled By: soumith fbshipit-source-id: 5af09aba149c13fadbd4146172e7da038f2f4261	2019-08-04 18:42:52 -07:00
Soumith Chintala	6313d5e28b	add appropriate install_requires (#23722 ) Summary: This adds: - dependency on numpy if compiled with numpy support - dependency on future if python <= 2.7 Fixes https://github.com/pytorch/pytorch/issues/23670 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23722 Differential Revision: D16643824 Pulled By: soumith fbshipit-source-id: 5cf4d79cd188678cb2328c4286eabd52a2a86fcd	2019-08-04 17:24:19 -07:00
Michael Suo	1b1bddaab3	Revert D16469619: Add Virtual Memory and CPU percentage computation to AIBench Differential Revision: D16469619 Original commit changeset: 670f3549c830 fbshipit-source-id: f55d4cda36f5e29df2df306d33a70158e5a7908b	2019-08-04 16:06:51 -07:00
Michael Suo	cbf05305c0	don't try to set training after ScriptModule has been initialized. (#23680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23680 Now when initializing a ScriptModule during the torch.jit.load() process, there is already a cpp module backing the thing. That means that setting training will overwrite whatever the initialized ScriptModule had. This PR splits apart the common "set up internal state" part of the Module __init__ and calls that from ScriptModule.__init__ and Module.__init__, leaving the "nn.Module-specific" part (setting `self.training`) for the nn.Module __init__ Test Plan: Imported from OSS Differential Revision: D16606959 Pulled By: suo fbshipit-source-id: f7ea6b36551ff4e4472b7685f65731d5cfab87fd	2019-08-04 15:04:55 -07:00
Daya Khudia	31137738de	Support for non-zero zero_points for weight and activation (#23541 ) Summary: We can now have any valid zero points for weight and activation for conv2d kernel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23541 Test Plan: buck test mode/dev caffe2/test:quantized -- 'test_qconv\ $test_quantized.TestQuantizedConv$' --print-passing-details ``` Running 1 tests Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/3377699723897843 ✓ caffe2/test:quantized - test_qconv (test_quantized.TestQuantizedConv) 68.528 1/1 (passed) Test output: > test_qconv (test_quantized.TestQuantizedConv) ... ok > > ---------------------------------------------------------------------- > Ran 1 test in 68.529s > > OK Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/3377699723897843 Summary (total time 74.97s): PASS: 1 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 0 ``` Differential Revision: D16556515 Pulled By: dskhudia fbshipit-source-id: 6e2ee9ddc58f9dc8a3f8b25918bb7955f0655073	2019-08-04 11:05:25 -07:00
Geoffrey Goh	445440a6a9	Add Virtual Memory and CPU percentage computation to AIBench (#23590 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23590 This diff adds CPU% and Virtual Memory computation by default to AIBench when doing mobile remote run Reviewed By: llyfacebook Differential Revision: D16469619 fbshipit-source-id: 670f3549c830a36bc456a57f2ea668f9f82dd15a	2019-08-04 09:29:44 -07:00
Haixin Liu	7f130c8494	Expose the quantized inputs and output of dynamic quantized int8 FC operator for debugging (#23566 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23566 Currently if we use dynamic quantization we don't have the access to the internally quantized inputs and output for debugging. To make the debugging easier, this diff adds a debug feature to expose the quantized X, W and Y for debugging if debug outputs are attached to the operator and caffe2_dnnlowp_force_slow_path flag is set. The quantized inputs and output are exposed as the extra outputs. The example Int8FC op with debug outputs appended looks like: ``` op { input: "X" input: "W" input: "b" output: "Y" output: "X_q" output: "W_q" output: "Y_q" name: "" type: "Int8FC" arg { name: "axis" i: 1 } ... } ``` Next need to expose the quantization parameters. Reviewed By: jspark1105 Differential Revision: D16566753 fbshipit-source-id: acd855a172ee7993ddba8808f2af81b628ff9c02	2019-08-02 21:23:43 -07:00
Owen Anderson	5faecc8b1f	Perform string uniquing by value in pickle serialization. (#23741 ) Summary: On my testcase, this reduces the uncompressed size of TorchScript debug info from 281KB to 76KB. With zip compression enabled, this saves about 2.5KB of final size. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23741 Differential Revision: D16624128 fbshipit-source-id: ce45659d6b20d40608ace05639b69b93696b00d9	2019-08-02 21:12:38 -07:00
vishwakftw	8e2b9de860	Document empty_strided (#23735 ) Summary: Changelog: - Add doc string for torch.empty_strided - Remove empty file named `python` in test/ Fixes https://github.com/pytorch/pytorch/issues/23688 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23735 Differential Revision: D16623438 Pulled By: ailzhang fbshipit-source-id: acd5a47da9220243467ccc6bff92edd209cca709	2019-08-02 20:02:44 -07:00
Horace He	f81db8afb8	Initial torchbind prototype (#21098 ) Summary: I have some test code in there as well, along with a script "test_libtorch" to run it. You'll need to modify `test_libtorch` to point to where you have `pytorch` built. I currently require that `pybind11` is included as a subdirectory of the test, but added it to the `.gitignore` to make this reviewable. Currently, something like this works: ```cpp struct Foo { int x, y; Foo(): x(2), y(5){} Foo(int x_, int y_) : x(x_), y(y_) {} void display() { cout<<"x: "<<x<<' '<<"y: "<<y<<endl; } int64_t add(int64_t z) { return (x+y)*z; } }; static auto test = torch::jit::class_<Foo>("Foo") .def(torch::jit::init<int64_t, int64_t>()) .def("display", &Foo::display) .def("add", &Foo::add) .def("combine", &Foo::combine); ``` with ```py torch.jit.script def f(x): val = torch._C.Foo(5, 3) val.display() print(val.add(3)) ``` results in ``` x: 5 y: 3 24 ``` Current issues: - [x] The python class created by torchscript doesn't interactly properly with the surrounding code. ``` torch.jit.script def f(x): val = torch._C.Foo(5, 3) return val ``` - [x] Doesn't properly take in non-pointer classes. Can't define this function signature in cpp (We don't want to support this I believe). ```cpp void combine(Foo x) { ``` - [x] Has some issues with memory for blobs when constructing multiple objects (fix constant propagation pass to not treat capsules as the same object). ```py torch.jit.script def f(x): val = torch._C.Foo(5, 3) val2 = torch._C.Foo(100, 0) val.display() print(val.add(3)) ``` - [ ] Can't define multiple constructors (need to define overload string. Currently not possible since we don't support overloaded methods). - [x] `init` is a little bit different syntax than `pybind`. `.init<...>()` instead of `.def(py::init<>())` - [x] I couldn't figure out how to add some files into the build so they'd be copied to the `include/` directories, so I symlinked them manually. - [ ] Currently, the conversion from Python into Torchscript doesn't work. - [ ] Torchbind also currently requires Python/Pybind dependency. Fixing this would probably involve some kind of macro to bind into Python when possible. - [ ] We pass back into Python by value, currently. There's no way of passing by reference. - [x] Currently can only register one method with the same type signature. This is because we create a `static auto opRegistry`, and the function is templated on the type signature. Somewhat blocked on https://github.com/pytorch/pytorch/pull/21177. We currently use some structures that will be refactored by his PR (namely `return_type_to_ivalue` and `ivalue_to_arg_type`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21098 Differential Revision: D16634872 Pulled By: Chillee fbshipit-source-id: 1408bb89ea649c27d560df59e2cf9920467fe1de	2019-08-02 18:45:15 -07:00
neginraoof	4e6e11c139	added opset10 ORT tests (#22993 ) Summary: Added a number of opset10 tests from Caffe2 to ORT Pull Request resolved: https://github.com/pytorch/pytorch/pull/22993 Differential Revision: D16467954 Pulled By: bddppq fbshipit-source-id: 0b92694c7c0213bdf8e77e6f8e07e6bc8a85170a	2019-08-02 17:34:48 -07:00
Edward Yang	97917fd26d	Partially revert "Remove all conda 3.5 nightly configs, remove libtorch smoketests (#21380 )" (#23747 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23747 This reverts commit 6a3ebdbbc529da79125423839bf18f527a706ab8 "Remove all conda 3.5 nightly configs" but not the smoketest removal. Test Plan: Imported from OSS Differential Revision: D16632992 Pulled By: ezyang fbshipit-source-id: 5c6dcf1510b84359a1760cfa216edea610563ad5	2019-08-02 16:24:29 -07:00
Le Fang	a1b10270c2	Fix the bug in regularizer matching (#23485 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23485 In previous diff D16326492, the "regularizer" in dot processor is defined according to input regularizer options through the function "get_emb_weighting_reg" in processor_utils.py. The option matching is only valid in local test, but doesn't work in workflows. This bug causes the regularizer not added in actual models and has made previous trimmed lasso implementation useless. An evidence is that before D16326492, a flow f126010621 has elastic regularizer added: https://our.intern.facebook.com/intern/chronos/jobinstance/?jobinstanceid=5375243255&smc=chronos_gp_admin_client {F171862755} while after D16326492, the regularizer is gone in flow f127262007 https://our.intern.facebook.com/intern/chronos/jobinstance/?jobinstanceid=5428982684&smc=chronos_gp_admin_client {F171862770} Differential Revision: D16535466 fbshipit-source-id: 6b0b5e95b2b14a0d6c6d65f96bab89529f4e79c5	2019-08-02 15:54:48 -07:00
Mingzhe Li	29881c7f02	Fix LSTM int8 quantization model size issue (#23577 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23577 This diff is fixing a model size issue introduced in #23291. After that PR, the model size after in8 quantization is the same as that of the original unquantized model. The reason is that we save original weight for int8 quantization even when that's not needed anymore. This diff fixes that by only saving original weight for fp16 quantization path. Reviewed By: llyfacebook Differential Revision: D16557619 fbshipit-source-id: f924ae8d155a0d525b86a7440b3c7147d5bead0a	2019-08-02 13:38:30 -07:00
Tongzhou Wang	3107f1dcd5	fix align_corners doc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23707 Differential Revision: D16617565 Pulled By: ezyang fbshipit-source-id: 9ae581e9233d8c2b92f35b9486af1dab30ce8e3a	2019-08-02 12:43:35 -07:00
Owen Anderson	d9ec37adc4	Compress all non-Tensor components of a serialized TorchScript model. (#23723 ) Summary: This saves about 69KB off the FaceBlaze model, bringing the total size down from 388KB to 319KB. See https://github.com/pytorch/pytorch/issues/23582 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23723 Differential Revision: D16623693 fbshipit-source-id: 66267f87635c502c804293054fd5716d291389c0	2019-08-02 12:39:20 -07:00
Jiexian Li	302adf1d20	add LambdaRank DCG Loss Option (#23679 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23679 Full Canary: https://fburl.com/fblearner/sa1pkpya Add LambdaRank DCG Loss Option * when use_idcg_normalization == true, regular LambdaRank with NDCG loss * when use_idcg_normalization == false, gradient and loss functions are not normalized by idcg. Differential Revision: D16605459 fbshipit-source-id: a16f071e69516974e48d27bef4ca179019ca4ae7	2019-08-02 11:47:46 -07:00
Jiexian Li	fc6aec9491	format only change (#23685 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23685 format only changes. Differential Revision: D16607482 fbshipit-source-id: 572afb59c6ff9f8a8842ba044fed6c87f8506843	2019-08-02 11:47:42 -07:00
Richard Zou	57fc793650	Add names to repr for named tensors Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23316 Test Plan: - [namedtensor ci] gh-metadata: pytorch pytorch 23316 gh/zou3519/80/head Imported from OSS Differential Revision: D16494415 Pulled By: zou3519 fbshipit-source-id: e483f57bdb0610d0eadbe70d673e20dc3d3f9502	2019-08-02 11:37:29 -07:00
Richard Zou	8e466b7e21	Add torch._C._BUILD_NAMEDTENSOR() (#23623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23623 This is a quick, not-user-facing check for if pytorch was built with BUILD_NAMEDTENSOR=1. Test Plan: - run tests [namedtensor ci] gh-metadata: pytorch pytorch 23623 gh/zou3519/85/head Differential Revision: D16621829 Pulled By: zou3519 fbshipit-source-id: d7e1161dc176bab2c1f953265722daeba1e63102	2019-08-02 11:37:25 -07:00
davidriazati	995920ae2c	Fix frontend error message Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23576 Pulled By: driazati Differential Revision: D16611640 fbshipit-source-id: 4a6937e779dc43b3f043aca33e66d2b84376501c	2019-08-02 11:37:21 -07:00
mal	692825db86	Tests for C++ custom autograd function API (#23628 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23628 More tests for autograd::Fuction based on python tests from test_autograd.py Test Plan: Imported from OSS Differential Revision: D16600992 fbshipit-source-id: 0cb8bfbcff315111dc4936e837ff859d0a1e251d	2019-08-02 11:37:17 -07:00
Peter Yeh	8df83ce559	Bump Gloo (#23400 ) Summary: Feature includes - Log message if bind(2) fail - Make collective work with single process context - Use hipStreamCreateWithFlags instead of hipStreamCreateWithPriority - Add RCCl support Pull Request resolved: https://github.com/pytorch/pytorch/pull/23400 Differential Revision: D16623110 Pulled By: bddppq fbshipit-source-id: e75cd8d2e2cad551ad0b0a08667320d7036b78bd	2019-08-02 11:26:28 -07:00
Bowen Bao	638d0b3705	Support ONNX export Multinomial (#23581 ) Summary: cc bddppq spandantiwari Pull Request resolved: https://github.com/pytorch/pytorch/pull/23581 Differential Revision: D16584853 Pulled By: bddppq fbshipit-source-id: 01c066e86a0ad071361cd67b8c3925bfb6b84a4a	2019-08-02 11:06:21 -07:00
Hong Xu	87131a9bae	Fix unused imports in torch/onnx/symbolic_opset8.py (#23678 ) Summary: Which causes lint errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23678 Differential Revision: D16622458 Pulled By: mrshenli fbshipit-source-id: 145ad30dfb452dd556573c1b3d4cdd9cd7852752	2019-08-02 10:55:16 -07:00
Mingzhe Li	5cb41d35da	increase predefined_minimum_secs to reduce variation (#23734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23734 In the latest run on AI-PEP, there are 6 tests out of 342 which has more than 7% variation. Around 20 tests which has variations between 4% to 7%. The rest are within 4%. This diff tries to further reduce the variation to 4% for all tests. Each test has to run predefined_minimum_secs seconds before exiting. Increasing that value makes all tests run longer. Based on the experimental results, we will see what's the right value to use. Reviewed By: hl475 Differential Revision: D16622361 fbshipit-source-id: d4c034f64b1d64e1cffd67ffbced7d8cd4449d69	2019-08-02 10:33:48 -07:00
Jerry Zhang	89956374c3	Remove qconfig_dict from API (#23465 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23465 We decided not to allow user to use qconfig_dict to do quantization since that API is not robust. Differential Revision: D16611504 fbshipit-source-id: b0d1d311b32c990a165c480f50e9ce3d68b785b5	2019-08-02 10:28:48 -07:00
Jerry Zhang	645b981d95	QAT modules take qconfig as argument and keep qconfig as memeber (#23609 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23609 Add qconfig to QAT modules to accommodate the convert logic Differential Revision: D16584654 fbshipit-source-id: 2d7da652eb6eea43056030952c533314da41550d	2019-08-02 10:20:06 -07:00
Shen Li	725d6cd8ce	Extract common classes and functions from test_c10d to common_distributed (#23660 ) Summary: MultiProcessTestCase will be useful for both c10d and rpc tests. So, this diff extracts that class and some common decorators to a separate file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23660 Reviewed By: pietern Differential Revision: D16602865 Pulled By: mrshenli fbshipit-source-id: 85ad47dfb8ba187b7debeb3edeea5df08ef690c7	2019-08-02 09:19:32 -07:00
Hong Xu	b2f6e2bdc1	Migrate neg's CUDA implementation to ATen. (#23617 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23617 Doesn't seem to cause any performance regression. Performance difference in the benchmarks is negligible. Benchmark script: ```python import timeit for n, t in [(10, 100000), (1000, 10000)]: print('a.neg() (a.numel() == {}) for {} times'.format(n, t)) for device in ('cpu', 'cuda'): for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64', 'torch.float', 'torch.double') + (('torch.half',) if device == 'cuda' else ()): print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a.neg()\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.ones({n}, device="{device}", dtype={dtype})', number=t)) ``` Before: ``` a.neg() (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 2.5537249100016197 device: cpu, dtype: torch.uint8, 100000 times 2.512518662999355 device: cpu, dtype: torch.int16, 100000 times 2.548207502000878 device: cpu, dtype: torch.int32, 100000 times 2.5974994509997487 device: cpu, dtype: torch.int64, 100000 times 2.6533011499996064 device: cpu, dtype: torch.float, 100000 times 2.6474813019995054 device: cpu, dtype: torch.double, 100000 times 2.6949866009999823 device: cuda, dtype: torch.int8, 100000 times 5.820120684998983 device: cuda, dtype: torch.uint8, 100000 times 5.732108927997615 device: cuda, dtype: torch.int16, 100000 times 5.791249125999457 device: cuda, dtype: torch.int32, 100000 times 5.816761754998879 device: cuda, dtype: torch.int64, 100000 times 5.935873205999087 device: cuda, dtype: torch.float, 100000 times 6.276509613999224 device: cuda, dtype: torch.double, 100000 times 6.122782447000645 device: cuda, dtype: torch.half, 100000 times 6.161522764999972 a.neg() (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.3766637519984215 device: cpu, dtype: torch.uint8, 10000 times 0.37288786600038293 device: cpu, dtype: torch.int16, 10000 times 0.3485262310023245 device: cpu, dtype: torch.int32, 10000 times 0.41810554200128536 device: cpu, dtype: torch.int64, 10000 times 0.5609612200023548 device: cpu, dtype: torch.float, 10000 times 0.39054008099992643 device: cpu, dtype: torch.double, 10000 times 0.4946578170020075 device: cuda, dtype: torch.int8, 10000 times 0.5843639539998549 device: cuda, dtype: torch.uint8, 10000 times 0.5780841570012853 device: cuda, dtype: torch.int16, 10000 times 0.5819949180004187 device: cuda, dtype: torch.int32, 10000 times 0.5827294059999986 device: cuda, dtype: torch.int64, 10000 times 0.5861426519986708 device: cuda, dtype: torch.float, 10000 times 0.5929420489992481 device: cuda, dtype: torch.double, 10000 times 0.594638443999429 device: cuda, dtype: torch.half, 10000 times 0.5903799709994928 ``` After: ``` a.neg() (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 2.4983287129980454 device: cpu, dtype: torch.uint8, 100000 times 2.479393904999597 device: cpu, dtype: torch.int16, 100000 times 2.5382055320005747 device: cpu, dtype: torch.int32, 100000 times 2.5587980189993687 device: cpu, dtype: torch.int64, 100000 times 2.637738788002025 device: cpu, dtype: torch.float, 100000 times 2.602799075997609 device: cpu, dtype: torch.double, 100000 times 2.6648931070012623 device: cuda, dtype: torch.int8, 100000 times 5.793338211999071 device: cuda, dtype: torch.uint8, 100000 times 5.782462584000314 device: cuda, dtype: torch.int16, 100000 times 5.824340334998851 device: cuda, dtype: torch.int32, 100000 times 5.851659068001027 device: cuda, dtype: torch.int64, 100000 times 5.8898071570001775 device: cuda, dtype: torch.float, 100000 times 5.913144636000652 device: cuda, dtype: torch.double, 100000 times 5.963339805999567 device: cuda, dtype: torch.half, 100000 times 5.87889370099947 a.neg() (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.37244726499920944 device: cpu, dtype: torch.uint8, 10000 times 0.36641623199830065 device: cpu, dtype: torch.int16, 10000 times 0.3449854829996184 device: cpu, dtype: torch.int32, 10000 times 0.4127863069988962 device: cpu, dtype: torch.int64, 10000 times 0.5551902160004829 device: cpu, dtype: torch.float, 10000 times 0.38593814199703047 device: cpu, dtype: torch.double, 10000 times 0.48877579500185675 device: cuda, dtype: torch.int8, 10000 times 0.5862828740027908 device: cuda, dtype: torch.uint8, 10000 times 0.5836667540024791 device: cuda, dtype: torch.int16, 10000 times 0.5918155769977602 device: cuda, dtype: torch.int32, 10000 times 0.5961457039993547 device: cuda, dtype: torch.int64, 10000 times 0.5963898690024507 device: cuda, dtype: torch.float, 10000 times 0.5985483309996198 device: cuda, dtype: torch.double, 10000 times 0.6027148480025062 device: cuda, dtype: torch.half, 10000 times 0.5961164370019105 ``` Test Plan: Imported from OSS Differential Revision: D16617574 Pulled By: ezyang fbshipit-source-id: c90aa410f6385ce94fe6b84ebeceffa5effd0267	2019-08-02 02:52:51 -07:00
Dmytro Dzhulgakov	acc5cedf6a	Adjust maintainers list (#23693 ) Summary: Adds new people and reorders sections to make more sense Pull Request resolved: https://github.com/pytorch/pytorch/pull/23693 Differential Revision: D16618230 Pulled By: dzhulgakov fbshipit-source-id: 74191b50c6603309a9e6d14960b7c666eec6abdd	2019-08-01 22:59:02 -07:00
Owen Anderson	d1e0a3dd15	Compress debug symbols when serializing TorchScript models. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23659 Differential Revision: D16603775 fbshipit-source-id: f2912048bdee36b3bcaa779e801c61bfbb5f30e5	2019-08-01 22:30:27 -07:00
Nikolay Korovaiko	3d15ee1b34	Remove more uses of `DimensionedTensorType` Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23060 Differential Revision: D16460391 Pulled By: Krovatkin fbshipit-source-id: b50ee87d22ad18b8cbfff719b199ea876ef172f1	2019-08-01 21:19:28 -07:00
James Reed	3314d60a75	fix conv2d Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23690 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D16610734 Pulled By: jamesr66a fbshipit-source-id: e190174f11d1810e6f87e2df256543028e9154ef	2019-08-01 19:39:08 -07:00
Hao Lu	df8638b0ed	Support Copy Op Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23705 Reviewed By: yinghai Differential Revision: D16354204 fbshipit-source-id: 158b0ee556606c117e52bee875d3dc89cc944b5a	2019-08-01 19:27:26 -07:00
Wanchao Liang	9d2cc2c987	Support nn.GRU in script Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23266 Test Plan: Imported from OSS Differential Revision: D16466586 Pulled By: wanchaol fbshipit-source-id: 0f5b8013167bb7b246bd7e28d87a4a9e9c3b34d5	2019-08-01 17:19:26 -07:00
Mikhail Zolotukhin	b22c88b8eb	Reduce input sets for tests to speed them up. (#23692 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23692 Before tests took ~40s to finish, with this change it's ~2s. Test Plan: Imported from OSS Differential Revision: D16611479 Pulled By: ZolotukhinM fbshipit-source-id: 391235483029d2ab860fcc4597ce84f4964025f1	2019-08-01 17:06:31 -07:00
svcscm	c91f209130	Updating submodules Reviewed By: zpao fbshipit-source-id: ff6387055e7fa2cde88bd870081a05c3adbf56ef	2019-08-01 17:01:23 -07:00
Tongzhou Wang	0539462ca2	Fix pin_memory_thread not exiting quickly (#23646 ) Summary: fixes https://github.com/pytorch/pytorch/issues/23642 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23646 Differential Revision: D16600874 Pulled By: soumith fbshipit-source-id: 50f0828d774a558d6f21e9dd21135906bd5be128	2019-08-01 15:24:14 -07:00
Vitaly Fedyunin	3b5daef6de	Move addcmul to Aten (#22874 ) Summary: Move CPU implementation of the `addcmul` operator to Aten ( https://github.com/pytorch/pytorch/issues/22797 ) ### before ```python In [11]: timeit x.addcmul(a, b) 1.31 ms ± 18.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` ### after ```python In [9]: timeit x.addcmul(a, b) 588 µs ± 22.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` Adding custom code for the case when `value == 1`, doesn't provide significant performance gain. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22874 Differential Revision: D16359348 Pulled By: VitalyFedyunin fbshipit-source-id: 941ead835672fca78a1fcc762da052e64308b111	2019-08-01 12:40:48 -07:00
Soumith Chintala	dded794eeb	add setup metadata to help PyPI flesh out content on pypi package page (#22085 ) Summary: add setup metadata to help PyPI flesh out content on pypi package page. Apparently this might help flesh out the "Used By" feature according to driazati Pull Request resolved: https://github.com/pytorch/pytorch/pull/22085 Differential Revision: D16604703 Pulled By: soumith fbshipit-source-id: ddb4f7ba7c24fdf718260aed28cc7bc9afb46de9	2019-08-01 12:15:56 -07:00
Bram Wasti	ff3dd72469	Add in-place check to AliasDb Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23210 Test Plan: Imported from OSS Differential Revision: D16444529 Pulled By: bwasti fbshipit-source-id: 83af54d423989a2a726158b521093660584ee9c2	2019-08-01 12:15:52 -07:00
Tongzhou Wang	336c9be7f4	Slightly improve dataloader docs on when auto-batching is disabled (#23671 ) Summary: cc gchanan Pull Request resolved: https://github.com/pytorch/pytorch/pull/23671 Differential Revision: D16604387 Pulled By: soumith fbshipit-source-id: 0ebc120bcaa0f6fa09158b1d0459a72ab11a53d6	2019-08-01 12:10:17 -07:00
Rui Zhu	7ac41b1cfd	Remove useless code from shape info Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23663 Reviewed By: yinghai Differential Revision: D16592163 fbshipit-source-id: de1482305abef45f7ef0e3e57b0c93cd2acac450	2019-08-01 11:47:01 -07:00
Farhad Ramezanghorbani	fed5ca192c	Adam/AdamW implementation minor fix (#22628 ) Summary: I have noticed a small discrepancy between theory and the implementation of AdamW and in general Adam. The epsilon in the denominator of the following Adam update should not be scaled by the bias correction [(Algorithm 2, L9-12)](https://arxiv.org/pdf/1711.05101.pdf). Only the running average of the gradient (_m_) and squared gradients (_v_) should be scaled by their corresponding bias corrections. ![adam_update](https://user-images.githubusercontent.com/13050245/60894105-11117f00-a230-11e9-9ba0-adad2ae2e0ae.png) In the current implementation, the epsilon is scaled by the square root of `bias_correction2`. I have plotted this ratio as a function of step given `beta2 = 0.999` and `eps = 1e-8`. In the early steps of optimization, this ratio slightly deviates from theory (denoted by the horizontal red line). ![plot](https://user-images.githubusercontent.com/13050245/60893952-cabc2000-a22f-11e9-8dc2-6353ad5d674d.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/22628 Differential Revision: D16589914 Pulled By: vincentqb fbshipit-source-id: 8791eb338236faea9457c0845ccfdba700e5f1e7	2019-08-01 11:42:04 -07:00
Jerry Zhang	6cf9ed4a54	ConvBn2d/ConvBnReLU2d (#23357 ) Summary: Added _intrinsic.qat.ConvBn2d/_intrinsic.qat.ConvBnReLU2d. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23357 ghstack-source-id: 87519573 Differential Revision: D16295500 fbshipit-source-id: 81e6d1d10d05bf6e343721fc5701d3d6bd7e07e6	2019-08-01 10:07:00 -07:00
Elias Ellison	029c8e7754	allow forward hooks in tracing (#23613 ) Summary: As far as I could tell forward hooks work out of the box, so allow them in the tracing. We don't have any way of supporting backward hooks though. Fixes https://github.com/pytorch/pytorch/issues/20862 and fixes https://github.com/pytorch/pytorch/issues/17571 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23613 Differential Revision: D16601437 Pulled By: eellison fbshipit-source-id: ecf5dc6201ca08b3b9afdb9fcdb0fda8741133a9	2019-08-01 09:51:19 -07:00
Edward Yang	2342b7485e	Omit local version identifier for default configuration. (#23654 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23654 Default configuration at time of writing is CUDA 10 (but with 10.1 coming soon) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D16601097 Pulled By: ezyang fbshipit-source-id: c8368355ce1521c01b0ab2a14b1cd0287f554e66	2019-08-01 08:54:56 -07:00
Edward Yang	8ab99a28d9	Fix CPU-only binary testing by properly installing cpu-only first. (#23611 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23611 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D16601098 Pulled By: ezyang fbshipit-source-id: febb5a822854b91d5b3d942e6bf71b4ae9f1f15c	2019-08-01 08:54:52 -07:00
Iurii Zdebskyi	865c7eea48	Changed tensor comparison return type from uint8 to bool (#21113 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21113 ghimport-source-id: 9c4ba63457a72bfc41894387e0b01be3fd9a9baf Test Plan: Imported from OSS Differential Revision: D15552204 Pulled By: izdeby fbshipit-source-id: a608213668649d058e22b510d7755cb99e7d0037	2019-08-01 07:54:53 -07:00
Hong Xu	388dc4f2a6	Let user be able to change MKLDNN "-m" flags back and forth in subsequent builds (#23608 ) Summary: Currently once user has set `USE_NATIVE_ARCH` to OFF, they will never be able to turn it on for MKLDNN again by simply changing `USE_NATIVE_ARCH`. This commit fixes this issue. Following up 09ba4df031ed51e05724bb490d4d6fc52b3b1ac6 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23608 Differential Revision: D16599600 Pulled By: ezyang fbshipit-source-id: 88bbec1b1504b5deba63e56f78632937d003a1f6	2019-08-01 06:05:36 -07:00
Sebastian Messmer	02f794b102	Add overload names to native_functions.yaml (#23532 ) Summary: We need this to be able to register them with the c10 dispatcher. The overload names are based on one-letter-per-argument-type. Script used to change native_functions.yaml and derivatives.yaml: P75630718 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23532 ghstack-source-id: 87539687 Differential Revision: D16553437 fbshipit-source-id: a1d0f10c42d284eba07e2a40641f71baa4f82ecf	2019-08-01 02:08:37 -07:00
mal	ec13f18390	Allow empty Variables to be saved for backwards (#23618 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23618 For example: `save_for_backward({Variable(), x, Variable()})` should be allowed, so that this is consistent with the python API behaviour. Test Plan: Added a test similar to the python test `test_save_none_for_backward` from test_autograd.py. Differential Revision: D16589402 fbshipit-source-id: 847544ad8fc10772954d8629ad5a62bfdc1a66c1	2019-07-31 19:51:35 -07:00
Ailing Zhang	0a12ff7c5b	Use dst dir for temp file (#23629 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/23607 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23629 Differential Revision: D16594223 Pulled By: soumith fbshipit-source-id: db0275415111f08fc13ab6be00b76737a20f92df	2019-07-31 19:04:03 -07:00
Michael Suo	0ce950de05	prefix module qualified names with __module__ (#23630 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23630 This is temporary, won't be needed with the new serialization format. But for now, since the main module gets its name from the archive name, we need this for safety, other wise something like `torch.jit.save("torch.pt") will break things. Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D16592404 Pulled By: suo fbshipit-source-id: b538dc3438a80ea7bca14d84591ecd63f4b1289f	2019-07-31 18:30:13 -07:00
Bram Wasti	230f7f9bbc	Include protobuf-defined outputs in the graph cutting algorithm (#23557 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23557 As title states this enables any tensors defined by the user to be outputs, including activations Reviewed By: yinghai Differential Revision: D16362993 fbshipit-source-id: b7dc8412c88c46fcf67a3b3953dc4e4c2db8c4aa	2019-07-31 16:15:59 -07:00
Tongzhou Wang	9467e80097	fix typo Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23615 Differential Revision: D16590899 Pulled By: zou3519 fbshipit-source-id: 4f07eda93fd618605c3bb6dfe4c11b2d1d2dec0d	2019-07-31 16:06:14 -07:00
Michael Suo	88b96ba951	Update relative links in OVERVIEW.md Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23627 Test Plan: Imported from OSS Differential Revision: D16590415 Pulled By: suo fbshipit-source-id: 9f4fabd77b80f08f96f4bc969b43aa8ff3d4ac96	2019-07-31 15:45:04 -07:00
Nikolay Korovaiko	3b6aa9ade6	Add logging to Alias Analysis Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23383 Differential Revision: D16573661 Pulled By: Krovatkin fbshipit-source-id: c199656805b474b3c1b3ba09b4e236aec84617f4	2019-07-31 13:31:36 -07:00
Thomas Viehmann	2e40857dad	Fix CTC loss for zero-length targets on GPU (#23298 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/18215 at last! Also sprinkle tests... Pull Request resolved: https://github.com/pytorch/pytorch/pull/23298 Differential Revision: D16582145 Pulled By: soumith fbshipit-source-id: bc8b1a629de0c2606e70a2218ccd135f4a9cdc5d	2019-07-31 12:03:45 -07:00
Richard Zou	08f7f27c6a	Fix named tensor build by enabling tensor.is_pinned and removing support for clone() (#23597 ) Summary: `is_pinned` was moved to native_functions.yaml, disabling it for named tensors. This PR re-enables its usage for named tensors. I wrote a named inference rule for torch.clone(), but something happened to it. Disable it for now so we can get the namedtensor ci to be green. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23597 Test Plan: - run tests [namedtensor ci] Differential Revision: D16581771 Pulled By: zou3519 fbshipit-source-id: 498018cdc55e269bec80634b8c0a63ba5c72914b	2019-07-31 11:48:40 -07:00
mal	3fa2df7c9a	Support custom autograd functions in C++ (#23572 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23572 ### (The stack from #23020 was moved into this PR) Adding API for custom autograd operations, with user defined forward and backward, [like in python](https://pytorch.org/docs/stable/notes/extending.html#extending-torch-autograd). The custom operation should be a subclass of Function, with static forward and backward functions. `forward()` can accept any arguments similar to the Python API and `backward()` should accept a variable list as an argument. Both `forward()` and `backward() `accept a AutogradContext* which can be used to share data between them. Variables can be saved in the context using `save_for_backward()` and other data can be saved in the map `save` in the form of `<std::string, at::IValue>` pairs. Variables saved in forward can be accessed with `get_saved_variables()`. Example usage: ``` class MyFunction : public Function<MyFunction> { public: static variable_list forward(AutogradContext ctx, int n, Variable var) { // Save data for backward in context ctx->saved_data["n"] = n; return {var}; } static variable_list backward(AutogradContext ctx, variable_list grad_output) { // Use data saved in forward auto n = ctx->saved_data["n"].toInt(); return {grad_output[0]*n}; } }; ``` Then, it can be used with: ``` Variable x; MyFunction::apply(6, x); ``` Also AutogradContext has methods to mark outputs as non differentiable and mark inputs as dirty similar to the [Python API](`ff23a02ac4/torch/autograd/function.py (L26)`). Test Plan: Added tests for the custom autograd function API based on test_autograd.py. Currently only the tests for the basic functionality have been added. More tests will be added later. Differential Revision: D16583428 fbshipit-source-id: 0bd42f19ce37bcd99d3080d16195ad74d40d0413	2019-07-31 11:30:48 -07:00
Zafar Takhirov	5e4c24baef	Documentation cleanup Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23148 Test Plan: Imported from OSS Differential Revision: D16414202 Pulled By: zafartahirov fbshipit-source-id: a999be0384a2ff5272dd2f8adcf87547ce6ee9dd	2019-07-31 11:30:44 -07:00
Tao Xu	87a75bd605	remove ONNX & Turn on `NO_API` for mobile build (#23546 ) Summary: ### Summary The iOS build was broken after this PR 👉 [23195](https://github.com/pytorch/pytorch/pull/23195/files) was merged, as there are two files still have dependency on ONNX. - `test.cpp` in `test/cpp/jit` - `export.cpp` in `torch/csrc/jit` This PR is to remove ONNX completely from mobile build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23546 Test Plan: - The `build_ios.sh` finished successfully. - The `libtorch.a` can be compiled and run on iOS devices Differential Revision: D16558236 Pulled By: xta0 fbshipit-source-id: b7ff1db750698cfd5a72d5cb0b9f2f378e315077	2019-07-31 10:42:56 -07:00
ptrblck	9130ab380a	fix gemm call for CUDABlas for THCUNN conv, #23545 (#23552 ) Summary: * Swapped `CUBLAS_OP_N` for `'n'` * added a test This PR should fix https://github.com/pytorch/pytorch/issues/23545. Thanks at AlphabetMan for reporting the initial issue reported in [the forum](https://discuss.pytorch.org/t/cuda-10-1-error-using-transposeconv2d-with-output-padding-1/51414?u=ptrblck) as well as ngimel for the guidance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23552 Differential Revision: D16580986 Pulled By: ezyang fbshipit-source-id: abc0bce1e84d9c9d96d44ae0296951725adc8424	2019-07-31 10:01:36 -07:00
vishwakftw	5d130e4232	Allowing batching for det/logdet/slogdet operations (#22909 ) Summary: Changelog: - Add batching for det / logdet / slogdet operations - Update derivative computation to support batched inputs (and consequently batched outputs) - Update docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/22909 Test Plan: - Add a `test_det_logdet_slogdet_batched` method in `test_torch.py` to test `torch.det`, `torch.logdet` and `torch.slogdet` on batched inputs. This relies on the correctness of `torch.det` on single matrices (tested by `test_det_logdet_slogdet`). A port of this test is added to `test_cuda.py` - Add autograd tests for batched inputs Differential Revision: D16580988 Pulled By: ezyang fbshipit-source-id: b76c87212fbe621f42a847e3b809b5e60cfcdb7a	2019-07-31 10:01:32 -07:00
Edward Yang	5b66062f99	Use prerendered KaTeX in docs. (#23376 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23376 This uses master version of sphinxcontrib-katex as it only recently got prerender support. Fixes #20984 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D16582064 Pulled By: ezyang fbshipit-source-id: 9ef24c5788c19572515ded2db2e8ebfb7a5ed44d	2019-07-31 10:01:28 -07:00
Zachary DeVito	456e66d531	format jit_type.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23564 Test Plan: Imported from OSS Differential Revision: D16567850 Pulled By: zdevito fbshipit-source-id: 6e2056b480da3f1ea0dbb6e7240677f7e7a9937e	2019-07-31 09:38:57 -07:00
vishwakftw	02d5c62f34	Fix regression in torch.qr (#23591 ) Summary: Changelog: - Use narrow instead of narrow_copy while returning Pull Request resolved: https://github.com/pytorch/pytorch/pull/23591 Test Plan: - All tests should pass to ensure that the change is correct Fixes https://github.com/pytorch/pytorch/issues/23580 Differential Revision: D16581174 Pulled By: ezyang fbshipit-source-id: 1b6bf7d338ddd138ea4c6aa6901834dd202ec79c	2019-07-31 09:38:53 -07:00
Ilham Firdausi Putra	50ce9e09da	Fix typos in .circleci/README.md (#23588 ) Summary: Fix typos in .circleci/README.md Pull Request resolved: https://github.com/pytorch/pytorch/pull/23588 Differential Revision: D16581934 Pulled By: ezyang fbshipit-source-id: 39bf07e8d9d80493e15ecba7e846097ef44a6851	2019-07-31 09:32:38 -07:00
Gregory Chanan	3e0da2ab8e	Rename AT_FORALL_SCALAR_TYPES_WITH_COMPLEX to AT_FORALL_SCALAR_TYPES_WITH_COMPLEX_AND_STUBS Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23336 Test Plan: Imported from OSS Differential Revision: D16467982 Pulled By: gchanan fbshipit-source-id: 004bfc179c7bf963e1132c59af692080156808ab	2019-07-31 08:17:17 -07:00
Gregory Chanan	e324f9a093	Remove AT_FORALL_SCALAR_TYPES_WITH_COMPLEX_EXCEPT_COMPLEX_HALF, which isn't used anymore. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22932 Test Plan: Imported from OSS Differential Revision: D16467978 Pulled By: gchanan fbshipit-source-id: 1dafde39a63c4109a8bc5fb31b9ffe5071d6dc53	2019-07-31 08:17:13 -07:00
peter	c7248dad63	Update MKL to 2019.4 for Windows Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23583 Differential Revision: D16581167 Pulled By: ezyang fbshipit-source-id: a5b4f65c08d53c9a477093a5558502a4b7b888a4	2019-07-31 07:47:10 -07:00
Richard Zou	c5482e33e9	Rename tensor.is_named to has_named, expose has_named to python. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23315 Test Plan: - [namedtensor ci] gh-metadata: pytorch pytorch 23315 gh/zou3519/79/head Imported from OSS Differential Revision: D16494414 Pulled By: zou3519 fbshipit-source-id: d2d6beb45db9288e5df707b68b6046d783ca9f97	2019-07-31 07:14:07 -07:00
Richard Zou	725e41e955	Enable named tensors for arithmetic, clone, and tensor conversion ops Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23237 Test Plan: Imported from OSS Differential Revision: D16494416 Pulled By: zou3519 fbshipit-source-id: 29bc390797c99088d50a2b59c3e2402a93562e2c	2019-07-31 07:14:04 -07:00
Richard Zou	b417c2d5a7	Refactor the pytorch_doc_push_script to take a branch Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23556 Test Plan: - Run ci Imported from OSS Differential Revision: D16563747 Pulled By: zou3519 fbshipit-source-id: 104371b3712c00b073a82e5145090e7bd6fd2d53	2019-07-31 07:02:18 -07:00
Michael Suo	2aaeccda55	add a test for inline tracing Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23543 Test Plan: Imported from OSS Differential Revision: D16570826 Pulled By: suo fbshipit-source-id: 854609b298b31bc0250a1c536daa9ff572fb71d6	2019-07-31 00:06:48 -07:00
Zafar Takhirov	9c549dfdc1	make_module: First version Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23288 Test Plan: Imported from OSS Differential Revision: D16455390 Pulled By: zafartahirov fbshipit-source-id: 4352f0a17cd0382b48502b93e51574cc3acdfdcc	2019-07-30 22:14:44 -07:00
Tongzhou Wang	af638ad5d7	pin_memory should not copy on already pinned tensors (#23484 ) Summary: fixes https://github.com/pytorch/pytorch/issues/21076 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23484 Differential Revision: D16546264 Pulled By: ezyang fbshipit-source-id: 8058e0bbc6336751f36b884d71234feef498a982	2019-07-30 21:16:23 -07:00
Roy Li	3fe00f0c90	Fix set_grad for extension backends Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23516 Test Plan: Imported from OSS Differential Revision: D16546732 Pulled By: li-roy fbshipit-source-id: bbf9498de98fd807c64862d628da35d0097f2ee0	2019-07-30 20:28:37 -07:00
Jerry Zhang	775d7bd6a1	at::view (#23452 ) Summary: accidently calls clone, but what we want is creating an empty tensor and set storage. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23452 ghstack-source-id: 87438096 Differential Revision: D16442756 fbshipit-source-id: 6d5663f82c9bd4e9de8fc846c52992477843af6a	2019-07-30 18:08:04 -07:00
davidriazati	756bdcbca4	Include recursive class compilations in error call stack (#23454 ) Summary: Previously these were left out which would lead to confusing messages, now it looks something like: ``` torch.jit.frontend.UnsupportedNodeError: import statements aren't supported : at ../test.py:13:9 def bad_fn(self): import pdb ~~~~~~ <--- HERE '__torch__.X' is being compiled since it was called from 'fn' at ../test.py:16:12 def fn(x): return X(10) ~~~~ <--- HERE ``` Fixes #23453 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23454 Pulled By: driazati Differential Revision: D16567930 fbshipit-source-id: 251b6f91f37a2816e06bb4c803f9bc172fa1d91b	2019-07-30 17:29:54 -07:00
Du Tran	941be58b5a	remove the confused CPU op (#23575 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23575 as title Reviewed By: skamalas Differential Revision: D16571878 fbshipit-source-id: f175d1d70f0e96e04da949100985db0e1f936fb9	2019-07-30 17:29:50 -07:00
Jerry Zhang	bc64324da9	Change condition in swap module Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23561 Test Plan: python test/test_quantization.py Imported from OSS Differential Revision: D16570928 Pulled By: jerryzh168 fbshipit-source-id: 70f36f577ac657d015f3d7738819867742088e5a	2019-07-30 17:25:02 -07:00
Michael Suo	ab584c738b	Move overview to docs/ folder Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23457 Test Plan: Imported from OSS Differential Revision: D16552603 Pulled By: suo fbshipit-source-id: 91547f870c563ca78382b8fdd7a42b472ed07ea4	2019-07-30 17:20:01 -07:00
Michael Suo	1c86b8a783	add docs for serialization Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23456 Test Plan: Imported from OSS Differential Revision: D16552602 Pulled By: suo fbshipit-source-id: 41e333af97e43fcef2b7f6e02c36a805ceb64573	2019-07-30 17:19:57 -07:00
Roy Li	0a04513367	Remove old Type based backend extensions (#22009 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22009 ghimport-source-id: e481b64707434a1abdc382fd80bd70f165540711 Test Plan: Imported from OSS Differential Revision: D15914755 Pulled By: li-roy fbshipit-source-id: 9230b8b234f71a5d865bf6bca93347c68c349ff7	2019-07-30 14:07:46 -07:00
Edward Yang	3cc7da3a7d	Revert D16561561: [pytorch][PR] Remove preprocessing of CFLAGS, CPPFLAGS, and LDFLAGS in Python scripts. Differential Revision: D16561561 Original commit changeset: 962a27a2b0a1 fbshipit-source-id: 82ed08e5599ddbb9ed96352ac4572aa73df65aac	2019-07-30 13:28:19 -07:00
Nikolay Korovaiko	649fa8e5c8	add log stmts to peephole.cpp Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23279 Differential Revision: D16519245 Pulled By: Krovatkin fbshipit-source-id: 50c49d890c0acac8259b3c367d183a1aa7cf6859	2019-07-30 13:16:16 -07:00
Nikolay Korovaiko	9dea86f86b	Make ProfiledTensorType hashable Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23116 Differential Revision: D16519748 Pulled By: Krovatkin fbshipit-source-id: 25090678d82d5dc9ca0a48aef45eeb62b8ac8d45	2019-07-30 13:11:06 -07:00
Hao Lu	2d238d090c	Add Cast Op Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23548 Reviewed By: yinghai Differential Revision: D16355170 fbshipit-source-id: 72de08b16251f55165977736e686075bca08c24e	2019-07-30 12:51:03 -07:00
Mikhail Zolotukhin	776b6b6bcd	Cleanup interface of inlineCallTo. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23539 Test Plan: Imported from OSS Differential Revision: D16555365 Pulled By: ZolotukhinM fbshipit-source-id: 6cfcde7a7600315e73e083284c80f876509489a5	2019-07-30 11:26:31 -07:00
Prasun Anand	be3d27589f	Added torch.autograd.profiler.record_function() as context manager. (#23428 ) Summary: Added torch.autograd.profiler.record_function() as context manager to annotate block of Python code during profiling. Fixes https://github.com/pytorch/pytorch/issues/19422 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/23428 Differential Revision: D16560771 Pulled By: soumith fbshipit-source-id: 3923130f7647a36a84dbbe28cc59d216d395d3f9	2019-07-30 11:10:01 -07:00
Jerry Zhang	7364aa796d	skip nn.Identity in add_observer Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23500 Test Plan: e2e test in quantizing resnext 101 Imported from OSS Differential Revision: D16550190 Pulled By: jerryzh168 fbshipit-source-id: 6128d7c3419235152b43739fcc5cade34342ba3d	2019-07-30 11:00:36 -07:00
Zafar Takhirov	5b4ac841c9	Quantized Average Pool kernel Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23143 Test Plan: Imported from OSS Differential Revision: D16406281 Pulled By: zafartahirov fbshipit-source-id: dcd8b58a0ef32b3dcc3337c282c59b4e52091516	2019-07-30 10:51:25 -07:00
Vitaly Fedyunin	401fbb0088	Port `resize_as_` and `clone` from TH to Aten (#23027 ) Summary: API operators now routed to `at::native::resize_as_*_` and `at::native::clone` accordingly. Internal `THTensor_(resizeAs)`, `THCTensor_(resizeAs)`, `THTensor_(newClone)` and `THCTensor_(newClone)` remains to support older TH code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23027 Differential Revision: D16362304 Pulled By: VitalyFedyunin fbshipit-source-id: 4c1e8516da685f3fdea632ff791d143f27aeebeb	2019-07-30 10:40:27 -07:00
Soumith Chintala	e7abff0778	Delete re_worker_requirements	2019-07-30 13:02:20 -04:00
vishwakftw	b3a9a7a9b9	Rename gels to lstsq (#23460 ) Summary: Changelog: - Rename `gels` to `lstsq` - Fix all callsites - Rename all tests - Create a tentative alias for `lstsq` under the name `gels` and add a deprecation warning to not promote usage. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23460 Test Plan: - All tests should pass to confirm that the patch is correct Differential Revision: D16547834 Pulled By: colesbury fbshipit-source-id: b3bdb8f4c5d14c7716c3d9528e40324cc544e496	2019-07-30 09:56:04 -07:00
Hong Xu	cfe9400996	Remove preprocessing of CFLAGS, CPPFLAGS, and LDFLAGS in Python scripts. (#23528 ) Summary: After https://github.com/pytorch/pytorch/issues/23455, there is no need of this preprocessing in Python scripts. They will be automatically processed in CMake (plus CPPFLAGS here probably meant to be CXXFLAGS). Reference: - https://cmake.org/cmake/help/v3.15/envvar/CFLAGS.html - https://cmake.org/cmake/help/v3.15/envvar/CXXFLAGS.html - https://cmake.org/cmake/help/v3.15/envvar/LDFLAGS.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/23528 Differential Revision: D16561561 Pulled By: ezyang fbshipit-source-id: 962a27a2b0a18db0f95477ad067a2611e4128187	2019-07-30 08:07:36 -07:00
Pavel Belevich	fd61cc9ebc	Moved at::assert_no_internal_overlap to TensorIterator Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22917 Differential Revision: D16521429 Pulled By: pbelevich fbshipit-source-id: 80ae583c6486d6948431b79e1452902bdf2cfbc3	2019-07-30 07:47:33 -07:00
Johannes M Dieterich	4b78ce1ba4	Clean cmake infrastructure up (#23527 ) Summary: Only check for cmake dependencies we directly depend on (e.g., hipsparse but not rocsparse) Use cmake targets for ROCm where possible. While there, update the docker CI build infrastructure to only pull in packages by name we directly depend on (anticipating the demise of, e.g., miopengemm). I do not anticipate a docker rebuild to be necessary at this stage as the changes are somewhat cosmetic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23527 Differential Revision: D16561010 Pulled By: ezyang fbshipit-source-id: 87cd9d8a15a74caf9baca85a3e840e9d19ad5d9f	2019-07-30 07:26:48 -07:00
Richard Zou	437a8b3eed	Named inference rule for copy_ Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23229 Test Plan: Imported from OSS Differential Revision: D16494413 Pulled By: zou3519 fbshipit-source-id: 4acb85e5a4ad09bf5f7cbb84cc8d4ceac0cd9967	2019-07-30 07:17:34 -07:00
Gabriele Mambrini	16da355b30	Sync worker requirement mismatches Summary: Syncing worker requirement mismatches to improve remote build time. Created actions: LARGE: 66 MEDIUM: 649 XLARGE: 1 Updated actions: From LARGE to MEDIUM: 18 From LARGE to XLARGE: 2 From MEDIUM to LARGE: 20 From XLARGE to LARGE: 1 Differential Revision: D16559356 fbshipit-source-id: a51ef034265649314661ab0e283089a069a20437	2019-07-30 02:53:11 -07:00
Ailing Zhang	4e59055c4d	optimize matmul memory usage for certain cases (#23433 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/21406 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23433 Differential Revision: D16524135 Pulled By: ailzhang fbshipit-source-id: e7684fec60c9b9db9a09f8ac157b13c8dde1bdd2	2019-07-29 22:35:45 -07:00
Will Feng	7b081e5d1e	Improve error message for changing tensor metadata after .data or .detach() (#23504 ) Summary: When a user tries to change metadata of a tensor created from `.data` or `.detach()`, we currently shows an error message "<function_name> is not allowed on Tensor created from .data or .detach()". However, this error message doesn't suggest what the right fix should look like. This PR improves the error message. Closes https://github.com/pytorch/pytorch/issues/23393. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23504 Differential Revision: D16547415 Pulled By: yf225 fbshipit-source-id: 37f4a0385442e2b0966386fb14d3d938ecf4230c	2019-07-29 22:25:14 -07:00
Owen Anderson	db1e9b1d6c	Fix a few clang warnings. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23524 Differential Revision: D16549562 fbshipit-source-id: 58351fc2858d495b135023626116f6f565c8e9b1	2019-07-29 22:08:50 -07:00
Sebastian Messmer	30bc19d751	dictKeys and dictItems ops on typed dicts return typed lists (#23270 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23270 ghstack-source-id: 87389530 Differential Revision: D16448942 fbshipit-source-id: e6b578f0e97776112259d7ea38e143e4716ec273	2019-07-29 20:00:34 -07:00
Michael Suo	c8817f9436	fix default value for script Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23542 Test Plan: Imported from OSS Differential Revision: D16557122 Pulled By: suo fbshipit-source-id: c86578aa2c55f44ed5d573d33874a82244df3d09	2019-07-29 19:51:26 -07:00
Michael Suo	6314af6e57	Revert D16526027: [jit] Include recursive class compilations in error call stack Differential Revision: D16526027 Original commit changeset: 109f2968430d fbshipit-source-id: c27252540ec6b7da60739eb7dcc8b1650672c226	2019-07-29 19:02:39 -07:00
Michael Suo	6574f6167c	Revert D16554694: [jit] add a test for inline tracing Differential Revision: D16554694 Original commit changeset: 0fae4458f18c fbshipit-source-id: 08aa0c292fa5b2dbdd0d1f0e59f531416edef760	2019-07-29 18:57:06 -07:00
Michael Suo	265b498de2	add a test for inline tracing Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23535 Test Plan: Imported from OSS Differential Revision: D16554694 Pulled By: suo fbshipit-source-id: 0fae4458f18c06ffbd484905ad7836dce9ce69cc	2019-07-29 18:05:15 -07:00
davidriazati	52b95fd4be	Include recursive class compilations in error call stack (#23454 ) Summary: Previously these were left out which would lead to confusing messages, now it looks something like: ``` torch.jit.frontend.UnsupportedNodeError: import statements aren't supported : at ../test.py:13:9 def bad_fn(self): import pdb ~~~~~~ <--- HERE '__torch__.X' is being compiled since it was called from 'fn' at ../test.py:16:12 def fn(x): return X(10) ~~~~ <--- HERE ``` Fixes #23453 ](https://our.intern.facebook.com/intern/diff/16526027/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/23454 Pulled By: driazati Differential Revision: D16526027 fbshipit-source-id: 109f2968430dbf51ee91b1b3409badfd557d19a4	2019-07-29 18:00:05 -07:00
davidriazati	696642ae8d	Change docs to use recursive script API (#21612 ) Summary: Use the recursive script API in the existing docs TODO: * Migration guide for 1.1 -> 1.2 Pull Request resolved: https://github.com/pytorch/pytorch/pull/21612 Pulled By: driazati Differential Revision: D16553734 fbshipit-source-id: fb6be81a950224390bd5d19b9b3de2d97b3dc515	2019-07-29 17:51:22 -07:00
Daya Khudia	bfee46f8e2	Update argument list for non-fbgemm path for qconv_prepack (#23521 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23521 non-fbgemm path should have the same arguments as fbgemm path. Reviewed By: jianyuh Differential Revision: D16547637 fbshipit-source-id: bb00d725fb968cbee32defb8facd2799a7e79bb4	2019-07-29 17:41:11 -07:00
Michael Suo	65a89472c4	Put all modules in the global Python CU Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23154 Test Plan: Imported from OSS Differential Revision: D16441913 Pulled By: suo fbshipit-source-id: a79f2c3e06a33cbd79b2e3333f16c069f356f451	2019-07-29 16:38:20 -07:00
Hong Xu	e366af7d87	Add TORCH_CHECK to disable sub for bool tensors (#23519 ) Summary: This resolves two issues in one shot: - sub shouldn't be available for bool type. - When sub is applied to an unsupported type, the current error messages shows "add_cpu/add_cuda is not implemented for [type]". They should be "sub_cpu/sub_cuda" instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23519 Differential Revision: D16548770 Pulled By: izdeby fbshipit-source-id: fe404a2a97b8d11bd180ec41364bf8e68414fb15	2019-07-29 16:28:35 -07:00
Mingzhe Li	3c986dff77	introduce auto_set to simplify benchmarking the backward path of operators (#23276 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23276 This diff introduces a new feature to simplify benchmarking the backward path of ops. Here is an example: ``` ... self.input_one = torch.rand(M, N, K, requires_grad=self.auto_set()) self.input_two = torch.rand(M, N, K, requires_grad=self.auto_set()) ... ``` In this way, the benchmark will generate three different test cases. 1. input_one requires grad 2. input_two requires grad 3. both inputs require grad Here is a sample output: ``` # Benchmarking PyTorch: add # Mode: Eager # Name: add_M1_N8_K8_bwdall # Input: M: 1, N: 8, K: 8 Backward Execution Time (us) : 863.744 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M1_N8_K8_bwd1 # Input: M: 1, N: 8, K: 8 Backward Execution Time (us) : 727.915 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M1_N8_K8_bwd2 # Input: M: 1, N: 8, K: 8 Backward Execution Time (us) : 687.626 ``` Reviewed By: zheng-xq Differential Revision: D16450355 fbshipit-source-id: 50ae0916e81c3ff9f0c482ed6d386319eb15b305	2019-07-29 15:58:41 -07:00
Ilia Cherniavskii	41dfe7204b	Threading and CPU Inference note Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23417 Test Plan: cd docs; make html Imported from OSS Differential Revision: D16523781 Pulled By: ilia-cher fbshipit-source-id: d6c09e8a85d39e6185bbdc4b312fea44fcdfff06	2019-07-29 15:45:49 -07:00
Wanchao Liang	f4eb93f7bc	Support pack_padded_sequence and pad_packed_sequence Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23249 Test Plan: Imported from OSS Differential Revision: D16466587 Pulled By: wanchaol fbshipit-source-id: a721da01b2da0ef90cac80b77f1285102e3b1118	2019-07-29 15:36:47 -07:00
Wanchao Liang	c384fbf4c8	support torch._C._get_tracing_state in script Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23248 Test Plan: Imported from OSS Differential Revision: D16466588 Pulled By: wanchaol fbshipit-source-id: 3c3d5dec2cea2f9cb080eadaef457cc62ac3fbe0	2019-07-29 15:05:50 -07:00
BowenBao	e1f8985973	Specify onnxruntime version to install for CI tests (#23517 ) Summary: No real change on the CI since currently the default latest is 0.4.0. houseroad bddppq Pull Request resolved: https://github.com/pytorch/pytorch/pull/23517 Differential Revision: D16550375 Pulled By: bddppq fbshipit-source-id: a669b8af678c79c4d6909300b28458fe6b7cd30c	2019-07-29 14:58:15 -07:00
Wanchao Liang	c779eff579	support torch.as_tensor in script Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23247 Test Plan: Imported from OSS Differential Revision: D16466590 Pulled By: wanchaol fbshipit-source-id: cf52721eacd177d9040564790382db13a9fcc2fe	2019-07-29 14:38:22 -07:00
Edward Thomson	3a568c9a2b	CI: install clang-tidy (#23518 ) Summary: Install clang-tidy (from LLVM 8) for the `clang_tidy` job. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23518 Differential Revision: D16549621 Pulled By: ezyang fbshipit-source-id: b1d20641380cdfdb0589249770b98163528fa69f	2019-07-29 14:28:26 -07:00
Hong Xu	a8edc2b5d2	Add sanity checks for NCCL detection. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22926 Differential Revision: D16546369 Pulled By: colesbury fbshipit-source-id: 56f7ef4476e586dee19366fdb720085d1c2f2027	2019-07-29 13:47:05 -07:00
dongfangduoshou123	9219a37c12	avoid Include the same header file twice (#23418 ) Summary: avoid Include the same header file twice Pull Request resolved: https://github.com/pytorch/pytorch/pull/23418 Differential Revision: D16546422 Pulled By: colesbury fbshipit-source-id: 5cd868cce73d9199ced9b6f2f6f57bf42e5a5d5b	2019-07-29 13:34:11 -07:00
Elias Ellison	dec4eacae4	fix fbcode weak ordering (#23511 ) Summary: There is an internal fbcode assert that fails if i do not add these checks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23511 Differential Revision: D16545606 Pulled By: eellison fbshipit-source-id: cd3a799850bae8f052f9d81c1e4a2678fda19317	2019-07-29 13:14:39 -07:00
Yaxun (Sam) Liu	0c9979dd7d	Fix TestCuda.test_events_wait (#23520 ) Summary: PyTorch test sets a policy() method to assertLeaksNoCudaTensors. Whenever a test is run, assertLeaksNoCudaTensors is called, which in turn calls CudaMemoryLeakCheck, which in turn calls initialize_cuda_context_rng, where it executes torch.randn on each device, where a kernel is launched on each device. Since the kernel may not finish on device 1, the assertion self.assertTrue(s1.query()) fails. The fix is to insert torch.cuda.synchronize(d0) torch.cuda.synchronize(d1) at the beginning of the test so that previously launched kernels finish before the real test begins. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23520 Differential Revision: D16547701 Pulled By: soumith fbshipit-source-id: 42ad369f909d534e15555493d08e9bb99dd64b6a	2019-07-29 13:09:41 -07:00
SsnL	e982e46de3	Add multiprocessing_context= argument to DataLoader (#22990 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/22131 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22990 Differential Revision: D16539052 Pulled By: colesbury fbshipit-source-id: b1c48ae2fb54065dd96a67be263254129e02eaa2	2019-07-29 12:58:40 -07:00
Edward Yang	56664c2c65	Untap caskroom/homebrew-cask in attempt to unbreak OS X builds. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23514 Test Plan: Imported from OSS Differential Revision: D16546841 Pulled By: ezyang fbshipit-source-id: 96d2e988cb0dddfeec0174875761dfa26f25a8c1	2019-07-29 12:45:01 -07:00
xzhu1900	31f1928096	add sorting policy to ChunkDataset (#23053 ) Summary: Add a sorting policy to ChunkDataset. This is considered an advanced parameter for developers who want to apply a 'sorting policy' to the chunk data before sampling into minibatch. Different than the collate method, this policy is applied on the chunk level instead of minibatch level. When a chunk of data is loaded (multiple chunks if cross_chunk_shuffle_count_ is greater than 1), this policy is targeting to the full loaded data. It will be useful if developers want to perform some pre-processing (like bucketing) to the chunk data before example sampler samples the data. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23053 Differential Revision: D16537692 Pulled By: colesbury fbshipit-source-id: cd21ed40ab787a18b8c6dd304e5b806a7a45e6ba	2019-07-29 12:34:02 -07:00
Soumith Chintala	a356276d79	add note to Contribution Guide around recently released research (#23513 ) Summary: Thanks adefazio for the feedback, adding a note to the Contribution guide so that folks don't start working on code without checking with the maintainers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23513 Differential Revision: D16546685 Pulled By: soumith fbshipit-source-id: 1ee8ade963703c88374aedecb8c9e5ed39d7722d	2019-07-29 12:24:59 -07:00
Fritz Obermeyer	06762b4721	Fix distributions.Categorical.sample bug from .view() (#23328 ) Summary: This modernizes distributions code by replacing a few uses of `.contiguous().view()` with `.reshape()`, fixing a sample bug in the `Categorical` distribution. The bug is exercised by the following test: ```py batch_shape = (1, 2, 1, 3, 1) sample_shape = (4,) cardinality = 2 logits = torch.randn(batch_shape + (cardinality,)) dist.Categorical(logits=logits).sample(sample_shape) # RuntimeError: invalid argument 2: view size is not compatible with # input tensor's size and stride (at least one dimension spans across # two contiguous subspaces). Call .contiguous() before .view(). # at ../aten/src/TH/generic/THTensor.cpp:203 ``` I have verified this works locally, but I have not added this as a regression test because it is unlikely to regress (the code is now simpler). Pull Request resolved: https://github.com/pytorch/pytorch/pull/23328 Differential Revision: D16510678 Pulled By: colesbury fbshipit-source-id: c125c1a37d21d185132e8e8b65241c86ad8ad04b	2019-07-29 12:09:50 -07:00
shihongzhi	be644d822b	fixes #20178 (#23297 ) Summary: fixes https://github.com/pytorch/pytorch/issues/20178 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23297 Differential Revision: D16497552 Pulled By: VitalyFedyunin fbshipit-source-id: 386933b15c27d02351f042be71b153bc9439004d	2019-07-29 12:04:44 -07:00
Hong Xu	236149edc5	Make randperm works properly on non-contiguous tensors. (#23043 ) Summary: Close https://github.com/pytorch/pytorch/issues/22710 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23043 Differential Revision: D16446340 Pulled By: VitalyFedyunin fbshipit-source-id: 1760af310fee71b369e1aaaf96546277058611c9	2019-07-29 11:59:04 -07:00
Nikolay Korovaiko	d6ff78fd00	fix an over-indented return in trace_module Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23358 Differential Revision: D16519010 Pulled By: Krovatkin fbshipit-source-id: a7e4225b70e915d91c74874e3eca9bcb87baf84c	2019-07-29 11:15:55 -07:00
Richard Zou	505fa83b2f	Implement named inference rule for mul Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23193 Test Plan: - [namedtensor ci] gh-metadata: pytorch pytorch 23193 gh/zou3519/75/head Imported from OSS Differential Revision: D16494401 Pulled By: zou3519 fbshipit-source-id: 0e2395d7de39158ec51feed5da0389715ec52600	2019-07-29 09:58:18 -07:00
Edward Yang	d3fcb4ccd3	Try another version of apt/dpkg killing. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23499 Test Plan: Imported from OSS Differential Revision: D16542875 Pulled By: ezyang fbshipit-source-id: 05aa97f2d61e4fc00a819768448944f85701cab8	2019-07-29 09:13:24 -07:00
Hong Xu	8ada7c9920	Remove two CMAKE_ build options from additional_options. (#23451 ) Summary: Following up 915261c8bef85e3b845a0384d3fb55e55707b609 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23451 Differential Revision: D16542303 Pulled By: ezyang fbshipit-source-id: 1406c311c198eb237f85d6d8f1f0d58626be8257	2019-07-29 08:13:59 -07:00
Hong Xu	09ba4df031	Whether MKLDNN should be built under native arch should respect USE_NATIVE_ARCH (#23445 ) Summary: Currently there is no way to build MKLDNN more optimized than sse4. This commit let MKLDNN build respect USE_NATIVE_ARCH. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23445 Differential Revision: D16542275 Pulled By: ezyang fbshipit-source-id: 550976531d6a52db9128c0e3d4589a33715feee2	2019-07-29 08:13:56 -07:00
Hong Xu	b335f3910f	Remove redundant MSVC_Z7_OVERRIDE processing and combine "/EHa" flag setup (#23455 ) Summary: - MSVC_Z7_OVERRIDE has already handled in CMakeLists.txt. No need to process it for once more in the Python scripts. - Option MSVC_Z7_OVERRIDE should be visible to the user only if MSVC is used. - Move the setting of "/EHa" flag to CMakeLists.txt, where other MSVC-specific flags are processed. This also further prepares the removal of redundant cflags setup in Python build scripts. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23455 Differential Revision: D16542274 Pulled By: ezyang fbshipit-source-id: 4d3b8b07161478bbba8a21feb6ea24c9024e21ac	2019-07-29 08:08:47 -07:00
Ralf Gommers	81e46d4f78	Fix build issue. CUDA may be installed in `$CUDA_HOME/lib` on macOS. (#23491 ) Summary: Closes gh-16955. Closes https://github.com/pytorch/vision/issues/977 On Linux both `lib64` and `lib` may be present (symlinked). The reports seem to all be about macOS, but it seems like this is also possibly more robust on Linux and can't hurt. So not treating platforms differently. Note that Eigen has a similar check in its CMake: ``` if(CUDA_64_BIT_DEVICE_CODE AND (EXISTS "${CUDA_TOOLKIT_ROOT_DIR}/lib64")) link_directories("${CUDA_TOOLKIT_ROOT_DIR}/lib64") else() link_directories("${CUDA_TOOLKIT_ROOT_DIR}/lib") endif() ``` There may be other issues for building from source on macOS, can't test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23491 Differential Revision: D16538973 Pulled By: soumith fbshipit-source-id: cc309347b7d16e718e06878d3824d0a6e40b1019	2019-07-29 08:08:43 -07:00
Hong Xu	97f129bf0a	Let set_rng_state and get_rng_state accept string parameter (#23448 ) Summary: Currently set_rng_state and get_rng_state do not accept string as their parameters. This commit let them accept strings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23448 Differential Revision: D16527172 Pulled By: soumith fbshipit-source-id: 8f9a2129979706e16877cc110f104770fbbe952c	2019-07-29 08:08:39 -07:00
Edward Yang	7a82066282	Update PyTorch Docker image to 323. (#23389 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23389 Test Plan: Imported from OSS Differential Revision: D16541971 Pulled By: ezyang fbshipit-source-id: 2b7e483f4d6eedef7f5c140ffc0fac21fecd179b	2019-07-29 07:29:54 -07:00
Mingbo Wan	f546a3b8d8	fixing documentation, issue 22697 (#23268 ) Summary: As fmassa commented : > Agree, it should probably be weight, start, end Pull Request resolved: https://github.com/pytorch/pytorch/pull/23268 Differential Revision: D16493403 Pulled By: zou3519 fbshipit-source-id: 51ed07f6f7abdbd41dc323570aed41d804fa9c1b	2019-07-29 07:24:49 -07:00
Alisdair Johnstone	19858f7cc6	Sync worker requirement mismatches Summary: Syncing worker requirement mismatches to improve remote build time. Created actions: MEDIUM: 981 LARGE: 56 Updated actions: From MEDIUM to LARGE: 10 From LARGE to MEDIUM: 3 From LARGE to XLARGE: 1 Differential Revision: D16532427 fbshipit-source-id: c58bf59e6c571627b3994f8cdfa79758fb85892b	2019-07-29 04:37:23 -07:00
vishwakftw	91d28026f8	Remove unused cuBLAS driver functions for getrs (#23375 ) Summary: Changelog: - Remove getrs driver functions from THCBlas{.h/.cpp} Pull Request resolved: https://github.com/pytorch/pytorch/pull/23375 Test Plan: - Build to pass to confirm no callsites were missed. Differential Revision: D16539079 Pulled By: soumith fbshipit-source-id: b5c285a2d36714ddf3393337eec7d85b1eaf3f51	2019-07-28 21:29:18 -07:00
peter	54c280863c	Add some compiler flags for building cpp extensions on Windows (#23472 ) Summary: (1) Add `COMMON_MSVC_FLAGS` to the flags in the ninja codepath (2) Add `/EHsc` to `COMMON_MSVC_FLAG` (3) Remove `-fPIC` and `-std=c++11` from the flags in the windows codepath Pull Request resolved: https://github.com/pytorch/pytorch/pull/23472 Differential Revision: D16532993 Pulled By: soumith fbshipit-source-id: bc2d983f5f8b4eae9c7385bf170f155679e92e87	2019-07-28 20:33:18 -07:00
Karl Ostmo	ef6356133e	Revert D16428208: [pytorch][PR] only scatter in forward if multi-device per process Differential Revision: D16428208 Original commit changeset: eaa3876b2b95 fbshipit-source-id: 9db3bc86bf419dd06fdaaff434f72b92ecb5a427	2019-07-27 22:41:20 -07:00
Hong Xu	64e4152064	Clarify that torch.device without an index will always represent the current device (#23468 ) Summary: Per discussion in https://github.com/pytorch/pytorch/issues/23448 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23468 Differential Revision: D16532950 Pulled By: soumith fbshipit-source-id: 48c97060aaf55f1d7589afab42c6cd623d71a9a7	2019-07-27 06:49:52 -07:00
Abhinav Jauhri	ffef0e03b7	Enabling GPU device runs for operators (#23461 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23461 Enabling GPU device runs for production operator shapes. Reviewed By: xw285cornell, mingzhe09088 Differential Revision: D16526928 fbshipit-source-id: 46657963f4b0bc43d14205ccf1b63d588657e388	2019-07-26 18:53:40 -07:00
James Reed	23e526e6ff	Fix SourceRange comparison Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23341 Test Plan: Imported from OSS Differential Revision: D16505398 Pulled By: jamesr66a fbshipit-source-id: 0bf6a1a054c7749c0a3334654d5746dd9f5dee96	2019-07-26 18:08:43 -07:00
Elias Ellison	3497891c14	add sorted keyword for lists and dicts (#23274 ) Summary: Add `sorted` keyword to JIT for lists and dicts. This desugars to a list copy and a call to `list.sort()`. Since we don't have interfaces yet I implement it in terms of `list.sort()`. When we do we can re-visit implementing this op in a different manner. The test fails bc of a fix to specialized lists which is landing here: https://github.com/pytorch/pytorch/pull/23267 Ignore the first commit because it is formatting, plz use clang_format ppl :'( Pull Request resolved: https://github.com/pytorch/pytorch/pull/23274 Differential Revision: D16527323 Pulled By: eellison fbshipit-source-id: aed8faef23cb790b9af036cd6c1b9b1d7066345d	2019-07-26 17:44:15 -07:00
Mingzhe Li	f0ebf769de	allow accepting empty input to the benchmark (#23462 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23462 as title Reviewed By: hl475 Differential Revision: D16527176 fbshipit-source-id: 7a8ff4f3c6122ae7b3205e0b446fec06fd95eedc	2019-07-26 17:30:42 -07:00
Lu Fang	522cca5040	Support IntList in Dict's shalloEquals (#23205 ) Summary: Required when comparing IntList type of default values Pull Request resolved: https://github.com/pytorch/pytorch/pull/23205 ghstack-source-id: 87208341 Reviewed By: zrphercule Differential Revision: D16433809 fbshipit-source-id: 3f60d67d708129be31198161423d819108468077	2019-07-26 17:30:38 -07:00
Adam Stooke	d6d7a5f075	only scatter in forward if multi-device per process (#22384 ) Summary: Scatter is unnecessary if only using one device, and it breaks on some custom data structures like namedtuple, so would like to avoid :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/22384 Differential Revision: D16428208 Pulled By: soumith fbshipit-source-id: eaa3876b2b95c1006ccaaacdb62f54c5280e730c	2019-07-26 17:30:34 -07:00
Jiakai Liu	e1ae3a75c8	gate module::save logic on mobile (#23415 ) Summary: This is part of the effort to shrink OSS libtorch mobile build size. We shouldn't need Module::save function on mobile - it depends on csrc/jit/export.cpp which then depends on ONNX. By gating these two methods we can avoid these dependencies for libtorch mobile. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23415 ghstack-source-id: 87288228 Reviewed By: dreiss Differential Revision: D16511143 fbshipit-source-id: fd031f91fcf9b7be54cbe1436506965af94ab537	2019-07-26 17:23:38 -07:00
Yuxin Wu	23f963e4a8	Update distributed.rst (#23289 ) Summary: Different backend is supported since https://github.com/pytorch/pytorch/pull/18595 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23289 Differential Revision: D16528229 Pulled By: soumith fbshipit-source-id: 57753e84c015817661ba30835278ee3a899aa2d0	2019-07-26 16:55:52 -07:00
Elias Ellison	ca76c82ce3	Add early returns to JIT (#19179 ) Summary: Add early returns to JIT with minimal changes to compiler.cpp and an IR->IR pass that will transform the graph so that there is only one return value. In compiler.cpp, record when a block will exit so that in the following example will work: ``` if cond: a = torch.zeros([2]) else: return 2 a += 2 ... ``` To match block outputs with values that will not be used, like in the above example with `a`, I add a Bottom Type that subtypes everything else. This allows shape propagation to continue to work, and makes it so that we don't need many extra nodes filling up the graph. The IR transform currently doesn't work on Loops, I didn't add that to this PR to avoid too much complexity, but will add it as a stack (and it should be very little extra code). the IR transform is commented at the top of the file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19179 Differential Revision: D16519819 Pulled By: eellison fbshipit-source-id: 322a27f69966d1fd074ebe723c3e948b458b0e68	2019-07-26 16:42:43 -07:00
Supriya Rao	9223fa1c46	Add support to serialize qtensor in JIT. (#23356 ) Summary: Adds qtensor specific fields to the proto file so that they get serialized into the model.json Pull Request resolved: https://github.com/pytorch/pytorch/pull/23356 ghstack-source-id: 87263428 Differential Revision: D16473237 fbshipit-source-id: bf5b51d0863d036d30a1644a3c3b74516468224b	2019-07-26 15:52:15 -07:00
Edward Yang	9dad13e1f0	Revert "Add fbgemm_qlinear_dynamic op (#23104 )" Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23449 Test Plan: Imported from OSS Differential Revision: D16524768 Pulled By: ezyang fbshipit-source-id: 9eb01b021011d1172317b5adb774c10c42ac2b86	2019-07-26 15:02:33 -07:00
Gregory Chanan	953459f29e	Dont allow conversions with QInt. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22931 Test Plan: Imported from OSS Differential Revision: D16467985 Pulled By: gchanan fbshipit-source-id: 3925fc96a641e66b92fa65c542a2a23190c915a5	2019-07-26 14:45:14 -07:00
Jerry Zhang	190d255d2e	Add FLOAT_MODULE to quantized conv (#23414 ) Summary: att Pull Request resolved: https://github.com/pytorch/pytorch/pull/23414 ghstack-source-id: 87225586 Differential Revision: D16511055 fbshipit-source-id: c617733f60cfe38f4791e35e57e9551f2b5d8c09	2019-07-26 14:02:20 -07:00
liqunfu	83d6c6be07	ONNX export for index_select (#21866 ) Summary: ONNX export for index_select Pull Request resolved: https://github.com/pytorch/pytorch/pull/21866 Reviewed By: zrphercule Differential Revision: D16471345 Pulled By: houseroad fbshipit-source-id: 745c23ba8a3223b5ec59b924df7358a36a92518c	2019-07-26 13:56:15 -07:00
Ilia Cherniavskii	74f8094ea5	Rename threading build options Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23407 Test Plan: USE_CUDA=0 ATEN_THREADING=TBB USE_OPENMP=0 USE_TBB=1 MKL_THREADING=TBB BLAS=MKL USE_MKLDNN=1 MKLDNN_THREADING=TBB BUILD_BINARY=1 python setup.py develop install --cmake ./build/bin/parallel_info Imported from OSS Differential Revision: D16522538 Pulled By: ilia-cher fbshipit-source-id: 75c4761d93a7f5936f28e4c5eedcd27d8490d0c5	2019-07-26 13:09:14 -07:00
Shen Li	aae48748f2	Avoid unnecessary tensor clone in Cloneable (#20995 ) Summary: As pointed out by SsnL in https://github.com/pytorch/pytorch/issues/20910, when clone destination is different from the module's device, `Cloneable` currently calls `clone()` and then `to()` on every parameter and buffer, where the first clone is unnecessary. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20995 Differential Revision: D15517353 Pulled By: mrshenli fbshipit-source-id: 6b6dc01560540a63845663f863dea0a948021fa5	2019-07-26 12:46:42 -07:00
Mingzhe Li	53182e53f0	fix observer name in the benchmark output (#23443 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23443 as title Reviewed By: hl475 Differential Revision: D16520962 fbshipit-source-id: 7a0ccbece487837c204f242d2a5c6f69b32cbc8c	2019-07-26 12:20:41 -07:00
Mingzhe Li	828c08b4c7	allow passing a list of operators to benchmark (#23442 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23442 Replace the argument name from `operator` to `operators` which can take a list of operators to test. Reviewed By: hl475 Differential Revision: D16520779 fbshipit-source-id: 94284a87c64471793e319f5bd3143f89b9a192bb	2019-07-26 12:20:36 -07:00
Jan Schlüter	0bc90194fb	Catch and print exception traceback in parallel_apply() workers (#18055 ) Summary: When an exception occurs in one of the modules passed to `parallel_apply()`, it is caught and re-raised in the main thread. This preserves the original exception type and message, but has the traceback point at the position where it's re-raised, rather than the original point of failure. This PR saves the exception information required to generate the traceback, and includes the original traceback in the message of the exception raised in the main thread. Before: ``` ... File ".../torch/nn/parallel/data_parallel.py", line 153, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File ".../torch/nn/parallel/parallel_apply.py", line 84, in parallel_apply raise output RuntimeError: expected type torch.FloatTensor but got torch.cuda.FloatTensor ``` After: ``` ... File ".../torch/nn/parallel/data_parallel.py", line 153, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File ".../torch/nn/parallel/parallel_apply.py", line 88, in parallel_apply ''.join(traceback.format_exception(*exc_info))) RuntimeError: Caught exception in replica 0. Original traceback and message: Traceback (most recent call last): ... File "../models/foo.py", line 319, in bar baz = asdf / ghij[:, np.newaxis] RuntimeError: expected type torch.FloatTensor but got torch.cuda.FloatTensor ``` I took care to raise an exception of the original type (in case the main code checks for that), but replaced the message. It helped me find a bug that did not occur outside `data_parallel()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18055 Differential Revision: D16444972 Pulled By: zhangguanheng66 fbshipit-source-id: ec436c9d4677fad18106a8046cfa835a20a101ce	2019-07-26 11:41:22 -07:00
Mingzhe Li	7499fe72e9	remove c2 tests from benchmark_all_test (#23437 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23437 as title Reviewed By: hl475 Differential Revision: D16519770 fbshipit-source-id: 63fc269e18c264d399e25f44b03f81fc3ae01113	2019-07-26 11:12:53 -07:00
Lu Fang	e5e2face8f	Change handling of DataParallel in ONNX exporter (#23365 ) Summary: Don't automatically unwrap top layer DataParalllel for users. Instead, we provide useful error information and tell users what action to take. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23365 Reviewed By: zrphercule Differential Revision: D16514273 Pulled By: houseroad fbshipit-source-id: f552de5c53fb44807e9d9ad62126c98873ed106e	2019-07-26 11:12:49 -07:00
Lu Fang	c8c5e11fba	Support variadic returns in Schema's operator<< (#23204 ) Summary: old: prim::PythonOp(...) -> new: prim::PythonOp(...) -> ... Pull Request resolved: https://github.com/pytorch/pytorch/pull/23204 ghstack-source-id: 87208343 Reviewed By: zrphercule Differential Revision: D16433592 fbshipit-source-id: 36cbb329188f112e09c3b1708a8090781b830dfe	2019-07-26 10:58:00 -07:00
Ralf Gommers	34f53564b4	Don't warn when using conda compilers with utils.cpp_extension (#23396 ) Summary: The conda compiler are gcc/c++ 7.3.0, but have custom version strings for clarity: x86_64-conda_cos6-linux-gnu-cc x86_64-conda_cos6-linux-gnu-c++ Using these compilers to build a C++ or CUDA extension now gives this warning (unnecessarily): ``` !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (/home/rgommers/anaconda3/envs/pytorch-nightly/bin/x86_64-conda_cos6-linux-gnu-c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. ... ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/23396 Differential Revision: D16500637 Pulled By: soumith fbshipit-source-id: 5b2fc3593e22e9a7d07dc2c0456dbb4934ffddb2	2019-07-26 10:17:14 -07:00
Jianyu Huang	47a54295ee	Add fbgemm_qlinear_dynamic op (#23104 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23104 ghstack-source-id: 87247148 As suggested in https://github.com/pytorch/pytorch/pull/22891, we will add an overload for ```torch.fbgemm_linear_int8_weight``` (dynamic quantized version of linear function) that takes PackedLinearWeight as input and is pretty much the same in signature as regular aten::linear. Differential Revision: D16381552 fbshipit-source-id: 1ccc4174fd02c546eee328940ac4b0da48fc85e8	2019-07-26 10:11:56 -07:00
Gregory Chanan	b7984fa8a7	Remove cases of AT_FORALL_SCALAR_TYPES_EXCEPT_HALF. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22890 Test Plan: Imported from OSS Differential Revision: D16467980 Pulled By: gchanan fbshipit-source-id: 93ddbd041b7f65cafe8520b095289f14ad6d667f	2019-07-26 09:58:35 -07:00
Richard Zou	0dcb8755c8	Implement tensor.set_names_, tensor.names setter Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23172 Test Plan: - [namedtensor ci] gh-metadata: pytorch pytorch 23172 gh/zou3519/74/head Imported from OSS Differential Revision: D16494364 Pulled By: zou3519 fbshipit-source-id: 8d0e26b33346d4eadba30b2e76610f6d7be7c373	2019-07-26 08:50:49 -07:00
Richard Zou	c8a50a26d2	Named inference rule for torch.prod Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23106 Test Plan: - [namedtensor ci] Imported from OSS Differential Revision: D16419175 Pulled By: zou3519 fbshipit-source-id: beb9ef838525c1ea7d7839cb9b8d68028fb4917f	2019-07-26 08:50:45 -07:00
Richard Zou	9817d7e16b	Implement named inference rule for torch.sum Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23081 Test Plan: - New tests [namedtensor ci] Imported from OSS Differential Revision: D16419174 Pulled By: zou3519 fbshipit-source-id: 8679f77f121664d0398d7f062a53c0fa37482481	2019-07-26 08:50:40 -07:00
Daya Khudia	4104e80eae	qconv+relu and qlinear+relu modules (#23410 ) Summary: adding qconv+relu and qlinear+relu modules in nn/_intrinsic/quantized Pull Request resolved: https://github.com/pytorch/pytorch/pull/23410 Test Plan: Extended tests to test these new modules as well buck test mode/dev caffe2/test:quantized -- 'test_linear_api' --print-passing-details ``` Running 1 tests Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/2251799820197379 ✓ caffe2/test:quantized - test_linear_api (test_nn_quantized.ModuleAPITest) 4.055 1/1 (passed) Test output: > test_linear_api (test_nn_quantized.ModuleAPITest) > test API functionality for nn.quantized.linear and nn._intrinsic.quantized.linear_relu ... ok > > ---------------------------------------------------------------------- > Ran 1 test in 4.056s > > OK Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/2251799820197379 Summary (total time 10.66s): PASS: 1 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 0 ``` buck test mode/dev caffe2/test:quantized -- 'test_conv_api' --print-passing-details ``` Running 2 tests Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/4785074607089664 ✓ caffe2/test:quantized - test_conv_api (test_quantized_conv.QuantizedConvTest) 5.195 1/2 (passed) Test output: > test_conv_api (test_quantized_conv.QuantizedConvTest) > Tests the correctness of the conv functional. ... ok > > ---------------------------------------------------------------------- > Ran 1 test in 5.195s > > OK ✓ caffe2/test:quantized - test_conv_api (test_nn_quantized.ModuleAPITest) 10.616 2/2 (passed) Test output: > test_conv_api (test_nn_quantized.ModuleAPITest) > Tests the correctness of the conv module. ... ok > > ---------------------------------------------------------------------- > Ran 1 test in 10.616s > > OK Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/4785074607089664 Summary (total time 17.31s): PASS: 2 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 0 `` Differential Revision: D16505333 Pulled By: dskhudia fbshipit-source-id: 04f45cd0e76dc55f4694d558b913ab2958b7d727	2019-07-26 08:50:36 -07:00
Hong Xu	fb8725fdbd	Update doc about subsequent builds: options can be changed in build/CMakeCache.txt Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23331 Test Plan: Imported from OSS Differential Revision: D16517622 Pulled By: ezyang fbshipit-source-id: d2d15b8bb2599b3b9abb7a1aa6bc91bfc0d8e5d0	2019-07-26 08:50:32 -07:00
Hong Xu	0b4c0b95e9	For second-time build, let build_type be inferred from CMakeCache.txt. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23323 Test Plan: Imported from OSS Differential Revision: D16517621 Pulled By: ezyang fbshipit-source-id: 22984df214d01246a7868980e148936698940ea8	2019-07-26 08:50:28 -07:00
Shen Li	beb109f6f1	Enable cpp api test in multigpu-test.sh (#23380 ) Summary: blocking https://github.com/pytorch/pytorch/issues/20995 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23380 Differential Revision: D16517013 Pulled By: mrshenli fbshipit-source-id: 3f44ecf0e8d1e235165f2ce4396795ca38e2d837	2019-07-26 07:44:07 -07:00
BowenBao	46224ef89e	Update ONNX docs (#23185 ) Summary: This is still work in progress. There are several more items to add to complete this doc, including - [x] LHS indexing, index assignments. - [x] Tensor List. - [x] ~Shape/Type propagation.~ - [x] FAQs Please review and share your thoughts, feel free to add anything that you think should be included as well. houseroad spandantiwari lara-hdr neginraoof Pull Request resolved: https://github.com/pytorch/pytorch/pull/23185 Differential Revision: D16459647 Pulled By: houseroad fbshipit-source-id: b401c005f848d957541ba3b00e00c93ac2f4609b	2019-07-26 00:14:54 -07:00
liqunfu	7a0ae0079f	export sort to onnx Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21913 Differential Revision: D15982801 Pulled By: houseroad fbshipit-source-id: 96dbd738c557478fffd48000db7263ae1f9754f5	2019-07-26 00:02:20 -07:00
Horace He	1c00e0fc3f	Added a flatten module (#22245 ) Summary: https://github.com/pytorch/pytorch/issues/2118 I'm not sure I'm doing it correctly, so I'll add tests if we decide that it's roughly correct. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22245 Differential Revision: D16508957 Pulled By: Chillee fbshipit-source-id: a8dc7af999ba698c921006889f71cb1bc5a59d50	2019-07-25 22:48:52 -07:00
Sebastian Messmer	5b0484d977	Fix forwarding of arguments into kernel function (#23412 ) Summary: They should be forwarded by their actual type, not their rvalue reference. This looked like perfect forwarding but actually wasn't. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23412 ghstack-source-id: 87214575 Reviewed By: dzhulgakov Differential Revision: D16507872 fbshipit-source-id: 2b20a37df83067dd53e917fe87407ad687bb147c	2019-07-25 22:00:40 -07:00
Mingzhe Li	3516f3c235	handle exit from init method (#21211 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21211 There are cases where the `init` method used to create inputs can exit with error. When this happens, that specific input should be skipped. Reviewed By: zheng-xq Differential Revision: D15466410 fbshipit-source-id: 55e86764b2ec56f7730349ff1df6e50efc0239d7	2019-07-25 21:41:06 -07:00
Pavel Belevich	dd79d45c5a	Added torch.bitwise_not docstr (#23397 ) Summary: Fixing https://github.com/pytorch/pytorch/issues/23311 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23397 Differential Revision: D16505107 Pulled By: pbelevich fbshipit-source-id: 8d515fc27e253469393941c8da23d8e0510e64df	2019-07-25 18:32:58 -07:00
Lu Fang	58a3e4f71f	Automatic update of fbcode/onnx to 28ca699b69b5a31892619defca2391044a9a6052 (#23404 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23404 Previous import was 707064980b9825b8705b9d1c9aad34d8b022d5dd Included changes: - [28ca699b](https://github.com/onnx/onnx/commit/28ca699b): Member Company logo guidelines (#2196) <Prasanth Pulavarthi> - [47acb06a](https://github.com/onnx/onnx/commit/47acb06a): remove link to outdated issue for contributions wanted (#2186) <Prasanth Pulavarthi> - [168519f6](https://github.com/onnx/onnx/commit/168519f6): Create sigs.md (#2103) <Prasanth Pulavarthi> - [b9320746](https://github.com/onnx/onnx/commit/b9320746): mintor format update (#2180) <Prasanth Pulavarthi> - [65b8e0f9](https://github.com/onnx/onnx/commit/65b8e0f9): add more types support for Equal op (#2176) <Ke Zhang> - [dc5e62a9](https://github.com/onnx/onnx/commit/dc5e62a9): Update AddNewOP document. (#2172) <Emad Barsoum> - [bae8b530](https://github.com/onnx/onnx/commit/bae8b530): Add missing space (#2150) <Takeshi Watanabe> - [5952b7f5](https://github.com/onnx/onnx/commit/5952b7f5): python api example typo fix (#2155) <LeicongLi> - [904cb842](https://github.com/onnx/onnx/commit/904cb842): Fix errors in RoiAlign shape inference code (#2167) <G. Ramalingam> Reviewed By: zrphercule Differential Revision: D16502373 fbshipit-source-id: 81f1e8f0db6828fd551089ae2e0be65153739532	2019-07-25 18:26:04 -07:00
Daya Khudia	bd54608bd2	fused qconv2d+relu kernel (#23353 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23353 Adding support for fused qconv2d + relu Reviewed By: jianyuh Differential Revision: D16473318 fbshipit-source-id: cd3c3476a21ffe946dbd9812e833b957c0fd206c	2019-07-25 17:55:47 -07:00
Daya Khudia	6a8c2758d5	Add better performing versions for groupwise and depthwise convolutions (#22869 ) Summary: Groupwise and depthwise convolutions become faster with this diff Pull Request resolved: https://github.com/pytorch/pytorch/pull/22869 Test Plan: buck test mode/dev caffe2/test:quantized -- 'test_qconv' --print-passing-details ``` Running 2 tests Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/562950091484224 ✓ caffe2/test:quantized - test_qconv (test_quantized.TestQuantizedConv) 2.731 1/2 (passed) Test output: > test_qconv (test_quantized.TestQuantizedConv) ... ok > > ---------------------------------------------------------------------- > Ran 1 test in 2.732s > > OK ✓ caffe2/test:quantized - test_qconv_unpack (test_quantized.TestQuantizedConv) 5.187 2/2 (passed) Test output: > test_qconv_unpack (test_quantized.TestQuantizedConv) ... ok > > ---------------------------------------------------------------------- > Ran 1 test in 5.188s > > OK Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/562950091484224 Summary (total time 15.66s): PASS: 2 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 0 ``` buck test mode/dev caffe2/test:quantized -- 'test_conv_api' ``` Running 2 tests Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/3940649676010406 ✓ caffe2/test:quantized - test_conv_api (test_nn_quantized.ModuleAPITest) 0.040 1/2 (passed) ✓ caffe2/test:quantized - test_conv_api (test_quantized_conv.FunctionalAPITest) 5.402 2/2 (passed) Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/3940649676010406 Summary (total time 11.83s): PASS: 2 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 0 ``` Differential Revision: D16264144 Pulled By: dskhudia fbshipit-source-id: 32fa43e5c3d97c8aaa6e0858327a2ac0aef8df5c	2019-07-25 17:55:43 -07:00
Gregory Chanan	2409e6a475	C++ at::Tensor, torch::tensor constructors should not accept QInts. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22889 Test Plan: Imported from OSS Differential Revision: D16467984 Pulled By: gchanan fbshipit-source-id: 6e2b1bf2a6a8dbc60138cd437b9cf814a0fc297d	2019-07-25 16:30:25 -07:00
Lu Fang	0e3a359a38	Align the operator<< for Argument with FunctionSchema parser (#23203 ) Summary: Align the Argument's operator<< with parser, additional support: 1) List size 2) real default value 3) Alias information Pull Request resolved: https://github.com/pytorch/pytorch/pull/23203 ghstack-source-id: 87118985 Reviewed By: zrphercule Differential Revision: D16433188 fbshipit-source-id: aea5711f93feacd94d1732e2f0d61218a31a0c5c	2019-07-25 15:28:17 -07:00
Sam Gross	83b0fbc38d	Remove TensorIterator::Builder (#23329 ) Summary: The builder pattern doesn't seem to work well with return-value-optimization. This saves ~100 ns in the construction of TensorIterator::binary_op. ``` import torch x = torch.rand(1) y = torch.rand(1) z = torch.rand(1) %timeit torch.add(x, y, out=z) # ~1.76 us vs ~1.88 us on my machine ``` cc resistor zheng-xq Pull Request resolved: https://github.com/pytorch/pytorch/pull/23329 Differential Revision: D16495070 Pulled By: VitalyFedyunin fbshipit-source-id: 8ce116075fa4c7149dabfcdfa25885c1187c8e2f	2019-07-25 15:15:49 -07:00
Edward Yang	2cfe949d45	DEPRECATE_MESSAGE doesn't actually get expanded; inline it at both sites. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23379 Test Plan: Imported from OSS Differential Revision: D16495278 Pulled By: ezyang fbshipit-source-id: 596438fbf3285d6eee7b5d27a014f87b6c261cf1	2019-07-25 14:26:00 -07:00
Michael Suo	b755bc1e31	fix importing for module defs that are named "foo.bar" Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23367 Test Plan: Imported from OSS Differential Revision: D16478637 Pulled By: suo fbshipit-source-id: 30c6e7bfe377ef35d8c39e2d31615075ca0a6a19	2019-07-25 14:07:56 -07:00
Guanheng Zhang	b22adeb007	Fix error message for a wrong fork CUDA (#23322 ) Summary: Re-land https://github.com/pytorch/pytorch/pull/23030 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23322 Differential Revision: D16469442 Pulled By: zhangguanheng66 fbshipit-source-id: 70b63ab6265efa3f289111ef0ce46bb3c0d353bc	2019-07-25 12:58:14 -07:00
Jesse Hellemn	d18529eb93	Fix upload path for macos binaries (#23386 ) Summary: peterjc123 will this affect windows too? Pull Request resolved: https://github.com/pytorch/pytorch/pull/23386 Differential Revision: D16499443 Pulled By: pjh5 fbshipit-source-id: a3bec32d16f2cd06a097082deae0b020dd8bc5ac	2019-07-25 12:48:04 -07:00
Tao Xu	7ee62d3d91	Fix the iOS build (#23293 ) Summary: The legacy iOS build script (`build_ios.sh`) is still working, but the output is in caffe2, not Pytorch. To enable the Pytorch iOS build, we can set the value of `BUILD_CAFFE2_MOBILE` to `NO`, and turn on another cmake arg - `INTERN_BUILD_MOBILE` ljk53 has created for Android. There is a trivial issue in `used_kernel.cpp` that will cause the compiling error when running `build_ios.sh`, as it uses a `system`API that has been deprecated since iOS 11. The fix below is to bypass this file since it's not needed by mobile. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23293 Test Plan: The `build_ios.sh` completed successfully, and all the generated static libraries can be compiled and linked successfully on iOS devices. ### Build script ```shell ./scripts/build_ios.sh \ -DBUILD_CAFFE2_MOBILE=OFF \ -DCMAKE_PREFIX_PATH=$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())') \ -DPYTHON_EXECUTABLE=$(python -c 'import sys; print(sys.executable)') ``` Differential Revision: D16456100 Pulled By: xta0 fbshipit-source-id: 38c73e1e3a0c219a38ddc28b31acc181690f34e8	2019-07-25 12:41:20 -07:00
Lu Fang	071536f895	Fix the operator== for Argument (#23202 ) Summary: type() returns a shared pointer, we should compare its content, not pointer itself Pull Request resolved: https://github.com/pytorch/pytorch/pull/23202 ghstack-source-id: 87125582 Reviewed By: zrphercule Differential Revision: D16432634 fbshipit-source-id: 639e730dcdc1cec02f280efeea53019b36d9ae37	2019-07-25 11:59:28 -07:00
Sebastian Messmer	bbc53bffef	AliasAnalysisKind::CONSERVATIVE/FROM_SCHEMA (#22175 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22175 - Rename AliasAnalysisKind::DEFAULT to AliasAnalysisKind::CONSERVATIVE - Introduce AliasAnalysisKind::FROM_SCHEMA that means the alias annotations of the schema should be honored - Introduce AliasAnalysisKind::INTERNAL_SPECIAL_CASE to be able to run assertions that internal special cased ops are treated correctly - aten:: and prim:: ops are not treated as special cases anymore, but just use AliasAnalysisKind::FROM_SCHEMA - There's a set of assertions to ensure that aten:: and prim:: ops are all correctly set up to use AliasAnalysisKind::FROM_SCHEMA. Once this PR lands and passes all tests, we will remove those assertions and open up for the possibility of different AliasAnalysisKind settings for aten:: and prim:: ops Differential Revision: D15929595 fbshipit-source-id: 7c6a9d4d29e13b8c9a856062cd6fb3f8a46a2e0d	2019-07-25 11:53:51 -07:00
Sebastian Messmer	b9202d459a	Polish torch::Dict (#23344 ) Summary: torch::List recently received some polishing that now also is done for Dict. This should be done before the PyTorch 1.2 release because of backwards compatibility. - Dict is just a reference type, so "const Dict" should have the same capabilities as "Dict", constness is not guaranteed in any way. - DictIterator gets comparison operators <, <=, >, >= Pull Request resolved: https://github.com/pytorch/pytorch/pull/23344 ghstack-source-id: 87170304 Differential Revision: D16468800 fbshipit-source-id: 2978c3b9cdcfb2cfb3f26516b15bd455d9a48ba9	2019-07-25 11:35:36 -07:00
Lu Fang	722f80a07d	Align String str() with the format in FunctionSchema (#23201 ) Summary: old: string new: str Pull Request resolved: https://github.com/pytorch/pytorch/pull/23201 ghstack-source-id: 87125580 Reviewed By: zrphercule Differential Revision: D16432340 fbshipit-source-id: 56bc7e8efbc2276315f464958cf38704f75dd06e	2019-07-25 11:31:00 -07:00
Pieter Noordhuis	7c383ba4a0	Remove superfluous check (#23370 ) Summary: This check is not needed. Even if it were, the assignment is clobbered anyway. Closes #23300. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23370 ghstack-source-id: 87157671 Differential Revision: D16485329 fbshipit-source-id: 8ccac79e81f5e0d0d20099d550411c161f58c233	2019-07-25 11:26:16 -07:00
Jianyu Huang	39de49c7ec	Fix a tiny bug and refactor (#22808 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22808 - Use ```size_to_dim_```. - ```mod``` is not in the scope. Should be ```module```. Reviewed By: mingzhe09088 Differential Revision: D16225799 fbshipit-source-id: 9a263227d2d508eefdfddfee15fd0822819de946	2019-07-25 11:26:12 -07:00
Jesse Hellemn	39fd264799	Fix lint Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23381 Differential Revision: D16496327 Pulled By: pjh5 fbshipit-source-id: 529029544a5f8c8106bcb7cebdc71aee33e3b86c	2019-07-25 10:39:37 -07:00
Lu Fang	ed316c0ca0	Align Dict str() with the format in FunctionSchema (#23200 ) Summary: Old: Dict[int, str] New: Dict(int, str) Pull Request resolved: https://github.com/pytorch/pytorch/pull/23200 ghstack-source-id: 87125581 Reviewed By: zrphercule Differential Revision: D16432159 fbshipit-source-id: a3dc5fa397697a53e78290d25e19589f757c1eb8	2019-07-25 10:27:41 -07:00
Jerry Zhang	f477cab2dc	Add type checks in _intrinsics.modules.fused (#23361 ) Summary: recreated Pull Request resolved: https://github.com/pytorch/pytorch/pull/23361 ghstack-source-id: 87142339 Reviewed By: zafartahirov Differential Revision: D16455500 fbshipit-source-id: ab2c9d10c7c025ae77f5b80f28e6bd261620a5f7	2019-07-25 09:49:01 -07:00
Lu Fang	25e06618c8	Support parsing variadic return schema (#23199 ) Summary: all cases should be prim ops, but let's support it. it will expect variadic return schema to be prim::PythonOp(...) -> ... Pull Request resolved: https://github.com/pytorch/pytorch/pull/23199 ghstack-source-id: 87113845 Differential Revision: D16431635 fbshipit-source-id: 798b6957ce5d800f7fcf981c86fdcb009cd77a78	2019-07-25 09:39:49 -07:00
Thomas Viehmann	cf50249bde	Disable fusion of grad_sum_to_size (#23372 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/22833 grad_sum_to_size does not commute with AutogradAdd after all because it turns the broadcasting AutogradAdd into a broadcasting add. Chillee did actually do most of the tracking down to the fusion of grad_sum_to_size and pinging me when he had found the cause. Thank you! About the choice of removing the fusion completely instead of being more precise: - We do have grad_sum_to_size elimination which works for cases where broadcasting does not actually happen in the forward, so the cases where the fusing of grad_sum_to_size is actually beneficial is much smaller than when initially proposed. - There will be less fusion, in terms of the tests, IOU stops being fully fused. I vaguely think that it is a case we could handle with refined logic. - Keeping it would add complexity in checking when to merge fusion groups to the complexities that this PR removes. - The future of fusion probably lies more in more complete solutions including reductions (TVM or KeOps or our own or ...). Pull Request resolved: https://github.com/pytorch/pytorch/pull/23372 Differential Revision: D16489930 Pulled By: soumith fbshipit-source-id: bc0431b0d3eda264c401b634675872c4ce46f0f4	2019-07-25 08:55:33 -07:00
Hong Xu	82545ecc71	Specify build dir as a global variable in BUILD_DIR in the build system. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23318 Test Plan: Imported from OSS Differential Revision: D16493987 Pulled By: ezyang fbshipit-source-id: 497e9dd924280f61dde095b4f2b50f5402d9da97	2019-07-25 07:19:47 -07:00
Hong Xu	915261c8be	Let users pass CMake-specific options starting with CMAKE_ to CMake. (#22776 ) Summary: This should make it more convenient to follow https://github.com/pytorch/pytorch/issues/8433's suggestion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22776 Differential Revision: D16493553 Pulled By: ezyang fbshipit-source-id: 852f4779e70f84a4c9f7bab4c2ae4927248ffc93	2019-07-25 07:19:44 -07:00
Hong Xu	f91b19c2aa	Do not explicitly set USE_FBGEMM in tools/setup_helpers/cmake.py (#23314 ) Summary: Instead, defer its default value to CMakeLists.txt NO_FBGEMM has already been handled in tools/setup_helpers/env.py (although deprecated) Pull Request resolved: https://github.com/pytorch/pytorch/pull/23314 Differential Revision: D16493580 Pulled By: ezyang fbshipit-source-id: 7255eb1df5e8a6dd0362507d68da0986a9ed46e2	2019-07-25 07:11:52 -07:00
Kexuan Sun	ba6f65cf33	Add document of functions nn.init.ones_/zeros_ (#23145 ) Summary: Functions `nn.init.ones_` and `nn.init.zeros_` were not documented. As mentioned in https://github.com/pytorch/pytorch/issues/9886 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23145 Differential Revision: D16427108 Pulled By: soumith fbshipit-source-id: 4fac31e79717a436411ef5e107a829b403e576c9	2019-07-25 06:09:50 -07:00
jjsjann123	252710262f	(#22775 ) Summary: passing FusionCallback and Symbol to recursive GraphFuser calls. It ensures consistent fusion in nested Blocks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22775 Differential Revision: D16439979 Pulled By: soumith fbshipit-source-id: 18d4b13f52b03708b8580c73f75450adbb672ac1	2019-07-25 05:54:03 -07:00
Prasun	0c79753c0d	Improve documentation for torch.enable_grad , torch.no_grad and torch.set_grad_enabled (#23310 ) Summary: Modified documentation for ` torch.enable_grad` , ` torch.no_grad` and `torch.set_grad_enabled`. Fixes https://github.com/pytorch/pytorch/issues/19189 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23310 Differential Revision: D16489626 Pulled By: soumith fbshipit-source-id: f0926e4f51ffd97521e67bee3a16ad954458247a	2019-07-25 05:48:33 -07:00
Pieter Noordhuis	2938299de1	Fix lint failure introduced in #23346 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23371 Differential Revision: D16489985 Pulled By: pietern fbshipit-source-id: 914048563bbe7bf5ab897c6f12f4a1bb4bff30e1	2019-07-25 05:17:15 -07:00
Ralf Gommers	0842624d50	Fix upload issue with linux libtorch nightlies (#23368 ) Summary: This is a small fix on top of gh-23348, which fixed the libtorch nightly build timeouts. For the latest nighly build (25 July), see https://circleci.com/workflow-run/33d0a24a-b77c-4a8f-9ecd-5646146ce684 The only failures are these uploads, which is because `aws s3 cp` can only deal with one file at a time. The only way to make it do multiple files at once is: ``` aws s3 cp . "$s3_dir" --exclude "" --include "libtorch-.zip" --recursive --acl public-read ``` which is much more verbose. executing one `cp` per file should be fine, and this is also what's done in `binary_macos_upload.sh` Closes gh-23039 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23368 Differential Revision: D16488853 Pulled By: soumith fbshipit-source-id: 6dc04b4de2f6cd2de5ae9ad57a6e980f56896498	2019-07-25 04:52:43 -07:00
Pieter Noordhuis	95e822622b	Enhance interpretation of GLOO_SOCKET_IFNAME (#22978 ) Summary: With this change you can now list multiple interfaces separated by comma. ProcessGroupGloo creates a single Gloo context for every device in the list (a context represents a connection to every other rank). For every collective that is called, it will select the context in a round robin fashion. The number of worker threads responsible for executing the collectives is set to be twice the number of devices. If you have a single physical interface, and wish to employ increased parallelism, you can also specify `GLOO_SOCKET_IFNAME=eth0,eth0,eth0,eth0`. This makes ProcessGroupGloo use 4 connections per rank, 4 I/O threads, and 8 worker threads responsible for executing the collectives. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22978 ghstack-source-id: 87006270 Differential Revision: D16339962 fbshipit-source-id: 9aa1dc93d8e131c1714db349b0cbe57e9e7266f1	2019-07-25 04:52:38 -07:00
Gu, Jinghui	1dd4d55565	Improve FindMKLDNN.cmake to avoid binary compatibility issue in MKL-DNN (#23292 ) Summary: Illegal instruction is encountered in pre-built package in MKL-DNN. https://github.com/pytorch/pytorch/issues/23231 To avoid such binary compatibility issue, the HostOpts option in MKL-DNN is disabled in order to build MKL-DNN for generic arch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23292 Differential Revision: D16488773 Pulled By: soumith fbshipit-source-id: 9e13c76fb9cb9338103cb767d7463c10891d294a	2019-07-25 04:42:26 -07:00
Pieter Noordhuis	e8ad167211	Remove usage of FOLDER constant in test_distributed.py (#23223 ) Summary: This is step 1 in trying to get rid of constants that are set prior to executing the test runner. All setup logic should be concentrated in the setupClass() function of the TestCase subclass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23223 ghstack-source-id: 87005260 Reviewed By: zhaojuanmao Differential Revision: D16439147 fbshipit-source-id: 7a929ad4b1c8e368e33d1165becbd4d91220882c	2019-07-25 02:55:30 -07:00
Michael Suo	711be82951	Make optimize a thread_local flag Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23170 Test Plan: Imported from OSS Differential Revision: D16441912 Pulled By: suo fbshipit-source-id: a33485178a329d54e41e364c4f14950f88481c55	2019-07-24 23:09:21 -07:00
Mingzhe Li	b3980f46a2	Replace uint8 with int8 in Linear and LSTM quantization path (#23347 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23347 This diff replaces uint8 with int8 to match with the underlying kernel implementation. When we do int8 quantization, we are computing with uint8 (input activation) * int8 (weight) -> uint8 (output activation). The weight is quantized into int8. Reviewed By: jianyuh Differential Revision: D16469435 fbshipit-source-id: a697655b0e97833fc601e5980970aec4dba53c39	2019-07-24 22:25:12 -07:00
Edward Yang	fba325be34	Try kill -9ing apt-get (#23354 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23354 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D16474254 Pulled By: ezyang fbshipit-source-id: 0dd7ce02e1aa1a42a24d2af066ebd0ac5206c9a0	2019-07-24 19:27:10 -07:00
Edward Yang	ff3cc795c8	Build binaries with local version string specifying CUDA version (#23325 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23325 Fixes #19990 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D16473826 Pulled By: ezyang fbshipit-source-id: 466db2c22fabd7b574f0a08aec67a18318ddb431	2019-07-24 18:32:32 -07:00
Iurii Zdebskyi	cf0f3556f6	Enabled cumsum and cumprod for bool tensors (#23346 ) Summary: ``` a = torch.tensor([[True, False, True], [False, False, False], [True, True, True]]) >>> torch.cumsum(a, 0) tensor([[1, 0, 1], [1, 0, 1], [2, 1, 2]]) >>> torch.cumsum(a, 1) tensor([[1, 1, 2], [0, 0, 0], [1, 2, 3]]) ``` Tested via unit tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23346 Differential Revision: D16469393 Pulled By: izdeby fbshipit-source-id: b55f3ca0588f9961a771def40f6ef58932021e1a	2019-07-24 18:16:01 -07:00
Du Tran	c9312e1a8b	Open source 3D depthwise conv (#23164 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23164 for open source CSN model Reviewed By: weiyaowang Differential Revision: D16412312 fbshipit-source-id: bb4e7748e697271563f974ca05878f8832d83653	2019-07-24 17:56:56 -07:00
Jesse Hellemn	73dee44ec5	Specifying libtorch variant in libtorch builds (#23348 ) Summary: This should fix all libtorch timeout issues Pull Request resolved: https://github.com/pytorch/pytorch/pull/23348 Differential Revision: D16472782 Pulled By: pjh5 fbshipit-source-id: b1f3a783d044eb881f6e8755e0c891093e93c93e	2019-07-24 17:31:43 -07:00
Zachary DeVito	3297d8e203	Switch keys to be sequential and stable in pickle serialization Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23280 Test Plan: Imported from OSS Differential Revision: D16452816 Pulled By: zdevito fbshipit-source-id: e143780b8e834298a575ac76d49576df94fbe27b	2019-07-24 17:13:51 -07:00
Zachary DeVito	93da1030df	Fix pickler bug where it would not load if no tensors were saved Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23263 Test Plan: Imported from OSS Differential Revision: D16446928 Pulled By: zdevito fbshipit-source-id: f70f86b28c3901a97b65b4d7654e39dc6e1aab6a	2019-07-24 17:13:46 -07:00
Zachary DeVito	7922b5057d	Memoize storages in pickler Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23262 Test Plan: Imported from OSS Differential Revision: D16446927 Pulled By: zdevito fbshipit-source-id: 92d26f64ff6269b1deef821edae31745158b5137	2019-07-24 17:13:42 -07:00
Lu Fang	71a047c3e3	Unwrap DataParallel automatically (#23334 ) Summary: Handle DataParallel for users. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23334 Differential Revision: D16467844 Pulled By: houseroad fbshipit-source-id: 696aeada437c6c0612ac4ef9c4d51e3386625de0	2019-07-24 16:29:48 -07:00
David Clissold	c23ba35009	Skip QNNpack tests on ppc64le (where support is not enabled) (#23343 ) Summary: Proposed PR for https://github.com/pytorch/pytorch/issues/23342 Disables execution of QNNpack tests if IS_PPC. Basically this parallels the same skipping of tests for IS_WINDOWS as well, which is already present. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23343 Differential Revision: D16469218 Pulled By: soumith fbshipit-source-id: 80b651d00e5d413e359cf418f79e20d74cd9c8e1	2019-07-24 15:24:00 -07:00
Lu Fang	22c169fb9c	Improve the error message for ONNX export (#23317 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23317 Print out the kind type when fail to export Reviewed By: zrphercule Differential Revision: D16462641 fbshipit-source-id: 27157c0bd597362f90ac8cfb33e1808bac0ec48b	2019-07-24 15:03:05 -07:00
Stefan Krah	87d3f66506	max_pool_with_indices: return valid indices if all input elements are -inf (#23161 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/20465. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23161 Differential Revision: D16442672 Pulled By: ezyang fbshipit-source-id: 8c2ee13acd73954c7307720c01c732f460266a63	2019-07-24 14:51:39 -07:00
Ailing Zhang	b7d90332ea	add notes about overshoot in bicubic mode (#23321 ) Summary: fix https://github.com/pytorch/pytorch/issues/21044 Bicubic interpolation can cause overshoot. Opencv keeps results dtype aligned with input dtype: - If input is uint8, the result is clamped [0, 255] - If input is float, the result is unclamped. In Pytorch case, we only accept float input, so we'll keep the result unclamped, and add some notes so that users can explicitly call `torch.clamp()` when necessary. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23321 Differential Revision: D16464796 Pulled By: ailzhang fbshipit-source-id: 177915e525d1f54c2209e277cf73e40699ed1acd	2019-07-24 14:46:37 -07:00
Alexander Sidorov	d522b3ca58	BlackBoxPredictor OSS part N: ThreadLocalPtr, InferenceGraph (#23257 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23257 Overal context: open-source BlackBoxPredictor as the entry point for inference in Caffe2 (thread safe abstraction for Caffe2 inference). This should be used in ThroughputBenchmark for the purpose of framework comparison This specific diff: There should be no harm in moving transformation code to OSS. On the advantages side we will be able to compare production Caffe2 setup with PyTorch in the most fair way via ThroughputBenchmark. This approach avoid any complicated transformation regirstries. Building those proper would be significant engineering effort as well as production risk. In the past we had SEVs related to transforms being turned off due to various refactors. Given that we don't plan to build any other significant investments into transformation logic except existing ones (like TVM and Glow), and those also relate to open-source technologies, I came up to the conclusion of moving to OSS the whole thing. Reviewed By: zrphercule Differential Revision: D16428124 fbshipit-source-id: b35deada5c015cd97b91ae12a7ea4aac53bd14b8	2019-07-24 14:35:30 -07:00
davidriazati	2915d53096	Move OptionalType wrapping out of constants.cpp Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23234 Pulled By: driazati Differential Revision: D16460880 fbshipit-source-id: d4e6b747615dbfe73a92ce571d3b2aaae7179f1b	2019-07-24 14:35:26 -07:00
davidriazati	48ca64dbf7	Better error for compiling a module type Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23312 Pulled By: driazati Differential Revision: D16461299 fbshipit-source-id: 11e56c44d561c3fbf70a96c22c5fd494eea0cf19	2019-07-24 14:24:50 -07:00
Dmytro Dzhulgakov	d6dcec37b6	Add docs about prod ecosystem features (#23010 ) Summary: Covering fleet-wide profiling, api logging, etc. It's my first time writing rst, so suggestions are definitely welcomed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23010 Differential Revision: D16456721 Pulled By: dzhulgakov fbshipit-source-id: 3d3018f41499d04db0dca865bb3a9652d8cdf90a	2019-07-24 14:15:33 -07:00
mal	87482bb15a	Remove torch::autograd::Node::get_shared_ptr() Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23307 Test Plan: Imported from OSS Differential Revision: D16460972 fbshipit-source-id: 0678627e05dd4c69c4dafa6b717db5cd1a531f56	2019-07-24 13:50:47 -07:00
Mingzhe Li	8fdbe1e10b	Support LSTM with FP16 weight (#23291 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23291 This diff implements LSTM with FP16 weights based on FBGEMM. At a high level, here are the steps: 1. Quantize and pack weight in every layer of LSTM 2. Pass weights from step 1 to the ATen `quantized_lstm` function which does matrix multiplication with FP16 weight. The following code shows the dtype of each variable used in MM: Y = X * W + B (fp32, fp32, fp16, fp32) Reviewed By: jianyuh Differential Revision: D16389595 fbshipit-source-id: c26ae4e153c667a941f4af64e9d07fc251403cee	2019-07-24 12:40:11 -07:00
Edward Yang	1f608d09cf	Revert D16440000: [pytorch][PR] Re-land "Fix error message for a wrong fork CUDA" Differential Revision: D16440000 Original commit changeset: e05683275522 fbshipit-source-id: b688f24c1e6d3d8f63c2d415262a3f0ab1b85914	2019-07-24 12:05:36 -07:00
Richard Zou	1a9edfcd36	Prevent xla-build from clobbering pytorch_linux_trusty_py3_6_gcc5_4_test Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23304 Test Plan: Imported from OSS Differential Revision: D16459166 Pulled By: zou3519 fbshipit-source-id: 8f5c35ebf1fe6e86705e7fb4fff4c720bcd8f97b	2019-07-24 11:58:44 -07:00
Sebastian Messmer	0c0ffccbb6	deepCopy also copies type information of lists (#23271 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23271 ghstack-source-id: 87088503 Differential Revision: D16449220 fbshipit-source-id: 551b7cef8f6d0d2d5a56b24ddbe2e0bb2c0c3dbe	2019-07-24 11:53:01 -07:00
Edward Yang	895e79adf1	Revert D16441000: Switch from KaTeX to imgmath for documentation rendering. Differential Revision: D16441000 Original commit changeset: c1ab557cb816 fbshipit-source-id: cbfec2ca648b614b291debd6b3e215db9fbeb57b	2019-07-24 11:43:17 -07:00
Zafar Takhirov	94711d7471	Quantized conv avoid functional usage (#22733 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22733 This refactor changes the conv module to avoid the usage of the functional ops. Reviewed By: jerryzh168 Differential Revision: D15835572 fbshipit-source-id: f2294cd708fbe8372eb3a15cc60d83777d4f7029	2019-07-24 11:43:12 -07:00
Pieter Noordhuis	67179d71f7	Reduce number of processes spawned for gloo_test.TestCase.test_forked_cw (#23221 ) Summary: It used to be run with comm_size=8, which causes flaky results in a stress run. The flakiness was caused by too many listening sockets being created by Gloo context initialization (8 processes times 7 sockets times 20-way concurrency, plus TIME_WAIT). Pull Request resolved: https://github.com/pytorch/pytorch/pull/23221 ghstack-source-id: 86995596 Reviewed By: d4l3k Differential Revision: D16437834 fbshipit-source-id: 998d0e2b087c0ab15eca64e308059c35e1b51e7b	2019-07-24 11:31:20 -07:00
Will Feng	3ed79f4b6c	Fix argument names in torch doc (#22973 ) Summary: I manually went through all functions in `torch.*` and corrected any mismatch between the arguments mentioned in doc and the ones actually taken by the function. This fixes https://github.com/pytorch/pytorch/issues/8698. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22973 Differential Revision: D16419602 Pulled By: yf225 fbshipit-source-id: 5562c9b0b95a0759abee41f967c45efacf2267c2	2019-07-24 11:22:45 -07:00
Jesse Hellemn	eb51131fb4	Revert D16423217: [pytorch][PR] Update sleef to master, fixes #20535 Differential Revision: D16423217 Original commit changeset: 587de3f10e83 fbshipit-source-id: 466e56eab73ce669cc179d08b7f39d2c8b0ffb34	2019-07-24 11:10:15 -07:00
Lu Fang	803af9988c	Fix the problem in parseOctal and throw exception if use \xhh to specify hex value (#23198 ) Summary: follow the rules: 1) https://docs.python.org/2.0/ref/strings.html 2) https://en.cppreference.com/w/cpp/language/escape didn't find anything about the format \h Pull Request resolved: https://github.com/pytorch/pytorch/pull/23198 ghstack-source-id: 86986691 Reviewed By: zrphercule Differential Revision: D16431215 fbshipit-source-id: 7b342708d1984e08b3cbbcf6d487623f1dc63a14	2019-07-24 10:41:59 -07:00
davidriazati	b9a7fc529a	Suppress warnings in tests Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23264 Pulled By: driazati Differential Revision: D16449965 fbshipit-source-id: ff7d6ddf2275e5f44883a19b6a24c6beaa42ccf4	2019-07-24 10:36:46 -07:00
Iurii Zdebskyi	200cb836f1	Enabled 'add_cuda' for bool and fixed alpha scalar bug (#23044 ) Summary: Enabled 'add_cuda' for bool Tested via unit tests Fixed https://github.com/pytorch/pytorch/issues/22431 #22430 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23044 Differential Revision: D16368095 Pulled By: izdeby fbshipit-source-id: 033d28095ff1c5df4078905c52782cf4cf9ed6b0	2019-07-24 10:31:34 -07:00
Owen Anderson	fbf28b5458	Change TensorIterator to be stack allocated, using named return value optimization to elide copies. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22519 Differential Revision: D16451460 fbshipit-source-id: 6ca6ae2fdf1af5a2f792b42e55279413971b3c46	2019-07-24 10:22:19 -07:00
Edward Yang	7203612f85	Update sleef to master, fixes #20535 (#23168 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/20535 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/23168 Differential Revision: D16423217 Pulled By: ezyang fbshipit-source-id: 587de3f10e839b94f51f673741b5fda8849e32f6	2019-07-24 08:18:14 -07:00
Hong Xu	fd1d06e317	Let Python build scripts accept both CMAKE_BUILD_TYPE and the oldschool DEBUG and REL_WITH_DEB_INFO variables. (#22875 ) Summary: Currently the build type is decided by the environment variable DEBUG and REL_WITH_DEB_INFO. This commit also lets CMAKE_BUILD_TYPE be effective. This makes the interface more consistent with CMake. This also prepares https://github.com/pytorch/pytorch/issues/22776. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22875 Differential Revision: D16281663 Pulled By: ezyang fbshipit-source-id: 952f92aad85ff59f1c7abe8256eca8a4a0936026	2019-07-24 08:07:47 -07:00
Guanheng Zhang	aa660b8eb7	Re-land "Fix error message for a wrong fork CUDA" (#23209 ) Summary: Re-land https://github.com/pytorch/pytorch/pull/23030 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23209 Differential Revision: D16440000 Pulled By: zhangguanheng66 fbshipit-source-id: e05683275522835a33d5a7e6d76b7e94774e4d98	2019-07-24 07:01:04 -07:00
Johannes M Dieterich	4cd726c7b3	Update ROCm CI to python3.6 (#23088 ) Summary: Rehash of https://github.com/pytorch/pytorch/issues/22322 . Given that python 2.7 will be EOL'd on Jan 1, 2020 and we have models depending on python3.5+, we'd like to update the ROCm CI across the board to python3.6. This PR adds the skip tests and some semantic changes for PyTorch. Added pattern match skip for anything but the ROCm CI compared to #223222 for the python find step in the PyTorch build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23088 Differential Revision: D16448261 Pulled By: bddppq fbshipit-source-id: 69ece1a213418d9abf1444c496dce1c190ee07c8	2019-07-23 23:07:45 -07:00
Elias Ellison	91bef6c168	format sugared_value & compiler.cpp (#23283 ) Summary: there are a lot of formatting changes which makes other diffs to these PRs noisy & hard to read. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23283 Differential Revision: D16453590 Pulled By: eellison fbshipit-source-id: 97b4bf1dbbbfb09c44c57402f61ea27287060044	2019-07-23 22:29:22 -07:00
Zafar Takhirov	bc15a20db9	Minor refactor: propagating messages in TestCase Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23146 Test Plan: Imported from OSS Differential Revision: D16413801 Pulled By: zafartahirov fbshipit-source-id: 8009a7afe77e127ddd220fb71c6c272b0d44c733	2019-07-23 21:18:44 -07:00
Will Feng	8a77098247	Make Module::register_module / register_parameter / register_buffer public (#23196 ) Summary: In Python, `register_module` / `register_parameter` / `register_buffer` method in `nn.Module` is public. This PR makes those APIs public for C++ `nn::Module` as well. Closes https://github.com/pytorch/pytorch/issues/23140. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23196 Differential Revision: D16440239 Pulled By: yf225 fbshipit-source-id: e0eff6e1db592961fba891ec417dc74fa765e968	2019-07-23 21:18:41 -07:00
Zafar Takhirov	24601daa12	Adding check for a single batch in adaptive_avg_pool Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23137 Test Plan: Imported from OSS Differential Revision: D16403804 Pulled By: zafartahirov fbshipit-source-id: df79a8c768ffabeceb4c0044c967a623c5885484	2019-07-23 21:11:06 -07:00
mal	e7a9b0d62f	Rename torch::autograd::Function to torch::autograd::Node Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23269 Test Plan: Imported from OSS Differential Revision: D16454878 fbshipit-source-id: b1e840fc2d3901955280d141e5ad6efd5e9d66af	2019-07-23 20:52:22 -07:00
Vishwak Srinivasan	0ab19d66ee	Port lu_solve to ATen (#22379 ) Summary: Changelog: - Port TH implementation to ATen/native/BatchLinearAlgebra.cpp - Port THC implementation to ATen/native/cuda/BatchLinearAlgebra.cu - Remove TH/THC implementations - Update doc strings Pull Request resolved: https://github.com/pytorch/pytorch/pull/22379 Test Plan: - Added new tests in test_torch.py (port to test_cuda.py exists) Differential Revision: D16089645 Pulled By: zou3519 fbshipit-source-id: dc8561aadacacb23e80c375b4fec687df2b6bbc8	2019-07-23 19:11:35 -07:00
grisha	2197bee3da	add cudnn.cu Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23265 Differential Revision: D16453611 Pulled By: bddppq fbshipit-source-id: b49e01b6d019781097ec5072cdc6d37a2988bfbe	2019-07-23 18:09:23 -07:00
Lucian Grijincu	a936a90391	caffe2/caffe2/fb/operators/cc_amrc: drop SIMD OpenMP vectorization Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23235 Reviewed By: ajtulloch Differential Revision: D16384612 Pulled By: luciang fbshipit-source-id: a4c8257c6d3e151ba99167a152ad824b0dde7671	2019-07-23 17:25:00 -07:00
Jiatong Zhou	7ed9622fdf	Read number of workspaces from argument in recurrent_network_op (#23272 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23272 We see significant performance improvements by limiting concurrency at caffe2 level on mobile. This diff enables setting the number of caffe2 workspaces used during rnn inference. Reviewed By: akyrola Differential Revision: D16448611 fbshipit-source-id: 28abaddb4ea60bacb084ceb28cb7a4d1e67ccc17	2019-07-23 17:19:40 -07:00
BowenBao	a35136dd73	Add support for onnx tensor index export (#21716 ) Summary: Support exporting * Standard tensor indexing like ``` x = torch.ones(4, 5) ind = torch.tensor([0, 1]) return x[ind] ``` * [Advanced indexing](https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#advanced-indexing) like ``` x = torch.ones(4,5,6,7,8) ind1 = torch.tensor([0, 1]) ind2 = torch.tensor([[3], [2]]) ind3 = torch.tensor([[2, 2], [4, 5]]) return x[2:4, ind1, None, ind2, ind3, :] ``` It would be ideal if ONNX can natively support indexing in future opsets, but for opset <= 10 it will always need this kind of workarounds. There are still various limitations, such as not supporting advanced indexing with negative indices, not supporting mask indices of rank > 1, etc. My feeling is that these are less common cases that requires great effort to support using current opset, and it's better to not make the index export more cumbersome than it already is. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21716 Reviewed By: zrphercule Differential Revision: D15902199 Pulled By: houseroad fbshipit-source-id: 5f1cc687fc9f97da18732f6a2c9dfe8f6fdb34a6	2019-07-23 17:11:28 -07:00
Elias Ellison	1de44a6f54	fix specialized list from dict keys (#23267 ) Summary: Previously we weren't specializing the list returned from `dict.keys()` Pull Request resolved: https://github.com/pytorch/pytorch/pull/23267 Differential Revision: D16448512 Pulled By: eellison fbshipit-source-id: fcd2a37ac680bdf90219b099a94aa36a80f4067c	2019-07-23 17:02:19 -07:00
Alexander Sidorov	a6ccd62a81	BlackBoxPredictor OSS part 5: glow transforms Summary: Overal context: open-source BlackBoxPredictor as the entry point for inference in Caffe2 (thread safe abstraction for Caffe2 inference). This should be used in ThroughputBenchmark for the purpose of framework comparison This specific diff: There should be no harm in moving transformation code to OSS. On the advantages side we will be able to compare production Caffe2 setup with PyTorch in the most fair way via ThroughputBenchmark. This approach avoid any complicated transformation regirstries. Building those proper would be significant engineering effort as well as production risk. In the past we had SEVs related to transforms being turned off due to various refactors. Given that we don't plan to build any other significant investments into transformation logic except existing ones (like TVM and Glow), and those also relate to open-source technologies, I came up to the conclusion of moving to OSS the whole thing. Reviewed By: bertmaher Differential Revision: D16367134 fbshipit-source-id: fc6bacc1be3ff6336beb57cdad58168d3a2b8c28	2019-07-23 16:39:23 -07:00
Jiakai Liu	bdb1e1305d	exclude some caffe2 modules from libtorch mobile build (#20000 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20000 ghimport-source-id: f47773ef1c6849cd0c0e65080400416c6b370d39 Test Plan: - verified libtorch mobile library builds and links successfully; Imported from OSS Differential Revision: D15169024 Pulled By: ljk53 fbshipit-source-id: 20ac89c6e7053239c93e51f00c5c5dc3595bea74	2019-07-23 16:20:27 -07:00
Yanli Zhao	1c0309a9a9	make OMP_NUM_THREADS default in launch.py (#22501 ) Summary: per https://github.com/pytorch/pytorch/issues/22260, default number of open mp threads are spawned to be the same of number of cores available, for multi processing data parallel cases, too many threads may be spawned and could overload the CPU, resulting in performance regression. so set OMP_NUM_THREADS = number of CPU processors/number of processes in default to neither overload or waste CPU threads Pull Request resolved: https://github.com/pytorch/pytorch/pull/22501 Test Plan: 1. without and with this change, example codes result in same result python ~/local/fbsource-fbcode/fbcode/caffe2/torch/distributed/launch.py --nproc_per_node=2 pytorch/examples/yanlizhao/distributed_launch_example.py Setting OMP_NUM_THREADS environment variable for each process to be: 24, which is max(1, num_cpus / num_processes), you can further tune the variable for optimal performance in your application if needed. final loss = tensor(0.5211, device='cuda:0', grad_fn=<MseLossBackward>) Differential Revision: D16092225 Pulled By: zhaojuanmao fbshipit-source-id: b792a4c27a7ffae40e4a59e96669209c6a85e27f	2019-07-23 16:14:24 -07:00
Zafar Takhirov	058645acb1	Fusion and _intrinsic modules (#23003 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23003 torch.quantization.fuse_module and torch.nn._intrinsic convRelu and LinearRelu Fusion function to combine specific modules: (conv,bn) and (conv,bn,relu). In all cases, replace modules in place. The first module is replaced with the _intrinsic fused module and the remaining modules are replaced by nn.Identity. Support both training and eval. For training, the modules are "fused" with a sequential container. This is to allow for further module swaps for quantization aware training. Also add: torch.nn._intrinsic for convRelu and LinearRelu. TODO: Add tests for _intrinsic modules. Conv BN fusion code is based on DsKhudia's implementation Differential Revision: D16199720 fbshipit-source-id: 95fb9ffe72b361d280313b2ec57de2acd4f9dda2	2019-07-23 14:54:19 -07:00
Pavel Belevich	7b229342ca	Renamed CosineAnnealingLr to CosineAnnealingLR (#23242 ) Summary: fixing https://github.com/pytorch/pytorch/issues/23160 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23242 Differential Revision: D16443348 Pulled By: pbelevich fbshipit-source-id: af0edf4e841e04a8016c98bfee72696581f3f070	2019-07-23 14:54:15 -07:00
Levent Ertoz	8d4956fd02	hook up dropout sparse with replacement operator Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23183 Reviewed By: ffjiang Differential Revision: D16428262 fbshipit-source-id: 0d6e17d15c898629bbd2826441f2c9701a78b0bd	2019-07-23 14:34:25 -07:00
Levent Ertoz	6f01d13728	Implement dropout with replacement for id list features. (#22880 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22880 Implement sparse dropout with replacement value. Reviewed By: xianjiec Differential Revision: D16267012 fbshipit-source-id: 8c4878230f61bb3ac333291e2c6aaf2fbdc5f9ce	2019-07-23 14:34:21 -07:00
Zachary DeVito	e0f632c58b	pickler.cpp: respect __getstate__/__setstate__ Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23190 Test Plan: Imported from OSS Differential Revision: D16431553 Pulled By: zdevito fbshipit-source-id: 680ea1507c12727fd17aedb3067f522cf490e306	2019-07-23 14:27:51 -07:00
Abhinav Jauhri	bae10db522	Incorporating arguments to pull production operators and adding device type. (#23197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23197 Incorporating arguments to pull production operators and adding device type. Reviewed By: mingzhe09088 Differential Revision: D16387263 fbshipit-source-id: e20ed82225eb1e4b7ab1756ec157967b055d85bf	2019-07-23 13:43:26 -07:00
Michael Suo	d8220b0599	add simple inheritance support to AST Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23109 Test Plan: Imported from OSS Differential Revision: D16441914 Pulled By: suo fbshipit-source-id: 18a57762d376759b98c18bc160eacbcc99f78ee9	2019-07-23 12:21:27 -07:00
Michael Suo	017870a633	kill module_lookup Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23097 Test Plan: Imported from OSS Differential Revision: D16383329 Pulled By: suo fbshipit-source-id: 282f8bac2245d584b66139daf4e5ea7b2b317295	2019-07-23 12:21:23 -07:00
Michael Suo	2a37740a86	make RHS of assignment optional Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23033 Test Plan: Imported from OSS Differential Revision: D16383330 Pulled By: suo fbshipit-source-id: 63c55fae06f0cd534eb5053f91a773431ad052d4	2019-07-23 12:21:19 -07:00
Michael Suo	3be0a2b4be	Parse all stmts in class defs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23031 Test Plan: Imported from OSS Differential Revision: D16383327 Pulled By: suo fbshipit-source-id: 6485109a66e653b7f26d30b91a97af8d71594e22	2019-07-23 12:21:15 -07:00
Thomas Viehmann	0dabaad819	Add Module::replace_module to C++ api (#22546 ) Summary: This adds a replace_module method to the C++ api. This is needed to be able to replace modules. The primary use case I am aware of is to enable finetuning of models. Given that finetuning is fairly popular these days, I think it would be good to facilitate this in the C++ api as well. This has been reported by Jean-Christophe Lombardo on the [forums](https://discuss.pytorch.org/t/finetuning-a-model-on-multiple-gpu-in-c/49195). Pull Request resolved: https://github.com/pytorch/pytorch/pull/22546 Differential Revision: D16440289 Pulled By: yf225 fbshipit-source-id: c136f914b8fc5c0f1975d877ea817fda5c851cda	2019-07-23 11:50:06 -07:00
Jerry Zhang	f112c522af	LinearReLU module (#23022 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23022 will be tested in later diffs. Added LinearReLU module for qat, allows conversion from torch.nn._intrisic.LinearReLU to torch.nn._intrinsic.qat.LinearReLU Reviewed By: zafartahirov Differential Revision: D16286800 fbshipit-source-id: 84cce3551d46e649781b9b6107d4076e10e51018	2019-07-23 11:17:25 -07:00
Sebastian Messmer	192dd8faf1	Set correct list type in pybind_utils (#23188 ) Summary: - Pull Request resolved: https://github.com/pytorch/pytorch/pull/23188 ghstack-source-id: 87008828 Differential Revision: D16430911 fbshipit-source-id: 9d9d29bf42402e0fff323dfd0ed65fcfd5564fd3	2019-07-23 10:52:38 -07:00
Sebastian Messmer	19be7ece15	Fix erase_number_types test (#23181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23181 We can't run dead code elimination after erasing number types because dce relies on graph invariants that erase_number_types breaks. Reviewed By: houseroad Differential Revision: D16427819 fbshipit-source-id: d1b98a74d2558b14d4be692219691149689a93d8	2019-07-23 10:23:10 -07:00
Sebastian Messmer	e56f11b750	Fix onnx export (#23180 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23180 This pass needs to be run later because it breaks jit graph invariants and the lower_all_tuples pass still needs a valid jit graph. Reviewed By: houseroad Differential Revision: D16427680 fbshipit-source-id: 427c7e74c59a3d7d62f2855ed626cf6258107509	2019-07-23 10:23:06 -07:00
Sebastian Messmer	60afcabc6f	DictConstruct sets correct types (#23171 ) Summary: - Pull Request resolved: https://github.com/pytorch/pytorch/pull/23171 ghstack-source-id: 87009037 Differential Revision: D16423640 fbshipit-source-id: 0f4f9b12759b8a9defaae775e33e2b0af9bb7791	2019-07-23 10:23:01 -07:00
Junjie Bai	67aede98c3	Exclude unused onnx targets (#23195 ) Summary: e.g. onnxifi_dummy Pull Request resolved: https://github.com/pytorch/pytorch/pull/23195 Differential Revision: D16441493 Pulled By: bddppq fbshipit-source-id: 76816e7a7c73f60f3c7abea10fbdbf086cea0476	2019-07-23 10:22:57 -07:00
Sebastian Messmer	9d03133c14	ListConstruct sets correct element type (#23189 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23189 ghstack-source-id: 86971099 Differential Revision: D16430987 fbshipit-source-id: 9af255075b670e6f811e1a9d104f2738a38e9515	2019-07-23 10:14:35 -07:00
Sebastian Messmer	2073cc73f8	Use concrete types in jit test for generic lists (#23192 ) Summary: Creating an untyped generic list is deprecated, we always want type information to be present. This fixes test cases and removes one that used lists with ambigious types. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23192 ghstack-source-id: 86972891 Differential Revision: D16431482 fbshipit-source-id: 4ca5cd142118a3f0a4dcb8cd77383127c54abb29	2019-07-23 10:04:12 -07:00
Edward Yang	21f52ce0d4	Remove trailing semicolon from TORCH_CHECK macros. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22339 Test Plan: Imported from OSS Differential Revision: D16182743 Pulled By: ezyang fbshipit-source-id: 3c4ac0abe49ce83901bd5b07279a135857035f80	2019-07-23 09:58:50 -07:00
Edward Yang	174f7a586f	Switch from KaTeX to imgmath for documentation rendering. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23025 Test Plan: Imported from OSS Differential Revision: D16441000 Pulled By: ezyang fbshipit-source-id: c1ab557cb8163e9c69585c32d237c076582a6d73	2019-07-23 09:44:37 -07:00
Tongzhou Wang	792d527746	Fix typos in comments Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23130 Differential Revision: D16402755 Pulled By: ezyang fbshipit-source-id: 8bf9767c0012aed8ad91289bbaf2d979f130d728	2019-07-23 09:44:33 -07:00
Hong Xu	60c46dd4df	Let CMake handle NCCL detection instead of our handcrafted Python script. (#22930 ) Summary: --- How does the current code subsume all detections in the deleted `nccl.py`? - The dependency of `USE_NCCL` on the OS and `USE_CUDA` is handled as dependency options in `CMakeLists.txt`. - The main NCCL detection happens in [FindNCCL.cmake](`8377d4b32c/cmake/Modules/FindNCCL.cmake`), which is called by [nccl.cmake](`8377d4b32c/cmake/External/nccl.cmake`). When `USE_SYSTEM_NCCL` is false, the previous Python code defer the detection to `find_package(NCCL)`. The change in `nccl.cmake` retains this. - `USE_STATIC_NCCL` in the previous Python code simply changes the name of the detected library. This is done in `IF (USE_STATIC_NCCL)`. - Now we only need to look at how the lines below line 20 in `nccl.cmake` are subsumed. These lines list paths to header and library directories that NCCL headers and libraries may reside in and try to search these directories for the key header and library files in turn. These are done by `find_path` for headers and `find_library` for the library files in `FindNCCL.cmake`. * The call of [find_path](https://cmake.org/cmake/help/v3.8/command/find_path.html) (Search for `NO_DEFAULT_PATH` in the link) by default searches for headers in `<prefix>/include` for each `<prefix>` in `CMAKE_PREFIX_PATH` and `CMAKE_SYSTEM_PREFIX_PATH`. Like the Python code, this commit sets `CMAKE_PREFIX_PATH` to search for `<prefix>` in `NCCL_ROOT_DIR` and home to CUDA. `CMAKE_SYSTEM_PREFIX_PATH` includes the standard directories such as `/usr/local` and `/usr`. `NCCL_INCLUDE_DIR` is also specifically handled. * Similarly, the call of [find_library](https://cmake.org/cmake/help/v3.8/command/find_library.html) (Search for `NO_DEFAULT_PATH` in the link) by default searches for libraries in directories including `<prefix>/lib` for each `<prefix>` in `CMAKE_PREFIX_PATH` and `CMAKE_SYSTEM_PREFIX_PATH`. But it also handles the edge cases intended to be solved in the Python code more properly: - It only searches for `<prefix>/lib64` (and `<prefix>/lib32`) if it is appropriate on the system. - It only searches for `<prefix>/lib/<arch>` for the right `<arch>`, unlike the Python code searches for `lib/<arch>` in a generic way (e.g., the Python code searches for `/usr/lib/x86_64-linux-gnu` but in reality systems have `/usr/lib/x86_64-some-customized-name-linux-gnu`, see https://unix.stackexchange.com/a/226180/38242 ). --- Regarding for relevant issues: - https://github.com/pytorch/pytorch/issues/12063 and https://github.com/pytorch/pytorch/issues/2877: These are properly handled, as explained in the updated comment. - https://github.com/pytorch/pytorch/issues/2941 does not changes NCCL detection specifically for Windows (it changed CUDA detection). - b7e258f81ef61d19b884194cdbcd6c7089636d46 A versioned library detection is added, but the order is reversed: The unversioned library becomes preferred. This is because normally unversioned libraries are linked to versioned libraries and preferred by users, and local installation by users are often unversioned. Like the document of [find_library](https://cmake.org/cmake/help/v3.8/command/find_library.html) suggests: > When using this to specify names with and without a version suffix, we recommend specifying the unversioned name first so that locally-built packages can be found before those provided by distributions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22930 Differential Revision: D16440275 Pulled By: ezyang fbshipit-source-id: 11fe80743d4fe89b1ed6f96d5d996496e8ec01aa	2019-07-23 08:45:51 -07:00
Tongzhou Wang	e4b75c6580	Fix typo in dataloader.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23132 Differential Revision: D16402759 Pulled By: ezyang fbshipit-source-id: 9500570f6b7492a67a2af853bfb63a5667e6b7b5	2019-07-23 08:45:47 -07:00
Kexuan Sun	45d3f495ef	Add document of function torch.as_strided (#22842 ) Summary: Documentation of `torch.as_strided` and `Tensor.as_strided` is missing. As mentioned in https://github.com/pytorch/pytorch/issues/9886 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22842 Differential Revision: D16254106 Pulled By: soumith fbshipit-source-id: dee142483fb9ef7bea84bd44a970b6eccdcdc471	2019-07-23 06:06:00 -07:00
Soumith Chintala	c9e62f6988	Update nccl to 2.4.8-1 (#23186 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/23016 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23186 Differential Revision: D16438723 Pulled By: soumith fbshipit-source-id: ff4f5b9c7383b92e5cf2053a87caf2ac11be7aeb	2019-07-23 05:35:32 -07:00
svcscm	9a6ae5c0b1	Updating submodules Reviewed By: cdelahousse fbshipit-source-id: aab8ded966d718befb664a6e968eedc6bbe7cb5e	2019-07-22 22:47:52 -07:00
Jerry Zhang	d7448c7812	quantized conv module (#23178 ) Summary: att Pull Request resolved: https://github.com/pytorch/pytorch/pull/23178 ghstack-source-id: 86973164 Differential Revision: D16426871 fbshipit-source-id: a2ebb38997acfeb61b7dfd6b11dd8ee9b3a7a8ed	2019-07-22 20:47:40 -07:00
Jerry Zhang	f3a37278cc	ConvReLU2d module (#23008 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23008 Added ConvReLU2d module to convert from nn._intrinsic.ConvReLU2d to nn._intrinsic.qat.ConvReLU2d Differential Revision: D16286670 fbshipit-source-id: 2903d825175911c0095497369f313bf2a2eb3833	2019-07-22 20:47:36 -07:00
BowenBao	eb5137a5d1	Export torch.arange to ONNX (#22601 ) Summary: Some overlap with https://github.com/pytorch/pytorch/pull/21716 regarding caffe2 nonzero. Will rebase the other one accordingly whichever gets merged first. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22601 Reviewed By: zrphercule Differential Revision: D16224660 Pulled By: houseroad fbshipit-source-id: dbfd1b8776cb626601e0bf83b3fcca291806e653	2019-07-22 20:30:39 -07:00
Jesse Hellemn	06d11f0434	Revert D16368004: [pytorch][PR] Fix error message for a wrong fork CUDA Differential Revision: D16368004 Original commit changeset: 44b6977790ce fbshipit-source-id: c81a232bd52219e56a19c64650c4b6dedeb167cb	2019-07-22 18:46:48 -07:00
Jerry Zhang	3861520603	Verify flatten works for quantized Tensor (#23121 ) Summary: Added a test in `test_torch.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/23121 ghstack-source-id: 86983227 Differential Revision: D16391409 fbshipit-source-id: 04e72b2f753a0a6ddbf58d55b794e443b18a2156	2019-07-22 18:34:25 -07:00
Horace He	a24f6c13a3	Fix broken indexing when using None and ellipses indexing together (#22905 ) Summary: https://github.com/pytorch/pytorch/issues/20153 I believe you need 2 passes for this. Take this example ```python torch.jit.script def f(): x = torch.ones(10, 9, 8, 7, 6) return x[..., None, None].shape ``` which results in `[10, 9, 8, 7, 6, 1, 1]` vs ``` torch.jit.script def f(): x = torch.ones(10, 9, 8, 7, 6) return x[..., None, None, :].shape ``` which results in `[10, 9, 8, 7, 1, 1, 6]` After only processing `x[..., None, None` we don't know whether we should be creating a new dimension at the end of the dimension list or somewhere in the middle. What we do depends on the elements to the right of it. Thus, I do 2 passes - one to collect all the dimensions that the index operations operate on, and another that executes the index operations. This still doesn't work for an ellipse index followed by a tensor index, but it wasn't working previously either. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22905 Differential Revision: D16433558 Pulled By: Chillee fbshipit-source-id: c1b303cb97b1af8b6e405bad33495ef3b4c27c4a	2019-07-22 18:11:23 -07:00
Pradeep Dorairaj	648f10be16	Fix load op to return the shape info as before when loading multiple blobs (#23182 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23182 This fixes the issue seen in D16390551 Changing the load op to take in shapes vector needs changes in lots of places (almost all usages of load op). Instead this is a small and safe change where the behavior is unchanged if we are loading multiple blobs and when loading a single blob without shape information. If you are loading just one blob and the shape information is provided, then this returns the right shape info back. For all other cases, behavior is unchanged as before we introduced the issue. This fixes the issue reported by Andrey in D16229465 Reviewed By: boryiingsu Differential Revision: D16428140 fbshipit-source-id: 8ef6705ab2efb346819489e1f166e23269f7ef8a	2019-07-22 15:53:40 -07:00
Jerry Zhang	1c574458b0	nn_quantized test (#23169 ) Summary: - scale/zero_point in quantized modules should be Tensor - fix conv module permutation API Pull Request resolved: https://github.com/pytorch/pytorch/pull/23169 ghstack-source-id: 86956383 Reviewed By: zafartahirov Differential Revision: D16423570 fbshipit-source-id: d29498e07bdd8f71a33b4e16e089f80847bbca6d	2019-07-22 15:53:36 -07:00
Jesse Hellemn	e08f8f45ff	Turning on fbgemm for nightlies (#22784 ) Summary: fbgemm requires a AVX512 which requires a more recent compiler, so this also switches all the nightlies from devtoolset3 to devtoolset7. Since CUDA 9.0 doesn't support devtoolset7, we also switch from CUDA 9.0 to CUDA 9.2 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22784 Differential Revision: D16428165 Pulled By: pjh5 fbshipit-source-id: c1af3729d8edce88a96fa9069d4c5a1808c25f99	2019-07-22 15:09:11 -07:00
Guanheng Zhang	a6e45a69a8	Fix error message for a wrong fork CUDA (#23030 ) Summary: Fix https://github.com/pytorch/pytorch/issues/17357 Unblock 1.2 release. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23030 Differential Revision: D16368004 Pulled By: zhangguanheng66 fbshipit-source-id: 44b6977790ce768efa4777bae41d4b26dae5f288	2019-07-22 15:04:32 -07:00
Kevin Wilfong	3ca7c0ffdb	Add get_accessed_features function to ModelLayer class (#23036 ) Summary: We need a way to figure get a complete list fo features that are used in training a model. One way to do this is to make it possible to get the list of features used in each Model Layer. Then once the model is complete we can go through the layers and aggregate the features. I've introduced a function to expose that information here, get_accessed_features, and implemented it in the FeatureSparseToDense layer to start with. I've tried to include the minimum amount of information to make this useful, while making it easy to integrate into the variety of model layers. This is, for example, why AccessedFeatures does not contain feature_names which is not always present in a model layer. I debated whether or not to include feature_type, but I think that's useful enough, and easy enough to figure out in a model layer, that it's worth including. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23036 Test Plan: Added a unit test to verify the behavior of get_accessed_features in FeatureSparseToDense. aml_dper2-fblearner-flow-integration-tests failed due to a known issue D16355865 aml_dper3-fblearner-flow-integration-tests failed due to a known issue T47197113 I verified no tests in the integration tests failed to issues other than those known ones. DPER2 canaries: https://fburl.com/fblearner/1217voga Reviewed By: volkhin Differential Revision: D16365380 Pulled By: kevinwilfong fbshipit-source-id: 2dbb4d832628180336533f29f7d917cbad171950	2019-07-22 15:04:28 -07:00
Edward Yang	ff23a02ac4	Pin numba to 0.44.0 to fix Windows CI. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23176 Test Plan: Imported from OSS Differential Revision: D16426873 Pulled By: ezyang fbshipit-source-id: 10d800db78416137504c396711dc45109f6f5ca4	2019-07-22 14:59:15 -07:00
vishwakftw	b6d06d5496	Remove empty THCThreadLocal{.h/.cpp} (#23157 ) Summary: These files were removed from the build process and cleaned in https://github.com/pytorch/pytorch/pull/9735. Closes https://github.com/pytorch/pytorch/issues/22572 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23157 Differential Revision: D16426819 Pulled By: soumith fbshipit-source-id: aa01aec9fe0e3af456ba8b75ae85d0b1df2a8ed9	2019-07-22 14:59:11 -07:00
Edward Yang	fdfc676eb6	Invert ownership between PyFunction and THPFunction. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22983 Test Plan: Imported from OSS Differential Revision: D16422209 Pulled By: ezyang fbshipit-source-id: d6e41a1606484fbbd7a95a547b83a4199151be68	2019-07-22 14:13:14 -07:00
Shen Li	ae5b52086e	Support converting Python number to IValue in pybind_utils.h (#22817 ) Summary: I ran into the following error when trying to pass a Python int as an arg to `torch::jit::createStackForSchema`, and I think it is due to the missing support for `NumberType` in [toIValue](https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/pybind_utils.h#L448). > RuntimeError: Missing cases in toIValue for type: Scalar! File a bug report. (toIValue at ../torch/csrc/jit/pybind_utils.h:449) Pull Request resolved: https://github.com/pytorch/pytorch/pull/22817 Differential Revision: D16276006 Pulled By: mrshenli fbshipit-source-id: 7f63519bb37219445e836ec1f51ca4f98bf52c44	2019-07-22 14:01:30 -07:00
Alexander Sidorov	2becbd3faa	BlackBoxPredictor OSS part 4: Open-source other transforms (#23099 ) Summary: Overal context: open-source BlackBoxPredictor as the entry point for inference in Caffe2 (thread safe abstraction for Caffe2 inference). This should be used in ThroughputBenchmark for the purpose of framework comparison This specific diff: There should be no harm in moving transformation code to OSS. On the advantages side we will be able to compare production Caffe2 setup with PyTorch in the most fair way via ThroughputBenchmark. This approach avoid any complicated transformation regirstries. Building those proper would be significant engineering effort as well as production risk. In the past we had SEVs related to transforms being turned off due to various refactors. Given that we don't plan to build any other significant investments into transformation logic except existing ones (like TVM and Glow), and those also relate to open-source technologies, I came up to the conclusion of moving to OSS the whole thing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23099 Test Plan: salex@devvm4218:caffe2 { (fcdaf96\|HISTEDIT)}$ submit_canary --q tw_adindexer_canary_on_canary_tier && submit_canary --q tw_adfinder_canary_on_canary_tier && submit_canary prospector_repl ay_canary /proc/self/fd/4/urllib3/connectionpool.py:851: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings Patch Phabricator Link: differential/diff/86851419/ Submit job request to the thrift service https://our.intern.facebook.com/intern/ads/canary/419717789681292057 DONE Everpaste link: https://our.intern.facebook.com/intern/everpaste/?color=0&handle=GBYe_ANnNNBnbWsDAAAAAABJPvJBbjEQAAAz /proc/self/fd/4/urllib3/connectionpool.py:851: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings Patch Phabricator Link: differential/diff/86851536/ Submit job request to the thrift service https://our.intern.facebook.com/intern/ads/canary/419717806884923980 DONE Everpaste link: https://our.intern.facebook.com/intern/everpaste/?color=0&handle=GArl_QPncP7tc30IAAAAAACfza93bjEQAAAz /proc/self/fd/4/urllib3/connectionpool.py:851: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings Patch Phabricator Link: differential/diff/86851661/ Submit job request to the thrift service https://our.intern.facebook.com/intern/ads/canary/419717823090263325 DONE Everpaste link: https://our.intern.facebook.com/intern/everpaste/?color=0&handle=GNcyAwRrfFd0MIUIAAAAAABLOINibjEQAAAz Differential Revision: D16288332 Pulled By: salexspb fbshipit-source-id: 95899dede6b11a2ae14703b9aaea8e1a677f0aaa	2019-07-22 13:53:43 -07:00
Spandan Tiwari	27031dccb2	Updating producer_version in exported ONNX models to pytorch 1.2. (#23120 ) Summary: Bumping up the producer_version in exported ONNX models in view of the next release. Updating tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23120 Reviewed By: zrphercule Differential Revision: D16420917 Pulled By: houseroad fbshipit-source-id: 6686b10523c102e924ecaf96fd3231240b4219a9	2019-07-22 13:45:39 -07:00
Horace He	7e31c02afe	Fixed deprecated use of yaml.load Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22985 Test Plan: Imported from OSS Differential Revision: D16425112 Pulled By: Chillee fbshipit-source-id: ef0c764c3fd2518b9284d9a20e84d677ebd8f277	2019-07-22 13:25:27 -07:00
Richard Zou	76291829ba	Refactor named inference rule for reductions Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23075 Test Plan: Imported from OSS Differential Revision: D16419173 Pulled By: zou3519 fbshipit-source-id: 187639b563336f935e5f06351dd0b680de1aadfd	2019-07-22 13:12:03 -07:00
Richard Zou	b4b51ed5ec	Implement tensor.size(Dimname), tensor.stride(Dimname) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22989 Test Plan: Imported from OSS Differential Revision: D16364437 Pulled By: zou3519 fbshipit-source-id: 393a93fecac27b5d3b1a7f7692590d8fd5e95a5d	2019-07-22 13:11:59 -07:00
Pavel Belevich	965b97f5f0	Bidirectional GRU and LSTM C++ API forward fix (#22850 ) Summary: Fixing https://github.com/pytorch/pytorch/issues/17998 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22850 Differential Revision: D16420854 Pulled By: pbelevich fbshipit-source-id: 76f38be40d8479fb9cafba92939cea61d81fd336	2019-07-22 12:59:47 -07:00
Edward Yang	e5797e9350	Revert D16390551: Fix load op to return the shape info as before when loading multiple blobs Differential Revision: D16390551 Original commit changeset: 1055b481a7a9 fbshipit-source-id: ea50a71e3d446a74bd04d9945710cc4ccee63c87	2019-07-22 12:48:14 -07:00
davidriazati	fcdfc35d1c	Support get/setstate with no args (#23119 ) Summary: `pickle` supports this and a lot of the quantized use cases for get/set state follow this pattern Pull Request resolved: https://github.com/pytorch/pytorch/pull/23119 Pulled By: driazati Differential Revision: D16391234 fbshipit-source-id: 9f63e0a1679daa61b17aa64b5995e2be23b07b50	2019-07-22 12:32:29 -07:00
Orion Reblitz-Richardson	858d4a6a04	Cleanup API and remove 'experimental' warning (#23000 ) Summary: This fixes ASAN test issues with https://github.com/pytorch/pytorch/pull/21786 seen at https://circleci.com/api/v1.1/project/github/pytorch/pytorch/2212325/output/105/0?file=true and lands it again. This cleans up the `torch.utils.tensorboard` API to remove all kwargs usage (which isn't clear to the user) and removes the "experimental" warning in prep for our 1.2 release. We also don't need the additional PyTorch version checks now that we are in the codebase itself. cc yf225, lanpa, natalialunova Pull Request resolved: https://github.com/pytorch/pytorch/pull/23000 Reviewed By: sanekmelnikov Differential Revision: D16349734 Pulled By: orionr fbshipit-source-id: 604a9cad56868a55e08b509a0c6f42b84f68de95	2019-07-22 12:10:05 -07:00
davidriazati	fad3031b5c	Fix type hints for None constants (#23029 ) Summary: The type hint was being ignored when emitting `None` constants, this also de-dups some testing code ](https://our.intern.facebook.com/intern/diff/16364572/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/23029 Pulled By: driazati Differential Revision: D16364572 fbshipit-source-id: 64f3abd3e37ee49c209480a85ed4f1b8802e5d93	2019-07-22 11:55:05 -07:00
davidriazati	2891784a72	Resolve with closed over variables instead of stack frame (#22270 ) Summary: Previously we looked at the stack frame of the function that called `script` to resolve variables. This doesn't work if someone calls script with a function defined somewhere else that references captured variables. We already have a mechanism to look at the closed over variables for a function, so this changes the `rcb` to use that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22270 Pulled By: driazati Differential Revision: D16391346 fbshipit-source-id: ad9b314ae86c249251b106079e76a5d7cf6c04c2	2019-07-22 11:44:36 -07:00
Pradeep Dorairaj	fd90b967b2	Fix load op to return the shape info as before when loading multiple blobs (#23166 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23166 Changing the load op to take in shapes vector needs changes in lots of places (almost all usages of load op). Instead this is a small and safe change where the behavior is unchanged if we are loading multiple blobs and when loading a single blob without shape information. If you are loading just one blob and the shape information is provided, then this returns the right shape info back. For all other cases, behavior is unchanged as before we introduced the issue. This fixes the issue reported by Andrey in D16229465 Reviewed By: boryiingsu Differential Revision: D16390551 fbshipit-source-id: 1055b481a7a9e83021209e59f38a7cc0b49003cf	2019-07-22 11:27:59 -07:00
Kimish Patel	82db5dceb6	Added running via throughput benchmark options. (#23077 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23077 Although the difference between running from python and this is not much if we have forward method's loop long enough (like 1000 in this case). Reviewed By: mingzhe09088 Differential Revision: D16122343 fbshipit-source-id: 5c1d1b98ae82c996baf9d42bcd04995e2ba60c78	2019-07-22 11:27:55 -07:00
Kimish Patel	2ba516d5b6	Added add op framework overhead benchmark for C2 (#23078 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23078 C2 benchmark. Reviewed By: mingzhe09088 Differential Revision: D16122337 fbshipit-source-id: bf56e60c6e60eda2be2938d9f613708a4bc1669a	2019-07-22 11:27:50 -07:00
Kimish Patel	0621068cdc	Add simple add op based framework overhead benchmark. (#23076 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23076 Tracing based and non tracing based added Reviewed By: mingzhe09088 Differential Revision: D16097280 fbshipit-source-id: 3a137092f7ccc3dd2d29d95e10178ec89d3ce892	2019-07-22 11:27:45 -07:00
Jerry Zhang	4223e2f9e9	fix qat tests (#23124 ) Summary: missed instantiating observers in linear Pull Request resolved: https://github.com/pytorch/pytorch/pull/23124 ghstack-source-id: 86886705 Reviewed By: raghuramank100 Differential Revision: D16401066 fbshipit-source-id: f9f0f359caeca855c62192d13261a33eef57715a	2019-07-22 10:28:35 -07:00
Jeff Daily	8bc28cc898	Remove cuda free mutex (#23040 ) Summary: Revision of https://github.com/pytorch/pytorch/issues/22173 to address CI failure after merging. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23040 Differential Revision: D16366872 Pulled By: mrshenli fbshipit-source-id: 747b6ecf2dc195c25f82b8f732ae9ff52cd3a394	2019-07-22 07:58:29 -07:00
Iurii Zdebskyi	22f7c9e31b	(#23105 ) Summary: Fixed a [bug](https://github.com/pytorch/pytorch/issues/22992) where passing result tensor into masked_select wont work with bool mask. Tested via unit tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23105 Differential Revision: D16386676 Pulled By: izdeby fbshipit-source-id: 93a1e9bfbc916c8a8eaa149a70a5553f3711f53e	2019-07-22 07:49:30 -07:00
Richard Zou	aeee49d51d	Revert "Temporarily skip mypy-0.720 to unbreak master type checks" Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23095 Test Plan: Imported from OSS Differential Revision: D16383149 Pulled By: zou3519 fbshipit-source-id: ca6bdfe0f51f6bdbd4d95142a880f3902f60676d	2019-07-22 06:54:22 -07:00
rohithkrn	b8c8977be7	Update ScatterWeightedSum Op (#23087 ) Summary: Update ScatterWeightedSum op when there exists only one weighted X to update slice of Y which is usually the case when the op is used for gradient update. The changes remove the copy overhead and seeing significant operator performance improvement - 25 - 50% improvment on CUDA based on input configuration - ~50% improvement on ROCm Pull Request resolved: https://github.com/pytorch/pytorch/pull/23087 Differential Revision: D16385194 Pulled By: bddppq fbshipit-source-id: 3189e892940fb9c26305269eb0d47479b9b71af0	2019-07-21 22:21:40 -07:00
Thomas Viehmann	ff8cb9f622	hipify: do not overwrite files that stay the same (#23112 ) Summary: This is a small patch to not overwrite unchanged files to help a bit with building. It is not as incremental as one might like, given that one has to pass `--out-of-place-only` to not run into the patching and things. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23112 Differential Revision: D16402623 Pulled By: bddppq fbshipit-source-id: 531ce0078bc716ae31bd92c5248080ef02a065b9	2019-07-21 22:00:53 -07:00
jerry73204	2ac9abf759	Fix memory leak in Adam, Adagrad, RMSProp (#23125 ) Summary: As reported in LaurentMazare/tch-rs#76, the memory grows when weight_decay is present when using Adam. It applies the same fix in https://github.com/pytorch/pytorch/issues/23007 to Adam, Adagrad and RMSProp. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23125 Differential Revision: D16402421 Pulled By: soumith fbshipit-source-id: 59eb4bd81b8bd9e1a5f7c068ed841f70a4c38a80	2019-07-21 10:06:18 -07:00
Yan Zhu	96b6797fc0	improve enforce in cross_entroy_op (#23062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23062 as title Reviewed By: xianjiec, BIT-silence Differential Revision: D16374601 fbshipit-source-id: 62219c6abde311ebc8a0e6a03cfb517d80bb52b5	2019-07-21 00:07:58 -07:00
Zafar Takhirov	3e66385002	Lint fix Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23135 Test Plan: Imported from OSS Differential Revision: D16403272 Pulled By: zafartahirov fbshipit-source-id: 31f9eb11216c494a8327bcb5dc37e47a77611e2b	2019-07-20 21:46:18 -07:00
Zafar Takhirov	963707c5ea	MaxPool2d in the torch (#22765 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22765 the pooling signature is the same as the non-quantized one. Adding it to the native_functions.yaml Reviewed By: jerryzh168 Differential Revision: D16102608 fbshipit-source-id: 7627ad8f02a231f488b74d1a245b853f89d9c419	2019-07-20 21:41:09 -07:00
Zafar Takhirov	cf3e6478ad	Concat with out (#22408 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22408 Quantized Concatenation with out argument Reviewed By: jianyuh Differential Revision: D16061526 fbshipit-source-id: 61487cf87763665df19feb8e678da72fd66e8740	2019-07-20 16:13:14 -07:00
Nikolay Korovaiko	05f088ec22	make jit logging visible, so it can be used in a TVM compiler Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23041 Differential Revision: D16402934 Pulled By: Krovatkin fbshipit-source-id: 715f9821809527e94bd7f01f1680db046c888e6c	2019-07-20 14:37:49 -07:00
Karl Ostmo	bb9119f67d	Use set -x to help investigate doc push errors (#23111 ) Summary: I couldn't find any verbosity options in the [`docker pull` command docs](https://docs.docker.com/engine/reference/commandline/pull/), but `docker pull` [got a `--quiet` option](https://github.com/docker/cli/pull/882) in a recent version (not sure if we're using that version), and `--quiet` for `docker push` [is forthcoming](https://github.com/docker/cli/pull/1221). Pull Request resolved: https://github.com/pytorch/pytorch/pull/23111 Differential Revision: D16402993 Pulled By: kostmo fbshipit-source-id: 52f77b11b839d28f8cf1ecb58518ca69632d7fbe	2019-07-20 12:36:05 -07:00
Hong Xu	a62c687445	Remove unused atomics detection code. (#23089 ) Summary: USE_{C11,MSC,GCC}_ATOMICS are not used in PyTorch or submodules. Now we remove their underlying detection code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23089 Differential Revision: D16402750 Pulled By: ezyang fbshipit-source-id: fde84b958eb0b5b4d3f0406acefa92ab30ea43be	2019-07-20 10:52:53 -07:00
ngimel	4e5f70089f	fix indexing for more than 65535 elems in non-indexed first dim (#23123 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/22843, also adds test from https://github.com/pytorch/pytorch/issues/23102 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23123 Differential Revision: D16402422 Pulled By: soumith fbshipit-source-id: aa7a79159ed947be03ce3725ec8abcf5246a60bf	2019-07-20 06:17:43 -07:00
Jerry Zhang	6791f395f9	support at::view and at::reshape for quantized tensor (#23046 ) Summary: att Pull Request resolved: https://github.com/pytorch/pytorch/pull/23046 ghstack-source-id: 86840501 Differential Revision: D16368897 fbshipit-source-id: 9da232c11f21af5f850cd9545e56996a81791d00	2019-07-19 23:34:04 -07:00
Jerry Zhang	a03205ed66	Move THTensor_compute_stride to ATen (#23045 ) Summary: att Pull Request resolved: https://github.com/pytorch/pytorch/pull/23045 ghstack-source-id: 86842517 Differential Revision: D16368860 fbshipit-source-id: 8970a73758afadbc9a6a3e263cdcfe5e2fd9cc0d	2019-07-19 23:14:11 -07:00
Jerry Zhang	0d8324b18a	Add fused modules in nn._intrinsic (#23085 ) Summary: Using nn.Sequential to represent fused modules Pull Request resolved: https://github.com/pytorch/pytorch/pull/23085 ghstack-source-id: 86883096 Differential Revision: D16379521 fbshipit-source-id: 57d67cb947de8665bd758848595a4a000366153a	2019-07-19 23:04:25 -07:00
Zafar Takhirov	47af41fe72	Quantized concatenation (+fused relu). (#21749 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21749 This is the first version without "requantization" Reviewed By: jerryzh168 Differential Revision: D15807940 fbshipit-source-id: 19bb0482abed8ed9d1521a3fa1f15bda8e6a6a7c	2019-07-19 22:23:41 -07:00
Zafar Takhirov	9f4df63c2c	Moving np function to test area Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23118 Test Plan: Imported from OSS Differential Revision: D16400634 Pulled By: zafartahirov fbshipit-source-id: 44872fdf64b20a6b67e5176042fe58c8c2359738	2019-07-19 22:11:21 -07:00
Jerry Zhang	77353636de	Conv module (#23084 ) Summary: Added Conv module for qat Pull Request resolved: https://github.com/pytorch/pytorch/pull/23084 ghstack-source-id: 86862445 Differential Revision: D16379417 fbshipit-source-id: 742cc8b8e0f132070ca4943a1c2e3db60c2b5bdc	2019-07-19 18:49:52 -07:00
Yinghai Lu	b964bdb53a	Fbgemm fp16 tensor support (#23101 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23101 Support for - Shape inference - Tensor info extraction Reviewed By: zrphercule Differential Revision: D16345251 fbshipit-source-id: 53ef674b5b1581e6267e6d2070e34355280dae79	2019-07-19 17:08:03 -07:00
Yinghai Lu	2a8d5a132c	Fix workspace destruction ordering (#23096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23096 nets can have states that depends on the rest of the state in the Workspace. Hence, they should be destructed first. Reviewed By: ajyu Differential Revision: D16382987 fbshipit-source-id: 3fd030ba206e2d0e897abb9e31c95bdaeb9482b7	2019-07-19 16:49:50 -07:00
davidriazati	79c4f83fbe	Include module names in recursive error stacks (#22921 ) Summary: Following on to #22280, this adds module names so they're included in the call stacks of an error message (e.g. so it appears as `M.forward` instead of `forward`) ](https://our.intern.facebook.com/intern/diff/16287925/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/22921 Pulled By: driazati Differential Revision: D16287925 fbshipit-source-id: 6f31d72caa87ba2dc527805d36f7d62eb94c0808	2019-07-19 16:09:14 -07:00
Jerry Zhang	7cc029cb75	Quantization aware training in eager mode (#23082 ) Summary: Add support for quantization aware training in eager mode Modifications to Post training flow: ## Prepare * Fusion: e.g. (Conv, Bn) → ConvBn (float) * Swapping: To insert fake_quant to weight, we need to swap the float modules that has weight with different qat modules, e.g. Conv → torch.nn.qat.Conv , ConvBn → torch.nn._intrinsic.qat.ConvBn ``` * previously we were thinking about modify the weight in forward_pre hook and change it back in forward_hook: * def forward_pre_hook(self, input): self.float_weight = self.weight self.weight = self.fake_quantize(self.float_weight) def forward_hook(self, input): self.weight = self.float_weight ``` * Assignments to self.weight are needed because we can’t change forward function and in forward function they are using self.weight. * But we will need to keep two copies of weight in this case, so it’s probably better to just swap the module * So we want to just swap Conv to torch.nn.qat.Conv and Linear to torch.nn.qat.Linear * qat modules will have fake_quant for output and weights inserted in forward function ## Convert * flow should be identical to ptq, but the swapping dictionary is slightly different since modules are changed in prepare step. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23082 ghstack-source-id: 86824650 Differential Revision: D16379374 fbshipit-source-id: 7d16d1acd87025065a24942ff92abf18e9fc8070	2019-07-19 14:57:25 -07:00
Zachary DeVito	c09e92255c	Add initial support for serializing classes Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22953 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D16340214 Pulled By: zdevito fbshipit-source-id: 70fb1968eca34e14492e0d2be52e28b27813f821	2019-07-19 14:51:59 -07:00
Alexander Sidorov	6334edc2d8	BlackBoxPredictor OSS: open-source NQL and custom transforms (#22877 ) Summary: Overal context: open-source BlackBoxPredictor as the entry point for inference in Caffe2 (thread safe abstraction for Caffe2 inference). This should be used in ThroughputBenchmark for the purpose of framework comparison This specific diff: There should be no harm in moving transformation code to OSS. On the advantages side we will be able to compare production Caffe2 setup with PyTorch in the most fair way via ThroughputBenchmark. This approach avoid any complicated transformation regirstries. Building those proper would be significant engineering effort as well as production risk. In the past we had SEVs related to transforms being turned off due to various refactors. Given that we don't plan to build any other significant investments into transformation logic except existing ones (like TVM and Glow), and those also relate to open-source technologies, I came up to the conclusion of moving to OSS the whole thing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22877 Test Plan: did a bunch of unit tests locally and now waitforsandcaslte AdFinder canary: https://our.intern.facebook.com/intern/ads/canary/419623727275650390 adindexer: https://our.intern.facebook.com/intern/ads/canary/419623750891549182 prospector: https://our.intern.facebook.com/intern/ads/canary/419644899887610977 https://our.intern.facebook.com/intern/ads/canary/419645123742738405 Differential Revision: D16267765 Pulled By: salexspb fbshipit-source-id: 776a1cd5415e0695eae28254b3f155e7a9bd8c2b	2019-07-19 14:37:56 -07:00
Elias Ellison	f2f3e8ad8c	fix overspecializing constants in compilation (#22816 ) Summary: When we specialize the tensor type of constants in compilation it causes all sorts of problems. Fix for https://github.com/pytorch/pytorch/issues/22809 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22816 Differential Revision: D16384094 Pulled By: eellison fbshipit-source-id: f33c00d92d87108749d09bf037a6e74c5d9adaa2	2019-07-19 14:19:49 -07:00
Jesse Hellemn	a302821c5d	Adding more binary documentation Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23093 Differential Revision: D16384838 Pulled By: pjh5 fbshipit-source-id: 0ce91c2f3f0ec8f5c026622f27039b36c42a81d4	2019-07-19 14:06:34 -07:00
Jie	a28ffaf350	(#22827 ) Summary: 1. Fix out of range memory access for reduction on all dimensions for non-packed tensor. 2. Enabling launch config that maps block width to reduction on fastest striding dimension. This mapping was previously only active when reducing on fastest striding dimension of packed tensor, which is not necessary. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22827 Differential Revision: D16271897 Pulled By: zdevito fbshipit-source-id: 20763f6cf9a58e44ffc0e7ec27724dfec8fe2c5d	2019-07-19 13:38:17 -07:00
Orion Reblitz-Richardson	818828e8a8	Only import PIL when needed (#23023 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/22389 In most cases we only import `PIL` methods when we need them, but we missed a spot. cc lanpa natalialunova sanekmelnikov Pull Request resolved: https://github.com/pytorch/pytorch/pull/23023 Reviewed By: sanekmelnikov Differential Revision: D16373492 Pulled By: orionr fbshipit-source-id: b08bf8a9b5a861390eadf62eda21ac055777180f	2019-07-19 13:30:43 -07:00
mal	44493a623e	Pass variable_list of inputs to _wrap_outputs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23037 Test Plan: Imported from OSS Differential Revision: D16380071 fbshipit-source-id: ae3333c02ef8a3c09b95bec7b8e92ce649553615	2019-07-19 12:31:23 -07:00
Elias Ellison	2ee0f0bc3a	add break continue to docs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23091 Differential Revision: D16382604 Pulled By: eellison fbshipit-source-id: 47432d844c811ecd87ad97155e835b07ae8056cc	2019-07-19 12:17:00 -07:00
vishwakftw	6dfecc7e01	Remove deprecated linear algebra functions (and methods) (#22841 ) Summary: Changelog: - Removed the following linear algebra functions in PyTorch in favor of the renamed operations - `btrifact` (use `lu` instead) - `btrifact_with_info` (use `lu` with `get_infos=True` instead) - `btrisolve` (use `lu_solve` instead) - `btriunpack` (use `lu_unpack` instead) - `gesv` (use `solve` instead) - `pstrf` (use `cholesky` instead) - `potrf` (use `cholesky` instead) - `potri` (use `cholesky_inverse` instead) - `potrs` (use `cholesky_solve` instead) - `trtrs` (use `triangular_solve` instead) - Removed dead code after the removal of `pstrf` Pull Request resolved: https://github.com/pytorch/pytorch/pull/22841 Test Plan: - All existing tests should pass to verify that the removal is clean Closes https://github.com/pytorch/pytorch/issues/22832 Differential Revision: D16346184 Pulled By: zou3519 fbshipit-source-id: f748d16ed7609c028de6adcbc28684d5a1af0678	2019-07-19 11:43:06 -07:00
Hong Xu	61a683c212	Delete aten/src/ATen/out.txt (#23050 ) Summary: An file introduced in 5c0e0589509540fc991a88ffc48e96cc76fd799d , probably by mistake Pull Request resolved: https://github.com/pytorch/pytorch/pull/23050 Differential Revision: D16379947 Pulled By: ezyang fbshipit-source-id: b7fa8995028e180603d7830b6f170a7a57310385	2019-07-19 10:27:59 -07:00
Morgan Funtowicz	5417ddbdae	Fix get_all_math_dtypes for device='cuda' retuning None (#23028 ) Summary: This PR fixes the invalid None return when calling get_all_math_dtype(device='cuda'). Issue came from the __append__ method which doesn't have any return value used in `return dtypes.append(...)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/23028 Differential Revision: D16362732 Pulled By: colesbury fbshipit-source-id: 0bbc30a0c663749d768159f1bc37b99f7263297b	2019-07-19 09:29:16 -07:00
Soumith Chintala	84c2c89e2c	Revert D16199356: [qat] Quantization aware training in eager mode Differential Revision: D16199356 Original commit changeset: 62aeaf47c12c fbshipit-source-id: d06a96b0a617ae38029ffb246173ec065454b666	2019-07-19 03:18:48 -07:00
Soumith Chintala	f19aa12ae5	Revert D16274792: [qat] Conv module Differential Revision: D16274792 Original commit changeset: 1da10194123b fbshipit-source-id: 71b34774b463f2350289bd39b8cfd798e095ffa5	2019-07-19 03:18:45 -07:00
Soumith Chintala	c362e72d4a	Revert D16349133: [quant] Add fused modules in nn._intrinsic Differential Revision: D16349133 Original commit changeset: 04d862ac4a0d fbshipit-source-id: d96d9d98e9b29fddf93d4106621752abb00947eb	2019-07-19 03:18:41 -07:00
Soumith Chintala	2401a05aae	Revert D16373996: [fix] conv module missing return Differential Revision: D16373996 Original commit changeset: 1ec85d23c9dd fbshipit-source-id: e507db59405aa240d20f132c3d6df323b241a542	2019-07-19 03:06:39 -07:00
Mingfei Ma	25f0dc3490	BERT CPU performance optimization: use mkldnn for nn.Linear() when input is dense layout (#21851 ) Summary: This PR aims at improving BERT performance on CPU by using `mkldnn` inner product for `nn.Linear()`. The current logic is to use `mkldnn` only when `input` tensor is of mkldnn layout. This PR loosens this condition, `mkldnn` will be used for `nn.Linear()` when `input` tensor is of dense layout. The aten tensor is viewed inplace in `mkldnn` without additional memory copy. 1. when `input.dim() >= 3` , it is viewed as 2d tensor. e.g. `[T, N, C]` is treated as `[TN, C]`; 2. when `input` is not contiguous, it is copied so as to be contiguous. `mkldnn` inner product can't handle non-contiguous memory. With this PR, BERT on `glue/MRPC` inference (batch size = 1) on Xeon 6148 single socket (20 cores@2.5GHz) improves by `44%`: 1. before (unit: iterations/sec): ```bash 408/408 [00:24<00:00, 16.69it/s] ``` 2. after (unit: iterations/sec): ```bash 408/408 [00:16<00:00, 24.06it/s] ``` The latency reduces from `59.92 ms` to `41.56ms` correspondingly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21851 Differential Revision: D16056334 Pulled By: dzhulgakov fbshipit-source-id: 9b70ed58323b5e2f3f4e3ebacc766a74a8b68a8a	2019-07-19 00:54:29 -07:00
Oren Amsalem	12ac9171db	fix error message Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22982 Differential Revision: D16356464 Pulled By: soumith fbshipit-source-id: 3ddd5de4cf5c000dcf5b2faed39283dc715cba25	2019-07-18 23:38:55 -07:00
Jerry Zhang	cdfdeb74af	conv module missing return (#23058 ) Summary: att Pull Request resolved: https://github.com/pytorch/pytorch/pull/23058 ghstack-source-id: 86807313 Reviewed By: jianyuh Differential Revision: D16373996 fbshipit-source-id: 1ec85d23c9ddd9975bc32f6c5d30cde04eb1109e	2019-07-18 22:24:56 -07:00
Nikolay Korovaiko	6601978012	Use ProfiledTensorType in peephole.cpp Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22767 Differential Revision: D16342954 Pulled By: Krovatkin fbshipit-source-id: a577ea942ff4bab6ae15f14d6ba04a68675c70aa	2019-07-18 21:49:45 -07:00
svcscm	d153b0b58b	Updating submodules Reviewed By: yns88 fbshipit-source-id: 87bb7a817dea65783436d6d6dfbbd492724d20a7	2019-07-18 20:43:55 -07:00
Ilia Cherniavskii	23badc60f3	Fix TBB build for older versions of cmake Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23038 Test Plan: with-proxy pip install --upgrade cmake==3.11.0 python setup.py clean USE_CUDA=0 PARALLEL_BACKEND=NATIVE USE_OPENMP=0 USE_TBB=1 MKL_THREADING=TBB BLAS=MKL USE_MKLDNN=1 MKLDNN_THREADING=TBB BUILD_BINARY=1 python setup.py develop --cmake with-proxy pip install --upgrade cmake==3.13.3 python setup.py clean USE_CUDA=0 PARALLEL_BACKEND=NATIVE USE_OPENMP=0 USE_TBB=1 MKL_THREADING=TBB BLAS=MKL USE_MKLDNN=1 MKLDNN_THREADING=TBB BUILD_BINARY=1 python setup.py develop --cmake with-proxy pip install --upgrade cmake==3.6.3 python setup.py clean USE_CUDA=0 PARALLEL_BACKEND=NATIVE USE_OPENMP=0 USE_TBB=1 MKL_THREADING=TBB BLAS=MKL USE_MKLDNN=1 MKLDNN_THREADING=TBB BUILD_BINARY=1 python setup.py develop --cmake Imported from OSS Differential Revision: D16365699 Pulled By: ilia-cher fbshipit-source-id: cbf779dff63e4e186d9b4c2fc21539a24ce0d5a2	2019-07-18 20:12:26 -07:00
Jerry Zhang	e57b682abf	Add fused modules in nn._intrinsic (#22999 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22999 Using nn.Sequential to represent fused modules Reviewed By: zafartahirov Differential Revision: D16349133 fbshipit-source-id: 04d862ac4a0d20e83dc9d6de6b7d0d0c26bdedfd	2019-07-18 18:58:11 -07:00
Jerry Zhang	12d9d768b8	Conv module (#22899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22899 Added Conv module for qat Reviewed By: zafartahirov Differential Revision: D16274792 fbshipit-source-id: 1da10194123b2759a6a35c60d1c2d2c0b569ccdc	2019-07-18 18:58:07 -07:00
Jerry Zhang	65ef671d11	Quantization aware training in eager mode (#22732 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22732 Add support for quantization aware training in eager mode Modifications to Post training flow: ## Prepare * Fusion: e.g. (Conv, Bn) → ConvBn (float) * Swapping: To insert fake_quant to weight, we need to swap the float modules that has weight with different qat modules, e.g. Conv → torch.nn.qat.Conv , ConvBn → torch.nn._intrinsic.qat.ConvBn ``` * previously we were thinking about modify the weight in forward_pre hook and change it back in forward_hook: * def forward_pre_hook(self, input): self.float_weight = self.weight self.weight = self.fake_quantize(self.float_weight) def forward_hook(self, input): self.weight = self.float_weight ``` * Assignments to self.weight are needed because we can’t change forward function and in forward function they are using self.weight. * But we will need to keep two copies of weight in this case, so it’s probably better to just swap the module * So we want to just swap Conv to torch.nn.qat.Conv and Linear to torch.nn.qat.Linear * qat modules will have fake_quant for output and weights inserted in forward function ## Convert * flow should be identical to ptq, but the swapping dictionary is slightly different since modules are changed in prepare step. Reviewed By: zafartahirov Differential Revision: D16199356 fbshipit-source-id: 62aeaf47c12c62a87d9cac208f25f7592e245d6c	2019-07-18 18:58:03 -07:00
Jerry Zhang	8dfbbf7bf2	Add nn.qat.Linear (#22714 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22714 We need this module for add fake_quant for weight Reviewed By: zafartahirov Differential Revision: D16193585 fbshipit-source-id: ed6c04ecf574ca1fe1dcded22c225da05976f7a3	2019-07-18 18:27:27 -07:00
Hong Xu	b6011c3caf	Update torchvision in CI. (#22754 ) Summary: To include `dea1afbf5e` Pull Request resolved: https://github.com/pytorch/pytorch/pull/22754 Differential Revision: D16366676 Pulled By: zhangguanheng66 fbshipit-source-id: abfcb785973f9caa2a5aa1154fa689bbba8ff2dd	2019-07-18 18:22:24 -07:00
svcscm	358e0d3d44	Updating submodules Reviewed By: yns88 fbshipit-source-id: 3998c7b50a8b0377e8f1748a8dbd3b7d2afc99a4	2019-07-18 16:36:25 -07:00
davidriazati	9897ec4701	Recursively compile class types (#22475 ) Summary: Try to compile for class types encountered in recursive script ](https://our.intern.facebook.com/intern/diff/16340717/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/22475 Pulled By: driazati Differential Revision: D16340717 fbshipit-source-id: 5e1a46db517be2412f57156efbc4eb3347b01a8a	2019-07-18 15:43:16 -07:00
Vitaly Fedyunin	425d28c30a	Reapply: optimize topk on cpu using parallel and partial sort (#19736 ) (#22865 ) Summary: https://github.com/pytorch/pytorch/issues/19736 was reverted as it was suspected to be broken on the master, trying to reapply Pull Request resolved: https://github.com/pytorch/pytorch/pull/22865 Differential Revision: D16265457 Pulled By: VitalyFedyunin fbshipit-source-id: 784bd6405471f15a8a49ebd0f3e98160d7d0679e	2019-07-18 14:15:54 -07:00
Will Feng	c1c4014bba	Add warning for legacy autograd function (#22922 ) Summary: When working on https://github.com/pytorch/pytorch/pull/22762, we discovered that we haven't actually deprecated legacy autograd function. This PR puts up the deprecation warning for 1.2, with the goal to remove legacy function support completely in the near future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22922 Differential Revision: D16363916 Pulled By: yf225 fbshipit-source-id: 4b554010a3d1f87a3fa45cc1aa29d019c8f1033c	2019-07-18 14:02:17 -07:00
Your Name	a2b3403962	Mark protobuf include path as system include (#23012 ) Summary: To suppress (many) compiler warnings from protobuf headers Pull Request resolved: https://github.com/pytorch/pytorch/pull/23012 Differential Revision: D16364573 Pulled By: bddppq fbshipit-source-id: adbc4921e29389131d43e7bcc1e6fcba19450c76	2019-07-18 13:44:39 -07:00
Shen Li	84d892b645	Remove DistributedDataParallelCPU as DDP now supports CPU models (#22864 ) Summary: cc ailzhang aazzolini yifuwang Pull Request resolved: https://github.com/pytorch/pytorch/pull/22864 Differential Revision: D16358011 Pulled By: mrshenli fbshipit-source-id: 8db2dc035dea03f07a32c749e754f625fda1bf28	2019-07-18 12:50:45 -07:00
Will Feng	a5e6586618	Revert D16357177: [pytorch][PR] Fix race condition, bad lock hierarchy. Move getFreeMutex() into AutoNcclGroup. Differential Revision: D16357177 Original commit changeset: f4ca9cd46cc6 fbshipit-source-id: 49e66e7e59df6cbc7f5d847bacc07da134067956	2019-07-18 12:28:46 -07:00
Will Feng	11d257e5df	Fix SGD memory leak when there is weight_decay (#23007 ) Summary: This fixes https://github.com/pytorch/pytorch/issues/20146. I am working on another PR that adds CPU and CUDA memory leak checking to all C++ API tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23007 Differential Revision: D16358973 Pulled By: yf225 fbshipit-source-id: 5ee7ed4e61e60424031540a633e1fae09d9df171	2019-07-18 12:10:10 -07:00
Hong Xu	502766e99e	Add the mathematical definition of torch.sign to clarify this is the sgn function. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22894 Differential Revision: D16345027 Pulled By: ezyang fbshipit-source-id: 1421571f1f8764539a35b9060d90ea6075f889d3	2019-07-18 11:45:27 -07:00
Richard Zou	662fe699c5	Named inference rules for some initializer fns Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22972 Test Plan: - [namedtensor ci] Imported from OSS Differential Revision: D16342782 Pulled By: zou3519 fbshipit-source-id: 25277688ab51e1e98af0e19aeb9c79399171d2fb	2019-07-18 10:04:29 -07:00
Richard Zou	57cec0a720	Named inference rules for split/chunk Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22971 Test Plan: Imported from OSS Differential Revision: D16342783 Pulled By: zou3519 fbshipit-source-id: 379edc8eb2f45a82ee8a6320f8285f8f81ea0b1b	2019-07-18 10:04:25 -07:00
Jesse Hellemn	6b70217a7e	Adding README for binaries to OSS Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23001 Differential Revision: D16359617 Pulled By: pjh5 fbshipit-source-id: bfe3f0e1dcb00f34e9362a74227e8a0bb90a8aaf	2019-07-18 10:04:21 -07:00
Supriya Rao	b91ab177a0	Add support to print QTensor in cpp (#22950 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22950 Print quantized tensor by first dequantizing it and then printing. Also print the scale, zero_point. size and type of tensor. Reviewed By: jerryzh168 Differential Revision: D16286397 fbshipit-source-id: 2d6fb1796e5b329a77c022b18af0a39f6edde0d7	2019-07-18 09:44:20 -07:00
Brian Vaughan	0c091380cc	disable non-deterministic cudnn ctcloss (#22977 ) Summary: Associated issue: https://github.com/pytorch/pytorch/issues/21680 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22977 Differential Revision: D16357873 Pulled By: nairbv fbshipit-source-id: 58711bac7d3e8390e868d594dc265ba053a1537c	2019-07-18 08:28:42 -07:00
Jeff Daily	29347cc9cf	Fix race condition, bad lock hierarchy. Move getFreeMutex() into AutoNcclGroup. (#22173 ) Summary: There are two mutexes within CUDACachingAllocator that cause a deadlock. One of the mutexes was added in order to work around the issue of NCCL interacting poorly with cudaFree. See - `68ff58d771` - https://github.com/pytorch/pytorch/pull/880 As of NCCL version 2 and its new group start/end APIs, the protection surrounding cudaFree() is no longer needed. The PyTorch code was updated to use the NCCL2 group start/end API, but the corresponding cuda_free_mutex and its getter getFreeMutex() were not revised. This PR removes the use of the getFreeMutex() when NCCL2 is used by moving calls to getFreeMutex() into the AutoNcclGroup. That way, depending on the NCCL version used, we either use the mutex or we use the new group APIs. The race condition is as follows, thanks to skeelyamd: The deadlock occurs between hip_free_mutex (aka cuda_free_mutex in github) (https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L165) and mutex (https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L162). hip_free_mutex is exported from THCCachingAllocator in getFreeMutex (https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L660) and is acquired in ProcessGroupNCCL::collective (https://github.com/pytorch/pytorch/blob/master/torch/lib/c10d/ProcessGroupNCCL.cpp#L397), which then calls back into THCCachingAllocator via c10::cuda::CUDACachingAllocator::recordStream (https://github.com/pytorch/pytorch/blob/master/torch/lib/c10d/ProcessGroupNCCL.cpp#L416 to https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L655 to https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L379). At this point it acquires mutex (https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L384). This requires hip_free_mutex to be locked before mutex. However, in free_blocks (https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L505) THCCachingAllocator locks hip_free_mutex. Free_blocks is called from emptyCache (https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L328) which locks mutex. That requires mutex to be locked before hip_free_mutex. emptyCache and ProcessGroupNCCL::collective may not be executed concurrently but this is occurring and deadlocking the CPU. free_blocks is also called by malloc (via cuda_malloc_retry -> free_cached_blocks -> free_blocks) which also locks mutex first and so malloc must not execute concurrent with ProcessGroupNCCL::collective. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22173 Differential Revision: D16357177 Pulled By: pietern fbshipit-source-id: f4ca9cd46cc6d5e15290d99577d88be3f4fa8972	2019-07-18 07:31:02 -07:00
Tongzhou Wang	14ecf92d42	Slightly improve irfft doc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22995 Differential Revision: D16356435 Pulled By: soumith fbshipit-source-id: f6cfd9990fd79faebfb566704359c866ddf36525	2019-07-18 03:12:49 -07:00
Igor Fedan	c2df54d6d0	avg_pool2d avg_pool3d for LongTensor (#22433 ) Summary: Generate avg_pool2d/avg_pool3d for LongTensor for CPU. Added divisor_override parameter. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22433 Differential Revision: D16108809 Pulled By: ifedan fbshipit-source-id: 8de7ff585a0479702cceafb5ccf9dfea62a9cc50	2019-07-17 19:59:09 -07:00
Will Feng	52bf38007b	Remove usage of legacy autograd function (#22925 ) Summary: We are planning to put up a deprecation warning for legacy autograd function in 1.2: https://github.com/pytorch/pytorch/pull/22922. This PR removes all usage of legacy function in PyTorch core and test suite, to prepare for the eventual removal of legacy function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22925 Differential Revision: D16344834 Pulled By: yf225 fbshipit-source-id: 8bf4cca740398835a08b7a290f3058c3e46781ba	2019-07-17 19:50:36 -07:00
svcscm	29853293d7	Updating submodules Reviewed By: yns88 fbshipit-source-id: 4f801d353ee14ec0bd6fd24830f0e7a4343d67f8	2019-07-17 18:05:09 -07:00
Zafar Takhirov	992f3860a3	Quantized relu to native_functions (#22316 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22316 Adding the quantized ReLU to the native_functions.yamp, as it has the same signature as non-quantized relu Reviewed By: jerryzh168 Differential Revision: D16038441 fbshipit-source-id: 1cfbb594eb9bca1b7ec49ca486defcf1908b0d26	2019-07-17 17:31:02 -07:00
Orion Reblitz-Richardson	e24f18cea0	Revert D15854892: [pytorch][PR] [tensorboard] Cleanup API and remove 'experimental' warning Differential Revision: D15854892 Original commit changeset: 06b849882694 fbshipit-source-id: 588edc4616d020a23645f8c8181782c8412c4c6e	2019-07-17 16:45:54 -07:00
Hong Xu	a0ef4abeed	Add missing comment from #22103 (#22984 ) Summary: One important comment is missing from https://github.com/pytorch/pytorch/issues/22103 (not sure what happened). This commit makes it up. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22984 Differential Revision: D16347044 Pulled By: ezyang fbshipit-source-id: 0903909a5fb6740b43195136f1a23c28cfb2a02f	2019-07-17 16:21:38 -07:00
Le Fang	442dd7b906	Implement "trimmed lasso" regularization and support all available regularization in a single interface (#22966 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22966 We want to implement "trimmed lasso" for feature selection with learnable and regularizable weights. Trimmed lasso is a simple yet powerful improved version from traditional lasso. More reference can be found at https://arxiv.org/abs/1708.04527 and http://proceedings.mlr.press/v97/yun19a.html. For quick and necessary intro, please refer to P1-3 of the paper at https://arxiv.org/abs/1708.04527. Given n weights, traditional lasso sums up all weights' l1 norms. The trimmed lasso takes an input integer k (how many weights you want to select from n) and only sums over the smallest n - k weights. Given lambda as the regularization constant, the penalty term is only on the smallest n - k weights, but not other larger weights. If lambda becomes larger than certain threshold, the smallest n - k weights are shrunk to zero. That means we have those weights "dropped". With this property, the number k is the number of weights left after lasso, which we can easily control. Meanwhile, we further support all available regularization in a single interface. Current supported regularizers on weights include no reg, l1, l2, elastic, trimmed l1, elastic with trimmed l1, group l1, and logbarrier. Differential Revision: D16326492 fbshipit-source-id: 6e1fd75606005d9bc09d6650435c96a7984ba69c	2019-07-17 16:12:31 -07:00
Junjie Bai	eb76b7a564	Revert D16199862: [pytorch][PR] [ROCm] Update ROCm CI to python3.6 Differential Revision: D16199862 Original commit changeset: 46ca6029a232 fbshipit-source-id: 2843b919f2655674e39dc764053621994061a12b	2019-07-17 14:26:56 -07:00
Lu Fang	796a39ba85	Automatic update of fbcode/onnx to 707064980b9825b8705b9d1c9aad34d8b022d5dd (#22981 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22981 Previous import was 806aa863020fa180e57f576cb032ec44ce8ddcca Included changes: - [70706498](https://github.com/onnx/onnx/commit/70706498): TensorProto::INT8 & INT16 were missed here (#2164) <ZINEKS> - [8218a4ea](https://github.com/onnx/onnx/commit/8218a4ea): Fix LabelEncoder's shape inference (#2170) <Wei-Sheng Chin> - [0f1a9a1c](https://github.com/onnx/onnx/commit/0f1a9a1c): Fixing a unit test in Cumsum Operator (#2157) <Jeff Saremi> - [2c03cff0](https://github.com/onnx/onnx/commit/2c03cff0): [New Operator] CumSum (#2030) <Jeff Saremi> - [220b8300](https://github.com/onnx/onnx/commit/220b8300): Fix globalpool output shape (#2147) <daquexian> Reviewed By: benoitsteiner Differential Revision: D16341736 fbshipit-source-id: 7e7a2684d8c821991231bfd6558f9f6cb4fb05fb	2019-07-17 14:05:14 -07:00
iotamudelta	031b406c38	Update ROCm CI to python3.6 (#22322 ) Summary: Given that python 2.7 will be EOL'd on Jan 1, 2020 and we have models depending on python3.5+, we'd like to update the ROCm CI across the board to python3.6. This PR adds the skip tests and some semantic changes for PyTorch. Open tasks/questions: * RoiAlignTest.CheckCPUGPUEqual fails in the Caffe2 unit tests. Is this something expects / can be skipped? * for testing, I've used update-alternatives on CentOS/Ubuntu to select python == python 3.6. Is this the preferred way? Pull Request resolved: https://github.com/pytorch/pytorch/pull/22322 Differential Revision: D16199862 Pulled By: ezyang fbshipit-source-id: 46ca6029a232f7d23f3fdb5efc33ae39a379fca8	2019-07-17 13:42:30 -07:00
Jan Schlüter	5adba33c01	Use integer floor division for pooling shape computation (#22304 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/21935 by using the integer floor division that was introduced for convolution shapes in https://github.com/pytorch/pytorch/issues/9640. Without this fix, the pooling operators can produce a 1-element output in cases they shouldn't. Disclaimer: I couldn't properly test it locally (it's not picking up the modified version for some reason). I'm marking this WIP until I checked what the CI tools say... Pull Request resolved: https://github.com/pytorch/pytorch/pull/22304 Differential Revision: D16181955 Pulled By: ezyang fbshipit-source-id: a2405372753572548b40616d1206848b527c8121	2019-07-17 13:23:29 -07:00
Tongzhou Wang	332824551c	Fix F.one_hot doc signature Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22929 Differential Revision: D16290741 Pulled By: ezyang fbshipit-source-id: d8b979e64d92b94c5a70bb4ffe2a83042ed6abfc	2019-07-17 13:23:25 -07:00
Zachary DeVito	074afd7143	Remove unneeded IValue copy in unpickler. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22883 Test Plan: Imported from OSS Differential Revision: D16270330 Pulled By: zdevito fbshipit-source-id: ffd05b8c6860889d75172a288f339a434af76d45	2019-07-17 11:00:38 -07:00
Zachary DeVito	b6adb568fb	Cleanup some logic in pickler Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22882 Test Plan: Imported from OSS Differential Revision: D16270332 Pulled By: zdevito fbshipit-source-id: 714f293493965b13e471945fde11831a04875604	2019-07-17 11:00:34 -07:00
George Guanheng Zhang	3c0814ffeb	add docs to onnx APIs (#22938 ) Summary: Add docs to onnx APIs, including - export - export_to_pretty_string - is_in_onnx_export Fix https://github.com/pytorch/pytorch/issues/14698 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22938 Differential Revision: D16296182 Pulled By: zhangguanheng66 fbshipit-source-id: 1a1fa769b430db6428e6dfafba5447e6e2a75517	2019-07-17 10:50:41 -07:00
Orion Reblitz-Richardson	4861527446	Cleanup API and remove 'experimental' warning (#21786 ) Summary: This cleans up the `torch.utils.tensorboard` API to remove all kwargs usage (which isn't clear to the user) and removes the "experimental" warning in prep for our 1.2 release. We also don't need the additional PyTorch version checks now that we are in the codebase itself. cc ezyang lanpa natalialunova Pull Request resolved: https://github.com/pytorch/pytorch/pull/21786 Reviewed By: natalialunova Differential Revision: D15854892 Pulled By: orionr fbshipit-source-id: 06b8498826946e578824d4b15c910edb3c2c20c6	2019-07-17 10:34:00 -07:00
Xiaodong Wang	2630109727	always restore dlopen flag in dyndep (#22958 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22958 When we use `extension_loader.DlopenGuard()` to dyndep or import modules, it sets a `RTLD_GLOBAL` flag, and restores the original flags after the `yield`. However, if the modules is not there, yield will fail, and the flags won't be restored, creating all kinds of symbol conflict problems. Reviewed By: bddppq Differential Revision: D16311949 fbshipit-source-id: 7b9ec6d60423ec5e78cae694b66c2f17493840b0	2019-07-17 10:26:25 -07:00
Zafar Takhirov	35b6cdc2eb	Rewriting hypothesis_utils (#22830 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22830 Separating the tensor generation and the generation of the quantization parameters - Introducing hypothesis filter `assume_not_overflowing`, which makes sure that the generated tensor and qparams play well with each other. Note: This is an expensive filter! - `qtensor` -> Renameed to `tensor` - `qtensor_conv` -> Renamed to `tensor_conv2d` - The tensors don't return the quantization parameters anymore, use `qparams` for it - The `dtypes` argument is just a quantized dtype now. - The enforcement for zero_point is predefined as before. As before, if set to `None` the zero_point will be sampled. However, if `None`, you can override sampling with `zero_point_min` and `zero_point_max` - Scale sampling can also be overriden using `scale_min` and `scale_max` Reviewed By: jerryzh168 Differential Revision: D16234314 fbshipit-source-id: 5b538a5aa9772b7add4f2ce5eff6fd0decd48f8e	2019-07-17 10:16:13 -07:00
Lu Fang	b96610bf5a	fix the CI job for onnx (#22946 ) Summary: ONNX uses virtualenv, and PyTorch doesn't. So --user flag is causing problems in ONNX ci... Fixing it by moving it to pytorch only scripts. And will install ninja in onnx ci separately. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22946 Reviewed By: bddppq Differential Revision: D16297781 Pulled By: houseroad fbshipit-source-id: 52991abac61beaf3cfbcc99af5bb1cd27b790485	2019-07-17 09:50:06 -07:00
Jianyu Huang	f72d754877	qlinear operator level benchmark (#22914 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22914 Adding op level benchmarking for qlinear operator Reviewed By: mingzhe09088 Differential Revision: D16285204 fbshipit-source-id: 99b734ddfa0af6aada820cac7b2f38ef7a5868cb	2019-07-17 09:13:17 -07:00
vishwakftw	7a99f3987b	Update note about tensors on CPU for certain MAGMA functions, elimina… (#22618 ) Summary: …te argument in macro Changelog: - Update note about tensors on CPU for the following MAGMA functions - magma_(d/s)getrf_gpu and magma_getrf_nopiv_gpu require tensors on CPU for pivots - magma_(d/s)geqrf2_gpu requires tensors on CPU for elementary reflectors - magma_(d/s)syevd_gpu requires tensors on CPU for eigenvalues - Remove dummy tensor in ALLOCATE_ARRAY MACRO Pull Request resolved: https://github.com/pytorch/pytorch/pull/22618 Test Plan: - All existing tests should pass to verify that the patch is correct This PR has been proposed to eliminate confusion due to the previous comments, as indicated in https://github.com/pytorch/pytorch/issues/22573 Differential Revision: D16286198 Pulled By: zou3519 fbshipit-source-id: a5a6ec829084bdb752ca6006b8795227cbaf63b1	2019-07-17 07:38:23 -07:00
Michael Suo	5911cb8e5c	Make `load()` create only one CU Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22727 Differential Revision: D16197603 Test Plan: Imported from OSS Pulled By: suo fbshipit-source-id: 3eaefe6f229032b109d63a151fe0a20268b5cf56	2019-07-16 20:08:10 -07:00
svcscm	ec57d9215f	Updating submodules Reviewed By: yns88 fbshipit-source-id: 7eb29a58ff20b8ff0b793a84eb2f00e0a1bbe4b5	2019-07-16 19:53:06 -07:00
Igor Fedan	7ed82ea461	Added generation of transpose and dilated 2D and 3D for LongTensor (#22594 ) Summary: Added implementations: transpose2D transpose3D dilated2D and dilated3D for LongTensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/22594 Differential Revision: D16155462 Pulled By: ifedan fbshipit-source-id: af57330314bc2c3e0a38b9e75105b20030a1f9bb	2019-07-16 18:58:39 -07:00
Pavel Belevich	bcfa023a00	hardshrink_cpu and hardshrink_backward_cpu refactoring with at::native::cpu_kernel Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22459 Differential Revision: D16132625 Pulled By: pbelevich fbshipit-source-id: d7eb1cd6ed04eba3d0c54feaca1e5ab2836211b5	2019-07-16 18:58:35 -07:00
davidriazati	ef36046ad7	Better error message for using Python builtin_function_or_method (#22935 ) Summary: * better error in `toSugaredValue` * removes a bunch of periods on error messages, `ErrorReport` already adds a `:` at the end of the message](https://our.intern.facebook.com/intern/diff/16291079/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/22935 Pulled By: driazati Differential Revision: D16291079 fbshipit-source-id: 478724fc7d1ae79093f4ede18553ffeafa2c7964	2019-07-16 16:49:04 -07:00
Natalia Lunova	25b69997c3	Tensorboard Metrics (#22492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22492 Collect metrics about Tensorboard usage [internal] fbcode/pytorch/tensorboardX/tensorboardX/writer.py [OSS] fbcode/caffe2/torch/utils/tensorboard/writer.py Tensorboard Ondemand https://fb.quip.com/JRvqAKtzgy6z Reviewed By: dzhulgakov Differential Revision: D16105544 fbshipit-source-id: de14e6ec781889e367a6eba39fc777f707628263	2019-07-16 16:18:00 -07:00
Edward Yang	7793ab0871	More documentation about the pyobj field. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22885 Test Plan: Imported from OSS Differential Revision: D16283076 Pulled By: ezyang fbshipit-source-id: 4f6a87d900c4d430eedc90661de89e0f6916347e	2019-07-16 14:47:38 -07:00
Horace He	cd11109c2e	Fix messed up tests for dropout (#22893 ) Summary: Fix https://github.com/pytorch/pytorch/issues/22109. I've confirmed with suo that this wasn't intentional. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22893 Differential Revision: D16288640 Pulled By: Chillee fbshipit-source-id: 00fd6fe418ecefb304866a723051d0e5451ba4d5	2019-07-16 14:17:11 -07:00
Hong Xu	8ced53d62b	Correct the check of whether src is defined in copy_. (#22715 ) Summary: (intentionally left blank) Pull Request resolved: https://github.com/pytorch/pytorch/pull/22715 Differential Revision: D16205243 Pulled By: ezyang fbshipit-source-id: 9bf5a7885691d057198ae482259b36c1773457dd	2019-07-16 14:03:43 -07:00
Edward Yang	798d5d9771	Revert D16281714: Add sanity checks for NCCL detection. Differential Revision: D16281714 Original commit changeset: 396bcbf099bd fbshipit-source-id: a22cc112d1b6a62d689f9d8a7f93e8be3abe2a44	2019-07-16 13:58:27 -07:00
svcscm	7586ffdc57	Updating submodules Reviewed By: yns88 fbshipit-source-id: 30926ecb8fabee3f020ae183bb568a11145bcada	2019-07-16 13:37:24 -07:00
Will Feng	01f03d56ee	Revert D16283037: Add sanity checks for NCCL detection. Differential Revision: D16283037 Original commit changeset: fc09c9443a56 fbshipit-source-id: 30cdf7b1ad91498ee615d018de5571ba36f4383e	2019-07-16 13:20:43 -07:00
davidriazati	7a370dbb41	Enable recursive script mode as the default (#22887 ) Summary: This fixes up the test suite (mostly just adding `ignore` decorations to tests that need to call Python function) so that it all passes with recursive script enabled. The main user-facing result of this change is that Python functions are compiled without any decorators, so non-TorchScriptable code must be decorated with `torch.jit.ignore` (or `torch.jit.ignore(drop_on_export=True` to maintain the functionality of the current `ignore`) Details can be found in #20939 ](https://our.intern.facebook.com/intern/diff/16277608/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/22887 Pulled By: driazati Differential Revision: D16277608 fbshipit-source-id: 0abd0dc4291cf40651a1719bff813abb2b559640	2019-07-16 13:00:08 -07:00
Michael Suo	eaee0c6cd9	Make classtypes hold a weak_ptr to their CU Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22902 Test Plan: Imported from OSS Differential Revision: D16278159 Pulled By: suo fbshipit-source-id: 6aa682e347847e808b44218d38ff1dae66945a07	2019-07-16 12:04:20 -07:00
Michael Suo	b6a88b3344	Make traced fns also go into the global python CU Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22901 Test Plan: Imported from OSS Differential Revision: D16278160 Pulled By: suo fbshipit-source-id: f3e7d83b48d5f5b5cb1548ccc5b9bd382a3c411a	2019-07-16 12:04:16 -07:00
Lucas Adams	c6fe864db3	Add key_padding_mask kwarg to Transformer (#22588 ) Summary: Motivation: The forward method of MultiheadAttention has a kwarg a key_padding_mask. This mask is of shape (N,S) where N is batch and S is sequence length. This mask is applied prior to attention softmax where True values in the mask are set to float('-inf'). This allows you to mask position j from attention for all position i in input sequence. It's typically used to mask padded inputs. So for a sample in a batch we will be able to make sure no encoder outputs depend on padding inputs. Currently the Transformer, TransformerEncoder, and TransformerEncoderLayer do not have this kwarg, and only have options for a (S,S), (T,T), and (S,T) masks which are applied equally across the batch for source input, target output, and target-source memory respectively. These masks can't be used for padding and are instead used for things like subsequent masking in language modeling, by masking the attention of position i to position j. This diff exposes the key_padding_mask to Transformer, TransformerEncoder, and TransformerEncoderLayer forward methods which is ultimately passed to MultiheadAttention forward. Open question: should we also allow a key_padding_mask for the decoder layer? As padding is usually at the end of each sentence in a batch and sentences are usually decoding from left to right, usually people deal with padding on decoded outputs by just masking those outputs at the loss layer. There might be some scenarios where it's needed though I don't think it would be common. People can also still just subclass and override the layers. We could also pass the input key_padding_mask to the memory <> decoder attention layer. Not sure if that's necessary though because the output of position i from each attention encoder layer won't depend on any masked positions in the input (even if position i is a masked position itself) so there's not really any point in masking position i again. Adds the key_padding_mask kwarg to Transformer, TransformerEncoder, and TransformerEncoderLayer forward methods. The standard TransformerEncoderLayer uses a MultiheadAttention layer as self_attn. MultiheadAttention forward method has a key_padding_mask kwarg that allows for masking of values such as padding per sequence in a batch, in contrast to the attn_mask kwarg which is usually of shape (S,S) and applied equally across the batch. MultiheadAttention calls functional.multi_head_attention_forward, which has the same key_padding_mask kwarg of shape (N,S). Masked (True) values are set to float('-inf'). Pull Request resolved: https://github.com/pytorch/pytorch/pull/22588 Test Plan: buck test mode/dev caffe2/test:nn -- 'test_transformerencoderlayer $test_nn\.TestNN$' buck test mode/dev caffe2/test:nn -- 'test_Transformer_cell $test_nn\.TestNN$' buck test mode/dev caffe2/test:nn -- 'test_transformer_args_check $test_nn\.TestNN$' Differential Revision: D16112263 Pulled By: lucasgadams fbshipit-source-id: dc4147dd1f89b55a4c94e8c701f16f0ffdc1d1a2	2019-07-16 11:57:22 -07:00
Mingzhe Li	9b9546a498	replace ByteTensor with bool in fill_test (#22913 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22913 as title Reviewed By: hl475 Differential Revision: D16285248 fbshipit-source-id: 78b13d48d547760e59e0e5c8875ab09a3cd24828	2019-07-16 11:51:55 -07:00
Hong Xu	31497799b9	Add sanity checks for NCCL detection. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22819 Test Plan: Imported from OSS Differential Revision: D16283037 Pulled By: ezyang fbshipit-source-id: fc09c9443a568d9af1c78a847282a7d707c49dd6	2019-07-16 11:32:36 -07:00
Hong Xu	e2046f8c1d	Add sanity checks for NCCL detection. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22819 Test Plan: Imported from OSS Differential Revision: D16281714 Pulled By: ezyang fbshipit-source-id: 396bcbf099bd07b996cf779c6b43092096b52d90	2019-07-16 11:32:32 -07:00
Hong Xu	3ea04b59c0	Resolve the doc issue in which two asterisks have weird links. (#22896 ) Summary: Asterisks start emphases in rst. We should either escape them or put them as interpreted text. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22896 Differential Revision: D16282869 Pulled By: zou3519 fbshipit-source-id: 15ec4286434db55fb8357b1a12e6f70ef54f8c66	2019-07-16 11:23:06 -07:00
Richard Zou	3f3f5d042a	Revert D16227440: [pytorch][PR] Update note about tensors on CPU for certain MAGMA functions, elimina… Differential Revision: D16227440 Original commit changeset: 97d5537c5da9 fbshipit-source-id: 2dacfcc821e1fb64466e185efa0f6abd0c9ba526	2019-07-16 11:13:59 -07:00
BowenBao	52de340629	Export torch.masked_fill with onnx::where Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22521 Reviewed By: zrphercule Differential Revision: D16155168 Pulled By: houseroad fbshipit-source-id: 5d419f08213324d474b839ba1ae13c799aeee92a	2019-07-16 10:55:30 -07:00
Johannes M Dieterich	6c997538b7	Unwrap sccache post-build for ROCm compilations. (#22743 ) Summary: The sccache wrapping strategy causes problems for at-runtime kernel compilation of MIOpen kernels. We therefore - after the builds of caffe2/pytorch are complete - unwrap sccache again by moving the clang-9 actual binary back into its original place. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22743 Differential Revision: D16283329 Pulled By: bddppq fbshipit-source-id: 4fcdc92be295d5ea9aba75c30e39af1a18a80c13	2019-07-16 10:28:16 -07:00
davidriazati	ba38445cfd	Fix alias annotations for dict ops (#22900 ) Summary: Fixes #22553 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22900 Pulled By: driazati Differential Revision: D16277794 fbshipit-source-id: 657f18c50c9a87597ec1a7d568cc532638cfe386	2019-07-16 10:28:12 -07:00
SsnL	8482efb203	pin_memory malloc now uses existing context if available. (#22229 ) Summary: This is achieved by using `cuDevicePrimaryCtxGetState` as a way to check whether a primary context exists on a device. It is not too slow, from this benchmark of a single call to it on CUDA 10.1, Titan Xp, driver 415.27: ``` --------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------- BM_cuDevicePrimaryCtxGetState 301 ns 301 ns 2319746 ``` Commits: 1. Add `CUDAHooks::getDeviceWithPrimaryContext` which returns a device index with primary context (if exists). Link `c10/cuda` against `libcuda` for device API calls. 2. Use `getDeviceWithPrimaryContext` to check primary context in `pin_memory`. Fix `OptionalDeviceGuard` doc. 3. Refactor `test_cuda_primary_ctx.py` to support multiple tests. Add test for this in that file. Fixes https://github.com/pytorch/pytorch/issues/21081. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22229 Differential Revision: D16170194 Pulled By: zou3519 fbshipit-source-id: 485a45f211b7844c9e69c63f3b3b75194a796c5d	2019-07-16 10:18:30 -07:00
vishwakftw	054c7eb0f4	Update note about tensors on CPU for certain MAGMA functions, elimina… (#22618 ) Summary: …te argument in macro Changelog: - Update note about tensors on CPU for the following MAGMA functions - magma_(d/s)getrf_gpu and magma_getrf_nopiv_gpu require tensors on CPU for pivots - magma_(d/s)geqrf2_gpu requires tensors on CPU for elementary reflectors - magma_(d/s)syevd_gpu requires tensors on CPU for eigenvalues - Remove dummy tensor in ALLOCATE_ARRAY MACRO Pull Request resolved: https://github.com/pytorch/pytorch/pull/22618 Test Plan: - All existing tests should pass to verify that the patch is correct This PR has been proposed to eliminate confusion due to the previous comments, as indicated in https://github.com/pytorch/pytorch/issues/22573 Differential Revision: D16227440 Pulled By: zou3519 fbshipit-source-id: 97d5537c5da98c0ed3edc4668a09294794fc426b	2019-07-16 10:09:10 -07:00
vishwakftw	f8ad65adb1	Fix torch.triu / torch.tril on contiguous tensors with non-default st… (#22730 ) Summary: …rides Changelog: - Fix behavior of `torch.triu` / `torch.tril` on certain unsqueezed tensors that lead to uninitialized values on CPU Pull Request resolved: https://github.com/pytorch/pytorch/pull/22730 Test Plan: - Add tests for these cases in test_triu_tril in test_torch Fixes https://github.com/pytorch/pytorch/issues/22581 Differential Revision: D16222897 Pulled By: zou3519 fbshipit-source-id: b86b060187797e5cd2a7731421dff1ba2b5c9596	2019-07-16 10:09:03 -07:00
HaoTang@descartes	0ea8e61f03	For consistent CUDA_HOME behavior (#22845 ) Summary: Align the behavior of `torch.utils.cpp_extension.CUDA_HOME` with that of `tools.setup_helpers.cuda.CUDA_HOME`. Typically, I swapped the position of guess 2 and guess 3 in `torch.utils.cpp_extension.CUDA_HOME` . Fixing issue https://github.com/pytorch/pytorch/issues/22844 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22845 Differential Revision: D16276241 Pulled By: zou3519 fbshipit-source-id: 3b62b439b2f794a6f3637a5fee58991f430985fe	2019-07-16 09:55:56 -07:00
Mingzhe Li	560d847da6	add benchmark for PT fill_ op (#22867 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22867 as title Reviewed By: hl475 Differential Revision: D16263458 fbshipit-source-id: 55b0e62023c117aaa0c2b9a4d65b234a388f086d	2019-07-16 09:50:41 -07:00
Jiakai Liu	3b1c3996e1	remove RTTI check for TensorImpl shadow copy (#22773 ) Summary: We introduced RTTI in recent change: https://github.com/pytorch/pytorch/pull/21613 For internal mobile build we don't enable '-frtti' yet. This diff is trying to replace RTTI with alternative approach. According to dzhulgakov we could compare two tensors' type_id directly in most cases - which is more strict than comparing TensorImpl subclass type as TensorImpl -> type_id mapping is 1-to-n but it's more proper for this use case. The only two cases where we can relax direct type comparison (for legacy reason) are: 1. CPUTensor <-> CUDATensor; 2. SparseCPUTensor <-> SparseCUDATensor; Pull Request resolved: https://github.com/pytorch/pytorch/pull/22773 Differential Revision: D16277696 Pulled By: ljk53 fbshipit-source-id: 043e264fbacc37b7a11af2046983c70ddb62a599	2019-07-15 23:21:57 -07:00
Michael Suo	c5afdd0b55	Revert D16197605: [jit] Make traced fns also go into the global python CU Differential Revision: D16197605 Original commit changeset: d32c975486b0 fbshipit-source-id: a00f0490cc23824792f3e745d7b5a003b1a33d20	2019-07-15 22:31:33 -07:00
Will Feng	a326aad816	Revert D16197608: [jit] Make classtypes hold a weak_ptr to their CU Differential Revision: D16197608 Original commit changeset: 22250d6f0d24 fbshipit-source-id: 47a8cdeb62b1033252070ecb92906358014b551a	2019-07-15 19:49:41 -07:00
svcscm	5f05037de6	Updating submodules Reviewed By: yns88 fbshipit-source-id: a7af5bc022abbfb81af31dbb653e25a3b8d54c4f	2019-07-15 18:07:21 -07:00
Mingzhe Li	94d99f2522	add num_runs flag to the benchmark (#22892 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22892 Think of num_runs as manually run the binary <num_runs> times. Each run runs the operator for many iterations. Reviewed By: hl475 Differential Revision: D16271597 fbshipit-source-id: b6f509ee0332c70f85bec0d447b84940c5c0cecd	2019-07-15 17:18:25 -07:00
davidriazati	6ffacd5f02	Use original module's class name for ScriptModules (#22873 ) Summary: Since recursive script creates a ScriptModule from an `nn.Module`, there's no ties to the original module to pull a type name from, so we have to explicitly pass it in. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22873 Pulled By: driazati Differential Revision: D16268547 fbshipit-source-id: 902a30e6e36427c6ba7033ded027a29d9dcbc1ee	2019-07-15 15:27:29 -07:00
Nikolay Korovaiko	248336946e	remove stray print Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22825 Differential Revision: D16266401 Pulled By: Krovatkin fbshipit-source-id: 214f90578061aad83eab143381b3c05386edee3d	2019-07-15 14:54:10 -07:00
Jerry Zhang	f7de9be3c0	Add FakeQuantize Module (#21767 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21767 Adding FakeQuantize Module for quantization aware training Reviewed By: dzhulgakov Differential Revision: D15728503 fbshipit-source-id: 2a9a6a362812ede3deac42b93dddca35987bd8e6	2019-07-15 14:08:55 -07:00
Mingzhe Li	0cddd3e751	update README (#21312 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21312 This diff updates the README of op-bench. Reviewed By: zheng-xq Differential Revision: D15612665 fbshipit-source-id: b33119fd4f9d086b03b5e28fbe8a4015b282b15c	2019-07-15 13:34:05 -07:00
vishwakftw	7d055c21b3	Port SVD to ATen, enable batching for matrix inputs (#21588 ) Summary: Changelog: - Port SVD TH implementation to ATen/native/BatchLinearAlgebra.cpp - Port SVD THC implementation to ATen/native/cuda/BatchLinearAlgebra.cu - Allow batches of matrices as arguments to `torch.svd` - Remove existing implementations in TH and THC - Update doc string - Update derivatives to support batching - Modify nuclear norm implementation to use at::svd instead of _batch_svd - Remove _batch_svd as it is redundant Pull Request resolved: https://github.com/pytorch/pytorch/pull/21588 Test Plan: - Add new test suite for SVD in test_torch.py with port to test_cuda.py - Add tests in common_methods_invocations.py for derivative testing Differential Revision: D16266115 Pulled By: nairbv fbshipit-source-id: e89bb0dbd8f2d58bd758b7830d2389c477aa61fb	2019-07-15 13:34:01 -07:00
Michael Suo	260b0e8476	Make classtypes hold a weak_ptr to their CU Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22726 Differential Revision: D16197608 Test Plan: Imported from OSS Pulled By: suo fbshipit-source-id: 22250d6f0d249f61f269afb4fe8e7d1af0be1205	2019-07-15 13:13:16 -07:00
Michael Suo	5fc1260e0a	Make traced fns also go into the global python CU Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22725 Differential Revision: D16197605 Test Plan: Imported from OSS Pulled By: suo fbshipit-source-id: d32c975486b0cb4808687f0aa89325571f2817c4	2019-07-15 13:13:12 -07:00
Michael Suo	16aa235f43	_script_compile and _script_class_compile add to the python CU Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22724 Differential Revision: D16197609 Test Plan: Imported from OSS Pulled By: suo fbshipit-source-id: e12b31f8c8ce14b0968f4ac9445e7d225126b210	2019-07-15 13:13:08 -07:00
Sebastian Messmer	f2f80744be	Close loophole to create untyped tuples (#22518 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22518 - Reviewed By: dzhulgakov Differential Revision: D16115216 fbshipit-source-id: 1afae3666f7acd7d7833db8a72168364fed4879d	2019-07-15 11:33:45 -07:00
Sebastian Messmer	800f4936f0	Deprecate untyped Lists (#22517 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22517 Force anybody creating an untyped Dict to call c10::impl::deprecatedUntypedDict(). This should hopefully make it clear that this is not public API and prevent people from using it. Reviewed By: dzhulgakov Differential Revision: D16115214 fbshipit-source-id: 2c8d0e4e375339c699d583995f79c05c59693c3e	2019-07-15 11:33:35 -07:00
Iurii Zdebskyi	bd88fd0793	Added .bfloat16() (#22852 ) Summary: Add conversion method for bfloat16 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22852 Differential Revision: D16256760 Pulled By: izdeby fbshipit-source-id: 01d75495f9df513a0cdf78791c3eb013ab92bd95	2019-07-15 09:32:18 -07:00
Edward Thomson	8399197df6	Set up CI with Azure Pipelines (#22839 ) Summary: Introduce Azure Pipelines for the linting checks. This is meant to be equivalent to the existing Travis linting phase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22839 Differential Revision: D16260376 Pulled By: ezyang fbshipit-source-id: 1e535c3096358be67a0dad4cd920a92082b2d18e	2019-07-15 06:41:56 -07:00
Edward Yang	535c5540bc	Back out "Back out "[pytorch][PR] Move thnvrtc and DynamicLibrary to ATen"" (#22794 ) Summary: Original commit changeset: 227df3b85316 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22794 ghstack-source-id: 86400904 Differential Revision: D16222777 fbshipit-source-id: 0b198ac59e640df0b8204b4ed30f8e822c15fd9a	2019-07-15 06:28:56 -07:00
Will Feng	317cf7c874	Remove tensor_data() call in Python Variable() and nn.Parameter() constructors (#22821 ) Summary: As part of the Variable/Tensor merge, `variable.tensor_data()` should be removed in favor of `variable.detach()`. This PR removes `tensor_data()` call sites in Python `Variable()` and `nn.Parameter()` constructor paths. Note that this PR is BC-breaking in the following way: - For Python `Variable()` constructor: Previously, in-place updating a tensor after it's been used to create a Variable does not bump the Variable's version counter, which causes the following problem: ```python t = torch.ones(2, 3) v = torch.autograd.Variable(t).requires_grad_() y = v * v t.add_(1) # This bumps version counter of `t` y.sum().backward() # This computes `v`'s gradient incorrectly before this patch, and throws error after this patch ``` After this patch, in-place updating a tensor after it's been used to create a Variable will also bump the Variable's version counter, thus preserving the correctness of the Variable's version counter. - For Python `nn.Parameter()` constructor: Previously, in-place updating a tensor after it's been used to create an nn.Parameter does not bump the nn.Parameter's version counter, which causes the following problem: ```python t = torch.ones(2, 3) v = torch.nn.Parameter(t) y = v * v t.add_(1) # This bumps version counter of `t` y.sum().backward() # This computes `v`'s gradient incorrectly before this patch, and throws error after this patch ``` After this patch, in-place updating a tensor after it's been used to create an nn.Parameter will also bump the nn.Parameter's version counter, thus preserving the correctness of the nn.Parameter's version counter. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22821 Differential Revision: D16258030 Pulled By: yf225 fbshipit-source-id: 9a6d68cea1864893193dbefbb6ef0c1d5ca12d78	2019-07-14 21:09:29 -07:00
Hong Xu	14e8fb70a1	Make the signature of fill_out consistent with fill_. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22761 Test Plan: Imported from OSS Differential Revision: D16257779 Pulled By: ezyang fbshipit-source-id: b1201500042ae1f4678835da957de1777c1038a3	2019-07-14 19:20:59 -07:00
Hong Xu	1c266c2738	Move the body of fill_kernel_impl into fill_kernel_cuda Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22760 Test Plan: Imported from OSS Differential Revision: D16257782 Pulled By: ezyang fbshipit-source-id: d214d2d77affd937109b33ca841af76004f85834	2019-07-14 19:20:53 -07:00
Hong Xu	fc297b8e83	Move fill and fill_diagonal to Fill.cpp, Fill.h, and FillKernel.{cpp,cu} Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22758 Test Plan: Imported from OSS Differential Revision: D16257781 Pulled By: ezyang fbshipit-source-id: 9e5ed06e95ef65b036eb388488faad981f1e8012	2019-07-14 19:20:46 -07:00
James Reed	815e73bc20	make_variable consumes the Tensor if it only has one reference Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22705 Test Plan: Imported from OSS Differential Revision: D16192220 Pulled By: jamesr66a fbshipit-source-id: 9c42bb759077b74a1370d3a2d7114ed3593f333b	2019-07-14 18:36:20 -07:00
Junjie Bai	b5fa9a340a	Temporarily skip mypy-0.720 to unbreak master type checks Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22835 Differential Revision: D16239190 Pulled By: bddppq fbshipit-source-id: e97fd3aae0676de8a06dc9fb498f36ed28dc92c3	2019-07-14 09:49:24 -07:00
Lingyi Liu	1a93b96815	Revert da315a4 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22837 Differential Revision: D16239667 Pulled By: llyfacebook fbshipit-source-id: 1a625d78d633927129dd2791e65b333b3902f94f	2019-07-13 01:54:20 -07:00
Junjie Bai	92468f0a6b	Revert D16238204: Revert D16224780: [pytorch][PR] [ROCm] MIOpen integration into pytorch RNN operators Differential Revision: D16238204 Original commit changeset: a6b5eb3f4820 fbshipit-source-id: bdfae93c522c1ce734ab8dc736ced66411fe50ee	2019-07-12 22:58:50 -07:00
Karl Ostmo	da315a4e2a	Revert D16037021: Support GRU module quantization in Pytorch Differential Revision: D16037021 Original commit changeset: 71145c67d869 fbshipit-source-id: 33cd2e57eba30ea33cc4f3116732a721c26f6efb	2019-07-12 21:05:34 -07:00
Karl Ostmo	fcfefc3439	Revert D16224780: [pytorch][PR] [ROCm] MIOpen integration into pytorch RNN operators Differential Revision: D16224780 Original commit changeset: 331dafbb7689 fbshipit-source-id: a6b5eb3f4820fbb58d4a329aa4c93b40a111ff27	2019-07-12 20:55:05 -07:00
Pradeep Dorairaj	ead1193241	Transfer Learning: Caffe2 load op changes to return shape inference (#22829 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22829 Sending out caffe2 load op changes separately since we want pick it to open source. This change is needed because the shape information of the blobs is determined from the load operator and that shape information is needed in our download_group. Reviewed By: boryiingsu Differential Revision: D16229465 fbshipit-source-id: f78b2df9a7f26968d70eca68dde75cd11ab6f7a2	2019-07-12 19:45:13 -07:00
Lingyi Liu	d8c1b86135	Support GRU module quantization in Pytorch Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22498 Reviewed By: BIT-silence Differential Revision: D16037021 fbshipit-source-id: 71145c67d8696e525b686cd3313033e5b6771718	2019-07-12 18:31:08 -07:00
James Reed	ba9d559a12	Get rid of torch.mean shape analysis Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22810 Test Plan: Imported from OSS Differential Revision: D16226973 Pulled By: jamesr66a fbshipit-source-id: ad23f48782e8d21788ecae39fc512ff4502716bf	2019-07-12 17:50:10 -07:00
Mingzhe Li	9eb039334f	Use Linear Operator with fp16 weights in JIT (#22323 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22323 This diff adds an interface to use quantized Linear op in JIT. Reviewed By: jamesr66a Differential Revision: D16040724 fbshipit-source-id: 90e90aff9973c96ea076ed6a21ae02c349ee2bcf	2019-07-12 15:59:17 -07:00
Mingzhe Li	573d9e6975	Support Linear operation with fp16 weights in ATen (#22023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22023 This diff implements Linear operation with fp16 weights based on FBGEMM. At a hight level, we want to perform the following operation: Y = X * W + B with dtypes: (fp32, fp32, fp16, fp32) To do that, three steps are needed: 1. Quantize weights from fp32 to fp16, this is done using `PackedGemmMatrixFP16` in the `fbgemm_pack_gemm_matrix_fp16` 2. Conduct matrix multiplication with quantized weights using `cblas_gemm_compute` in `fbgemm_linear_fp16_weight` 3. Add bias to the result from step2 and return the final Y Reviewed By: jianyuh Differential Revision: D15921768 fbshipit-source-id: dc4e5b366f846ce9d58975876940a9b3372b8b8d	2019-07-12 15:59:13 -07:00
Karl Ostmo	35ee4bf4e5	Revert D16204820: [pytorch][PR] optimize topk on cpu using parallel and partial sort Differential Revision: D16204820 Original commit changeset: ea70562c9149 fbshipit-source-id: c8f8e262c7c681593d243f035bf1f0d84675c9dc	2019-07-12 15:14:06 -07:00
Elias Ellison	cf2889ad8f	add support for breaks and continues (#21692 ) Summary: Add support for breaks and continues in the jit. We do with a Graph transform pre-SSA. A graph of the form ``` def test(): while i < 5: if i == 3: break i += 1 print(i) ``` has the body of the loop transformed to ``` if i == 3: did_break = True else: did_break = False if did_break: loop_exit = True else: i += 1 print(i) loop_exit = i < 5 ``` I am going to add more tests but I think it is ready for review now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21692 Differential Revision: D16215807 Pulled By: eellison fbshipit-source-id: 365102f42de4861d9323caaeb39a96de7619a667	2019-07-12 15:02:44 -07:00
BowenBao	b3147bc674	PyTorch export to ONNX Opset 7 and 8 - Cont (#22421 ) Summary: This is an extension to the original PR https://github.com/pytorch/pytorch/pull/21765 1. Increase the coverage of different opsets support, comments, and blacklisting. 2. Adding backend tests for both caffe2 and onnxruntime on opset 7 and opset 8. 3. Reusing onnx model tests in caffe2 for onnxruntime. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22421 Reviewed By: zrphercule Differential Revision: D16225518 Pulled By: houseroad fbshipit-source-id: 01ae3eed85111a83a0124e9e95512b80109d6aee	2019-07-12 14:52:48 -07:00
ashish	9f8e2c067f	MIOpen integration into pytorch RNN operators (#22774 ) Summary: This PR enables pytorch RNN operators to use MIOpen engine ezyang bddppq cc: lcskrishna iotamudelta Pull Request resolved: https://github.com/pytorch/pytorch/pull/22774 Differential Revision: D16224780 Pulled By: bddppq fbshipit-source-id: 331dafbb76892d7390b620a95d8f384d38ee5533	2019-07-12 14:47:48 -07:00
Sam Gross	30e03df638	Speeds up fast-path for 1D tensors (#22756 ) Summary: Using PMCTest (https://www.agner.org/optimize/) to measure TensorIterator construction, this results in ~600 fewer instructions retired (~300 fewer cycles) for constructing TensorIterator on a 1D tensor. (Should be roughly ~100 ns, but it's hard to measure that precisely end-to-end). ``` Before: Clock Core cyc Instruct Uops L1D Miss 5082 2768 5690 7644 3 After: Clock Core cyc Instruct Uops L1D Miss 4518 2437 5109 6992 0 ``` Note that Instruct is reliable, Core cyc is a little noisy, and Clock is a little more noisy. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22756 Differential Revision: D16207777 Pulled By: VitalyFedyunin fbshipit-source-id: bcc453a90472d9951a1c123bcb1b7a243fde70ac	2019-07-12 12:33:38 -07:00
Wanchao Liang	02bc06a683	avoid kernel launches for zero-sized tensor inputs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22790 Test Plan: Imported from OSS Differential Revision: D16226168 Pulled By: wanchaol fbshipit-source-id: 081607c9acc1540c753b080c5f727dc4e8c22acc	2019-07-12 12:24:52 -07:00
Sam Gross	b1b65f34a9	Make PythonArgs::tensor and PythonArgs::scalar faster (#22782 ) Summary: Speeds up the common case where Tensor is a torch.Tensor (not a subclass). This reduces the number of executed instructions for a torch.add(tensor1, tensor2) by ~328 (should be ~65 ns faster). Note that most of the PythonArgs accessors are too large to be inlined. We should move most of them to the cpp file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22782 Differential Revision: D16223592 Pulled By: colesbury fbshipit-source-id: cc20f8989944389d5a5e3fab033cdd70d581ffb1	2019-07-12 11:57:29 -07:00
Mingfei Ma	10c14ad17c	optimize topk on cpu using parallel and partial sort (#19736 ) Summary: This PR aims at improving `topk()` performance on CPU. This is useful when computing beam search during `Transformer` and `BERT`. Given a tensor x of size `[N, C]`, and we want to apply `x.topk(K)`, the current logic is sequentially loop on the dimension of `N` and do quick select on the dimension of `C` so as to find out top K elements. Performance can be further improved from: - On the dimension of `N`, it can be paralleled - Maybe a faster sorting algorithm for `topk`. (After a bunch of experimenting, `std::partial_sort` seems to be the most promising) So i compared 3 versions: 1. vanilla: sequential + quick select 2. reference PR https://github.com/pytorch/pytorch/issues/19737: parallel + quick select 3. this PR: parallel + partial sort with the following benchmark, on `Xeon 8180, 228 cores@2.5 GHz`: ```python import torch from time import time num_iters = 1000 def bench_topk(N=8, C=168560, k=10): a = torch.randn(N, C) # warm up for i in range(100): torch.topk(a, k) t = 0 for i in range(num_iters): a = torch.randn(N, C) start = time() value, indice = torch.topk(a, k) t += time() - start print("#[%d, %d] times: %f ms" % (N, C, t / num_iters 1000)) Ns = [10, 20, 30] Cs = [10000, 20000, 40000, 80000, 160000, 320000] for n in Ns: for c in Cs: bench_topk(N=n, C=c) ``` ### vanilla: sequential + quick select ``` #[10, 10000] times: 0.746740 ms #[10, 20000] times: 1.437399 ms #[10, 40000] times: 2.832455 ms #[10, 80000] times: 5.649426 ms #[10, 160000] times: 11.309466 ms #[10, 320000] times: 22.798765 ms #[20, 10000] times: 1.511303 ms #[20, 20000] times: 2.822024 ms #[20, 40000] times: 5.564770 ms #[20, 80000] times: 11.443044 ms #[20, 160000] times: 22.747731 ms #[20, 320000] times: 46.234449 ms #[30, 10000] times: 2.214045 ms #[30, 20000] times: 4.236179 ms #[30, 40000] times: 8.418577 ms #[30, 80000] times: 17.067578 ms #[30, 160000] times: 33.826214 ms #[30, 320000] times: 68.109420 ms ``` ### reference PR: parallel + quick select ``` #[10, 10000] times: 0.271649 ms #[10, 20000] times: 0.593016 ms #[10, 40000] times: 1.133518 ms #[10, 80000] times: 2.082355 ms #[10, 160000] times: 4.049928 ms #[10, 320000] times: 7.321285 ms #[20, 10000] times: 0.315255 ms #[20, 20000] times: 0.539054 ms #[20, 40000] times: 1.000675 ms #[20, 80000] times: 1.914586 ms #[20, 160000] times: 4.437122 ms #[20, 320000] times: 8.822445 ms #[30, 10000] times: 0.347209 ms #[30, 20000] times: 0.589947 ms #[30, 40000] times: 1.102814 ms #[30, 80000] times: 2.112201 ms #[30, 160000] times: 5.186837 ms #[30, 320000] times: 10.523023 ms ``` ### this PR: parallel + partial sort ``` #[10, 10000] times: 0.150284 ms #[10, 20000] times: 0.220089 ms #[10, 40000] times: 0.521875 ms #[10, 80000] times: 0.965593 ms #[10, 160000] times: 2.312356 ms #[10, 320000] times: 4.759422 ms #[20, 10000] times: 0.167630 ms #[20, 20000] times: 0.265607 ms #[20, 40000] times: 0.471477 ms #[20, 80000] times: 0.974572 ms #[20, 160000] times: 3.269645 ms #[20, 320000] times: 6.538608 ms #[30, 10000] times: 0.204976 ms #[30, 20000] times: 0.342833 ms #[30, 40000] times: 0.589381 ms #[30, 80000] times: 1.398579 ms #[30, 160000] times: 3.904077 ms #[30, 320000] times: 9.681224 ms ``` In summary, `2` is 5x faster than `vanilla` on average and `3` is 8.6x faster than `vanilla`. On `Fairseq Transformer`, the default parameter on dataset `wmt14` would have a `topk` size of `[8, 168560]`, and this operator gets `3x` faster with this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19736 Differential Revision: D16204820 Pulled By: VitalyFedyunin fbshipit-source-id: ea70562c9149a0d832cf5872a891042ebd74fc63	2019-07-12 11:10:20 -07:00
Sam Gross	fc23d7f3bd	Speed up TensorIterator::compute_strides a little (#22779 ) Summary: For three 1-D operands, compute_strides now takes 298 instructions instead of 480. (Saves ~36 ns). We'll want to make Tensor::sizes(), strides(), and element_size() trivially inlinable to speed this up more. (Using PMCTest from https://www.agner.org/optimize/ to measure instructions retired) Pull Request resolved: https://github.com/pytorch/pytorch/pull/22779 Differential Revision: D16223595 Pulled By: colesbury fbshipit-source-id: e4730755f29a0aea9cbc82c2d376a8e6a0c7bce8	2019-07-12 10:57:32 -07:00
Guanheng Zhang	f266a63eeb	Initiate checkCuda90Bug warning (#22757 ) Summary: Initiate checkCuda90Bug warning to THCudaBlas_Sgemm and THCudaBlas_Hgemm. https://github.com/pytorch/pytorch/pull/22034 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22757 Differential Revision: D16223085 Pulled By: zhangguanheng66 fbshipit-source-id: 470c6cbaba16a3cec295993c2673f02008a602a6	2019-07-12 09:55:09 -07:00
Edward Yang	ccb28939bf	Revert D16222539: [pytorch][PR] Let users pass CMake-specific options starting with CMAKE_ to CMake. Differential Revision: D16222539 Original commit changeset: 1cc6e69c85cd fbshipit-source-id: c79d68976ac1047c54b32c093429b23e9482cd8f	2019-07-12 07:57:57 -07:00
Hong Xu	612eed31a9	Let users pass CMake-specific options starting with CMAKE_ to CMake. (#22776 ) Summary: This should make it more convenient to follow https://github.com/pytorch/pytorch/issues/8433's suggestion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22776 Differential Revision: D16222539 Pulled By: ezyang fbshipit-source-id: 1cc6e69c85cdf0d7f8074653445410d85746847c	2019-07-12 07:28:32 -07:00
Mingzhe Li	7eb0319339	add new tests to benchmark_all_test (#22787 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22787 as title Reviewed By: hl475 Differential Revision: D16219329 fbshipit-source-id: 097ee73e7644d5ca482ad044d0fd2c3e7dc2c10b	2019-07-11 22:50:55 -07:00
Mingzhe Li	1878800f47	make custom op work in OSS environment (#22781 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22781 The custom op is required to make the op benchmark work with JIT. Running this command `python setup.py install` in the pt_extension directory to install it. It is required. Reviewed By: hl475 Differential Revision: D16214430 fbshipit-source-id: c9221c532011f9cf0d5453ac8535a6cde65e8376	2019-07-11 21:17:17 -07:00
Jianyu Huang	8ec712da30	Add the support of handle Bias being nullptr for torch.ops.quantized.fbgemm_linear (#22403 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22403 - C10 Operator Registration (https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/core/op_registration/op_registration.cpp) supports None type. - ATen has None Tensor support, e.g., https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/native_functions.yaml#L1078 Reviewed By: zafartahirov Differential Revision: D16069522 fbshipit-source-id: 3acaec783fc138ff36b14ffc0582d0764be4ad34	2019-07-11 17:33:08 -07:00
Spandan Tiwari	9d11004ee4	Update ONNX constant folding to support opset 10. (#22515 ) Summary: Currently ONNX constant folding (`do_constant_folding=True` arg in `torch.onnx.export` API) supports only opset 9 of ONNX. For opset 10, it is a no-op. This change enables ONNX constant folding for opset 10. Specifically there are three main changes: 1) Turn on constant folding ONNX pass for opset 10. 2) Update support for opset 10 version of `onnx::Slice` op for backend computation during constant folding. 3) Enable constant folding tests in `test/onnx/test_utility_funs.py` for multiple opsets (9 and 10). Pull Request resolved: https://github.com/pytorch/pytorch/pull/22515 Reviewed By: zrphercule Differential Revision: D16189336 Pulled By: houseroad fbshipit-source-id: 3e2e748a06e4228b69a18c5458ca71491bd13875	2019-07-11 16:29:03 -07:00
Michael Suo	291570e085	make CompilationUnit::define return defined functions Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22723 Test Plan: Imported from OSS Differential Revision: D16197604 Pulled By: suo fbshipit-source-id: b22491a58aa9ea476acab06614093ff004291407	2019-07-11 14:55:43 -07:00
Michael Suo	de819be93e	refactor self to be a class again Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22722 Test Plan: Imported from OSS Differential Revision: D16197607 Pulled By: suo fbshipit-source-id: b4dd96b3f9cc46b48678aab0ff89afc3666e2185	2019-07-11 14:55:39 -07:00
Michael Suo	22d70e0d4b	Give functions qualified names Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22721 Test Plan: Imported from OSS Differential Revision: D16197606 Pulled By: suo fbshipit-source-id: 94718fcdb0d3b651f16674af3cfd6249ed4533ae	2019-07-11 14:55:34 -07:00
Edward Yang	4b48ae4aec	Suppress progress bar only for pip install Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22708 Test Plan: Imported from OSS Differential Revision: D16206329 Pulled By: ezyang fbshipit-source-id: 4ec29e0e9e48a168e88ec716ee8e270c56a38cdb	2019-07-11 13:50:29 -07:00
Bram Wasti	05d56bd1b6	Remove hard-coded NVRTC specific constant from fuser header Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22699 Test Plan: Imported from OSS Differential Revision: D16192290 Pulled By: bwasti fbshipit-source-id: 4dccaf3e6e0151e86d35474c36e1ddb7f2afb5cf	2019-07-11 13:44:25 -07:00
James Reed	513b7a7a06	assert_no_internal_overlap pass op name by const ref Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22729 Test Plan: Imported from OSS Differential Revision: D16205448 Pulled By: jamesr66a fbshipit-source-id: b383c461dd58e8a3d0bfeae43ebfd1e021668f80	2019-07-11 13:38:10 -07:00
James Reed	9690f8629d	Move the storage in empty_cpu Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22728 Test Plan: Imported from OSS Differential Revision: D16205449 Pulled By: jamesr66a fbshipit-source-id: 6fd198d0d526b5de393e2988906dac2a63064f24	2019-07-11 13:38:07 -07:00
Chunli Fu	a797815198	bucketize op shape inference (#22716 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22716 add shape inference func to bucketize op Reviewed By: ipiszy Differential Revision: D16193718 fbshipit-source-id: 6e893356b6408255538545673047dd5124837e70	2019-07-11 12:44:29 -07:00
Edward Yang	ac78a86e1d	Back out "[pytorch][PR] Move thnvrtc and DynamicLibrary to ATen" (#22749 ) Summary: Original commit changeset: add2ee8a8865 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22749 ghstack-source-id: 86323899 Differential Revision: D16203552 fbshipit-source-id: 227df3b85316315c15d2cb7b6a5c884096a82e9e	2019-07-11 12:21:21 -07:00
Xiaomeng Yang	8bdda03ae1	optimize RNN on CPU (#22512 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22512 optimize RNN on CPU Reviewed By: llyfacebook Differential Revision: D16113360 fbshipit-source-id: 9ee53b3b4bb9b636e7be1ccdf25420e2caa60762	2019-07-11 12:16:27 -07:00
Jie	3135298dde	(#22602 ) Summary: 1. update on restricting block.z <= 64, compliant to CUDA maximum z-dimension of a block; 2. clang-format Pull Request resolved: https://github.com/pytorch/pytorch/pull/22602 Differential Revision: D16203857 Pulled By: ezyang fbshipit-source-id: 567719ae175681a48eb0f818ca0aba409dca2550	2019-07-11 12:02:58 -07:00
Jerry Zhang	1682d38a25	Improve hypothesis_utils.py for qtensor (#22693 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22693 change np.finfo to torch.finfo Differential Revision: D16185556 fbshipit-source-id: 594f8ba1d6317ac2de47af754a8bd6015d40ea15	2019-07-11 11:56:01 -07:00
Junjie Bai	3fabb9f105	Fix lint Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22737 Differential Revision: D16200090 Pulled By: bddppq fbshipit-source-id: 3819716a9b01f073966fc8b420c6a0b8d13232ac	2019-07-11 11:09:24 -07:00
shihongzhi	45cf33a731	add fill_diagonal function (#21892 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/21796 Pull Request resolved: https://github.com/pytorch/pytorch/pull/21892 Differential Revision: D16164678 Pulled By: colesbury fbshipit-source-id: 85df8ae9b7a6a91b6023fe7295b3a8124e4526ea	2019-07-11 09:20:44 -07:00
Kexuan Sun	89d6e88042	Add environment variables used in CONTRIBUTING example (#22736 ) Summary: Some other environment variables can be added to speed things up for development. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22736 Differential Revision: D16200904 Pulled By: soumith fbshipit-source-id: 797ef91a863a244a6c96e0adf64d9f9b4c9a9582	2019-07-11 04:15:51 -07:00
Chaitanya Sri Krishna Lolla	5147819f9d	enabled MIOpen depthwise convolutions (#22696 ) Summary: They mistakenly got removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22696 Differential Revision: D16191442 Pulled By: bddppq fbshipit-source-id: 7ceda274c557879e11f84596040efe9e0c9b861f	2019-07-11 00:14:58 -07:00
Zafar Takhirov	d21e476dcd	Quantized Conv2d Module (#21323 ) Summary: Stack:     ⚪  https://github.com/pytorch/pytorch/issues/21808 Quantized conv avoid functional usage  [💛](https://our.intern.facebook.com/intern/diff/D15835572/)     ⚫  https://github.com/pytorch/pytorch/issues/21323 Quantized Conv2d Module  [💛](https://our.intern.facebook.com/intern/diff/D15551835/) Quantized Conv2d Module Pull Request resolved: https://github.com/pytorch/pytorch/pull/21323 Test Plan: Tests are split into two parts: functional and API. `buck test mode/dev caffe2/test:quantized -- test_conv_api` : https://our.intern.facebook.com/intern/testinfra/testrun/4785074605318491 ``` Parsing buck files: finished in 1.4 sec Building: finished in 4.6 sec (100%) 7136/7136 jobs, 2 updated Total time: 6.1 sec Trace available for this run at /tmp/testpilot.20190703-153023.392592.log TestPilot test runner for Facebook. See https://fburl.com/testpilot for details. Testpilot build revision 7149de230b9e1cdc7a872bb31fe099f0616dee09 fbpkg e59e6ab0fe8e47a496f915d34555c3ad at Fri Jun 28 12:20:54 2019 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/647/t.par Discovering tests Running 2 tests Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/4785074605318491 ✓ caffe2/test:quantized - test_conv_api (test_nn_quantized.ModuleAPITest) 0.044 1/2 (passed) ✓ caffe2/test:quantized - test_conv_api (test_quantized_conv.FunctionalAPITest) 5.109 2/2 (passed) Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/4785074605318491 Summary (total time 9.08s): PASS: 2 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 0 ``` Differential Revision: D15551835 Pulled By: zafartahirov fbshipit-source-id: 481a7df4b8a88e485437e1596eefb08d5e6766fa	2019-07-10 21:31:24 -07:00
Shen Li	ad634875d0	Mark Unpickler data ptr arg as const Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22690 Differential Revision: D16184299 Pulled By: mrshenli fbshipit-source-id: 332954028533952dad01df03eca8e95bf6fe67a9	2019-07-10 20:07:13 -07:00
Sam Gross	4240220926	Revert D16183577: Delegate Python ~ (invert operator) to Tensor.bitwise_not(). Differential Revision: D16183577 Original commit changeset: f86838c407db fbshipit-source-id: bbf53ce52a20b1e90b1fe522d73e558d8044c4ba	2019-07-10 18:29:22 -07:00
Karl Ostmo	1ecc945ab2	Revert D15998762: [jit] Give functions qualified names Differential Revision: D15998762 Original commit changeset: bc2b734f626a fbshipit-source-id: a118cc4e9a34233279e8380529a8d8120a25839d	2019-07-10 16:10:28 -07:00
Karl Ostmo	a1ca32409f	Revert D15998758: [jit] refactor self to be a class again Differential Revision: D15998758 Original commit changeset: 14bad87bb6e4 fbshipit-source-id: f2c29974d4afc4d8f88a36e9c266e6d5a22a6191	2019-07-10 16:10:24 -07:00
Karl Ostmo	e6eb17303f	Revert D16184799: [jit] make CompilationUnit::define return defined functions Differential Revision: D16184799 Original commit changeset: 9f77a7ca2223 fbshipit-source-id: a0e08220d924a6ca55bf2f1f77754553d0133595	2019-07-10 16:10:20 -07:00
Iurii Zdebskyi	fffa7200c1	fixing lint Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22703 Differential Revision: D16188326 Pulled By: izdeby fbshipit-source-id: 72e6b6f957068c3995010a1b811f24cd2304ff6f	2019-07-10 16:02:21 -07:00
Brian Vaughan	67c634d58e	add a comment to native_functions explaining softmax interfaces (#22651 ) Summary: Address the review comment made by gchanan here: https://github.com/pytorch/pytorch/pull/22456#discussion_r300715866 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22651 Differential Revision: D16181828 Pulled By: nairbv fbshipit-source-id: 0d41a9024c2664298c281e198a997be73e7f8499	2019-07-10 15:34:29 -07:00
Nikolay Korovaiko	0196e0bafb	add line numbers to jit_log.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22630 Differential Revision: D16172090 Pulled By: Krovatkin fbshipit-source-id: 26cdb0077a0bfbf9981e39359472f3251546db53	2019-07-10 15:28:29 -07:00
Michael Suo	c49a71f91f	make CompilationUnit::define return defined functions Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22667 Test Plan: Imported from OSS Differential Revision: D16184799 Pulled By: suo fbshipit-source-id: 9f77a7ca2223237fbcb4b12a4734b7d334f7be13	2019-07-10 15:19:11 -07:00
Michael Suo	ee9c8a75f4	refactor self to be a class again (#22207 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22207 ghimport-source-id: 36ee8bd17411a2e220665ad2a27364653061070e Test Plan: Imported from OSS Differential Revision: D15998758 Pulled By: suo fbshipit-source-id: 14bad87bb6e44bf1a43ae86339d8cc7b311c76dd	2019-07-10 15:19:07 -07:00
Michael Suo	c0674cebf1	Give functions qualified names (#22206 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22206 ghimport-source-id: d453219d907e048f24eb7f63c096b2c300307c83 Test Plan: Imported from OSS Differential Revision: D15998762 Pulled By: suo fbshipit-source-id: bc2b734f626ab07f97dc50ddf1b021e8b46de312	2019-07-10 15:19:03 -07:00
Lucas Kabela	86fc417147	Move Quantization Models to common_quantization (#22706 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22706 Moved the models used for quantization test from the test_quantization.py file to common_quantization.py Reviewed By: jerryzh168 Differential Revision: D16189865 fbshipit-source-id: 409b43454b6b3fe278ac16b1affb9085d6ed6835	2019-07-10 15:05:49 -07:00
Edward Yang	ebafa2e15f	Turn on USE_DIRECT_NVRTC in fbcode again. (#22685 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22685 ghstack-source-id: 86247780 Reviewed By: bddppq Differential Revision: D16182352 fbshipit-source-id: fc51aa7c1112904b8cccd055dc87e10c836cf2fb	2019-07-10 15:05:45 -07:00
Wanchao Liang	edeb4dbdcb	register __getitem__ builtin Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22276 Test Plan: Imported from OSS Differential Revision: D16060595 Pulled By: wanchaol fbshipit-source-id: e1e27d6be8d62fc1a841860a783aff108980d9d3	2019-07-10 14:53:35 -07:00
Tongzhou Wang	368dbb9ab3	Fix a FIXME in test_nn (#22675 ) Summary: https://github.com/pytorch/pytorch/issues/17262 is already resolved, so this should pass now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22675 Differential Revision: D16188003 Pulled By: zou3519 fbshipit-source-id: 32693229a0590b274ed1bf76b815f17e77c2d3ea	2019-07-10 13:12:50 -07:00
Elias Ellison	00df49c984	Fix Trace inlining of graphs with optional inputs (#22686 ) Summary: Previously in tracing when we called a script function we would inline the graph and set the graph inputs equal to the types the graph was invoked with. This breaks for optional arguments invoked with None since we rely on None being set to Optional[T] in schema matching. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22686 Differential Revision: D16186372 Pulled By: eellison fbshipit-source-id: e25c807c63527bf442eb8b31122d50689c7822f5	2019-07-10 12:57:06 -07:00
Lucas Kabela	3e3e6ee335	Add common_quantized test case utilities (#22694 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22694 Move quantization and quantized utility functions for testing to common_quantized.py and common_quantization.py. Addditionally, add a quantized test case base class which contains common methods for checking the results of quantization on modules. As a consequence of the move, fixed the import at the top of test_quantized.py, and test_quantization to use the new utility Reviewed By: jerryzh168 Differential Revision: D16172012 fbshipit-source-id: 329166af5555fc829f26bf1383d682c25c01a7d9	2019-07-10 12:23:36 -07:00
Hong Xu	7750cae722	Refactor and improve randperm tests. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22121 Test Plan: Imported from OSS Differential Revision: D16153794 Pulled By: li-roy fbshipit-source-id: 4dbfa6cfcc79f6d431918a6646664215fa9ea0b9	2019-07-10 12:23:33 -07:00
Hong Xu	32709af8f4	Swap detection order in randperm_out_cuda to avoid unnecessary conversion from float when the input is small. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22103 Test Plan: Imported from OSS Differential Revision: D16153585 Pulled By: li-roy fbshipit-source-id: 0801b91e7b352c8de8fdfbe929be85d69182b8da	2019-07-10 12:23:29 -07:00
Hong Xu	0f7c3710dd	Support Half type in randperm. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22102 Test Plan: Imported from OSS Differential Revision: D16153586 Pulled By: li-roy fbshipit-source-id: d58e3dbc5da893005f4eaf521a28b0d752274eff	2019-07-10 12:23:25 -07:00
Hong Xu	9c4c9c3af0	Delegate Python ~ (invert operator) to Tensor.bitwise_not(). Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22326 Test Plan: Imported from OSS Differential Revision: D16183577 Pulled By: colesbury fbshipit-source-id: f86838c407db4ded9ce70998bf1ab1ffd75b3b58	2019-07-10 12:17:52 -07:00
Hong Xu	574e808680	Add a bitwise NOT operator for integer and Boolean types (CUDA). Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22320 Test Plan: Imported from OSS Differential Revision: D16183578 Pulled By: colesbury fbshipit-source-id: 2f72cce5e10fd637be1ac87e1bbfe0937a661034	2019-07-10 12:17:48 -07:00
Hong Xu	e2dc1fc715	Add a bitwise NOT operator for integer and Boolean types (CPU). Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22283 Test Plan: Imported from OSS Differential Revision: D16183576 Pulled By: colesbury fbshipit-source-id: 2e539fab8ff885dddb9bff334d1d784b28d65b8f	2019-07-10 12:17:44 -07:00
mal	58e20638f7	Refactoring _wrap_outputs to remove python dependence. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22631 Test Plan: test suite Imported from OSS Differential Revision: D16185040 fbshipit-source-id: 9b83749f6c9cd05d13f54a3bb4801e263293252b	2019-07-10 12:12:16 -07:00
Michael Suo	ec1b669d23	fix dce over loops Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22632 Test Plan: Imported from OSS Differential Revision: D16184469 Pulled By: suo fbshipit-source-id: b7cc2d20a7dd8b287e1b6128ddb70d3936032a7e	2019-07-10 12:03:19 -07:00
Xiaodong Wang	9b8d771733	skip import nccl and gloo_gpu in cpu machine (#22522 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22522 Skip importing nccl and gloo_gpu modules in cpu machine Reviewed By: bddppq Differential Revision: D16115827 fbshipit-source-id: 329b7a0bb5eccb78c9e772bdab5db7c79b546d55	2019-07-10 11:56:56 -07:00
Jerry Zhang	b984b0ab4b	fix print (#22689 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22689 att Reviewed By: Lucaskabela Differential Revision: D16184260 fbshipit-source-id: 1a6ad51a37918d0c81d6e3baa0ca0baa32cb9673	2019-07-10 11:26:34 -07:00
Nikolay Korovaiko	f81395b3e3	Enable more passes in ProfilingGraphExecutor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22079 Differential Revision: D16119322 Pulled By: Krovatkin fbshipit-source-id: 301fcc42d0e1f031d9de5bcd9679fb8c2d742fef	2019-07-10 10:44:18 -07:00
Iurii Zdebskyi	10c60b601a	Added Bfloat16 tensor for cpu with very limited support (#21860 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21860 ghimport-source-id: 5290755b63033cdfdeb911a4ecf4aa282b3db02d Test Plan: Imported from OSS Differential Revision: D15856091 Pulled By: izdeby fbshipit-source-id: 54e7e17be1b5c5a2e80a41feaeaeba75dbb8108f	2019-07-10 09:08:52 -07:00
Zhi Tian	6eb3969ac7	keep reuqires_grad unchanged after converting bn to syncbn (#22569 ) Summary: After converting BN layers to SyncBN layers, the function will set all `requires_grad = True` regardless of the original requires_grad states. I think it is a bug and have fixed it in this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22569 Differential Revision: D16151647 Pulled By: zou3519 fbshipit-source-id: e2ad1886c94d8882485e7fb8be51ad76469ecc67	2019-07-10 08:38:04 -07:00
Edward Yang	cbb0b8166d	Revert D16161144: [pytorch][PR] Add traces to LowerGradOf and SpecializeAutoGrad Differential Revision: D16161144 Original commit changeset: 9e206fcfb179 fbshipit-source-id: 8f9eecb5cd6ca715bd0c647c32cf77cd9d88e6ac	2019-07-10 06:55:01 -07:00
Iurii Zdebskyi	3a8d7463bd	Enabled BFloat16 storage (#21523 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21523 ghimport-source-id: 698b3cbd6b21c09b9ff8bf8011980df8e35c33b0 Test Plan: Imported from OSS Differential Revision: D15819368 Pulled By: izdeby fbshipit-source-id: f6b3bba7b3ca8ee677bd80a231dbb3920c07d61c	2019-07-09 21:51:06 -07:00
svcscm	932ec8aa9f	Updating submodules Reviewed By: zpao fbshipit-source-id: f5636ab0457c1b2e15df95a5677a7194978d9cd0	2019-07-09 21:39:57 -07:00
Iurii Zdebskyi	e72b617eb5	Intoducing bfloat16 type (#21522 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21522 ghimport-source-id: 4803f197ec04938501fdb10c1741280331c349d2 Test Plan: Imported from OSS Differential Revision: D15819369 Pulled By: izdeby fbshipit-source-id: 46408dc316a5c4dc644a736dc42da2422b34bcb9	2019-07-09 21:14:10 -07:00
xzhu1900	de5a481c6e	add forward declaration in stateful dataset (#22562 ) Summary: Addressing potential dependency issue by adding forward declaration for OutputArchive/InputArchive. This change follows the same pattern in base.h in 'torch/csrc/api/include/torch/data/samplers/base.h' Pull Request resolved: https://github.com/pytorch/pytorch/pull/22562 Differential Revision: D16161524 Pulled By: soumith fbshipit-source-id: d03f8a2ece5629762f9fa8a27b15b0d037e8f07b	2019-07-09 16:41:56 -07:00
Mingzhe Li	3cf5f22f02	Enable C2 operators running with {cpu, gpu} * {forward, backward} (#22664 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22664 This diff enables c2 operators to run the combination of {cpu, gpu} * {forward, backward}. Reviewed By: hl475 Differential Revision: D15781789 fbshipit-source-id: e9843e3c46ea144042829860638d406f6a33792b	2019-07-09 16:41:53 -07:00
Mingzhe Li	95a5da175d	change c2 bench to use new tensor creation interface (#22663 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22663 as title Reviewed By: hl475 Differential Revision: D15744502 fbshipit-source-id: 441ab9fb7580ca87c3f2027d0a63ba18b8d35016	2019-07-09 16:41:49 -07:00
Hong Xu	e1fdf8a46f	Add comments about adding new build options. (#22641 ) Summary: Also revert the change of cmake.py in c97829d7011bd59d662f6af9c3a0ec302e7e75fc . The comments are added to prevent future similar incidents in the future (which has occurred a couple of times in the past). Pull Request resolved: https://github.com/pytorch/pytorch/pull/22641 Differential Revision: D16171763 Pulled By: ezyang fbshipit-source-id: 5a65f9fbb3c1c798ebd25521932bfde0ad3d16fc	2019-07-09 16:41:46 -07:00
Andrew Jones	e2216ada65	Properly formats errors rising up from C++ extension compilation (#22445 ) Summary: Here's a C++ extension with a missing semicolon: ```python torch.utils.cpp_extension.load_inline('test', 'int main() { return 0 }') ``` which currently generates this error ``` RuntimeError: Error building extension 'test_v6': b'[1/2] c++ -MMD -MF main.o.d - DTORCH_EXTENSION_NAME=test_v6 -DTORCH_API_INCLUDE_EXTENSION_H -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site- packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site- packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /tmp/torch_extensions/test/main.cpp -o main.o\nFAILED: main.o \nc++ -MMD -MF main.o.d - DTORCH_EXTENSION_NAME=test_v6 -DTORCH_API_INCLUDE_EXTENSION_H -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site- packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site- packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /tmp/torch_extensions/test/main.cpp -o main.o\n/tmp/torch_extensions/test/main.cpp: In function \xe2\x80\x98int main()\xe2\x80\x99:\n/tmp/torch_extensions/test/main.cpp:2:23: error: expected \xe2\x80\x98;\xe2\x80\x99 before \xe2\x80\x98}\xe2\x80\x99 token\n int main() { return 0 }\n ^\nninja: build stopped: subcommand failed.\n' ``` After this PR, the error is ``` RuntimeError: Error building extension 'test': [1/2] c++ -MMD -MF main.o.d - DTORCH_EXTENSION_NAME=test -DTORCH_API_INCLUDE_EXTENSION_H -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site- packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site- packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /tmp/torch_extensions/test/main.cpp -o main.o FAILED: main.o c++ -MMD -MF main.o.d -DTORCH_EXTENSION_NAME=test - DTORCH_API_INCLUDE_EXTENSION_H -isystem /opt/conda/lib/python3.7/site- packages/torch/include -isystem /opt/conda/lib/python3.7/site- packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site- packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /tmp/torch_extensions/test/main.cpp -o main.o /tmp/torch_extensions/test/main.cpp: In function ‘int main()’: /tmp/torch_extensions/test/main.cpp:2:23: error: expected ‘;’ before ‘}’ token int main() { return 0 } ^ ninja: build stopped: subcommand failed. ``` which is a lot easier to read. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22445 Differential Revision: D16094205 Pulled By: ezyang fbshipit-source-id: 21043344aac260dc3e4e04d6a42898507bb840e4	2019-07-09 16:41:42 -07:00
Nick Korovaiko	50901be9fb	Add traces to LowerGradOf and SpecializeAutoGrad Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22599 Differential Revision: D16161144 Pulled By: Krovatkin fbshipit-source-id: 9e206fcfb1796e9448e80f178b75d0c277bd348f	2019-07-09 16:41:39 -07:00
Tongzhou Wang	0c2cd93e43	Avoid potential extra copy in _lu_with_info_cuda (#22634 ) Summary: No need to `clone` if the expanded size matches original size. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22634 Differential Revision: D16171091 Pulled By: ezyang fbshipit-source-id: 3d8f116398f02952488e321c0ee0ff2868768a0c	2019-07-09 16:41:36 -07:00
Mingzhe Li	45aad2e680	change unary, pool, max ops to use new interface (#22661 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22661 as title Reviewed By: hl475 Differential Revision: D16170825 fbshipit-source-id: d80944224b8717e7aa35980907ff48e587b85217	2019-07-09 16:41:32 -07:00
Mingzhe Li	2b2fe525b9	introduce a new interface to add a list of operators (#21209 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21209 This diff introduces a new interface to add a list of operators. Here are the steps to add ops using this interface: - create op_list: ```unary_ops_list = op_bench.op_list( attr_names=["op_name", "op_function"], attrs=[ ["abs", torch.abs], ["abs_", torch.abs_], ], ) ``` - create a bench class: ``` class UnaryOpBenchmark(op_bench.TorchBenchmarkBase): def init(self, M, N, op_function): self.input_one = torch.rand(M, N) self.op_func = op_function def forward(self): return self.op_func(self.input_one) ``` - 3. register those ops ``` op_bench.generate_pt_tests_from_list(unary_ops_list, unary_ops_configs, UnaryOpBenchmark) ``` Reviewed By: zheng-xq Differential Revision: D15514188 fbshipit-source-id: f09b359cab8175eeb8d51b3ad7bbbcfbc9f6430f	2019-07-09 16:41:29 -07:00
Jerry Zhang	164388150a	fix lint (#22654 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22654 att Reviewed By: bddppq Differential Revision: D16168219 fbshipit-source-id: db1a5e2161e7be70b2f6e6b4beaa27ea91f853f2	2019-07-09 16:41:26 -07:00
davidriazati	8a233b99cb	Report errors through call stack (#22280 ) Summary: The error for `test_error_stack_module`: ``` Traceback (most recent call last): File "../test.py", line 35, in <module> scripted = torch.jit.script(M()) File "/home/davidriazati/other/pytorch/torch/jit/__init__.py", line 1119, in script return _convert_to_script_module(obj) File "/home/davidriazati/other/pytorch/torch/jit/__init__.py", line 1825, in _convert_to_script_module raise e RuntimeError: d(int x) -> int: Expected a value of type 'int' for argument 'x' but instead found type 'str'. : at ../test.py:11:12 def c(x): return d("hello") + d(x) ~ <--- HERE 'c' is being compiled since it was called from 'b' at ../test.py:14:12 def b(x): return c(x) ~~~ <--- HERE 'b' is being compiled since it was called from 'forward' at ../test.py:22:16 def forward(self, x): return b(x) ~~~ <--- HERE 'forward' is being compiled since it was called from 'forward' at ../test.py:31:20 def forward(self, x): return x + self.submodule(x) ~~~~~~~~~~~~~~~~ <--- HERE ``` This also unifies our error reporting in the front end with `ErrorReport` TODO * Include module names in message, #22207 should make this easy ](https://our.intern.facebook.com/intern/diff/16060781/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/22280 Pulled By: driazati Differential Revision: D16060781 fbshipit-source-id: c42968b53aaddb774ac69d5abbf7e60c23df8eed	2019-07-09 16:41:22 -07:00
Zafar Takhirov	13d58fd9f5	README for the quantized op creation (#22165 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22165 Workflow description for the quantized ops design. Reviewed By: jerryzh168 Differential Revision: D15975977 fbshipit-source-id: ef73b172f609adef149c157c404bb452b5457a9f	2019-07-09 16:41:19 -07:00
Bram Wasti	dd4982e287	Cleanup integer sign warnings Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22560 Test Plan: Imported from OSS Differential Revision: D16151479 Pulled By: bwasti fbshipit-source-id: a54139d8f95ed964530862f96723e4365724b2da	2019-07-09 16:41:16 -07:00
Brandon Amos	046c4589df	lu: When not using pivoting, return the identity permutation instead of zeros (#22242 ) Summary: Some of my qpth users have told me that updating to the latest version of PyTorch and replacing the btrifact/btrisolve calls with the LU ones wasn't working and I didn't believe them until I tried it myself :) These updates have broken unpivoted LU factorizations/solves on CUDA. The LU factorization code used to return the identity permutation when pivoting wasn't used but now returns all zeros as the pivots. This PR reverts it back to return the identity permutation. I've not yet tested this code as I'm having some trouble compiling PyTorch with this and am hitting https://github.com/pytorch/pytorch/issues/21700 and am not sure how to disable that option. Here's a MWE to reproduce the broken behavior, and my fix. ```python torch.manual_seed(0) n = 4 L = torch.randn(n,n) A = L.mm(L.t()).unsqueeze(0) b = torch.randn(1, n) A_lu_cpu = torch.lu(A) A_lu_cuda_nopivot = torch.lu(A.cuda(), pivot=False) A_lu_cuda_pivot = torch.lu(A.cuda(), pivot=True) print('A_lu_cuda_nopivot\n', A_lu_cuda_nopivot) print('-----\nA_lu_cuda_pivot\n', A_lu_cuda_nopivot) x_cpu = b.lu_solve(A_lu_cpu) x_cuda_nopivot = b.cuda().lu_solve(A_lu_cuda_nopivot) x_cuda_nopivot_fixed = b.cuda().lu_solve( A_lu_cuda_nopivot[0], torch.arange(1, n+1, device='cuda:0').int()) x_cuda_pivot = b.cuda().lu_solve(*A_lu_cuda_pivot) print(x_cpu, x_cuda_nopivot, x_cuda_nopivot_fixed, x_cuda_pivot) ``` Output: ``` A_lu_cuda_nopivot (tensor([[[ 2.8465, -0.7560, 0.8716, -1.7337], [-0.2656, 5.5724, -1.1316, 0.6678], [ 0.3062, -0.2031, 1.4206, -0.5438], [-0.6091, 0.1198, -0.3828, 1.5103]]], device='cuda:0'), tensor([[0, 0, 0, 0]], device='cuda:0', dtype=torch.int32)) ----- A_lu_cuda_pivot (tensor([[[ 2.8465, -0.7560, 0.8716, -1.7337], [-0.2656, 5.5724, -1.1316, 0.6678], [ 0.3062, -0.2031, 1.4206, -0.5438], [-0.6091, 0.1198, -0.3828, 1.5103]]], device='cuda:0'), tensor([[0, 0, 0, 0]], device='cuda:0', dtype=torch.int32)) (tensor([[-0.3121, -0.1673, -0.4450, -0.2483]]), tensor([[-0.1661, -0.1875, -0.5694, -0.4772]], device='cuda:0'), tensor([[-0.3121, -0.1673, -0.4450, -0.2483]], device='cuda:0'), tensor([[-0.3121, -0.1673, -0.4450, -0.2483]], device='cuda:0')) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/22242 Differential Revision: D16049334 Pulled By: ezyang fbshipit-source-id: 7eacae810d87ffbdf8e07159bbbc03866dd9979d	2019-07-09 11:16:50 -07:00
Tim Hatch	7fcfed19e7	Fix interpreter lines for files with python2-only syntax. Reviewed By: lisroach Differential Revision: D15362271 fbshipit-source-id: 48fab12ab6e55a8537b19b4623d2545ca9950ec5	2019-07-09 10:51:43 -07:00
Jerry Zhang	5040d52a5a	torch.quantization conversion utilities, observers for eager mode quantization (#22010 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22010 torch.quantization module with observers and conversion routines Reviewed By: zafartahirov Differential Revision: D15554183 fbshipit-source-id: 05a3fabe28dd701978b8ecebf5bfc3a4c044ba5c	2019-07-09 10:51:38 -07:00
Nikolay Korovaiko	073fa6f411	add GRAPH_UPDATE logging to guard_elimination.cpp Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22497 Differential Revision: D16165106 Pulled By: Krovatkin fbshipit-source-id: aeb48d81d92c71f7038903b1656d760b6b95c562	2019-07-09 10:09:35 -07:00
ptrblck	a3346e100e	Performance improvements for depthwise convolutions in FP16 (#22302 ) Summary: This PR activates faster depthwise convolution kernels for Volta and Turing GPUs using cudnn >= 7600. The script to benchmark the current PyTorch master branch and this PR branch can be found [here](https://gist.github.com/ptrblck/4590cf20721d8f43296c9903abd4a774). (50 warmup iterations, 1000 iterations for timing) I've used https://github.com/pytorch/pytorch/issues/3265 to create a similar benchmark and added a few additional setups. Since the results are quite long, I've uploaded them in a spreadsheet [here](https://docs.google.com/spreadsheets/d/13ByXcqg7LQUr3DVG3XpLwnJ-CXg3GUZJ3puyTMw9n2I/edit?usp=sharing). Times are given in ms per iteration. We've benchmarked this PR on a DGX1 using V100 GPUs. The current workload check in `check_cudnn_depthwise_workload` is quite long and can be moved to another file, if wanted. CC ngimel (Thanks for the support while benchmarking it ;) ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/22302 Differential Revision: D16115057 Pulled By: ezyang fbshipit-source-id: bad184658518e73b4d6b849d77e408f5a7a757de	2019-07-09 07:28:31 -07:00
SsnL	31d821e267	Move thnvrtc and DynamicLibrary to ATen (#22362 ) Summary: Having the NVRTC stub in ATen is necessary to call driver APIs in ATen. This is currently blocking https://github.com/pytorch/pytorch/pull/22229. `DynamicLibrary` is also moved as it is used in the stub code, and seems general enough. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22362 Differential Revision: D16131787 Pulled By: ezyang fbshipit-source-id: add2ee8a8865229578aa00001a00d5a6671e0e73	2019-07-09 07:28:27 -07:00
Tongzhou Wang	74883d4865	Fix typos in gradcheck error message Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22357 Differential Revision: D16065935 Pulled By: ezyang fbshipit-source-id: f131655eaca27f9df4cd6c511faabf0b8f2bf0de	2019-07-09 07:12:56 -07:00
Barys Skarabahaty	92e8d04098	Sync worker requirement mismatches Summary: Syncing worker requirement mismatches to improve remote build time. Created actions: MEDIUM: 488 LARGE: 29 XXLARGE: 2 Updated actions: From MEDIUM to LARGE: 227 From XLARGE to MEDIUM: 1 From XLARGE to LARGE: 1 From XLARGE to XXLARGE: 1 From LARGE to MEDIUM: 2 From LARGE to XLARGE: 2 Differential Revision: D16161669 fbshipit-source-id: 67a4e0d883ca3f1ca3185a8285903c0961537757	2019-07-09 05:24:19 -07:00
Nick Korovaiko	9fb4386c14	Add a higher-level log traces to DCE Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22511 Differential Revision: D16153694 Pulled By: Krovatkin fbshipit-source-id: 5edbc04bdccfa89f5ad0bf37f51e1bd2cb28962a	2019-07-08 21:55:02 -07:00
Shoaib Ahmed Siddiqui	5395db22a4	Typo fixed in documentation Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22600 Differential Revision: D16156989 Pulled By: mrshenli fbshipit-source-id: e491b083d872eaceb829028dadbab2e28ecfc785	2019-07-08 19:29:07 -07:00
Igor Fedan	b5860b5572	torchvision version changed to the latest one (#22598 ) Summary: Version changed to `487c9bf4b7` Pull Request resolved: https://github.com/pytorch/pytorch/pull/22598 Differential Revision: D16155471 Pulled By: ifedan fbshipit-source-id: 2d54883c91add28c0f076858f292363eb95340a9	2019-07-08 17:13:59 -07:00
Jongsoo Park	738aba171b	use caffe2_dnnlowp_force_slow_path in FC (#22143 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22143 Like Conv DNNLOWP operator, allow FC to run the slow path to debug numerical issues caused by Intel's int8 instruction that does horizontal addition of 2 int8 multiplication results in 16 bit Reviewed By: hx89 Differential Revision: D15966885 fbshipit-source-id: c6726376a3e39d341fd8aeb0e54e0450d2af8920	2019-07-08 17:01:04 -07:00
Nikolay Korovaiko	905b9a89b2	fix uninitialized variable in BailOutInserter Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22596 Differential Revision: D16156883 Pulled By: Krovatkin fbshipit-source-id: 8926262a2d3115f34400c9ebb0c98c540e1cc623	2019-07-08 16:45:51 -07:00
Supriya Rao	c97829d701	Adding FC and Relu QNNPACK ops to C10 registry (#22174 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22174 This is a preliminary change outlining the approach we plan to follow to integrate QNNPACK operators into the pytorch backend. The operators will not be made visible to the user in the python world, so ultimately we will have a function that calls qnnpack backend based on the environment being run on. The goal of the project is to integrate QNNPACK library with PyTorch to achieve good performance for quantized mobile models. Reviewed By: ljk53 Differential Revision: D15806325 fbshipit-source-id: c14e1d864ac94570333a7b14031ea231d095c2ae	2019-07-08 14:21:42 -07:00
Owen Anderson	0e7b65e48a	Convert VariableVersion counter to intrusive_ptr, saving a memory allocation on every Tensor. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22514 Differential Revision: D16114714 fbshipit-source-id: 441043d18938710869b64cb67884f49cd6060727	2019-07-08 13:40:58 -07:00
Hong Xu	0c1ecf19e1	Simplify the flow control in div_kernel_cuda. (#22555 ) Summary: Some duplicated code is removed. It also becomes clear that there is only one special case `div_kernel_cuda` is handling. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22555 Differential Revision: D16152091 Pulled By: zou3519 fbshipit-source-id: bb875370077c1f84efe4b766b3e1acc461e73e6c	2019-07-08 12:13:10 -07:00
SsnL	478d480d37	Add Module.requires_grad_ (#22576 ) Summary: addresses https://github.com/pytorch/pytorch/issues/20241 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22576 Differential Revision: D16149314 Pulled By: zou3519 fbshipit-source-id: 1cc4c1ec084df30e00e9ae73ce1a53494a034d5c	2019-07-08 12:13:07 -07:00
joker	456d27dff0	Update module.h (Fix a grammatical error of the comment in line 233) (#22548 ) Summary: Fix a grammatical error of the comment in line 233. change from " Returns an `OrderedDict` of he submodules of this `Module`" to " Returns an `OrderedDict` of the submodules of this `Module`" Pull Request resolved: https://github.com/pytorch/pytorch/pull/22548 Differential Revision: D16134534 Pulled By: zou3519 fbshipit-source-id: 33b1dd0fbc3a24bef99b6e0192566e2839292842	2019-07-08 11:51:49 -07:00
Will Feng	3a12520844	Pass Variable into Caffe2 ops, by requiring that the Variable doesn't require grad (#22473 ) Summary: As part of the Variable/Tensor merge, we want to be able to pass Variables into Caffe2 without doing extra shallow copy, to improve performance and also allow for in-place mutations in Caffe2 ops. There are a few approaches outlined in https://github.com/pytorch/pytorch/pull/22418, and this PR is the chosen approach. Specifically, we can have the assumption that we won't be connecting autograd to C2 gradients at any point (as it's too tricky and not that useful). Therefore, we can pass Variable into Caffe2 ops by requiring that all Variables in Caffe2 don't require grad. For code paths in Caffe2 that might potentially track gradients (e.g. `ScriptModuleOp` and `call_caffe2_op_from_c10`), we use the `torch::NoGradGuard` to make sure gradients are not tracked. This supersedes https://github.com/pytorch/pytorch/pull/22418. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22473 Differential Revision: D16099042 Pulled By: yf225 fbshipit-source-id: 57efc3c7cfb3048d9abe90e63759acc14ebd2972	2019-07-08 11:31:10 -07:00
Iurii Zdebskyi	304552b61a	Enabled masked fill and scatter for bool tensors (#22491 ) Summary: Enabled masked_fill, masked_fill_, masked_scatter_, masked_scatter on bool tensors. Tested via unit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/22491 Differential Revision: D16108817 Pulled By: izdeby fbshipit-source-id: 8b1808f41d7a4f65fe6d3797a3c83b2dac3446c7	2019-07-08 10:49:40 -07:00
Pavel Belevich	a48cf8f52d	Fixed RNNImplBase reset and flat_weights methods to handle bidirectional flag correctly (#22493 ) Summary: Fixing https://github.com/pytorch/pytorch/issues/19545: Changed torch/csrc/api/src/nn/modules/rnn.cpp to be consistent with torch/nn/modules/rnn.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/22493 Differential Revision: D16111433 Pulled By: pbelevich fbshipit-source-id: edfa41e8a9889d64918998dc7c46b8763fdf5765	2019-07-08 10:34:04 -07:00
Jon Malmaud	595e344769	Add type stubs to import 'nn' modules (#22411 ) Summary: Forgot to mirror the `nn/ __init__.py` semantics in the new `nn` type stub. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22411 Differential Revision: D16149798 Pulled By: ezyang fbshipit-source-id: 0ffa256fbdc5e5383a7b9c9c3ae61acd11de1dba	2019-07-08 09:22:37 -07:00
vishwakftw	9bafe5d4da	Fix torch.normal with CUDA tensors (#22533 ) Summary: `addcmul_out` overwrote the samples, which led to constant values being output by `torch.normal`. Changelog: - Replace the `addcmul_out` calls with combo of inplace `mul` and `add` and justification for this change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22533 Test Plan: - Enable tests for test_normal on all devices Fixes https://github.com/pytorch/pytorch/issues/22529 Differential Revision: D16141337 Pulled By: ezyang fbshipit-source-id: 567a399042e0adcd154582f362318ce95a244c62	2019-07-08 08:27:38 -07:00
Hong Xu	80e2fab952	Deprecate and set a date for removing NO_* and WITH_* (user) build options (#22474 ) Summary: Currently specifying different build options in respect to the "USE_" series is in quite a disarray. There are a lot of build options that accept three variants: USE_OPTION, WITH_OPTION, and NO_OPTION. Some build options only accept USE_ and NO_ variant. Some accept only USE_. This inconsistency is quite confusing and hard to maintain. To resolve this inconsistency, we can either let all these build options support all three variants, or we only support the USE_ variant. This commit makes a step to the latter choice, i.e., deprecates and sets a date for removing the NO_ and WITH_ variants and keeps only the USE_ variant. This is likely better than the former solution because: - NO_ and WITH_ variants are not documented. - CMakeLists.txt only has the USE_ variants for relevant build options defined. It would be a surprise that when user pass these variables to CMake during rebuild and find them ineffective. - Multiple variants are difficult to maintain. - The behavior is confusing if more than one variant is passed. For example, what to be expected if one sets "NO_CUDA=1 USE_CUDA=1"? The downside is that this will break backward compatibility for existing build scripts in the future (if they used the undocumented build options). Pull Request resolved: https://github.com/pytorch/pytorch/pull/22474 Differential Revision: D16149396 Pulled By: ezyang fbshipit-source-id: 7145b88ad195db2051772b9665dd708dfcf50b7d	2019-07-08 08:22:08 -07:00
Arul	43d36415b9	torch.utils.data.Dataloader: documentation about RNG state consumption (#22540 ) Summary: the outcome from the pytorch forum issue: https://discuss.pytorch.org/t/dataloader-problem-problem-arises-when-shuffle-true/45631 The discussion is here: https://github.com/pytorch/pytorch/pull/20749 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22540 Differential Revision: D16131777 Pulled By: ezyang fbshipit-source-id: 566deda1b44dc7fae54250e9b508d120851a2848	2019-07-08 08:22:04 -07:00
peter	ce8c9d9bd5	Fix cuda detection script (#22527 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/22507 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22527 Differential Revision: D16126220 Pulled By: ezyang fbshipit-source-id: eb05141282b0f058324da1b3d3cb34566f222a67	2019-07-08 07:06:59 -07:00
peter	d4464d3418	Use system locale in collect_env.py (#22579 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/22570. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22579 Differential Revision: D16147304 Pulled By: soumith fbshipit-source-id: db73447bffbfdf54f7b830447d4b9584f363f05f	2019-07-07 20:55:31 -07:00
SsnL	d48cbd62cd	Fix spectral_norm load_state_dict with strict=False (#22545 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/21251 also fixes some missing hook removals. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22545 Differential Revision: D16139506 Pulled By: soumith fbshipit-source-id: 552a9f9f91be328a47ee8f1e1d29c1f59b0ebca3	2019-07-07 19:08:48 -07:00
peter	94bd5ddf7f	Add some essentials for building c++ extensions on Windows (#22563 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/22489. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22563 Differential Revision: D16142615 Pulled By: ezyang fbshipit-source-id: d7c27a874f788dd27065fad6699485e4a6372ec4	2019-07-06 19:29:25 -07:00
Jongsoo Park	9db7bc8bc7	fix uninitialized variable warning (#22477 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22477 There is actually no use of uninitialized variable but some compilers are not smart enough to reason about two if branches are already taken together. Reviewed By: hx89 Differential Revision: D16100211 fbshipit-source-id: 25f01d668063603d7aaa776451afe8a10415d2ea	2019-07-06 00:36:45 -07:00
Lara	42c6ea5faa	ONNX Export Topk with Dynamic k (+ add test cases) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21104 Differential Revision: D16061592 Pulled By: houseroad fbshipit-source-id: 855b310a138fdde9c25869ffe9f127189dc2eaf5	2019-07-05 23:46:36 -07:00
Will Feng	221af09ca7	Move GradMode / AutoGradMode / NoGradGuard to ATen core (#18573 ) Summary: After the Variable/Tensor merge, code paths in ATen need to be able to check whether a tensor requires gradient, and throw errors in places where a `requires_grad=true` tensor is not allowed (such as https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/Utils.h#L76-L78 and https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/SparseTensorImpl.cpp#L86). Since the `GradMode` thread-local variable controls whether a tensor should accumulate gradients, we need to be able to check this variable from ATen when we determine whether a tensor requires gradient, hence the PR to move `GradMode` / `AutoGradMode` / `NoGradGuard` to ATen. Note that we intentionally don't merge `at::GradMode` and `at::NonVariableTypeMode`, with the following reasoning: Semantically, `at::GradMode` and `at::NonVariableTypeMode` actually mean different things: `at::GradMode` controls whether a tensor should accumulate gradients, and `at::NonVariableTypeMode` controls whether a Variable should be treated as a non-Variable tensor in type dispatches. There are places whether we don't want the tensor to accumulate gradients, but still want the Variable to be treated as a Variable. Here is one example: ```python # torch/tensor.py with torch.no_grad(): ... new_tensor = self.new() # `at::GradMode` is false at this point ... ``` ```cpp // tools/autograd/templates/python_variable_methods.cpp static PyObject * THPVariable_new(PyObject* self, PyObject* args, PyObject* kwargs) { ... // if we merge `at::GradMode` and `at::NonVariableTypeMode`, since `at::GradMode` is false and `self_.type()` checks `at::GradMode` to decide whether to return non-Variable type, it will return a non-Variable type here, which is not what we want (and throws a "Tensor that was converted to Variable was not actually a Variable" error) return THPVariable_Wrap(torch::utils::legacy_tensor_new(self_.type(), args, kwargs)); ... } ``` For the above reason, we cannot merge `at::GradMode` and `at::NonVariableTypeMode`, as they have different purposes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18573 Differential Revision: D16134413 Pulled By: yf225 fbshipit-source-id: 6140347e78bc54206506499c264818eb693cdb8a	2019-07-05 23:41:37 -07:00
James Reed	39a4ec4141	Make device_of take tensor by const ref Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22556 Test Plan: Imported from OSS Differential Revision: D16137540 Pulled By: jamesr66a fbshipit-source-id: 8a6be6edd602c53edc954508ea27d8a6071bd964	2019-07-05 23:34:43 -07:00
Dehua Cheng	7730346853	Make shuffling optional in DistributedSampler (#22479 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22479 In some cases, for example, when we training on CTR data, we would like to start training from old samples and finish on new recent samples. This diff add the option to disable the shuffling in DistributedSampler to accommodate this use case. Reviewed By: soumith Differential Revision: D16100388 fbshipit-source-id: 35566581f5250040b2db5ec408a63037b47a9f5d	2019-07-05 18:56:28 -07:00
svcscm	9e1ae4b264	Updating submodules Reviewed By: cdelahousse fbshipit-source-id: 8932f509ab9b14988428a1b9a42d3388ff5a4ad5	2019-07-05 18:36:03 -07:00
Elias Ellison	577042a3cc	Better Constant Propagation through Tuples (#22561 ) Summary: Replaces https://github.com/pytorch/pytorch/pull/21501 because ghimport had errors when i tried to import the stack that i couldn't figure out :'( has the two commits that were previously accepted and the merge commit Pull Request resolved: https://github.com/pytorch/pytorch/pull/22561 Differential Revision: D16135743 Pulled By: eellison fbshipit-source-id: f0a98842ccb334c7ceab04d1437e09dc76be0eb1	2019-07-05 18:06:46 -07:00
Sebastian Messmer	a09150adc0	Deprecate untyped Dicts (#22516 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22516 Force anybody creating an untyped Dict to call c10::impl::deprecatedUntypedDict(). This should hopefully make it clear that this is not public API and prevent people from using it. Differential Revision: D16115215 fbshipit-source-id: 2ef4cb443da1cdf4ebf5b99851f69de0be730b97	2019-07-05 18:00:13 -07:00
svcscm	595d2dbb4d	Updating submodules Reviewed By: cdelahousse fbshipit-source-id: 1a6c249837151564f48f675ced6a221ec739aae9	2019-07-05 15:39:56 -07:00
Nikolay Korovaiko	91706d1044	Primitive Jit Logging Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22278 Differential Revision: D16134598 Pulled By: Krovatkin fbshipit-source-id: e64b14d0d68801189fc78c059a4e8b322acce3fa	2019-07-05 15:27:38 -07:00
Sebastian Messmer	ed60d9fcf9	List/Dict remember and check their element type (#22005 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22005 When a Dict or List is created with type information, it will remember that. If at any point later, this list is instantiated to a List<T> with a concrete type, it will assert that T is the correct type. Differential Revision: D15914462 fbshipit-source-id: a8c3d91cb6d28d0c1ac0b57a4c4c6ac137153ff7	2019-07-05 15:17:51 -07:00
mal	042da2171e	Skip test_slogdet_sign if LAPACK library is not installed Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22551 Test Plan: ran test locally Imported from OSS Differential Revision: D16132182 fbshipit-source-id: 5b9efbf883efa66c4d8b7c400bdb804ac668a631	2019-07-05 11:57:24 -07:00
Supriya Rao	3507eaf3ea	Add clone() implementation for QTensor (#22510 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22510 Added a new function to implement clone operation on quantized tensors. Also added a test case which can be tested as shown in test plan. This change is required to be able to call torch.jit.trace on quantized models. Clone implementation calls copy_ on QTensor internally. Differential Revision: D16059576 fbshipit-source-id: 226918cd475521b664ed72ee336a3da8212ddcdc	2019-07-05 11:24:53 -07:00
mal	0140a756d8	Prioritize reentrant tasks and execute them recursively until close to limit Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22397 Test Plan: Added test for reentrant backwards with checkpoint and a test for a recursive backwards function (which should fail if we run all the reentrant tasks recursively in the same thread) and for testing priority of reentrant tasks. ~~Will add a test for priority of reentrant tasks in future pr.~~ Imported from OSS Differential Revision: D16131955 fbshipit-source-id: 18301d45c1ec9fbeb566b1016dbaf7a84a09c7ac	2019-07-05 08:51:06 -07:00
Kexin Yu	e5d640341f	Set stream for softmax kernel launch (#22470 ) Summary: Currently, the stream parameter is not set when launching these two kernels: softmax_warp_forward() and softmax_warp_backward(), i.e. the kernels are always put on the default stream, which may fail to respect the stream that was set previously. Add at::cuda::getCurrentCUDAStream() as a launch argument to fix this issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22470 Differential Revision: D16115051 Pulled By: izdeby fbshipit-source-id: 38b27e768bb5fcecc1a06143ab5d63b0e68a279e	2019-07-05 07:33:55 -07:00
Du Tran	d2ceab2766	update video input (#22471 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22471 update C2 video input with latest augmentation Reviewed By: HengCV Differential Revision: D16096127 fbshipit-source-id: bb07394e211cd52b50005d801b6d03250248ea9e	2019-07-05 00:56:33 -07:00
Jianyu Huang	4ba1c4f798	Add the support of handle Bias being nullptr for torch.ops.quantized.fbgemm_conv (#22472 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22472 As Title says. Reviewed By: dskhudia, bddppq Differential Revision: D16097594 fbshipit-source-id: 7f56b7906dd9c2792e21a8aa553c0b8d05b19012	2019-07-04 19:37:37 -07:00
Michael Suo	57dbc79674	fix lint Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22543 Test Plan: Imported from OSS Differential Revision: D16127755 Pulled By: suo fbshipit-source-id: 5f6ba507bf5b671987e2cabf03c2118271800595	2019-07-04 18:26:04 -07:00
Michael Suo	b952011bae	use save/load for emitFunctionHook Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22314 Test Plan: Imported from OSS Differential Revision: D16121781 Pulled By: suo fbshipit-source-id: b376afb082726d78f082a0ff6902c4b501435d4b	2019-07-04 17:12:16 -07:00
Michael Suo	bc24593a80	remove unused argument in import.cpp (#22205 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22205 ghimport-source-id: afdf13f6515a1352cde4cbb7b45bf01e717bcf4d Test Plan: Imported from OSS Differential Revision: D15998763 Pulled By: suo fbshipit-source-id: 6e5c1c668b9de5e352d2aa7ca27197deb42ca94b	2019-07-04 17:12:12 -07:00
Michael Suo	4b9b7d6f03	improvements to QualifiedName (#22204 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22204 ghimport-source-id: 319afc622f7137ca9075efefca1a05acedc19a4a Test Plan: Imported from OSS Differential Revision: D15998759 Pulled By: suo fbshipit-source-id: 4534443aef61255af0fa3d2ed1be5e87266e2f2c	2019-07-04 17:12:08 -07:00
Michael Suo	f5919dba45	refactoring of module/object (#22203 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22203 ghimport-source-id: 6b3807ac8aa53df2fdd770b43d8e54b8f0d69c20 Test Plan: Imported from OSS Differential Revision: D15998760 Pulled By: suo fbshipit-source-id: dd51edbcb66561189ae9d94a129434092bcad01b	2019-07-04 17:12:04 -07:00
Michael Suo	3b2844eeea	Make CompilationUnit own Functions (#22202 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22202 ghimport-source-id: de6c963af1df76d2d6357155e64a5913ab879f76 Test Plan: Imported from OSS Differential Revision: D15998761 Pulled By: suo fbshipit-source-id: 5414a6424953738d823b265d20dc67dde6e5b2d8	2019-07-04 17:12:00 -07:00
Michael Suo	66378c7025	make test context managers exception safe Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22502 Test Plan: Imported from OSS Differential Revision: D16121783 Pulled By: suo fbshipit-source-id: e5f991b189261f596355541cc331ef92667bd1a5	2019-07-04 17:11:56 -07:00
Nikolay Korovaiko	2b06e7cd50	add #pragma once to jit headers Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22505 Differential Revision: D16119310 Pulled By: Krovatkin fbshipit-source-id: 8b742411f40d66690ce28726c213741e0c2de618	2019-07-04 11:10:59 -07:00
Nikolay Korovaiko	6f6a680481	remove erase_fork_wait.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22509 Differential Revision: D16119307 Pulled By: Krovatkin fbshipit-source-id: 3251f594be6d365652b847bdc35dd4f4b62c35e6	2019-07-03 22:47:57 -07:00
Wanchao Liang	799633e4cd	move casting ops from prim to aten Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22275 Test Plan: Imported from OSS Differential Revision: D16060597 Pulled By: wanchaol fbshipit-source-id: a11d8ad3b037e15bd670cc7cd3fefd4f0abd0bba	2019-07-03 22:22:28 -07:00
Brian Vaughan	97a604ef57	Rereapply optional ScalarType interface changes that were reverted in D16079809 (#22456 ) Summary: re-apply changes reverted in: https://github.com/pytorch/pytorch/pull/22412 Also change log_softmax to take positional arguments. Long-term we do want the kwarg-only interface, but seems to currently be incompatible with jit serialization. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22456 Differential Revision: D16097159 Pulled By: nairbv fbshipit-source-id: 8cb73e9ca18fc66b35b873cf4a574b167a578b3d	2019-07-03 20:03:25 -07:00
David Riazati	10c4b98ade	Remove weak script (#22212 ) Summary: * Deletes all weak script decorators / associated data structures / methods * In order to keep supporting the standard library in script, this enables recursive script on any function defined in `torch.nn` * Most changes in `torch/nn` are the result of `ag -Q "weak" torch/nn/ -l \| xargs sed -i '/weak/d'`, only `rnn.py` needed manual editing to use the `ignore` and `export` to continue supporting the overloaded `forward` methods * `Sequential`/`ModuleList` no longer need to be added to constants since they are compiled on demand This should also fix https://github.com/pytorch/pytorch/issues/22212 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22212 Differential Revision: D15988346 Pulled By: driazati fbshipit-source-id: af223e3ad0580be895377312949997a70e988e4f	2019-07-03 17:28:25 -07:00
Mingzhe Li	b93f29ded3	add JIT path to the benchmark (#22309 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22309 This diff enables PT operators to run with JIT mode. Users can control eager and JIT mode using the `use_jit` flag. In this diff, we are putting operators in a loop and passed it to JIT. One extra step which wraps the operator with the `_consume` op is introduced to avoid dead code elimination optimization in JIT. With that, the reported time includes the real operator execution time plus the `_consume` (directly return input, nothing else if happening inside) op. Reviewed By: zheng-xq Differential Revision: D16033082 fbshipit-source-id: e03be89fd5a505e44e81015dfc63db9cd76fb8a1	2019-07-03 17:18:03 -07:00
Shuaipeng Li	29ec4769bb	Fix SyncBatchNorm running var update issue (#22248 ) Summary: ## Fix https://github.com/pytorch/pytorch/issues/22192 + change signature: `func: batch_norm_gather_stats(Tensor input, Tensor mean, Tensor invstd, Tensor? running_mean, Tensor? running_var, float momentum, float eps, Tensor counts) -> (Tensor, Tensor)` + change cuda & cuda head ```cuda std::tuple<Tensor, Tensor> batch_norm_gather_stats_cuda(const Tensor& self, const Tensor& mean, const Tensor& invstd, const Tensor& running_mean, const Tensor& running_var, double momentum, double epsilon, int64_t count) { const Tensor& running_var, double momentum, double epsilon, const Tensor& counts) ``` + change python interface ```python class SyncBatchNorm(Function): def forward(self, input, weight, bias, running_mean, running_var, eps, momentum, process_group, world_size): ... ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/22248 Differential Revision: D16002146 Pulled By: mrshenli fbshipit-source-id: 9007e83928267b89df4d3847aabfbdb63e456956	2019-07-03 17:17:59 -07:00
Mingzhe Li	325ec2327f	create tensor based on provided datatype (#22468 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22468 as title Reviewed By: ajauhri Differential Revision: D15744503 fbshipit-source-id: 050b32dd7f135512385fc04f098c376c664211a9	2019-07-03 17:08:23 -07:00
BowenBao	319ef3bcbb	Fix onnx custom op export & add initial test case (#21321 ) Summary: - Fix typo in ```torch/onnx/utils.py``` when looking up registered custom ops. - Add a simple test case 1. Register custom op with ```TorchScript``` using ```cpp_extension.load_inline```. 2. Register custom op with ```torch.onnx.symbolic``` using ```register_custom_op_symbolic```. 3. Export model with custom op, and verify with Caffe2 backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21321 Differential Revision: D16101097 Pulled By: houseroad fbshipit-source-id: 084f8b55e230e1cb6e9bd7bd52d7946cefda8e33	2019-07-03 16:59:12 -07:00
Mingzhe Li	9c44f6c723	generate tests based on op metadata (#21432 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21432 This diff introduce a new interface to generate tests based on the metadata of operators. Reviewed By: ajauhri Differential Revision: D15675542 fbshipit-source-id: ba60e803ea553d8b9eb6cb2bcdc6a0368ef62b1c	2019-07-03 16:48:41 -07:00
Sebastian Messmer	2732a5e534	Another dce fix (#22499 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22499 Another place where onnx export is running dead code elimination after making the jit graph invalid. Fixing it. Reviewed By: houseroad Differential Revision: D16111969 fbshipit-source-id: 5ba80340c06d091988858077f142ea4e3da0638c	2019-07-03 16:37:53 -07:00
Alyssa Wang	d9e15bccb0	Perform weight re-init for embedding table in sparse_lookup.py (#22348 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22348 This is the last step of LRU hash eviction weight re-init. This diff checks if there's evicted values in sparse_lookup, if so call op created in D15709866 to re-init the values for indicies in evicted_values. Also created gradient op for the operator. The gradient op just passes the output gradient as input gradient. Reviewed By: itomatik Differential Revision: D16044736 fbshipit-source-id: 9afb85209b0de1038c5153bcb7dfc5f52e0b2abb	2019-07-03 10:33:40 -07:00
Pieter Noordhuis	c9f41e9bc0	Add device guard around MPI operations (#22446 ) Summary: If the current CUDA device is not the same as the device that hosts the tensor the operation works on then OpenMPI will segfault, as reported in https://github.com/pytorch/pytorch/issues/21922. This changes adds a device guard for every operation to ensure the correct device is set. Fixes https://github.com/pytorch/pytorch/issues/21922. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22446 Differential Revision: D16106823 Pulled By: pietern fbshipit-source-id: 99d762eb3851c0a0e0b4fe81cf27c1c8d35596cc	2019-07-03 02:01:53 -07:00
James Reed	abb2e68989	Don't construct a single element array for unary ops Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22469 Test Plan: Imported from OSS Differential Revision: D16105794 Pulled By: jamesr66a fbshipit-source-id: 6bb5a6703c8dba3cda20a6db192d2a15711751a1	2019-07-02 23:29:56 -07:00
James Reed	7fef0b7b72	Take const refs in TensorIterator::mark_outputs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22465 Test Plan: Imported from OSS Differential Revision: D16105795 Pulled By: jamesr66a fbshipit-source-id: 22945fc3f02f8450ae6b92492a0f7baad80c5cb5	2019-07-02 23:29:52 -07:00
Sebastian Messmer	0d63619414	Deprecate vector/unordered_map again (#22478 ) Summary: The deprecation was temporarily removed in https://github.com/pytorch/pytorch/pull/21999 because it threw warnings on our codebase. https://github.com/pytorch/pytorch/pull/22162 fixed these internal call sites so we can now re-enable the deprecation. I checked locally that this PR doesn't add any new warnings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22478 Differential Revision: D16101342 Pulled By: smessmer fbshipit-source-id: 378eb208f6a3dd3a28d2efe2e001fd71a9569297	2019-07-02 21:59:05 -07:00
Sebastian Messmer	17cc79865d	Fix dead code elimination in onnx export (#22476 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22476 Dead code elimination assumes a valid jit graph because it checks if operators have side effects. The onnx export path destroys the jit graph right before calling dead code elimination, but it actually doesn't care about side effects. We can just call dead code elimination and disable side effect lookup and things should work. Reviewed By: houseroad Differential Revision: D16100172 fbshipit-source-id: 8c790055e0d76c4227394cafa93b07d1310f2cea	2019-07-02 21:28:57 -07:00
Jiakai Liu	76e14c1e51	remove caffe2/core dependency from ATen/core/jit_type.h (#22441 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22441 This include doesn't seem to be needed. Remove it to simplify mobile build dependency. Reviewed By: dreiss Differential Revision: D16088224 fbshipit-source-id: f6aec21655e259726412e26a006d785912436c2a	2019-07-02 21:07:59 -07:00
Brennan Vincent	e210c65097	Add `torch.where` overload with only condition argument (#21986 ) Summary: Requested in https://github.com/pytorch/pytorch/issues/21798 Pull Request resolved: https://github.com/pytorch/pytorch/pull/21986 Differential Revision: D16081577 Pulled By: zhangguanheng66 fbshipit-source-id: 658c0f451b833aceb1a41ee424c7990eec00bc02	2019-07-02 18:18:15 -07:00
Brennan Vincent	dcd902bdde	provide "size" parameter in torch.normal when called with two floats (#20545 ) Summary: This has been requested in https://github.com/pytorch/pytorch/issues/20323 (It is still not exactly the same as NumPy, which allows you to pass tensors at mean/std and broadcast them with size, but the present PR is extremely simple and does the main thing people are asking for) Pull Request resolved: https://github.com/pytorch/pytorch/pull/20545 Differential Revision: D15358736 Pulled By: zhangguanheng66 fbshipit-source-id: 762ea5eab5b8667afbac2df0137df017ba6e413c	2019-07-02 18:18:11 -07:00
Guanheng Zhang	bb0f299f27	Update MultiheadAttention module support key/value with different number of features and allow static key/value (#21288 ) Summary: The changes include: 1. Allow key/value to have different number of features with query. It supports the case when key and value have different feature dimensions. 2. Support three separate proj_weight, in addition to a single in_proj_weight. The proj_weight of key and value may have different dimension with that of query so three separate proj_weights are necessary. In case that key and value have same dimension as query, it is preferred to use a single large proj_weight for performance reason. However, it should be noted that using a single large weight or three separate weights is a size-dependent decision. 3. Give an option to use static k and v in the multihead_attn operator (see saved_k and saved_v). Those static key/value tensors can now be re-used when training the model. 4. Add more test cases to cover the arguments. Note: current users should not be affected by the changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21288 Differential Revision: D15738808 Pulled By: zhangguanheng66 fbshipit-source-id: 288b995787ad55fba374184b3d15b5c6fe9abb5c	2019-07-02 18:06:25 -07:00
Duke Vijitbenjaronk	d684112ec9	Output sequence probability with CTC beam search, optional multiple output sequences (#21927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21927 Add `OUTPUT_PROB` output to CTCBeamSearchDecoderOp to return a probability for each sequence. Add argument to output top-k instead of top-1 decoded sequences. Reviewed By: SuperIRabbit Differential Revision: D15797371 fbshipit-source-id: 737ca5cc4f90a0bcc3660ac9f58519a175977b69	2019-07-02 17:29:13 -07:00
Sebastian Messmer	830c6590ef	EraseNumberTypes cleans itself up (#22461 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22461 We shouldn't call dead code elimination after EraseNumberTypes because dead code elimination assumes a valid jit graph which EraseNumberTypes just broke. Let's have it clean up itself isntead. Reviewed By: houseroad Differential Revision: D16094656 fbshipit-source-id: f2752277d764e78ab276c57d56b2724b872b136f	2019-07-02 17:06:53 -07:00
Hong Xu	a6441c00d6	Remove build variable NCCL_EXTERNAL (#22467 ) Summary: It's always set to equal USE_NCCL, we made Gloo depending on Caffe2 NCCL build. See 30da84fbe1614138d6d9968c1475cb7dc459cd4b Pull Request resolved: https://github.com/pytorch/pytorch/pull/22467 Differential Revision: D16098581 Pulled By: ezyang fbshipit-source-id: f706ec7cebc2e6315bafca013b669f5a72e04815	2019-07-02 15:36:44 -07:00
Alyssa Wang	34f950c800	Create C2 operator to replace values in embedding table (#22279 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22279 This new operator is used for embedding table weight re-init. After we get the evicted indices, they will be the rows need reseting in embedding table. Then we can create a 1d tensor with default values, and apply this operator to copy the tensor to all evicted rows in embedding table Will add gradient op in next diff Reviewed By: itomatik Differential Revision: D15709866 fbshipit-source-id: 2297b70a7326591524d0be09c73a588da245cc08	2019-07-02 15:26:22 -07:00
Jongsoo Park	040a4bd914	include conv_op_impl.h from conv_dnnlowp_op.cc (#22458 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22458 To make sure template instantiation. Reviewed By: jianyuh Differential Revision: D16094183 fbshipit-source-id: 7861df0b303bec42ab80a53477c4b608edebb61d	2019-07-02 15:09:34 -07:00
Brennan Vincent	474dec4b00	Warn on conditions that can trigger cuBLAS sgemm bug (#22034 ) Summary: The sgemm in cuBLAS 9.0 has some issues with sizes above 2M on Maxwell and Pascal architectures. Warn in this case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22034 Differential Revision: D15949930 Pulled By: zhangguanheng66 fbshipit-source-id: 0af977ec7900c76328d23898071de9c23778ff8b	2019-07-02 15:09:31 -07:00
Hong Xu	f5b3f9ecd9	Remove unnecessary ROCm detection code in Python scripts. (#22464 ) Summary: ROCm is already detected in cmake/public/LoadHIP.cmake. No need to detect twice. Plus, the Python script read environment variable ROCM_HOME, but what is really used in CMake scripts is ROCM_PATH -- A user must specify both environment variables right. Since ROCM_HOME is undocumented, this commit completely eradicates it. --- ezyang A remake of https://github.com/pytorch/pytorch/issues/22228 because its dependency has been dismissed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22464 Differential Revision: D16096833 Pulled By: bddppq fbshipit-source-id: fea461e80ee61ec77fa3a7b476f7aec4fc453d5d	2019-07-02 14:46:03 -07:00
Sebastian Messmer	e68dc899d1	Fix compiler warnings (#22162 ) Summary: Fix various compiler warnings Pull Request resolved: https://github.com/pytorch/pytorch/pull/22162 Differential Revision: D16085339 Pulled By: smessmer fbshipit-source-id: d36a4b334315f1a5942cac46443a7d166ca36d0d	2019-07-02 14:12:55 -07:00
Chunli Fu	53a52f574f	infer shape until no more change (#22425 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22425 Currently, in bound_shape_inference.cc: InferBoundShapeAndType, we firstly infer ops in the order and then infer inputs of concat in reverse order. In ctr_instagram_model tiny version, concat is right before FC, so we can infer the inputs for concat. But in production version, we found there are some ops between concat and FC(or other ops we know the shape), so the shape of these ops cannot be inferred. This diff is a tmp solution for this problem: infer shape in order and in reverse order repeatly until no more change. Reviewed By: yinghai, ipiszy Differential Revision: D16082521 fbshipit-source-id: d5066509368029c6736dce156030adf5c38653d7	2019-07-02 13:13:55 -07:00
Hui Wu	07ef85e326	Add USE_MKLDNN_CBLAS build option. (#19014 ) Summary: MKL-DNN is the main library for computation when we use ideep device. It can use kernels implemented by different algorithms (including JIT, CBLAS, etc.) for computation. We add the "USE_MKLDNN_CBLAS" (default OFF) build option so that users can decide whether to use CBLAS computation methods or not. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19014 Differential Revision: D16094090 Pulled By: ezyang fbshipit-source-id: 3f0b1d1a59a327ea0d1456e2752f2edd78d96ccc	2019-07-02 12:29:54 -07:00
Sebastian Messmer	6d5871300b	Use concrete types on call sites for Dict/List (#22004 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22004 In future, we want all dicts/lists to store information about the types they contain. This is only possible if the creation API doesn't allow creating lists/dicts without type information. This diff removes some call sites that don't specify type information and have it specify type information. Reviewed By: dzhulgakov Differential Revision: D15906387 fbshipit-source-id: 64766a2534b52c221e8a5501a85eaad13812e7bd	2019-07-02 11:52:35 -07:00
Hong Xu	693871ded3	Rename macros and build options NAMEDTENSOR_ENABLED to BUILD_NAMEDTENSOR (#22360 ) Summary: Currently the build system accepts USE_NAMEDTENSOR from the environment variable and turns it into NAMEDTENSOR_ENABLED when passing to CMake. This discrepancy does not seem necessary and complicates the build system. The naming of this build option is also semantically incorrect ("BUILD_" vis-a-vis "USE_"). This commit eradicate this issue before it is made into a stable release. The support of NO_NAMEDTENSOR is also removed, since PyTorch has been quite inconsistent about "NO_*" build options. --- Note: All environment variables with their names starting with `BUILD_` are currently automatically passed to CMake with no need of an additional wrapper. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22360 Differential Revision: D16074509 Pulled By: zou3519 fbshipit-source-id: dc316287e26192118f3c99b945454bc50535b2ae	2019-07-02 11:46:13 -07:00
Alyssa Wang	bb07f2d063	Pass LRU hash output evicted_values to SparseLookup (#21389 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21389 As titled. To do weight re-init on evicted rows in embedding table, we need to pass the info of the evicted hashed values to SparseLookup, which is the layer model responsible for constructing the embedding table and do pooling. To pass evicted values, we need to adjust the output record of lru_sparse_hash to include the evicted values, and add optional input to all processors that needs to take in sparse segment. For SparseLookup to get the evicted values, its input record needs to be adjusted. Now the input record can have type IdList/IdScoreList/or a struct of feature + evicted values Reviewed By: itomatik Differential Revision: D15590307 fbshipit-source-id: e493881909830d5ca5806a743a2a713198c100c2	2019-07-02 11:27:37 -07:00
Haixin Liu	869ce89474	use feenableexcept when glibc is available (#22241 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22241 Pull Request resolved: https://github.com/pytorch/pytorch/pull/20387 glibc has a non-standard function, feenableexcept, that triggers floating-point exception handler . Compared to feclearexcept + fetestexcept , this approach allows us to see precisely where the exception is raised from the stack trace. Reviewed By: jspark1105 Differential Revision: D15301095 fbshipit-source-id: 94f6e72456b2280f78d7d01c2ee069ae46d609bb	2019-07-02 10:49:55 -07:00
Gregory Chanan	e74b0fc87c	Fix empty_like for quantized tensors. (#21978 ) Summary: empty_like uses the tensor options of `self`, rather than the passed in tensor options. This means it messes up variable/tensor types, and ignores specifications like different dtypes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21978 Differential Revision: D15903948 Pulled By: gchanan fbshipit-source-id: f29946be01c543f888daef2e99fe928e7b7d9d74	2019-07-02 10:28:59 -07:00
Karl Ostmo	7235532df3	Revert D16088193: Refactored math tests to iterate over all math ops Differential Revision: D16088193 Original commit changeset: 81b6e536b450 fbshipit-source-id: 81ee8857c3d5353955d75e05203cfbf2d3b8dacd	2019-07-02 10:18:46 -07:00
Karl Ostmo	a845d02cd5	Revert D16088191: Added math.log2 and hypot Differential Revision: D16088191 Original commit changeset: 5d80c480243d fbshipit-source-id: 12ea2617e3af5bf81b1f2a57f8633ca06a99db5b	2019-07-02 10:18:42 -07:00
Dave Kotfis	2dd71b18c4	Fix PoolWindow crash from thread_local (#22405 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/19394 See https://developercommunity.visualstudio.com/content/problem/124121/thread-local-variables-fail-to-be-initialized-when.html for context. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22405 Differential Revision: D16090822 Pulled By: ezyang fbshipit-source-id: 9fdd2c272fa7723fb62b906336d2e2620411b12b	2019-07-02 09:48:14 -07:00
Lara Haidar	7ca7edc307	ONNX Export LayerNorm Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22265 Reviewed By: zrphercule Differential Revision: D16076268 Pulled By: houseroad fbshipit-source-id: 29b4ecab2fa0dc7250c9d1ad6924903181a66ab2	2019-07-02 09:37:07 -07:00
Michael Acar	a4b2f3e213	Implement AdamW optimizer (#21250 ) Summary: # What is this? This is an implementation of the AdamW optimizer as implemented in [the fastai library](`803894051b/fastai/callback.py`) and as initially introduced in the paper [Decoupled Weight Decay Regularization](https://arxiv.org/abs/1711.05101). It decouples the weight decay regularization step from the optimization step during training. There have already been several abortive attempts to push this into pytorch in some form or fashion: https://github.com/pytorch/pytorch/pull/17468, https://github.com/pytorch/pytorch/pull/10866, https://github.com/pytorch/pytorch/pull/3740, https://github.com/pytorch/pytorch/pull/4429. Hopefully this one goes through. # Why is this important? Via a simple reparameterization, it can be shown that L2 regularization has a weight decay effect in the case of SGD optimization. Because of this, L2 regularization became synonymous with the concept of weight decay. However, it can be shown that the equivalence of L2 regularization and weight decay breaks down for more complex adaptive optimization schemes. It was shown in the paper [Decoupled Weight Decay Regularization](https://arxiv.org/abs/1711.05101) that this is the reason why models trained with SGD achieve better generalization than those trained with Adam. Weight decay is a very effective regularizer. L2 regularization, in and of itself, is much less effective. By explicitly decaying the weights, we can achieve state-of-the-art results while also taking advantage of the quick convergence properties that adaptive optimization schemes have. # How was this tested? There were test cases added to `test_optim.py` and I also ran a [little experiment](https://gist.github.com/mjacar/0c9809b96513daff84fe3d9938f08638) to validate that this implementation is equivalent to the fastai implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21250 Differential Revision: D16060339 Pulled By: vincentqb fbshipit-source-id: ded7cc9cfd3fde81f655b9ffb3e3d6b3543a4709	2019-07-02 09:09:10 -07:00
Mads R. B. Kristensen	c9a8413306	Numerical stability of embedding kernels (#22401 ) Summary: Address the issue raised in https://github.com/pytorch/pytorch/issues/22377. The PR https://github.com/pytorch/pytorch/issues/22016 introduces a temporary tensor of weights `grad_weight_per_segment` of the same dtype as the end result, which can be a problem when using `float16`. In this PR, it now use a `float32` temporary tensor when the input is `float16`. ngimel, can I get you to review? I think I have fixed the issues you have pointed out. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22401 Differential Revision: D16077319 Pulled By: mrshenli fbshipit-source-id: 7cfad7f40b4d41a244052baa2982ab51bbbd7309	2019-07-02 08:53:22 -07:00
Horace He	b76877728a	Added math.log2 and hypot Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21512 Test Plan: Imported from OSS Differential Revision: D16088191 Pulled By: Chillee fbshipit-source-id: 5d80c480243d2644c96df26337cf65918d79443e	2019-07-02 06:28:34 -07:00
Horace He	3d3d07b7dd	Refactored math tests to iterate over all math ops Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21511 Test Plan: Imported from OSS Differential Revision: D16088193 Pulled By: Chillee fbshipit-source-id: 81b6e536b4505178c829a9d925c30cd185b7a706	2019-07-02 06:28:30 -07:00
Pieter Noordhuis	0ffda97aa4	Make Gloo an optional c10d dependency (#22257 ) Summary: The CMake modifications include removal of some unnecessary paths (e.g. find_package(CUDA) and friends) that are no longer used since c10d is always part of the larger torch build. The macro `C10D_USE_...` was ambiguous and is now removed in favor of only having top level `USE_...`. The c10d test suite is changed to include skip annotations for the tests that depend on Gloo as well. Now, if you compile with `USE_DISTRIBUTED=1` and `USE_GLOO=0` you get a functioning build for which the tests actually pass. Closes https://github.com/pytorch/pytorch/issues/18851. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22257 Differential Revision: D16087993 Pulled By: pietern fbshipit-source-id: 0cea66bd5cbd9736b06fa1d45ee13a18cab88adb	2019-07-02 02:39:48 -07:00
Hong Xu	b9ede6600e	Remove the USE_MIOPEN build option as MIOpen is always used when built with ROCm. (#22420 ) Summary: Close https://github.com/pytorch/pytorch/issues/22200 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22420 Differential Revision: D16087538 Pulled By: bddppq fbshipit-source-id: ecf3e7eb8213bb093e1c5290d096c233284a2ff9	2019-07-02 00:05:59 -07:00
Dmytro Dzhulgakov	6721e67c10	Remove hacky stub for quantized ops (#22388 ) Summary: Effectively reverts https://github.com/pytorch/pytorch/pull/18267 - this was a temporary measure and is not used any more. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22388 Differential Revision: D16070725 Pulled By: dzhulgakov fbshipit-source-id: ee5db11a608f248b0da981169d4cc90470fd482f	2019-07-01 23:21:42 -07:00
Xianjie Chen	2dd1323379	Fix the GPU trainer for NoneCalibration and RNN Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22385 Reviewed By: Wakeupbuddy Differential Revision: D16053190 fbshipit-source-id: 6304c5c51f33691c201c78d4c921a9c250d9b4f5	2019-07-01 22:55:18 -07:00
Hong Xu	5bd97be309	Fix lint error in format_time() in throughput_benchmark.py and clean it up a bit. (#22424 ) Summary: The `assert False` lint error has been causing CI to fail: ./torch/utils/throughput_benchmark.py:14:13: B011 Do not call assert False since python -O removes these calls. Instead callers should raise AssertionError(). Pull Request resolved: https://github.com/pytorch/pytorch/pull/22424 Differential Revision: D16083464 Pulled By: bddppq fbshipit-source-id: 6d96e36c8fcbb391d071b75fe79c22d526c1ba3c	2019-07-01 22:15:37 -07:00
Greg McGary	edd5b770be	Remove API-level guard on NeuralNetworks.h (#22429 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22429 Android NDK r20 removes the guard `(__ANDROID_API__ <= __ANDROID_API_O_MR1__)`, so we do it here also. There is insufficient reason to keep these decls undefined for earlier API levels. NDK r15 and earlier don't even define `__ANDROID_API_O_MR1__`, so the preprocessor defaults it to 0 and the guard evaluates as TRUE. Reviewed By: smeenai, hlu1 Differential Revision: D16084105 fbshipit-source-id: f0857b3eb0573fe219f0d6c5e6583f89e2b5518f	2019-07-01 22:09:11 -07:00
Lu Fang	de84104059	Lint ONNX Related Code (#22423 ) Summary: Lint the code Pull Request resolved: https://github.com/pytorch/pytorch/pull/22423 Differential Revision: D16086518 Pulled By: houseroad fbshipit-source-id: c6e5143f42c73a70beeaa2e089df4164f6265c32	2019-07-01 21:44:16 -07:00
James Reed	ffa15d2285	Load original SourceRanges on import (#22180 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22180 ghimport-source-id: efa46dcb845c099f0a746f523901ab2c2cd3b004 Test Plan: Imported from OSS Differential Revision: D15981425 Pulled By: jamesr66a fbshipit-source-id: bef682bd13c1a5be95bdb97e025690c6f2d523d3	2019-07-01 21:14:39 -07:00
James Reed	2c2a913a4f	Preserve SourceRanges across serialization (#22179 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22179 ghimport-source-id: 9879551127da09d78ca348b9e436db5a09a92a38 Test Plan: Imported from OSS Differential Revision: D15981423 Pulled By: jamesr66a fbshipit-source-id: a2506f5a2f05916b6e8226841b0229110e758671	2019-07-01 21:14:35 -07:00
James Reed	e05942c09b	Serialization methods for SourceRange and Source (#22178 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22178 ghimport-source-id: 85ca4d4454c6d4b57a82f211004c4bb712d1c980 Test Plan: Imported from OSS Differential Revision: D15981426 Pulled By: jamesr66a fbshipit-source-id: f81f5ee3b66fc4a0d4a708b8109712b5df9f241a	2019-07-01 21:14:31 -07:00
James Reed	671782d88a	Refactor file:line:col to be less ugly (#22177 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22177 ghimport-source-id: e35f068c2d39bd8fa2058a9bfc0b1a3856f9383d Test Plan: Imported from OSS Differential Revision: D15981424 Pulled By: jamesr66a fbshipit-source-id: b7748c5cfd4f8ea594314cb601a2b8045173700a	2019-07-01 21:14:28 -07:00
Wanchao Liang	dff2c07183	Manual revert of D16012838 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22412 Reviewed By: nairbv, houseroad Differential Revision: D16079809 fbshipit-source-id: ee0d805ff7a2bc5f98bcc65f90b8199751c840f6	2019-07-01 19:58:21 -07:00
David Riazati	2c18bf21be	Fix `ScriptModule.__dir__()` (#22426 ) Summary: `_method_names` is on `_c`, so `dir(script_module)` calls previously didn't work Pull Request resolved: https://github.com/pytorch/pytorch/pull/22426 Differential Revision: D16085330 Pulled By: driazati fbshipit-source-id: 6f9f1bef5da4306c0f26aa0be1bcec6dd3a6f0fb	2019-07-01 19:33:14 -07:00
xzhu1900	f0f2331a1c	Add support for cross-chunk shuffling in ChunkDataset (#22347 ) Summary: This change adds one advanced support for cross-chunk shuffling. For training with static dataset, the default configuration is at user's disposal. However, in some user cases, over each epoch, new data is added to the current dataset, thus the dataset's size is dynamically changing/increasing. In order to mix the new data and the old data for better random sampling, one approach is to shuffle examples from more than 1 chunks. This feature is supported with this change. By specifying the `cross_chunk_shuffle_count_` on construction, advanced user can specify how many chunks to shuffle example from. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22347 Differential Revision: D16081378 Pulled By: zhangguanheng66 fbshipit-source-id: fd001dfb9e66947839adecfb9893156fbbce80d0	2019-07-01 19:13:34 -07:00
Sebastian Messmer	1f9c4fdb5e	split onnx passes (#22413 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22413 _jit_pass_erase_number_types invalidates the jit graph but parts of _jit_pass_onnx rely on having a valid jit graph. This splits _jit_pass_onnx into _jit_pass_onnx_remove_print and _jit_pass_onnx_preprocess_caffe2 (which rely on the valid jit graph), runs these before _jit_pass_erase_number_types, and then runs the rest of _jit_pass_onnx after _jit_pass_erase_number_types Reviewed By: houseroad Differential Revision: D16079890 fbshipit-source-id: ae68b87dced077f76cbf1335ef3bf89984413224	2019-07-01 18:16:53 -07:00
iurii zdebskyi	a54acd3755	Update the way boolean tensor are being printed (#22238 ) Summary: In case when the boolean tensor gets printed out, no need to specify the dtype. Example: ``` >> x = torch.tensor([[True, True, True], [True, True, True]]) >> print(x) tensor([[True, True, True], [True, True, True]]) >> x = torch.tensor([True]) >> print(x) tensor([True]) >> x = torch.tensor(True) >> print(x) tensor(True) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/22238 Differential Revision: D15996304 Pulled By: izdeby fbshipit-source-id: 5699acf3e00abca8a2bbb5384f8271eeb063dce7	2019-07-01 18:04:42 -07:00
Gu, Jinghui	cbf572671d	update mkldnn-bridge to avoid mem leak (#22392 ) Summary: fix the memory leak issue exposed by https://github.com/pytorch/pytorch/issues/21537 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22392 Test Plan: {F164886124} Reviewed By: yinghai Differential Revision: D16074150 Pulled By: bddppq fbshipit-source-id: b70192aad3d531f349fea5d2d477b827715a2363	2019-07-01 17:12:48 -07:00
Mingzhe Li	402b9f9a6d	add PT chunk op to the benchmark (#22409 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22409 as title Reviewed By: hl475 Differential Revision: D16079031 fbshipit-source-id: 109060ffc953f2357b2783b13f9b9dc87bd3f98a	2019-07-01 16:37:05 -07:00
Mingzhe Li	8a726f5815	add PT split op to the benchmark (#22410 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22410 as title Reviewed By: hl475 Differential Revision: D16078705 fbshipit-source-id: 29e1cc19d0e93a561d07c47e5678a311e6de3e3b	2019-07-01 16:37:01 -07:00
Mingzhe Li	8281909e73	add PT cat operator to the benchmark (#22404 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22404 as title Reviewed By: hl475 Differential Revision: D16078395 fbshipit-source-id: 4ff5c558036af1dce6ac0001a1a1fc3a373a981f	2019-07-01 16:36:57 -07:00
Mingzhe Li	007fd01e9b	Enable PT operators running with {cpu, gpu} * {forward, backward} (#22416 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22416 This diff tests the combination of cpu/gpu and forward/backward path for PT add operator. Reviewed By: hl475 Differential Revision: D15770792 fbshipit-source-id: 38cc648361d2501d774db407f988c3cb5115b2ae	2019-07-01 16:30:58 -07:00
Lu Fang	dfa6fca1c6	Supporting Manifold DB in Predictor Exporter (#22334 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22334 Improve the function signatures of save_to_db and load_from_db in predictor_exporter. Reviewed By: akyrola Differential Revision: D16047208 fbshipit-source-id: a4e947f86e00ef3b3dd32c57efe58f76a38fcec7	2019-07-01 16:17:02 -07:00
svcscm	30fedeae4a	Updating submodules Reviewed By: cdelahousse fbshipit-source-id: 26112fb218995b292bb28e65332f6259b3c289f6	2019-07-01 15:51:30 -07:00
Xiaomeng Yang	10e4137396	Optimize InstanceNormGradientOp (#22288 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22288 Optimize InstanceNormGradientOp Benchmarks: CPU with [N, C, H, W] = [128, 256, 56, 56], NCHW order: 616ms -> 128ms NHWC order: 1612ms -> 174ms GPU with [N, C, H, W] = [128, 256, 112, 112], NCHW order: 6450ms -> 37ms NHWC order: 1419ms -> 82ms Reviewed By: houseroad Differential Revision: D16023630 fbshipit-source-id: 5af9bf1103cde2fc2bcb5cd5a057d039732f052e	2019-07-01 15:10:17 -07:00
Alexander Sidorov	d0348c0ef9	ThroughputBenchmark: improve formatting for ExecutionStats (#22293 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22293 Just wrapping C class with nicer python interface which now ust print dirrectly to get all the data. Later we can add various visualizations there Differential Revision: D16023999 fbshipit-source-id: 8436e37e36965821a690035617784dcdc352dcd1	2019-07-01 14:24:34 -07:00
Alexander Sidorov	d0db2a76a0	PyTorch ThroughputBenchmark: fix inaccuracy in number of iterations reporting (#22292 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22292 as we do atomic fetch_add to validate if a thread should finish, we should not take the last iteration into account. As a result total number of iterations should be exactly the same as user sets via config.num_iters Now when running a unit test I see exact number of iterations reported Differential Revision: D16023963 fbshipit-source-id: 3b12ee17276628ecd7b0979f28cd6deb777a1543	2019-07-01 14:24:29 -07:00
Will Feng	813b01e4a8	Use at::AutoNonVariableTypeMode before calling ATen tensor factory functions (#22364 ) Summary: As part of the Variable/Tensor merge, one invariant for tensor libraries such as ATen / Caffe2 / XLA is that they should only deal with Tensors, not Variables. However, currently in `variable_factories.h` we are potentially passing Variables into those tensor libraries without the `at::AutoNonVariableTypeMode` guard, which will cause those libraries to treat those Variables as Variables (i.e. their `is_variable()` is true), not Tensors. Consider the following example for `full_like`: ```cpp inline at::Tensor full_like(const at::Tensor & self, at::Scalar fill_value) { ... // Both ATen and XLA rely on `at::full_like` to dispatch to library specific implementations. // // When `self` is a Variable, since we are not using `at::AutoNonVariableTypeMode`, // `at::full_like` will also use `self` as a Variable (and it will see that `self.is_variable()` is true), // which breaks the invariant that ATen / XLA should never deal with Variables. at::Tensor tensor = at::full_like(self, fill_value, self.options().is_variable(false)); at::Tensor result = autograd::make_variable_consuming(std::move(tensor), /requires_grad=/false); ... return result; } ``` Instead, the invariant-preserving implementation would be: ```cpp inline at::Tensor full_like(const at::Tensor & self, at::Scalar fill_value) { ... at::Tensor tensor = ([&]() { at::AutoNonVariableTypeMode non_var_type_mode(true); // Both ATen and XLA rely on `at::full_like` to dispatch to library specific implementations. // // When `self` is a Variable, since we have `at::AutoNonVariableTypeMode` in the scope, // `at::full_like` will use `self` as a Tensor (and it will see that `self.is_variable()` is false), // which preserves the invariant that ATen / XLA should only deal with Tensors. return at::full_like(self, fill_value, self.options().is_variable(false)); })(); at::Tensor result = autograd::make_variable_consuming(std::move(tensor), /requires_grad=/false); ... return result; } ``` This PR makes the suggested change for all variable factory functions. cc. ailzhang This should allow us to remove all `tensor_data()` calls in the XLA codebase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22364 Differential Revision: D16074862 Pulled By: yf225 fbshipit-source-id: 3deba94b90bec92a757041ec05d604401a30c353	2019-07-01 14:08:28 -07:00
Your Name	d632b1ff3c	Expose is_mkldnn to python and register it as torchscript prim op Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22386 Differential Revision: D16074722 Pulled By: bddppq fbshipit-source-id: b9b2a05a894847640084f063fba68d9db4e6aec1	2019-07-01 12:31:59 -07:00
svcscm	2ab6ff42d1	Updating submodules Reviewed By: cdelahousse fbshipit-source-id: c9d3be641389f3c45a9e5a65280d8bfd20e38ea0	2019-07-01 12:25:21 -07:00
Jerry Zhang	577c04c490	add mutation support for forward_pre_hook and forward_hook (#22285 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22285 Previously forward hooks are expected to return None, this PR adds the support to overwrite input and output in `forward_pre_hook` and `forward_hook`, this is used to implement inserting quant/dequant function calls around forward functions. Differential Revision: D16022491 fbshipit-source-id: 02340080745f22c8ea8a2f80c2c08e3a88e37253	2019-07-01 11:06:42 -07:00
Andrew Gallagher	f7421b82ad	Remove versions constraints from `external_deps` (#22113 ) Summary: As per attached tasks, these are noops and are being deprecated/removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22113 Reviewed By: philipjameson Differential Revision: D15901131 fbshipit-source-id: 3acf12208f692548afe4844be13717a49d74af32	2019-07-01 10:55:30 -07:00
Jon Malmaud	bfeff1eb8f	Stubs for torch.nn (#19089 ) Summary: Closes https://github.com/pytorch/pytorch/issues/18724 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19089 Differential Revision: D16073654 Pulled By: ezyang fbshipit-source-id: 5642179651ce45ab7c5a46cc1fcc4fd6b37fa71c	2019-07-01 09:50:17 -07:00
SsnL	a43d9af52c	Comment on why Windows build_pytorch.bat builds twice (#22363 ) Summary: I've noticed that Windows CI seems to build twice, e.g., https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-build/60304/console This adds a comment explaining why. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22363 Differential Revision: D16073609 Pulled By: zou3519 fbshipit-source-id: ddb422b7c7e18cc436caff2c5838373a82f69429	2019-07-01 09:45:01 -07:00
Daya Khudia	451c907a47	Adding qconv unpack operator for serialization (#22354 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22354 qconv weight unpack operator Reviewed By: zafartahirov, jianyuh Differential Revision: D16059668 fbshipit-source-id: b068b1a13bcf6a9148d864db384db780d474bfbf	2019-07-01 09:39:14 -07:00
Richard Zou	f894ef7263	Add smoke test for information fn/method/attrs to test_namedtensor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22341 Test Plan: - `python test/test_namedtensor.py -v` [namedtensor ci] gh-metadata: pytorch pytorch 22341 gh/zou3519/66/head Imported from OSS Differential Revision: D16053440 Pulled By: zou3519 fbshipit-source-id: 400f2e1c136cd7db4346a42b58813e42595ca755	2019-07-01 07:24:54 -07:00
Richard Zou	496e35f76b	More named inference rules for pointwise unary ops Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22308 Test Plan: - `python test/test_namedtensor.py -v` [namedtensor ci] gh-metadata: pytorch pytorch 22308 gh/zou3519/65/head Imported from OSS Differential Revision: D16053441 Pulled By: zou3519 fbshipit-source-id: 2e8d4cc11d7a711d2b789752a316a11fffc0996e	2019-07-01 07:24:51 -07:00
Roy Li	2a698682e4	Remove Type dispatch (#21964 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21964 ghimport-source-id: fdfb555ac4efbf31ae7d2c700a5aa44ad0cc4d7f Test Plan: Imported from OSS Differential Revision: D15897424 Pulled By: li-roy fbshipit-source-id: 3cd6744254e34d70e6875ffde749b5cf959b663c	2019-06-30 04:11:35 -07:00
Roy Li	6c454ff14c	Stop using Type in Python bindings (#21963 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21963 ghimport-source-id: 4d9d66ba2c8587503d892b67f535cc2a62e2d19e Test Plan: Imported from OSS Differential Revision: D15897423 Pulled By: li-roy fbshipit-source-id: 2dd55ceb80971df7c86545b7bfff733387f13572	2019-06-30 04:11:32 -07:00
Roy Li	9c8f9f0ecb	Remove many usages of Type (#21941 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21941 ghimport-source-id: f20cca6229daba9eb8652adb3d959266ae081ef1 Test Plan: Imported from OSS Differential Revision: D15893331 Pulled By: li-roy fbshipit-source-id: c988b16008ff0e2725a88c6025afd4aabdaca45a	2019-06-30 04:11:28 -07:00
Andrew Naguib	3cba9e8aaa	Error Message Paraphrasing (#22369 ) Summary: Saying `I` in an err msg is too subjective to be used in a framework. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22369 Differential Revision: D16067712 Pulled By: soumith fbshipit-source-id: 2a390646bd5b15674c99f65e3c460a7272f508b6	2019-06-30 00:13:02 -07:00
vishwakftw	41e51ce142	Fix QNNPACK and NNPACK settings (#22367 ) Summary: `setup.py` recommends setting `USE_QNNPACK=0` and `USE_NNPACK=0` to disable building QNNPACK and NNPACK respectively. However this wasn't reflected correctly because we were looking for `NO_QNNPACK` and `NO_NNPACK`. This PR fixes it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22367 Differential Revision: D16067393 Pulled By: soumith fbshipit-source-id: 6491865ade9a6d41b7a79d68fd586a7854051f28	2019-06-29 21:15:59 -07:00
Chris Seymour	d8de69d621	Adds symbolic op for logsumexp Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22306 Differential Revision: D16046027 Pulled By: houseroad fbshipit-source-id: 7319fd58321220941250c5b8eff024914798e392	2019-06-29 00:09:06 -07:00
Ashwinee Panda	b52621c870	Revise error message for invalid Reduction (#22160 ) Summary: Say the user inputs reduction=False. Of course, we can't add a bool and a string, so the ValueError itself will error -which is more confusing to the user. Instead, we should use string formatting. I would use `f"{reduction} is not..."` but unsure whether we are ok with using f"" strings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22160 Differential Revision: D15981826 Pulled By: soumith fbshipit-source-id: 279f34bb64a72578c36bdbabe2da83d2fa4b93d8	2019-06-28 22:37:04 -07:00
Lu Fang	9e18234109	Automatic update of fbcode/onnx to 806aa863020fa180e57f576cb032ec44ce8ddcca (#22359 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22359 Previous import was 355a4954ea4e5836a5e943589509951c44feb6b4 Included changes: - [806aa863](https://github.com/onnx/onnx/commit/806aa863): Expose ONNX_ML build option to python (#2138) <bddppq> - [8f6e60db](https://github.com/onnx/onnx/commit/8f6e60db): Missing newline fix (#2128) <Chris Seymour> - [d94f99d2](https://github.com/onnx/onnx/commit/d94f99d2): Avoid unnecessary copies of names by checker (#2098) <Scott McKay> - [01f77251](https://github.com/onnx/onnx/commit/01f77251): update qlinear conv test (#2120) <Ashwini Khade> - [1f0c13d3](https://github.com/onnx/onnx/commit/1f0c13d3): Add shape inference for LinearClassifier (#2077) <Hariharan Seshadri> - [eb798fcf](https://github.com/onnx/onnx/commit/eb798fcf): Fix inconsistency in describing graph's initializer. The initializer (#2115) <xykong58> Reviewed By: bddppq, zrphercule Differential Revision: D16061494 fbshipit-source-id: 6ccb63c135c27b307048aa42c11313675027ffb7	2019-06-28 22:22:24 -07:00
Owen Anderson	7cc8f37f56	Reduce needless copying when returning lists of tensors in the JIT interpreter. (#21690 ) Summary: This fixes the JIT performance gap reported in https://twitter.com/VahidK/status/1138677898439561216 Pull Request resolved: https://github.com/pytorch/pytorch/pull/21690 Differential Revision: D15783709 fbshipit-source-id: 23bb4acda6b60c27e95667e1d53c7d261a87167d	2019-06-28 19:00:05 -07:00
Sebastian Messmer	737f8a7638	Fix onnx passes (#22319 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22319 The onnx pass replacing ints with Tensors produces an invalid JIT graph. It should only be called right before the onnx pass. Also, it should only be called if we actually export to onnx. Reviewed By: houseroad Differential Revision: D16040374 fbshipit-source-id: e78849ee07850acd897fd9eba60b6401fdc4965b	2019-06-28 17:08:55 -07:00
Gisle Dankel	e76c9751c4	Use lazy initialization in autograd record_function to avoid static (#22317 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22317 About to add an observer that is also statically initialized in a different file, so we need to enforce initialization order. Reviewed By: ilia-cher Differential Revision: D16012275 fbshipit-source-id: f26e57149a5e326fd34cb51bde93ee99e65403c4	2019-06-28 14:52:56 -07:00
Mingzhe Li	3a198400f8	modify pool benchmarks Summary: as title Reviewed By: hl475 Differential Revision: D16058193 fbshipit-source-id: 8f4e04a0356960f6483d6ef58e64876740434849	2019-06-28 14:35:23 -07:00
Mingzhe Li	89c709d217	modify unary operators benchmark Summary: as title Reviewed By: hl475 Differential Revision: D16057665 fbshipit-source-id: 07e31a17450fbfd88b5bd330c31c729de5300eaa	2019-06-28 14:03:41 -07:00
Mingzhe Li	6cf4df5d06	add PT softmax ops to the benchmark suite (#21208 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21208 The diff adds softmax, softmax2d, and logsoftmax to the benchmark suite. Reviewed By: zheng-xq Differential Revision: D15526265 fbshipit-source-id: b7ba63032dba7146765513c8cb1ac5a6a7bd1a68	2019-06-28 13:58:20 -07:00
Edward Yang	2132ea1d8d	Fix "python: can't open file '.jenkins/pytorch/print_sccache_log.py': [Errno 2] No such file or directory" Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22315 Test Plan: Imported from OSS Differential Revision: D16049862 Pulled By: ezyang fbshipit-source-id: d9c83208e6b5ee7eb009ddb585dbfa0ea1cbb9e6	2019-06-28 07:15:33 -07:00
Phyllipe Medeiros	042a2fd810	Sync worker requirement mismatches Summary: Syncing worker requirement mismatches to improve remote build time. Created actions: MEDIUM: 445 LARGE: 354 Updated actions: From MEDIUM to LARGE: 21 From LARGE to XLARGE: 34 From LARGE to MEDIUM: 9 From XLARGE to MEDIUM: 1 Differential Revision: D16047893 fbshipit-source-id: 7afab2ef879277f114d67fd1da9f5102ec04ed7f	2019-06-28 04:13:06 -07:00
Hong Xu	e259894e83	Test raising TypeError in torch.from_numpy() (#21607 ) Summary: With some additional cleanup. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21607 Differential Revision: D16046063 Pulled By: li-roy fbshipit-source-id: 15256a0e94afea39db3cb581c546c2a18a8a7fda	2019-06-27 23:54:47 -07:00
Jerry Zhang	0804452709	fix lint in torch/nn/quantized/modules/linear.py (#22325 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22325 att Reviewed By: bddppq Differential Revision: D16042464 fbshipit-source-id: 0610896c08667fdaa95983f49140193ecb9ede16	2019-06-27 23:18:42 -07:00
Hong Xu	1bea27be9d	Remove three cpu sigmoid functions that are identical to IMPLEMENT_UNARY_OP_VEC (#22271 ) Summary: This does not occur in CUDA code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22271 Differential Revision: D16024605 Pulled By: bddppq fbshipit-source-id: bb4f16bacbdc040faa59751fba97958f4c2d33cd	2019-06-27 23:05:05 -07:00
Jerry Zhang	5e77111486	nn.quantized.Relu and nn.quantize.Quantize/DeQuantize modules Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21930 Differential Revision: D15554224 fbshipit-source-id: 1de9ac7412468106be60e53852c23318ead37bc6	2019-06-27 16:15:17 -07:00
Uladzislau Paulovich	6f0f7e316d	Support building caffe2 with clang-cl on Windows (#22307 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22307 MSVC-specific pragma doesn't silence the warning about throwing constructor and therefore `clang-cl` fails to compile this file. This diff fixes the problem by adding additional check for `clang` compiler. Reviewed By: smessmer Differential Revision: D16032324 fbshipit-source-id: 6dbce0ebf0a533d3e42b476294720590b43a8448	2019-06-27 15:43:38 -07:00
Spandan Tiwari	83768f0756	Add ONNX export support for multidim torch.sum. (#22240 ) Summary: This change fixes the issue reported in https://github.com/pytorch/pytorch/issues/22066. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22240 Reviewed By: zrphercule Differential Revision: D15996934 Pulled By: houseroad fbshipit-source-id: 3a842ba26f54aa710233fbe87d727fc1f2568d9c	2019-06-27 15:02:33 -07:00
Jerry Zhang	2832e33a94	Add serialization for nn.quantized.Linear module (#21925 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21925 att Differential Revision: D15483071 fbshipit-source-id: 3a218dad5b653b38a0885339889ff70c75a13bef	2019-06-27 14:57:22 -07:00
Jerry Zhang	5c46e701fc	Implementation of nn.quantized.linear module (#21921 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21921 Call FBGEMM kernels to implement quantized linear operator. This operator is used only for inference. Differential Revision: D15375695 fbshipit-source-id: b9ca6c156fd60481fea83e55603b2897f7bfc3eb	2019-06-27 14:09:48 -07:00
Pieter Noordhuis	7a40412158	Delay reduction of unused parameters until first autograd hook is called (#22219 ) Summary: Reduction of gradients for unused parameters should happen as soon as possible, because they potentially block reduction of gradients for used parameters. This used to happen instantly when `prepare_for_backward` was called and it found parameters that didn't contribute. This meant that if you have a model with unused parameters, and you want to discard the model output (i.e. not call backward on some loss), reduction of the gradients of those unused parameters would have been kicked off, and you'd see an error the next time you called `forward`. In this commit, this original approach is slightly changed to delay reduction of the gradients of those unused parameters until the first autograd hook is called. This means that you can now discard the model output regardless of the model having unused parameters or not. This is a prerequisite for making the `find_unused_parameters` argument to DDP default to `True`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22219 Differential Revision: D16028698 Pulled By: pietern fbshipit-source-id: c6aec2cd39c4a77746495d9cb1c9fb9c5ac61983	2019-06-27 14:09:44 -07:00
Horace He	ac39869370	Fixed list() not making a copy (#22093 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/22087 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22093 Differential Revision: D16036814 Pulled By: Chillee fbshipit-source-id: 3c7106f907415ed0f600acaf45d2c61e1c60867a	2019-06-27 13:55:43 -07:00
Alexander Sidorov	b1096995d5	Update ThroughputBenchmark to reflect new script::Module API (no (#22291 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22291 There was a race between landing the benchmark diff and https://github.com/pytorch/pytorch/pull/21934 from zdevito. This PR should fix the issue. Reviewed By: zdevito Differential Revision: D16023640 fbshipit-source-id: 931714352e656f045f9ef3cd17422db51b168384	2019-06-27 12:57:27 -07:00
Richard Zou	177b8bf6e7	Named inference rule for more pointwise ops. (#22268 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22268 ghimport-source-id: c722f9fbb3fc529c872dcccbf58ba1a8c5fcda8e Test Plan: - `python test/test_namedtensor.py -v` [namedtensor ci] Imported from OSS Differential Revision: D16030549 Pulled By: zou3519 fbshipit-source-id: 5cbb2c8626335a32a22ed8079245a5faa7cf553f	2019-06-27 12:49:36 -07:00
Richard Zou	7732b1a604	Enable named inference for some unary pointwise ops (#22267 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22267 ghimport-source-id: 1566df9a20712cada6ea1209e000c5ff757daa14 Test Plan: Imported from OSS Differential Revision: D16030550 Pulled By: zou3519 fbshipit-source-id: 183ca1d14dc0fb6f1ee6e114b48c2703c61e11ce	2019-06-27 12:49:32 -07:00
Richard Zou	69b702a6eb	Implement unify_from_right (#22223 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22223 ghimport-source-id: b88bd2a13c1c9c699945a69ec05300c6e598e95a Test Plan: - `build/bin/NamedTensor_test` [namedtensor ci] gh-metadata: pytorch pytorch 22223 gh/zou3519/62/head Imported from OSS Differential Revision: D16030551 Pulled By: zou3519 fbshipit-source-id: f3d53e3f9b2428a4926c61a02631e6cd29f89e4b	2019-06-27 12:49:29 -07:00
Richard Zou	6386e4d244	Named inference rule for `abs`. (#22151 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22151 ghimport-source-id: 54c1726b578ac162af817f78df6f540b764e46e3 Test Plan: - `python test/test_namedtensor.py` [namedtensor ci] Imported from OSS Differential Revision: D15970326 Pulled By: zou3519 fbshipit-source-id: 4ea25f0a73bbc24b604d3ded2027eeb4ce800de0	2019-06-27 12:49:25 -07:00
Lu Fang	2913f6a26d	Adding modules for Python 3 compatibility (#22295 ) Summary: To improve the python 3 compatibility and make linter happy. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22295 Reviewed By: zrphercule Differential Revision: D16024957 Pulled By: houseroad fbshipit-source-id: c0eddf731891b2f547ba619b3c2f6b2d7a32f034	2019-06-27 12:06:40 -07:00
Xiaomeng Yang	6947e192f7	Remove unused param in Caffe2 LayerNormGradientOp (#22282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22282 Remove unused param in Caffe2 LayerNormGradientOp Reviewed By: bddppq, houseroad Differential Revision: D16017117 fbshipit-source-id: bdd0bd2aca009e549dfd2bf622494dfc791589e3	2019-06-27 11:22:44 -07:00
davidriazati	be0631b6ee	Add the rest of the `dict` API (#21979 ) Summary: This adds the rest of the `dict.???` methods that were missing Pull Request resolved: https://github.com/pytorch/pytorch/pull/21979 Pulled By: driazati Differential Revision: D16023573 fbshipit-source-id: 3ea9bd905090e2a176af654a8ca98c7d965ea679	2019-06-27 11:08:18 -07:00
Horace He	c9626a11cc	Made a += b for lists do an in place add (#21896 ) Summary: In talks with smessmer, we decided that it'd be better to put the logic in `list`, as optimal behavior requires knowing `.capacity()` Results on my cpu (for the benchmark here: https://twitter.com/VahidK/status/1138674536679821312) now look like this: ``` Pytorch batch_gather took 0.018311 seconds. Pytorch batch_gather jit took 0.013921 seconds. Pytorch vectorized batch_gather took 0.001384 seconds. ``` Previously, `batch_gather jit` took 3x as long as `batch_gather`. Some logic taken from https://github.com/pytorch/pytorch/pull/21690. Note that these two PR's are somewhat orthogonal. That PR handles this benchmark by looking at the alias analysis, while this PR specializes for `+=`. Note that we can't jit the vectorized version as we think `torch.arange` returns a float tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21896 Differential Revision: D15998628 Pulled By: Chillee fbshipit-source-id: b0085960da4613578b94deb98ac62c0a4532a8c3	2019-06-27 10:59:24 -07:00
Hong Xu	bf677b8849	Set MKLDNN (default) build variables in CMakeLists.txt, not in Python build scripts (#22215 ) Summary: This is yet another step to disentangle Python build scripts and CMake and improve their integration (Let CMake handle more build environment detections, and less by our handcrafted Python scripts). The processor detection logic also changed a bit: Instead of detecting whether the system processor is PPC or ARM, this PR changes to detect Intel CPUs, because this is more precise as MKL only supports Intel CPUs. The build option `USE_MKLDNN` will also not be presented to users on non-Intel processors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22215 Differential Revision: D16005953 Pulled By: ezyang fbshipit-source-id: bf3f74d53609b3f835e280f63a872ff3c9352763	2019-06-27 10:21:55 -07:00
Igor Fedan	d2bad941f4	Fix lint issues Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22303 Differential Revision: D16030302 Pulled By: ifedan fbshipit-source-id: 5564f6f810382f31f9416e5881978b03f51e53a9	2019-06-27 09:27:16 -07:00
zaf	e9d1b852c4	Functional conv2d (#21225 ) Summary: Stack:     ⚪  https://github.com/pytorch/pytorch/issues/21323 Quantized Conv2d Module  [💛](https://our.intern.facebook.com/intern/diff/D15551835/)     ⚫  https://github.com/pytorch/pytorch/issues/21225 Functional conv2d  [💛](https://our.intern.facebook.com/intern/diff/D15544061/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/21225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/21225 Test Plan: `buck test mode/dev caffe2/test:quantized -- test_conv_api`: https://our.intern.facebook.com/intern/testinfra/testrun/1407375016833929 ``` Action graph will be rebuilt because files have been added or removed. Parsing buck files: finished in 1.1 sec Building: finished in 5.1 sec (100%) 6958/6958 jobs, 2 updated Total time: 6.3 sec Trace available for this run at /tmp/testpilot.20190603-163323.4026295.log TestPilot test runner for Facebook. See https://fburl.com/testpilot for details. Testpilot build revision 17661db57af88ec71497f5c21efa86531c07662b fbpkg ce57c6c1c73f45c4aa890e9df65820c3 at Sat Jun 1 17:06:32 2019 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/625/t.par Discovering tests Running 1 tests Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/1407375016833929 ✓ caffe2/test:quantized - test_conv_api (test_quantized_conv.FunctionalAPITest) 6.962 1/1 (passed) Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/1407375016833929 Summary (total time 10.65s): PASS: 1 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 0 ``` Reviewed By: dskhudia Differential Revision: D15544061 Pulled By: zafartahirov fbshipit-source-id: 700c0c78b5915bf7e54bda7c44f44b7b1e247f4d	2019-06-27 09:19:54 -07:00
iurii zdebskyi	59c42595e0	Enabled gather and scatter for bool tensor (#21924 ) Summary: - moving stuff around in order to enable bool. - Added implementation of atomicAdd(bool, bool) Pull Request resolved: https://github.com/pytorch/pytorch/pull/21924 Differential Revision: D15883711 Pulled By: izdeby fbshipit-source-id: 733f35c2bc3d87cec9f9687d72b62d2d2cd7c03e	2019-06-27 09:07:50 -07:00
Soumith Chintala	f13fadd510	fix python2 corner-case in torch.distributed.launch (#20996 ) Summary: Small fix for the comment raised in `4cf76574b9 (r33134850)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/20996 Differential Revision: D15991510 Pulled By: pietern fbshipit-source-id: 4e5a35864b5a4ec9402aa83a19c4a3ba0df2f01f	2019-06-27 05:19:37 -07:00
xzhu1900	f39b6624ba	ChunkDataset checkpoint support (#21889 ) Summary: When dealing with large scale dataset, it is handy if we can save the dataset status and resume later. Especially in cases where some unexpected crash happens, user don't need to start over the whole dataset from begining. Instead, they can reload it from the last checkpoint. This change adds support for checkpoint save/load logic in ChunkDataset. On ChunkDataset construction, user can specify a file name from which to load the checkpoint. If it is empty, default to start from fresh; otherwise the ChunkDataset will 'fast forward' the chunk sampler to the corresponding checkpoint. The user can also call ChunkDataset::save() to serialize current status to a file, which can be used later. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21889 Differential Revision: D16024582 Pulled By: ailzhang fbshipit-source-id: 1862ab5116f94c9d29da174ce04a91041d06cad5	2019-06-26 22:54:14 -07:00
Hong Xu	30d890c672	Removed an outdated comment above IMPLEMENT_UNARY_OP_VEC(abs) (#22272 ) Summary: due to 82b570528db0a43fc04bb90f5d4538c01e4a5582 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22272 Differential Revision: D16024544 Pulled By: bddppq fbshipit-source-id: 37955bff3301975c0dd6abde8a3ba79af0555111	2019-06-26 22:24:13 -07:00
Hong Xu	f144b9ebef	Fix two overindent lint errors in test/common_nn.py. (#22287 ) Summary: This keeps causing lint tests to fail. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22287 Differential Revision: D16024524 Pulled By: bddppq fbshipit-source-id: a3e3780a55943283e9c854e94ac06ea4715e5319	2019-06-26 21:41:41 -07:00
Hong Xu	e6d4a2d289	Remove unused file cmake/Modules/FindMIOpen.cmake (#22244 ) Summary: `cmake/public/LoadHIP.cmake` calls `find_package(miopen)`, which uses the CMake module in MIOpen installation (It includes the line `set(miopen_DIR ${MIOPEN_PATH}/lib/cmake/miopen)`). `cmake/Modules/FindMIOpen.cmake` is not used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22244 Differential Revision: D16000771 Pulled By: bddppq fbshipit-source-id: 07bb40fdf033521e8427fc351715d47e6e30ed34	2019-06-26 21:21:46 -07:00
Will Feng	5e0a74dd70	Rename copy_tensor_data to copy_tensor_metadata (#22266 ) Summary: The original name `copy_tensor_data` could be confusing because users are not sure whether it deep-copies data in the tensor's storage or just copies the tensor's metadata. The renaming makes it more clear. cc. ailzhang This might break XLA build, but I think the renaming makes it more clear why we use `copy_tensor_data` in XLATensorImpl's shallow-copy functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22266 Differential Revision: D16014724 Pulled By: yf225 fbshipit-source-id: f6ee966927d4d65d828b68264b3253b2f8fd768d	2019-06-26 21:16:57 -07:00
Lara	45c6fa0007	Refactor Tests for Multiple ONNX Opsets (#20036 ) Summary: Refactor tests for https://github.com/pytorch/pytorch/pull/19294. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20036 Reviewed By: zrphercule Differential Revision: D16016593 Pulled By: houseroad fbshipit-source-id: eaae324e347679acf3d0ac1c14be03919f54496e	2019-06-26 17:06:57 -07:00
Alexander Sidorov	f51de8b61a	Back out "Revert D15435461: [pytorch][PR] PyTorch ThroughputBenchmark" (#22185 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22185 Original commit changeset: 72a0eac1658b Differential Revision: D15981928 fbshipit-source-id: d2455d79e81c26ee90d41414cde8ac0f9b703bc3	2019-06-26 16:05:51 -07:00
Nikolay Korovaiko	3f2a839dda	Add comments to bailoug_graph.* Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22161 Differential Revision: D15975355 Pulled By: Krovatkin fbshipit-source-id: dca0095b4f05cff8277663ad38b65eeb44417f40	2019-06-26 15:39:38 -07:00
Igor Fedan	04fe2453c4	conv2d/conv3d for LongTensor (#20730 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20730 Generates forward conv2d function for LongTensor Differential Revision: D15423753 fbshipit-source-id: 0e770b61257cc4c6559581796bf104ef68155c84	2019-06-26 15:29:56 -07:00
Wanchao Liang	3ba72a11db	Revert D15999938: [jit] Add the rest of the `dict` API Differential Revision: D15999938 Original commit changeset: 7bc2a55e3f79 fbshipit-source-id: e377c00e990d6f058960936e69712b77851c06fa	2019-06-26 14:16:37 -07:00
Brian Vaughan	7707dee761	Re apply optional ScalarType changes (#22237 ) Summary: This is (mostly) the re-application of: https://github.com/pytorch/pytorch/pull/21088 which was reverted due to an issue conflicting with changes in: https://github.com/pytorch/pytorch/pull/22104 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22237 Differential Revision: D16012838 Pulled By: nairbv fbshipit-source-id: 35f4a73c97ab68b4e2648aca96b2176f07b5a883	2019-06-26 13:36:25 -07:00
Sebastian Messmer	8b02522b93	Avoid copy in ArrayRef<->vector comparison (#22218 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22218 - Differential Revision: D15990763 fbshipit-source-id: 53c98f915fadc8a65aea896c80292d5804d967a4	2019-06-26 13:36:21 -07:00
Vitaly Fedyunin	516c7e4456	Adding memory_format to empty and empty_like operators (#20558 ) Summary: Original RFC https://github.com/pytorch/pytorch/issues/19092 To ensure that we are not introducing BC breaking change, empty_like returns contiguous tensor by default. ```python nCwh = torch.randn(N, C, H, W) nhwC = nCwh.contiguous(memory_format=torch.channels_last) new_nCwh = torch.empty_like(nhwC) new_nCwh.is_contiguous(memory_format=torch.channels_last) == False ``` Now we need a way to preserve memory format in `empty_like` ```python nCwh = torch.randn(N, C, H, W) nhwC = nCwh.contiguous(memory_format=torch.channels_last) new_nhwC = torch.empty_like(nhwC, memory_format=torch.preserve_format) new_nhwC.is_contiguous(memory_format=torch.channels_last) == True like_nCwh = torch.empty_like(nCwh, memory_format=torch.preserve_format) like_nCwh.is_contiguous(memory_format=torch.channels_last) == False ``` Usage of `torch.preserve_format` allows us to avoid `if` constructs. We can also generate different memory format outputs ```python nCwh = torch.randn(N, C, H, W) nhwC = nCwh.contiguous(memory_format=torch.channels_last) new_nhwC = torch.empty_like(nCwh, memory_format=torch.channels_last) new_nhwC.is_contiguous(memory_format=torch.channels_last) == True new_nCwh = torch.empty_like(nhwC, memory_format=torch.contiguous_format) new_nCwh.is_contiguous(memory_format=torch.channels_last) == False ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/20558 Differential Revision: D15502474 Pulled By: VitalyFedyunin fbshipit-source-id: 2e120d57eefad6fb8e04b8322c79871392f64331	2019-06-26 11:48:27 -07:00
Richard Zou	5bdc4db26e	Refactor named tensor helper code (#22150 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22150 ghimport-source-id: 460022febc24f49b86f0d5dbf8dc227564bde6cb Test Plan: Imported from OSS Differential Revision: D15970325 Pulled By: zou3519 fbshipit-source-id: 86a3e3ca82bbf4ff815431e25c5f9a35fcd23be0	2019-06-26 11:33:29 -07:00
Xiaomeng Yang	29b53b0259	Fix bug in caffe2 transpose on GPU (#22233 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22233 Fix bug in caffe2 transpose on GPU Reviewed By: hl475 Differential Revision: D15994973 fbshipit-source-id: 542dc8757b51a6322fffa55826c1d4e32927398d	2019-06-26 11:33:25 -07:00
davidriazati	2dc9643080	Better error message for mismatched dict key type (#22231 ) Summary: ](https://our.intern.facebook.com/intern/diff/15993936/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/22231 Pulled By: driazati Differential Revision: D15993936 fbshipit-source-id: 6822ef01477a3b32beb8c037a621fa71abd022c8	2019-06-26 10:46:45 -07:00
davidriazati	af9e0085f2	Add the rest of the `dict` API (#21979 ) Summary: This adds the rest of the `dict.???` methods that were missing Pull Request resolved: https://github.com/pytorch/pytorch/pull/21979 Pulled By: driazati Differential Revision: D15999938 fbshipit-source-id: 7bc2a55e3f791015a0ff2e3731703075cf0770ee	2019-06-26 10:40:29 -07:00
Tongzhou Wang	25eae3ed08	Disable test_proper_exit flaky worker_kill (#22208 ) Summary: I learned from https://github.com/pytorch/pytorch/pull/22058 that `worker_kill` is just flaky, regardless of `hold_iter_reference`. So let's disable it altogether for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22208 Differential Revision: D15990307 Pulled By: soumith fbshipit-source-id: d7d3f4fe7eaac4987f240cb8fd032c73a84157d7	2019-06-26 09:47:40 -07:00
Mingzhe Li	a4f281446b	introduce flags to set omp and mkl threads (#21472 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21472 as title Reviewed By: hl475 Differential Revision: D15695846 fbshipit-source-id: 44437f6b94a9c583275fcc711bb6ccf2b04f90fc	2019-06-26 09:33:05 -07:00
Will Feng	5f84f372a6	Use variable_data() in tensor_to_numpy (#22214 ) Summary: As part of the Variable/Tensor merge, we want to gradually remove call sites of `tensor_data()` and the API itself, and instead uses `variable_data()`. This PR removes the `tensor_data()` call in the tensor_to_numpy conversion path. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22214 Differential Revision: D15997397 Pulled By: yf225 fbshipit-source-id: 6fcab7b14e138824fc2adb5434512bcf868ca375	2019-06-26 08:57:47 -07:00
Vincent Quenneville-Belair	f176950a67	Use lower case for strong wolfe option. (#22092 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22092 ghimport-source-id: ccc53ed2f1e16865237334a4dde4d162e21762e5 Test Plan: Imported from OSS Differential Revision: D15955996 Pulled By: vincentqb fbshipit-source-id: 8ffbea3b9ef8ff7021d42524fa46112da8a3438e	2019-06-26 08:20:25 -07:00
Richard Zou	9f22805cc6	Refactor function_wrapper.create_generic (#22077 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22077 ghimport-source-id: 39cf0a2e66e7fa2b6866af72782a22a4bd025e4c Test Plan: - Compared the build/aten/src folder before and after this change locally and verified they are identical (`diff -r`). - Wait for CI + Also, [namedtensor ci] Imported from OSS Differential Revision: D15941967 Pulled By: zou3519 fbshipit-source-id: d8607df78f48325fba37e0d00fce0ecfbb78cb36	2019-06-26 08:20:21 -07:00
Igor Fedan	b297552887	Make nn functions configurable for different scalar types (#20729 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20729 Currently there is no way to specify what scalar types each nn function will support. This change will allow to specify supported scalar types for each function/backward function and device. By default each function will support Float, Double, Half. If you want to scpecify any extra supported scalar types, other then default, you will need to change nn.yalm: - name: _some_func(Tensor self) cname: SomeFunction CPU: forward_scalar_types: ['Float', 'Double', 'Long'] backward_scalar_types: ['Float', 'Double'] Differential Revision: D15423752 fbshipit-source-id: b3c157316d6e629bc39c1b377a3b23c71b1656cf	2019-06-26 07:53:38 -07:00
peter	95b5718007	Prevent VS from emitting errors when using swap in Optional.h (#22182 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/21706 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22182 Differential Revision: D15981740 Pulled By: ezyang fbshipit-source-id: d58b3ca3aea8d3d383150208b87fa4bbd4f6fe33	2019-06-26 07:29:35 -07:00
Tongzhou Wang	fde75a33e1	update IterableDataset doc to be consistent with current behavior Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22230 Differential Revision: D15994680 Pulled By: ezyang fbshipit-source-id: 9e47e8369aa08a550987c4468ce75aa7650ee1d4	2019-06-26 06:49:22 -07:00
Nikolay Korovaiko	655a370859	restoring HEADs for ideep and onnx to more recent versions Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22250 Differential Revision: D16003227 Pulled By: Krovatkin fbshipit-source-id: bf906a8e9e5e0f79391e5984c6cdfb9638d84981	2019-06-26 02:19:17 -07:00
Pieter Noordhuis	17b37eb353	Bump gloo (#22225 ) Summary: This includes: * Removal of builder classes * Add allgatherv * Add bcube allreduce algorithm Pull Request resolved: https://github.com/pytorch/pytorch/pull/22225 Differential Revision: D16003629 Pulled By: pietern fbshipit-source-id: fd062b82bfeeddb8190206d9931a781c7daff6f9	2019-06-26 00:55:36 -07:00
Thomas Viehmann	c1fc2f25c2	export deleteFunction in torch/csrc/autograd/function.h (#22236 ) Summary: In `torch/csrc/autograd/function.h` we define `torch::autograd::Function`, a (the?) central autograd record-holding class. `Function` is declared public API (`TORCH_API`). We also define a custom deleter `deleteFunction` which we use throughout PyTorch's own use of `Function`. This trivial PR declares the deleter public API as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22236 Differential Revision: D16001335 Pulled By: yf225 fbshipit-source-id: 6ef0a3630e8f82f277a0e6e26cc64455ef7ee43e	2019-06-25 20:46:09 -07:00
Ailing Zhang	e8bc992b03	print device when it's not on default device (#22094 ) Summary: we used to not print device when it's on xla. It's sometimes confusing as it looks the same as cpu tensor... Pull Request resolved: https://github.com/pytorch/pytorch/pull/22094 Differential Revision: D15975405 Pulled By: ailzhang fbshipit-source-id: f19ceb9e26f5f2f6e7d659de12716f0dfe065f42	2019-06-25 20:28:50 -07:00
Jiakai Liu	1a164bf30b	remove unused mkldnn include (#22217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22217 Seems introduced in https://github.com/pytorch/pytorch/pull/19209/files which is no longer used. Remove it to simplify mobile build. Reviewed By: bddppq Differential Revision: D15983344 fbshipit-source-id: 37ee0bfbd022da09af6bc44c6e3fec1c99a8e732	2019-06-25 17:45:39 -07:00
Sebastian Messmer	de85abf226	Allow default construction of Dict/List (#22084 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22084 For DictPtr/ListPtr, default construction was disallowed because it was ambigious if it's supposed to create an empty list or a nullptr. But since we renamed them to Dict/List, we can now allow default construction without ambiguity. Differential Revision: D15948098 fbshipit-source-id: 942a9235b51608d1870ee4a2f2f0a5d0d45ec6e6	2019-06-25 17:40:48 -07:00
Sebastian Messmer	e425789286	Fix "missing return statement" warning (#22216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22216 - Differential Revision: D15989670 fbshipit-source-id: d0534a3bf1eef29657738e271d35503a2f75a043	2019-06-25 16:57:42 -07:00
Wanchao Liang	f7a126f941	fix optional type subtype relation (#22186 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22186 ghimport-source-id: 05ef8c3a176fe2a67d4835888e6db52b57a6d199 Test Plan: Imported from OSS Differential Revision: D15994644 Pulled By: wanchaol fbshipit-source-id: 7c5c4eebd421f6c9470661c2c2eb38bafdff8bbd	2019-06-25 16:57:38 -07:00
David Riazati	defd23b8b9	Clean up old uses of checkScript (#22002 ) Summary: This cleans up the `checkScript` API and some old tests that were hardcoding outputs. It also now runs the Python function when a string is passed in to verify the outputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22002 Differential Revision: D15924485 Pulled By: driazati fbshipit-source-id: ee870c942d804596913601cb411adc31bd988558	2019-06-25 16:24:19 -07:00
Lara	7b1ffba3bf	ArgumentStash for Scalar arguments (#21931 ) Summary: Scalars are being traced as constants. This PR is to fix this issue. The ONNX Graph for Test_Full_op() before and after this change: def Test_Full_op(): class Test_Full(nn.Module): def forward(self, x): return torch.full((3, 4), x, dtype=torch.long) model = Test_Full() x = torch.tensor(12) output = model(x) Before this change: graph(%input1 : Long()): %output1 : Float(3, 4) = onnx::Constant[value=<Tensor>] return (%output1) After this change: graph(%input1 : Long()): %1 : int[] = onnx::Constant[value= 3 4 [ Variable[CPULongType]{2} ]] %2 : Tensor = onnx::ConstantOfShape[value={0}] %output1 : Float(3, 4) = onnx::Add(%2, %input1) return (%output1) Similar PR : https://github.com/pytorch/pytorch/pull/12939 Pull Request resolved: https://github.com/pytorch/pytorch/pull/21931 Reviewed By: zrphercule Differential Revision: D15950066 Pulled By: houseroad fbshipit-source-id: 3470665d88fa34faa600940ef16b069a06002cd5	2019-06-25 15:22:08 -07:00
Cheng,Penghui	7ee82d48a8	Removed work around for convolution transpose op since the bug has be… (#22184 ) Summary: …en fixed in v0.18 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22184 Differential Revision: D15982627 Pulled By: bddppq fbshipit-source-id: 8725d5b5e5b68e029ffb08af12b416bd310c9638	2019-06-25 14:34:34 -07:00
Zachary DeVito	5b87049c66	remove uses of std::shared_ptr<Module> (#21934 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21934 ghimport-source-id: e64ab9096f43749ead3ac5567675b815da295664 Test Plan: Imported from OSS Differential Revision: D15892401 Pulled By: zdevito fbshipit-source-id: 6424139206593ff944556c69d8a54723884eacaf	2019-06-25 13:24:38 -07:00
Pieter Noordhuis	1d705b4b07	Run clang-format on c10d bits (#22194 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22194 TSIA Differential Revision: D15983780 fbshipit-source-id: 1365bcf9bbc262a3657f646e81d2fc9c32f24c97	2019-06-25 12:34:52 -07:00
Liuyi Jin	f5a1ea170b	SIMD version average pooling added (#22148 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22148 Average pooling is added into dnnlowp optimization code. Reviewed By: jspark1105 Differential Revision: D15936556 fbshipit-source-id: 6177ee62529801898f230c6fb89e9c4b598593a5	2019-06-25 12:19:21 -07:00
Uladzislau Paulovich	a7cb07eb0f	Add missing algorithm header to Array utility (#22157 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22157 This header uses `std::swap_ranges` function which is defined in `<algorithm>` header (https://en.cppreference.com/w/cpp/algorithm/swap_ranges). Therefore this file isn't guaranteed to compile on all platforms. This diff fixes the problem by adding the missing header. Reviewed By: smessmer Differential Revision: D15971425 fbshipit-source-id: e3edcec131f72d729161f5644ee152f66489201a	2019-06-25 12:19:17 -07:00
Pieter Noordhuis	6ff0c6ca3f	Remove THD (#22065 ) Summary: It's been ~9 months since moving THD to the `torch.distributed.deprecated` namespace (see https://github.com/pytorch/pytorch/issues/11405) and we haven't seen issues related to it, so it's time to remove it. Closes https://github.com/pytorch/pytorch/issues/18967. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22065 Reviewed By: mrshenli Differential Revision: D15983669 Pulled By: pietern fbshipit-source-id: 2a2f5866f9a63040bc7cef3956d5fd215aba7165	2019-06-25 12:19:13 -07:00
vishwakftw	bcb5fd8f06	Port symeig to ATen and enable batching of inputs (#21858 ) Summary: Changelog: - Port `symeig` from TH/THC to ATen - Enable batching of matrix inputs for `symeig` - Modify derivative computation based on batching - Update docs to reflect the change Pull Request resolved: https://github.com/pytorch/pytorch/pull/21858 Test Plan: - Added additional tests in `test_torch.py` (with a port to `test_cuda.py`) and `common_methods_invocations.py` to test if both the port and batching work. Differential Revision: D15981789 Pulled By: soumith fbshipit-source-id: ab9af8361f8608db42318aabc8421bd99a1ca7ae	2019-06-25 12:13:27 -07:00
Sebastian Messmer	4ec6fbefa6	Show deprecation warning when stateful lambdas are used as kernels (#21885 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21885 If a kernel is defined as a stateful lambda static auto registry = torch::RegisterOperators().op("my::op", [some_closure] (Tensor a) {...}); this can have very unexpected behavior when kernels are instantiated. There is no guarantee that the state is kept. In the options based API, state is already disallowed: // this is a compiler error static auto registry = torch::RegisterOperators().op("my::op", torch::RegisterOperators::options().kernel([some_closure] (Tensor a) {...})); but we can't disallow it in the non-options-based API for backwards compatibility reasons. We can, however, show a deprecation warning. This is what this diff introduces. Differential Revision: D15867089 fbshipit-source-id: 300fa4772fad8e7d177eb7cb910063d360537a4a	2019-06-25 11:53:18 -07:00
Ailing	c68119387d	serialize torch.Size object (#20952 ) Summary: fixes https://github.com/pytorch/pytorch/issues/20823 Pull Request resolved: https://github.com/pytorch/pytorch/pull/20952 Differential Revision: D15514274 Pulled By: ailzhang fbshipit-source-id: 8340a40fadfd06063f7f33b0d99d693e74d5defb	2019-06-25 10:44:35 -07:00
Ivan Ogasawara	7daa96a3ce	porting convtranspose3d to ATen (#22019 ) Summary: Resolves https://github.com/pytorch/pytorch/issues/18353 CPU and GPU porting for convolution transpose 3d Pull Request resolved: https://github.com/pytorch/pytorch/pull/22019 Differential Revision: D15985353 Pulled By: ezyang fbshipit-source-id: 1c579577a32db24a1ce38f5ab9b3f1cb9c8f2a6e	2019-06-25 10:22:34 -07:00
xiaobing.zhang	9af8ea1ce5	Not expose mkldnn reshape and transpose (#22193 ) Summary: This PR is to make mkldnn reshape and transpose not exposes as Tensor API, please see the comments in https://github.com/pytorch/pytorch/pull/21943. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22193 Differential Revision: D15983434 Pulled By: bddppq fbshipit-source-id: ad3514dfd8a3b0d89442eef752864e5d3f3d04f0	2019-06-25 09:52:47 -07:00
mal	c8b5f1d2f8	Switch autograd to use a pool of workers for each device (#21911 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21911 ghimport-source-id: 3b7d37481201aa4b4ca8f7767603d0dfd13f871f Test Plan: Tested on https://github.com/pytorch/pytorch/issues/6959 and ensured no Recursion Error. Performance testing: [word_language_model](https://gist.github.com/malvika2147/34c214871d549f9275812f2d20506990) (no significant change) [mnist](https://gist.github.com/malvika2147/77890eef102099490a1029122fb20dd0) (no significant change) [Comparison of performance](https://gist.github.com/malvika2147/c0a8790910b8513bd2e20b224bdd6300) on https://github.com/pytorch/pytorch/issues/6959 with smaller inputs. (slower by about ~25%, expected) Imported from OSS Differential Revision: D15985852 fbshipit-source-id: ca172690857fd1718462b80f3a244af9d8825d6c	2019-06-25 09:08:26 -07:00
Mads R. B. Kristensen	94e83da55c	Optimization of the Embedding and Embedding-Bag CUDA Kernel (#22016 ) Summary: Re-implementation of the `embedding_dense_backward_cuda()` and the `embedding_bag_backward_cuda_sum_avg()` functions. #### Performance Running a [Mortgage Workflow](https://github.com/EvenOldridge/MortgageWorkflowA) with a block size of 100K on a DXG-2 (single GPU), we see a 270% speedup: ``` Original version: 370,168 example/s Optimized version: 1,034,228 example/s ``` The original version is bounded by the `EmbeddingBag_accGradParametersKernel_sum_avg`, which takes 70% of the CUDA execution time. In the optimized version, the optimized kernel now takes only 17% of the time. #### Greater Numerical Stability An added benefit is greater numerical stability. Instead of doing a flat sum where a single variable are used to accumulate the weights, this code uses two-steps where each GPU-thread computes a sub-result defined by `NROWS_PER_THREAD` before the final result are accumulated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22016 Differential Revision: D15944339 Pulled By: mrshenli fbshipit-source-id: 398d5f48826a017fc4b31c24c3f8b56d01830bf0	2019-06-25 08:14:15 -07:00
Hong Xu	b0bd8758fc	Further remove redundant CMake option passing code for those CMake variables that are directly controlled by environment variables but with a different name. (#22154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22154 ghimport-source-id: 714b98566e70063925c4c9e10940a4fe46fb5a3d Test Plan: Imported from OSS Differential Revision: D15985376 Pulled By: ezyang fbshipit-source-id: 60710125009cd8bf60b5600a3f05854d931d9844	2019-06-25 07:23:06 -07:00
Hong Xu	ce1a9653a8	Remove more build options not needed to be explicitly set in Python build scripts. (#22153 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22153 ghimport-source-id: 129d90626a8e64079477a744fbbaba58e139a852 Test Plan: Imported from OSS Differential Revision: D15985375 Pulled By: ezyang fbshipit-source-id: 925bb1c886633b002beb1da0754bb055aa971e21	2019-06-25 07:23:03 -07:00
Syed Tousif Ahmed	839b496fbd	Fixes bugs in torch.multinomial without replacement (#22183 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22183 ghimport-source-id: f03c17178de115adbe983953a8f9f205e3df7721 Test Plan: Imported from OSS Differential Revision: D15985324 Pulled By: ezyang fbshipit-source-id: 6e9dc3b54d448f4bb374b004d7f1dd1ac5c014f6	2019-06-25 07:15:18 -07:00
Xiaomeng Yang	b61693c0ed	Optimize InstanceNormOp forward (#22130 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22130 Optimize InstanceNormOp forward For InstanceNormOp on CPU with order = NHWC, N = 128, C = 256, H = 56, W = 56: 183ms -> 115ms. For InstanceNormOp on GPU with N = 256, C = 256, H = 112, W = 112: NCHW: 1475ms -> 45ms NHWC: 1597ms -> 79ms Reviewed By: houseroad Differential Revision: D15963711 fbshipit-source-id: 3fa03109326456b9f301514fecbefa7809438d3e	2019-06-25 01:04:53 -07:00
Le Fang	ac4913ee62	support both regularizable and sofmax re-weighting on sparse features in dot product (#22176 ) Summary: In order to select more important features in dot product among a list of candidate sparse features, we can assign one learnable weight on each feature, reweight each feature by multiplying the weight onto its embedding before dot product. We finally select features based on the weight magnitude after training. We can perform L1 and/or L2 regularization on the weights. To summarize, the weights tend to shrink their values (avoiding overfitting) due to L2 regularization, and some weights will vanish to zero as L1. To avoid sparse feature embedding being ignored due to early collapse of weights, a piece lr warm up policy is used in optimizing regularization term, such that regularization is weak at first stage and gets stronger afterwards (a small lr constant in iters less than threshold 1, a medium lr constant in stage 2, and a final reasonable large lr constant in all iters after threshold 2). The features with nonzero and relatively large weights (in absolute value) will be selected for the module. We can also apply softmax on the original weights to make it sum to 1. We can even boosting the softmaxed weights by multiply the number of softmax components, which essentially make them sum to the number of softmax components and avergae to 1. In this idea, all the weights are positive and sum to a constant. Regularization is not a must since we can count on the competition between softmax weights themselves to achieve reasonable re-weighting. We expect those weights be more dense, comparing with sparse ones from L1 regularization and we can select features based on top K weights. Overall, we aim to demonstrate the selected feature set outperform current v0 feature set in experiments. Special acknowledgement goes to Shouyuan Chen, who initiated the work of regularizable weighting. --- Pull Request resolved: https://github.com/pytorch/pytorch/pull/22176 The diff will export updates to Github repository, as stated below. {F162787228} Basically, the updates on the files are summarized as below: - adding logger messages `caffe2/python/layer_model_helper.py` - add ElasticNet regularizer, which combines both L1 and L2 regularization `caffe2/python/regularizer.py` - implement piecewarmup, specifically warm up with three constant pieces `caffe2/sgd/learning_rate_functors.h, caffe2/sgd/learning_rate_op.cc, caffe2/sgd/learning_rate_op.h` Differential Revision: D15923430 fbshipit-source-id: ee18902cb88c23b1b7b367cc727d690a21e4cda9	2019-06-24 21:27:33 -07:00
Hong Xu	299ea84a70	Use latest stable flake8-bugbear in CI and fix B011 flake8 error. (#21944 ) Summary: - PyCQA/flake8-bugbear#53 has been fixed (but not yet closed on their side) and a new version of flake8-bugbear has been released on Mar 28, 2019. Switch CI to use the latest stable version. - Fix the new B011 errors that flake8-bugbear catches in the current codebase. --- B011: Do not call assert False since python -O removes these calls. Instead callers should raise AssertionError(). Pull Request resolved: https://github.com/pytorch/pytorch/pull/21944 Differential Revision: D15974842 Pulled By: soumith fbshipit-source-id: de5c2c07015f7f1c50cb3904c651914b8c83bf5c	2019-06-24 20:48:15 -07:00
Thomas Viehmann	f5df0c9104	Don't end on inplace operators in einsum (#22111 ) Summary: Returning the result of an inplace `squeeze_` in `einsum` (which itself is traced) interacts badly with `autograd.Function`. I must admit that I'm not 100% certain whether it should be necessary to change this, but I consider this a good change overall. Fixes: https://github.com/pytorch/pytorch/issues/22072 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22111 Differential Revision: D15974990 Pulled By: soumith fbshipit-source-id: 477e7f23833f02999085f665c175d062e7d32acd	2019-06-24 20:39:20 -07:00
iurii zdebskyi	ede08492e1	Enabled mul for bool tensors on CUDA (#21771 ) Summary: Enable mul_cuda for bool tensors. This is a helper PR to fix a [test failure](https://circleci.com/gh/pytorch/pytorch/1992191?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link) in the other [PR](https://github.com/pytorch/pytorch/pull/21113). Pull Request resolved: https://github.com/pytorch/pytorch/pull/21771 Differential Revision: D15883737 Pulled By: izdeby fbshipit-source-id: 4c39644bbe8e80da4d14570862589944285d4bfe	2019-06-24 18:37:29 -07:00
Corentin Dancette	3b700a43d5	Add missing whitespace in error message (#21904 ) Summary: The current error message displays as: `RuntimeError: index koccurs twice in output` A whitespace is missing between the index and 'occurs' Pull Request resolved: https://github.com/pytorch/pytorch/pull/21904 Differential Revision: D15878941 Pulled By: colesbury fbshipit-source-id: 163dda1829bf4956978cd01fd0e751673580722d	2019-06-24 15:32:46 -07:00
Stefan Krah	88cdc16835	AveragePool: expand incomplete kernel_size for the C++ API Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22075 Differential Revision: D15945260 Pulled By: mrshenli fbshipit-source-id: 827660c19ebbdb5f0aae2f4eadb6025ae2f93674	2019-06-24 15:32:41 -07:00
Stefan Krah	2372e7ed2e	DilatedMaxPool: expand incomplete kernel_size for the C++ API (#22073 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/22032. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22073 Differential Revision: D15944471 Pulled By: mrshenli fbshipit-source-id: 84b265be00d67aa7f13508ede0646763d2339f1d	2019-06-24 15:32:36 -07:00
Adam Paszke	b2a39314e7	Make Dropout.__repr__ consistent with other modules (#22110 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/22106. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22110 Differential Revision: D15958821 Pulled By: ezyang fbshipit-source-id: 89381dc3bfa79544580e20fea906cef4f5101b61	2019-06-24 15:27:06 -07:00
Hong Xu	273b6c5bae	Cast return value of vector.at() to void to avoid nodiscard warning in MSVC. (#22061 ) Summary: Fix https://github.com/pytorch/pytorch/issues/22053 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22061 Differential Revision: D15957983 Pulled By: ezyang fbshipit-source-id: e4416c5f0db2bc6b8bfaa27be52b942148ec7b3d	2019-06-24 15:27:02 -07:00
Ziyang Hu	0ac28c8966	Quick fix for #18215 , the CPU case (#21910 ) Summary: The bug is that when target_length == 0, there is no preceding BLANK state and the original implementation will lead to out of bound pointer access. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21910 Differential Revision: D15960239 Pulled By: ezyang fbshipit-source-id: 7bbbecb7bf91842735c14265612c7e5049c4d9b3	2019-06-24 15:26:58 -07:00
Adam Paszke	41d0525de3	Improve repr for IncompatibleKeys (#22119 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/20128. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22119 Differential Revision: D15961965 Pulled By: ezyang fbshipit-source-id: 9cc397726e6bea5580e79d291cfc1ee75337fa0c	2019-06-24 15:26:54 -07:00
Adam Paszke	f1775796dd	Fix minor issues with #21736 (#22074 ) Summary: cc mrshenli Pull Request resolved: https://github.com/pytorch/pytorch/pull/22074 Differential Revision: D15965376 Pulled By: mrshenli fbshipit-source-id: 50ff96de6390817d8ea52c04322c6bee3d649b32	2019-06-24 15:18:26 -07:00
Hong Xu	a45898931c	Document the Boolean tensor type. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21601 Differential Revision: D15971573 Pulled By: gchanan fbshipit-source-id: c07c57f989980149cb1307dcca6ba64dce52d0ef	2019-06-24 14:16:36 -07:00
Ilia Cherniavskii	7c4206499e	Fix in ivalue::Future (#22114 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22114 ghimport-source-id: 249b76c078e7af8ebb6cab113dd48dbd3e31e8dc Test Plan: ran intra_inter_benchmark with PARALLEL_BACKEND=NATIVE build Imported from OSS Differential Revision: D15958901 Pulled By: ilia-cher fbshipit-source-id: 1c3dedc4cf1ff8166aeb26899a06c7287a499562	2019-06-24 12:56:46 -07:00
Ilia Cherniavskii	6350dbddd1	Fix sequential MKL case (#22062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22062 ghimport-source-id: a30255d7453c4ffecf40215a785c1e06b7296368 Test Plan: USE_CUDA=0 PARALLEL_BACKEND=OPENMP BLAS=MKL USE_MKLDNN=1 MKL_SEQ=1 MKLDNN_THREADING=SEQ BUILD_BINARY=1 python setup.py develop --cmake ./build/bin/parallel_info Imported from OSS Differential Revision: D15938079 Pulled By: ilia-cher fbshipit-source-id: e7ef0c5bc75ebb845ebe66bf76a4070d45305b35	2019-06-24 12:56:43 -07:00
Nick Korovaiko	21da33f0f9	Better trace comments Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22090 Differential Revision: D15968440 Pulled By: Krovatkin fbshipit-source-id: e55e03a4303adbaa576c4384e7a42410bd99da6e	2019-06-24 12:51:27 -07:00
Bram Wasti	f1c7fa0503	De-deprecate some warnings that hurt usability (#21999 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21999 ghimport-source-id: a77b3aea3d3ed33f328e143203730f2655371837 Test Plan: Imported from OSS Differential Revision: D15925892 Pulled By: bwasti fbshipit-source-id: 2b4e0af40bc1c6d12c617ba8701d3a5f7a6d833d	2019-06-24 12:35:00 -07:00
Nikolay Korovaiko	2347a4032b	Fix tracing docs and add more comprehensive examples (#22082 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/21857 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22082 Differential Revision: D15968306 Pulled By: Krovatkin fbshipit-source-id: a76e500b0b7192bd814931ec48bbe9c37b8b92e0	2019-06-24 12:10:19 -07:00
Chandler Zuo	85cbe0d825	Fix Concat Dimension Bug (#22088 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22088 This diff is similar to D14163001. We need to handle the edge case when add_axis=1. Reviewed By: jspark1105 Differential Revision: D15949003 fbshipit-source-id: 328d1e07b78b69bde81eee78c9ff5a8fb81f629b	2019-06-24 10:32:48 -07:00
Johannes M Dieterich	322261a4de	Fix dispatching of backwards kernel for ROCm. (#22125 ) Summary: Use WARP_SIZE consistently also for the dispatch dimensions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22125 Differential Revision: D15966661 Pulled By: bddppq fbshipit-source-id: 93eb663e01aff3b49474504a2f96f060919edf0c	2019-06-24 10:32:44 -07:00
Michael Suo	e016a424ef	Revert D15944971: [pytorch][PR] merge interfaces that have an optional scalartype parameter Differential Revision: D15944971 Original commit changeset: 53473c370813 fbshipit-source-id: a18158b448cb8993b12e1a3bf2c2a3e0d6df6b10	2019-06-24 09:41:33 -07:00
byronhe	6edaa11e5a	fix broken link Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22064 Differential Revision: D15951107 Pulled By: mrshenli fbshipit-source-id: 0b8f97bd2bbac26855cd2889e1fc619770974ee2	2019-06-24 07:34:16 -07:00
Pieter Noordhuis	77eda8de8e	Support sparse gradients in DistributedDataParallel (#22037 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22037 This adds support for sparse gradients to the reducer as well as to the DistributedDataParallel wrapper. Note that an out of band signal is needed whether or not a dense parameter (e.g. an embedding) is expected to receive a sparse gradient or not. This information is passed to the bucket assignment computation routine and the reducer as a vector of booleans. Every parameter for which we expect a sparse gradient is assigned its own bucket, as we cannot easily group multiple unrelated sparse tensors. Reviewed By: mrshenli Differential Revision: D15926383 fbshipit-source-id: 39c0d5dbd95bf0534314fdf4d44b2385d5321aaf	2019-06-24 07:34:12 -07:00
Pieter Noordhuis	a7ec889de4	Add sparse tensor allreduce (#22036 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22036 Implemented only on ProcessGroupGloo, as an allgather of metadata (sparse_dim, dense_dim, and nnz), followed by an allgather of indices, followed by an allgather of values. Once these operations have finished, all ranks locally compute a reduction over these sparse tensors. Works for both CPU and CUDA tensors. This surfaced a problem with the existing assumption of only modifying tensors that are passed at the call site, because for sparse tensors we don't know the dimensions of the output tensors before we run the collective. To deal with this unknown, this commit adds a `result` function to the `c10d::ProcessGroup::Work` class that returns a vector of tensors. It's a bit odd to have to retrieve the result through this function only for operations on sparse tensors. To make this work irrespective of tensor layout, we can create a follow-up commit to make all in place operations make their results accessible through this function as well. This doesn't break any existing contracts but does have the potential to add interface ambiguity. This is a resubmission of #19146. Reviewed By: mrshenli Differential Revision: D15926384 fbshipit-source-id: b6ee5d81606bfa8ed63c3d63a9e307613491e0ae	2019-06-24 07:34:09 -07:00
Syed Tousif Ahmed	313960d52e	Use at::detail::* instead of detail::* to avoid ambiguity in windows (#22029 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22029 ghimport-source-id: d1a26a07faf101c644775267a141ba56cbd3f1c9 Test Plan: Imported from OSS Differential Revision: D15965039 Pulled By: ezyang fbshipit-source-id: 31baf405da6f7c6d9e31f5954ec827889dadf769	2019-06-24 07:18:02 -07:00
Brian Vaughan	142361a7e4	merge interfaces that have an optional scalartype parameter (#21088 ) Summary: This change is backwards incompatible in C++ only on mean(), sum(), and prod() interfaces that accepted either of: ``` Tensor sum(IntArrayRef dim, bool keepdim=false) const; Tensor sum(IntArrayRef dim, ScalarType dtype) const; ``` but now to specify both the dim and dtype will require the keepdim parameter: ``` Tensor sum(IntArrayRef dim, bool keepdim=false, c10::optional<ScalarType> dtype=c10::nullopt) const; ``` [xla ci] Pull Request resolved: https://github.com/pytorch/pytorch/pull/21088 Reviewed By: ailzhang Differential Revision: D15944971 Pulled By: nairbv fbshipit-source-id: 53473c370813d9470b190aa82764d0aea767ed74	2019-06-24 07:17:58 -07:00
Hong Xu	cd0d8480d3	Remove many build options redundantly specified in Python build scripts. (#21877 ) Summary: Currently many build options are explicitly passed from Python build scripts to CMake. But this is unecessary, at least for many of them. This commit removes the build options that have the same name in CMakeLists.txt and environment variables (e.g., `USE_REDIS`). Additionally, many build options that are not explicitly passed to CMake are lost. For `ONNX_ML`, `ONNX_NAMESPACE`, and `BUILDING_WITH_TORCH_LIBS`, I changed their default values in CMake scripts (as consistent with what the `CMake.defines` call meant), to avoid their default values being redundantly set in the Python build scripts. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21877 Differential Revision: D15964996 Pulled By: ezyang fbshipit-source-id: 127a46af7e2964885ffddce24e1a62995e0c5007	2019-06-24 07:17:54 -07:00
Pearu Peterson	1b34ccfc78	Porting SpatialDilatedConvolution and VolumetricDilatedConvolution to ATen (#20983 ) Summary: This PR tackles issue https://github.com/pytorch/pytorch/issues/18352 . Progress: - [x] conv_dilated2d CPU - [x] conv_dilated3d CPU - [x] conv_dilated2d CUDA - [x] conv_dilated3d CUDA - [x] RocM port - [x] Port of CUDA gemm and gemv - [x] Refactored 2d and 3d functions as well as output and gradient computations into a single C++ template function - [x] Cleanup + [x] eliminate forward functions + [x] eliminate buffers `columns` and `ones` from functions API + [x] eliminate out functions + [x] eliminate using `ones` Note that col2im, im2col, col2vol, vol2col implementations are exposed in `ATen/native/im2col.h` and `ATen/native/vol2col.h`. The corresponding operators (not ported in this PR) should use these. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20983 Differential Revision: D15958088 Pulled By: ezyang fbshipit-source-id: 1897f6e15abbf5710e9413cd1e443c2e1dc7d705	2019-06-24 07:12:54 -07:00
peter	3ba654e6d5	Add finding thnvrtc_library into torchconfig.cmake (#22126 ) Summary: Fixes https://github.com/pytorch/pytorch/pull/21861#issuecomment-504805368 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22126 Differential Revision: D15964930 Pulled By: ezyang fbshipit-source-id: 0fb749784bec9af5a8ccbcf775fa7d9d4d34a4c6	2019-06-24 07:04:44 -07:00
Soumith Chintala	08060e898b	Revert D15435461: [pytorch][PR] PyTorch ThroughputBenchmark Differential Revision: D15435461 Original commit changeset: db08829dc3f4 fbshipit-source-id: 72a0eac1658b2d3f885bc9a21c49fcc23030ae3e	2019-06-23 22:55:05 -07:00
Wanchao Liang	d96ce9b9fe	add for in dict support (#22006 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22006 ghimport-source-id: d9686c0b61b0eea3787f48adce567249e4e8faf0 Test Plan: Imported from OSS Differential Revision: D15948548 Pulled By: wanchaol fbshipit-source-id: 4227502ca050099085ad481aef725ac2cab06d74	2019-06-23 20:49:35 -07:00
Wanchao Liang	c9344fc9c4	add for in string support (#21990 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21990 ghimport-source-id: 69b4882f8602c4088e7a833c43fd3cd37501a3c0 Test Plan: Imported from OSS Differential Revision: D15948547 Pulled By: wanchaol fbshipit-source-id: 057e7f4fb67c6dca98458ceb14414368e1a86260	2019-06-23 20:49:30 -07:00
Wanchao Liang	eab35756d8	support iteration tuple unpacking (#21985 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21985 ghimport-source-id: 1f20a8db7b6bad23b18ac1caefcb46b3fa141697 Test Plan: Imported from OSS Differential Revision: D15948549 Pulled By: wanchaol fbshipit-source-id: 758c9c3dfad40c4158aee21ddebcd25b711111d7	2019-06-23 20:49:26 -07:00
Alexander Sidorov	9b45237618	PyTorch ThroughputBenchmark (#20766 ) Summary: This is useful for measuring inference performance of your models. This is a very basic benchmark for now. We don't support batching on the benchmark side, no inter and intra op parallelizm is supported yet, just caller based parallelizm. Main phylosophy here is that user should be able to provide inputs from python and just stack them within the benchmark. API should be exactly the same as passing inputs to module.forward. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20766 Test Plan: Added a new unit test Differential Revision: D15435461 Pulled By: salexspb fbshipit-source-id: db08829dc3f4398bb1d8aa16cc4a58b6c72f16c6	2019-06-23 13:03:18 -07:00
Adam Paszke	c0f96aaf01	Restore default values on premature test exit (#22115 ) Summary: Previously any assert failures would leave the updated setting, making the test suite semantics dependent on the order in which the tests are run. The diff is large only due to the indentation change (might be good to review without whitespace changes). cc yf225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22115 Differential Revision: D15960875 Pulled By: soumith fbshipit-source-id: 9313695277fc2d968786f13371719e03fff18519	2019-06-23 12:55:00 -07:00
James Reed	887ecf797c	Fix DictType isSubtypeOf (#22104 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22104 ghimport-source-id: 9db14020f424cf2e021d63e9c0fe4017ac7cd6c8 Test Plan: Imported from OSS Differential Revision: D15956726 Pulled By: jamesr66a fbshipit-source-id: 85448deab70c5e5b7ab1132652836ed575581868	2019-06-22 16:36:34 -07:00
Wanchao Liang	45b91bd326	refactor all for in range/tensor tests to be together with other for loop tests (#21950 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21950 ghimport-source-id: b2491313bc2e0fcc10f77167c261cbae4d884ebb Test Plan: Imported from OSS Differential Revision: D15948546 Pulled By: wanchaol fbshipit-source-id: 34dde28902ae5b8affbf6e4deaaffdb1d8ddd6ec	2019-06-22 01:38:14 -07:00
Wanchao Liang	e0f5ab2c2e	Tree based Iterator infrastructure: for in range/list/tensor/zip/enumerate (#21801 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21801 ghimport-source-id: b019d3e9a6f9bf152991a01b40e424dff176ffaa Test Plan: Imported from OSS Differential Revision: D15948545 Pulled By: wanchaol fbshipit-source-id: 6110a0f3ab08cbbb398441e8330f56083ecd2d99	2019-06-22 01:00:42 -07:00
Nikolay Korovaiko	a256b09ce9	Backout Liveness Tests again :-( Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22100 Differential Revision: D15956214 Pulled By: Krovatkin fbshipit-source-id: 9b0c8ecf5b479bf878ffc31acc416bd8dbfe4b50	2019-06-22 00:18:21 -07:00
Ilia Cherniavskii	7b1d6c8912	Update intra_inter_benchmark (#22051 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22051 ghimport-source-id: 70710b3866b1a5e21656b77d2695ada74d00254e Test Plan: PARALLEL_BACKEND=NATIVE_TBB USE_OPENMP=0 USE_TBB=1 MKL_SEQ=1 MKLDNN_THREADING=SEQ USE_CUDA=0 BLAS=MKL USE_MKLDNN=1 BUILD_BINARY=1 python setup.py develop --cmake ./build/bin/intra_inter_benchmark Imported from OSS Differential Revision: D15933951 Pulled By: ilia-cher fbshipit-source-id: 88ad8f7a1634c1612ffaa68f22721ffc73d9b2ba	2019-06-21 23:06:27 -07:00
Jerry Zhang	91bf0a9f9d	Move quantized tensor tests in test_torch.py to test_quantized_tensor.py (#22089 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22089 att Reviewed By: jianyuh Differential Revision: D15950101 fbshipit-source-id: 70acdeeef3a05201d72f986d5a0005832efd75ff	2019-06-21 22:48:34 -07:00
Jongsoo Park	b19b20efef	fix minor comment (#21576 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21576 Fix comment regarding original_tensor Reviewed By: jianyuh Differential Revision: D15733294 fbshipit-source-id: e2957f32dcf90859b77e61c931b64abdd066aabb	2019-06-21 22:23:53 -07:00
James Reed	f7b2778cb1	s/uniqueName/debugName/ (#22096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22096 ghimport-source-id: 8f1d994b98432942b5beeb10bf6d30e447d51997 Test Plan: Imported from OSS Differential Revision: D15956004 Pulled By: jamesr66a fbshipit-source-id: 319d2d20ef0863249a8a2bdd228b4f792d37bfab	2019-06-21 20:54:53 -07:00
SsnL	7d637de771	Reduce excessive CI printing in TestHub (#22043 ) Summary: https://github.com/pytorch/pytorch/pull/21132 reverted https://github.com/pytorch/pytorch/pull/19606. Now these tests again print like 40% lines of CI outputs (e.g., https://circleci.com/gh/pytorch/pytorch/2041825?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link) This PR now uses the functionality introduced in https://github.com/pytorch/vision/issues/862. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22043 Differential Revision: D15947268 Pulled By: ailzhang fbshipit-source-id: f84f4d6b86203dbe8687e04ae3ed8c99df0bdff8	2019-06-21 20:08:44 -07:00
svcscm	63ca908026	Updating submodules Reviewed By: yns88 fbshipit-source-id: d3374d2ee514cc0526559ffbac6dc11918ea71cf	2019-06-21 18:51:07 -07:00
Ailing Zhang	856268c716	Revert D15947873: [JIT] s/uniqueName/debugName Differential Revision: D15947873 Original commit changeset: 31a2b30d0ce9 fbshipit-source-id: ef1c0f120c1835184d8106d176cea58ec6ad40b7	2019-06-21 18:51:03 -07:00
James Reed	36e4b54420	s/uniqueName/debugName (#22048 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22048 ghimport-source-id: a82d80ceec1d8055ce4cf62df10ade4a224109f8 Test Plan: Imported from OSS Differential Revision: D15947873 Pulled By: jamesr66a fbshipit-source-id: 31a2b30d0ce911edf5791ca10040a1e968750b06	2019-06-21 17:59:38 -07:00
Richard Zou	4bc89bd5a6	Implement tensor.select(Dimname,int) (#21795 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21795 ghimport-source-id: d13af6078a47de1d6045cfbb7d278c378fe734fe Test Plan: Imported from OSS Differential Revision: D15833457 Pulled By: zou3519 fbshipit-source-id: fa52aff25ce0e12f31da3eef83ea948b4f7a5d9f	2019-06-21 16:16:45 -07:00
svcscm	18a904c12e	Updating submodules Reviewed By: yns88 fbshipit-source-id: 494a8fe00cbdb782bbdb05eefb17e9166d117599	2019-06-21 15:14:46 -07:00
Nikolay Korovaiko	f164c01f9c	Adding liveness test cases back Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21762 Differential Revision: D15943509 Pulled By: Krovatkin fbshipit-source-id: 4b65bf63ab15a2347da5f7269cc0f2dbb226b330	2019-06-21 15:09:09 -07:00
Ilia Cherniavskii	38aa5a519e	Experimental option to use single thread pool (#22047 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22047 ghimport-source-id: 8731538a091997fd31d6aff59152dc9241de2ba4 Test Plan: EXPERIMENTAL_SINGLE_THREAD_POOL=1 PARALLEL_BACKEND=NATIVE_TBB USE_OPENMP=0 USE_TBB=1 MKL_SEQ=1 MKLDNN_THREADING=SEQ USE_CUDA=0 BLAS=MKL USE_MKLDNN=1 BUILD_BINARY=1 python setup.py develop --cmake ./build/bin/parallel_info ./build/bin/thread_init_test ./build/bin/test_parallel ./build/bin/intra_inter_benchmark Imported from OSS Differential Revision: D15931188 Pulled By: ilia-cher fbshipit-source-id: 1ca1b190b6e16ce5398f2dad72deaf3cb083a43b	2019-06-21 14:54:16 -07:00
Wanchao Liang	5ff06a7b0b	more complete tuple assignments (#21949 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21949 ghimport-source-id: 458793d74af3728bf0338867b081157905a7635a Test Plan: Imported from OSS Differential Revision: D15948550 Pulled By: wanchaol fbshipit-source-id: 9ed69e0859e052816f06fc9c288b905551b2e48c	2019-06-21 14:49:38 -07:00
Johannes M Dieterich	4009089d1f	Sparse BLAS: Remove workaround to check zero length inputs. (#22080 ) Summary: Fix was released with ROCm 2.4. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22080 Differential Revision: D15947427 Pulled By: bddppq fbshipit-source-id: b6b66f4cfc334ddc6140d1d519792d4783ba0efa	2019-06-21 14:45:06 -07:00
Johannes M Dieterich	04e9278306	First round of optimizations for segment_reduction_op kernels. (#22081 ) Summary: Apply launch bounds annotations for ROCm as the maximum threads per block (1024) is higher than the ROCm internal default (256). Reduce the minBlocksPerMultiprocessor for ROCm to 8 from 16 as this improves performance in some microbenchmarks by (statistically significant) 4%. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22081 Differential Revision: D15947426 Pulled By: bddppq fbshipit-source-id: b4b7015417f99e14dfdedb62639e4d837c38e4fd	2019-06-21 14:33:12 -07:00
davidriazati	1c5fe2e8c4	Add support for Python 3.8 Constant node (#22007 ) Summary: We can't really test these until we get Python 3.8 in the CI, but these all work locally and won't be invoked at all for Python 3.7 and lower so this should be pretty safe. Fixes #21710 ](https://our.intern.facebook.com/intern/diff/15914735/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/22007 Pulled By: driazati Differential Revision: D15914735 fbshipit-source-id: 83833cebe7e38b162719a4f53cbe52c3fc638edd	2019-06-21 14:22:06 -07:00
liqunfu	f9b3989206	handle slice with negative indices and indices exceeding tensor dimen… (#21811 ) Summary: handle slice with negative indices and indices exceeding tensor dimensions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21811 Reviewed By: zrphercule Differential Revision: D15944243 Pulled By: houseroad fbshipit-source-id: f7d987e9d8d704ade9d489599df14afbf1333428	2019-06-21 13:37:54 -07:00
Gregory Chanan	38c9bb8261	Remove most usages of THCHalfAutoNumerics. (#21878 ) Summary: This was originally introduced between at::Half, which overloaded a number of operators; since this isn't necessary anymore, get rid of it. Note in many cases, these files still need THCNumerics.cuh (which was included by THCHalfAutoNumerics); I was not careful about isolating these usages. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21878 Differential Revision: D15941236 Pulled By: gchanan fbshipit-source-id: 65f30a20089fcd618e8f3e9646cf03147a15ccba	2019-06-21 12:40:38 -07:00
Sebastian Messmer	06c3bd0302	Improve ListPtr::extract() (#21753 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21753 - it accidentally didn't move non-IValue-based lists before. This is fixed now. - it only needs to recreate a T() for IValue-based lists Reviewed By: resistor Differential Revision: D15809220 fbshipit-source-id: 944badf1920ee05f0969fff0d03284a641dae4a9	2019-06-21 12:26:01 -07:00
Vitaly Fedyunin	fe580e850e	Rewrite lerp operator to use TensorIterator and support compile-time vectorization. (#22038 ) Summary: Get benefit from the compile time vectorization and multi-threading. Before: ```python In [1]: import torch In [2]: x = torch.randn(1000000) In [3]: y = torch.randn(1000000) In [4]: w = 0.7 In [5]: timeit torch.lerp(x, y, w) 2.29 ms ± 23.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` After: ```python In [1]: import torch In [2]: x = torch.randn(1000000) In [3]: y = torch.randn(1000000) In [4]: w = 0.7 In [5]: timeit torch.lerp(x, y, w) 452 µs ± 1.81 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` After with multi-processing: ```python In [1]: import torch In [2]: x = torch.randn(1000000) In [3]: y = torch.randn(1000000) In [4]: w = 0.7 In [5]: timeit torch.lerp(x, y, w) 167 µs ± 48.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/22038 Differential Revision: D15941468 Pulled By: VitalyFedyunin fbshipit-source-id: fa8a5126187df4e6c849452e035b00b22be25739	2019-06-21 11:39:27 -07:00
Ilia Cherniavskii	28630529ac	Limit overall number of threads used by TBB (#22045 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22045 ghimport-source-id: ea49ae04d86677f7a73a07968ce454eb1128fb84 Test Plan: PARALLEL_BACKEND=NATIVE_TBB USE_OPENMP=0 USE_TBB=1 MKL_SEQ=1 MKLDNN_THREADING=SEQ USE_CUDA=0 BLAS=MKL USE_MKLDNN=1 BUILD_BINARY=1 python setup.py develop --cmake ./build/bin/parallel_info ./build/bin/thread_init_test ./build/bin/test_parallel Imported from OSS Differential Revision: D15930319 Pulled By: ilia-cher fbshipit-source-id: 4c33ae395965e5708f8d7ceb67495b303fc4d22c	2019-06-21 11:39:18 -07:00
Dmytro Dzhulgakov	82dd69326b	Split nn.Module._save_to_state_dict to make it overridable (#21933 ) Summary: # Motivation We allow to override JIT module serialization with `__getstate__/__setstate__` in order to cover cases where parameters are not serializable. Use cases include: MKLDNN integration: `a388c78350/torch/utils/mkldnn.py (L18-L26)` and also fbgemm prepacked format integration for quantized tensors. However many Eager scripts use `torch.save(module.state_dict())` form of serialization. There are several ways to make it work: * make packed_weight itself pickleable (e.g. by binding `__getstate__/__setstate__` on C++ UDT level) * change: we’d need to allow module buffers to be of arbitrary, non-Tensor types * pro: no change to state_dict behavior * cons: might not be directly inspectable by user calling .state_dict(), especially if packed weights represent several tensors fused together * make packed_weight being proper Tensor layout * pro: no change to state_dict or buffers behavior * cons: adding new tensor layouts is pretty costly today * cons: doesn’t work if multiple tensors are packed in one interleaved representation * [this approach] allow Modules to override state_dict and return regular tensors * pro: most flexible and hackable * pro: maintains semantic meaning of statedict as all data necessary to represent module’s state * cons: complicates state_dict logic * cons: potential code duplication between `__getstate__/__setstate__` Based on discussions with zdevito and gchanan we decided to pick latter approach. Rationale: this behavior is fully opt-in and will impact only modules that need it. For those modules the requirement listed above won't be true. But we do preserve requirement that all elements of state_dict are tensors. (https://fburl.com/qgybrug4 for internal discussion) In the future we might also implement one of the approaches above but those are more involved. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21933 Differential Revision: D15937678 Pulled By: dzhulgakov fbshipit-source-id: 3cb5d1a8304d04def7aabc0969d0a2e7be182367	2019-06-21 09:55:22 -07:00
Gavriel State	b2197ef2b0	Adding support for JIT Fusion on Windows for CUDA (#21861 ) Summary: This pull request adds the necessary Windows DLL code to be able to support JIT fusion for CUDA. CPU JIT Fusion isn't supported. This also adds all the non-CPU JIT tests back in on Windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21861 Differential Revision: D15940939 Pulled By: soumith fbshipit-source-id: e11f6af1ac258fcfd3a077e6e2f2e6fa38be4ef1	2019-06-21 09:44:17 -07:00
Roy Li	edb5a1662e	Remove getDeviceFromPtr and allocator from Type (#21940 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21940 ghimport-source-id: 0a618878ae030f663b05662f83ac4b549a90ba29 Test Plan: Imported from OSS Differential Revision: D15893330 Pulled By: li-roy fbshipit-source-id: a3dfb6b4ed0c72f7f3efd00192fb63aabc9c5967	2019-06-21 01:05:33 -07:00
Roy Li	b36a041d6f	Move UnsafeTensorFromTH and UnsafeStorageFromTH off Type (#21923 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21923 ghimport-source-id: f015c8521ef9071eaa982cbf73c13aa925035956 Test Plan: Imported from OSS Differential Revision: D15883390 Pulled By: li-roy fbshipit-source-id: 6a7a7ffbe6000199d41cdca5efb97371f46dd8fe	2019-06-21 01:05:29 -07:00
Jongsoo Park	5d7cf66862	add Int8SpatialBNRelu (#22014 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22014 Add Int8SpatialBN + Relu fused operator. Reviewed By: dskhudia Differential Revision: D15916551 fbshipit-source-id: a938e0f0e105ab5f823a3cb6144f50aa2ab944c1	2019-06-20 23:23:04 -07:00
Junjie Bai	7d81e62562	Add mkldnn tests for running end to end resnet models Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22041 Differential Revision: D15928786 Pulled By: bddppq fbshipit-source-id: 4b12e5bda2da13aba2d63d357a0a854d59317362	2019-06-20 22:42:49 -07:00
Tongzhou Wang	71741ba115	rename test to be more consistent Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22057 Differential Revision: D15936870 Pulled By: soumith fbshipit-source-id: ab6194219da2582efdf324b89b2bc87dfe4e5d69	2019-06-20 22:02:36 -07:00
Nikolay Korovaiko	a3fc6ed046	Hook up liveness into profiling pipeline. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21881 Differential Revision: D15931627 Pulled By: Krovatkin fbshipit-source-id: dc825a563c7aceb5f66a2ed2a600d550b70941b2	2019-06-20 21:23:16 -07:00
Jerry Zhang	3838324539	Add max/min/argmax/argmin/sort/argsort for quantized Tensor (#21546 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21546 Added following methods for QTensor: - max, min - argmax, argmin - sort, argsort Reviewed By: dzhulgakov Differential Revision: D15718117 fbshipit-source-id: 746b978d5722cb75e216fc65585bf206d45a7969	2019-06-20 21:00:03 -07:00
Jongsoo Park	95aee81dd7	more general fusion logic (#22015 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22015 Previous fusion logic only works for operators back-to-back in the linear order of protobuf file. This diff generalizes to work for any predecessor-successor operators in the graph without any "interfering" use/def of the related blobs. Reviewed By: csummersea Differential Revision: D15916709 fbshipit-source-id: 82fe4911a8250845a8bea3427d1b77ce2442c495	2019-06-20 20:44:26 -07:00
Jerry Zhang	88921feafd	change return type for q_scale and q_zero_point (#21709 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21709 Change the return type from Scalar to double/int64_t so we don't need to do conversion when we call other quantize related aten functions Differential Revision: D15793003 fbshipit-source-id: 510936c69fa17a4d67340a31ebb03415647feb04	2019-06-20 20:30:39 -07:00
Tongzhou Wang	058beae411	Add IterableDataset (#19228 ) Summary: This is a modified version of https://github.com/pytorch/pytorch/pull/14705 since commit structure for that PR is quite messy. 1. Add `IterableDataset`. 3. So we have 2 data loader mods: `Iterable` and `Map`. 1. `Iterable` if the `dataset` is an instance of `IterableDataset` 2. `Map` o.w. 3. Add better support for non-batch loading (i.e., `batch_size=None` and `batch_sampler=None`). This is useful in doing things like bulk loading. 3. Refactor `DataLoaderIter` into two classes, `_SingleProcessDataLoaderIter` and `_MultiProcessingDataLoaderIter`. Rename some methods to be more generic, e.g., `get_batch` -> `get_data`. 4. Add `torch.utils.data.get_worker_info` which returns worker information in a worker proc (e.g., worker id, dataset obj copy, etc.) and can be used in `IterableDataset.__iter__` and `worker_init_fn` to do per-worker configuration. 5. Add `ChainDataset`, which is the analog of `ConcatDataset` for `IterableDataset`. 7. Import torch.utils.data in `torch/__init__.py` 9. data loader examples and documentations 10. Use `get_worker_info` to detect whether we are in a worker process in `default_collate` Closes https://github.com/pytorch/pytorch/issues/17909, https://github.com/pytorch/pytorch/issues/18096, https://github.com/pytorch/pytorch/issues/19946, and some of https://github.com/pytorch/pytorch/issues/13023 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19228 Reviewed By: bddppq Differential Revision: D15058152 fbshipit-source-id: 9e081a901a071d7e4502b88054a34b450ab5ddde	2019-06-20 20:12:44 -07:00
Lu Fang	d4119f8fcb	Automatic update of fbcode/onnx to 355a4954ea4e5836a5e943589509951c44feb6b4 (#22030 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22030 Previous import was dd599b05f424eb161a31f3e059566a33310dbe5e Included changes: - [355a4954](https://github.com/onnx/onnx/commit/355a4954): Update codeowners to have community folder changes assigned to steering committee (#2104) <Prasanth Pulavarthi> - [ceaa5da7](https://github.com/onnx/onnx/commit/ceaa5da7): Fix Resize/Upsample Shape inference function (#2085) <Raymond Yang> - [4de8dc0d](https://github.com/onnx/onnx/commit/4de8dc0d): Clarify shape inference requirements for new operators (#2088) <Hariharan Seshadri> - [52aa1fad](https://github.com/onnx/onnx/commit/52aa1fad): Fix NN defs file (#2083) <Hariharan Seshadri> Reviewed By: bddppq Differential Revision: D15924221 fbshipit-source-id: 91ba64ef3e1a2de4a7dd0b02ee6393508cc44a73	2019-06-20 15:52:45 -07:00
Frank Jiang	84a2d5d7aa	Add hashing to bucket-weighted pooling (#20673 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20673 Add option to bucket-weighted pooling to hash the bucket so that any cardinality score can be used. Reviewed By: huginhuangfb Differential Revision: D15003509 fbshipit-source-id: 575a149de395f18fd7759f3edb485619f8aa5363	2019-06-20 15:12:36 -07:00
Edward Yang	1aae4b02df	Fix 'error : detail is ambiguous' on Windows (#22025 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22025 ghimport-source-id: 0fb408ad185a989507f7509a2a3574e1a7e60ab2 Test Plan: Imported from OSS Differential Revision: D15926651 Pulled By: ezyang fbshipit-source-id: 298340bfbfe44dcd81cde8f0d56f8dbde92fb7bd	2019-06-20 13:23:21 -07:00
svcscm	19ef15709f	Updating submodules Reviewed By: yns88 fbshipit-source-id: 0be0694d6adf1ae9baa408a4b372101a26a14ba4	2019-06-20 12:59:31 -07:00
Brennan Vincent	4cd7d78718	correct arange docs (#21992 ) Summary: https://github.com/pytorch/pytorch/issues/21579 correctly points out an inaccuracy in the docs for `arange`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21992 Differential Revision: D15914411 Pulled By: umanwizard fbshipit-source-id: 3eb1734b29af3f3858f0f4d54c71e28dbda5c75b	2019-06-20 12:36:00 -07:00
Shen Li	08facca1a1	Support accumulating DDP grads using a context manager (#21736 ) Summary: The first attempt and more discussions are available in https://github.com/pytorch/pytorch/issues/19577 #### Goal Allow toggling DDP gradient synchronization across iterations. With this feature, users may accumulate grads in module variables, and only kick off expensive grad synchronize every a few iterations. #### Concerns Our first attempt in https://github.com/pytorch/pytorch/issues/19577 tries to do it using a variable or a function. But apaszke made a good point that it will not be error prone, and favors a context manager instead. #### Proposed Solution Instead of providing a `accumulate_grads` variable/function/context, we provide a `DistributedDataParallel.no_sync()` context manager. And it does exactly what the name suggests, i.e., disable DDP grad synchronization within the context. Note that `accumulate_grads` means `no_sync` + no optimizer step, where the latter is not controlled by DDP. It is true that users need to call another `model(input).backward()` after exiting the context, and this is indeed more verbose. But I think it is OK as one major concern in the previous discussion is to prevent users from running into errors without knowing it. This API should reaffirm the expected behavior, and does not mess up with other use cases if accumulating grads is not required.. The application would then look like: ```python with ddp.no_sync(): for input in inputs: ddp(input).backward() ddp(one_more_input).backward() optimizer.step() ``` chenyangyu1988 myleott Pull Request resolved: https://github.com/pytorch/pytorch/pull/21736 Differential Revision: D15805215 Pulled By: mrshenli fbshipit-source-id: 73405797d1e39965c52016af5cf45b15525ce21c	2019-06-20 12:23:52 -07:00
Horace He	40b9f8f0a0	Added more descriptive error message for index out of range (#21758 ) Summary: https://github.com/pytorch/pytorch/issues/21535 Pull Request resolved: https://github.com/pytorch/pytorch/pull/21758 Differential Revision: D15922915 Pulled By: Chillee fbshipit-source-id: dcb301a661c359f27869200ee241ec272ef50d3a	2019-06-20 12:12:03 -07:00
davidriazati	6bd58b7548	Move list / dict tests to TestList and TestDict (#22000 ) Summary: There aren't any substantive changes aside from some test renames (e.g. `TestScript.test_dict_membership` -> `TestDict.test_membership`) and the addition of `TestDict.dict()`. Adding the rest of the dict ops was making the tests a mess and `TestScript` is already > 10000 lines by itself, so breaking them up should make things cleaner ](https://our.intern.facebook.com/intern/diff/15911383/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/22000 Pulled By: driazati Differential Revision: D15911383 fbshipit-source-id: 614428e03fbc14252f0e9cde74ab9a707169a860	2019-06-20 11:17:35 -07:00
Hong Xu	0702b5f345	Partially parallelize randperm on CPU. (#21529 ) Summary: This commit parallelizes the variable initialization (from 1 to n) step on CPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21529 Differential Revision: D15855402 Pulled By: VitalyFedyunin fbshipit-source-id: f1ba54587451f9cb0eb5e542c3c5b458b48e1a3d	2019-06-20 10:44:01 -07:00
Will Feng	e388f70499	Move cppdocs build to CircleCI (#19768 ) Summary: The cppdocs build job (originally run on Chronos as a cron job) was frequently broken because it was not run on every PR. This PR moves it to CircleCI and enables it on every PR, so that we can get the build failure signal much earlier. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19768 Differential Revision: D15922289 Pulled By: yf225 fbshipit-source-id: e36ef59a2e42f78b7d759ee02f2d94dc90f88fff	2019-06-20 10:24:21 -07:00
Edward Yang	76fe91bb2f	Revert D14889547: Add sparse tensor allreduce Differential Revision: D14889547 Original commit changeset: 34f3de4d6a2e fbshipit-source-id: 24d2239da0b865280af88dce3d8fb25883fc0174	2019-06-20 10:07:27 -07:00
Edward Yang	cb4c213f55	Revert D15007365: Support sparse gradients in DistributedDataParallel Differential Revision: D15007365 Original commit changeset: f298e83fd3ca fbshipit-source-id: ef5e556d2df37f0c64652bd3563956afd8d9fd7f	2019-06-20 10:07:22 -07:00
Ivan Ogasawara	f8f583cbae	Port convtranspose2d (#20994 ) Summary: this PR will resolve partially https://github.com/pytorch/pytorch/issues/18353 Pull Request resolved: https://github.com/pytorch/pytorch/pull/20994 Differential Revision: D15876052 Pulled By: ezyang fbshipit-source-id: 5896e0cbb656d0530e39fd681808adc685841b37	2019-06-20 07:11:38 -07:00
Pieter Noordhuis	365de7bda1	Support sparse gradients in DistributedDataParallel (#19443 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19443 This adds support for sparse gradients to the reducer as well as to the DistributedDataParallel wrapper. Note that an out of band signal is needed whether or not a dense parameter (e.g. an embedding) is expected to receive a sparse gradient or not. This information is passed to the bucket assignment computation routine and the reducer as a vector of booleans. Every parameter for which we expect a sparse gradient is assigned its own bucket, as we cannot easily group multiple unrelated sparse tensors. Reviewed By: mrshenli Differential Revision: D15007365 fbshipit-source-id: f298e83fd3ca828fae9e80739e1db89d045c99ac	2019-06-20 07:06:28 -07:00
Pieter Noordhuis	aee6a412e9	Add sparse tensor allreduce (#19146 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19146 Implemented only on ProcessGroupGloo, as an allgather of metadata (sparse_dim, dense_dim, and nnz), followed by an allgather of indices, followed by an allgather of values. Once these operations have finished, all ranks locally compute a reduction over these sparse tensors. Works for both CPU and CUDA tensors. This surfaced a problem with the existing assumption of only modifying tensors that are passed at the call site, because for sparse tensors we don't know the dimensions of the output tensors before we run the collective. To deal with this unknown, this commit adds a `result` function to the `c10d::ProcessGroup::Work` class that returns a vector of tensors. It's a bit odd to have to retrieve the result through this function only for operations on sparse tensors. To make this work irrespective of tensor layout, we can create a follow-up commit to make all in place operations make their results accessible through this function as well. This doesn't break any existing contracts but does have the potential to add interface ambiguity. Reviewed By: mrshenli Differential Revision: D14889547 fbshipit-source-id: 34f3de4d6a2e09c9eba368df47daad0dc11b333e	2019-06-20 07:06:24 -07:00
Summer Deng	97ea44b34a	Fix issue in quantization error measurement when followed by Relu (#21890 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21890 As title Reviewed By: jspark1105 Differential Revision: D15739808 fbshipit-source-id: 8fbcca04f0711fd9f994d67e1f4a604ef9fa42c6	2019-06-19 22:29:54 -07:00
xiaobing.zhang	b6f542f8a1	Add aten mkldnn transpose (#21943 ) Summary: This PR is about: 1. Make mkldnn reshape can share same memory fro plain format tensor. 2. Add mkldnn transpose operator. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21943 Differential Revision: D15916063 Pulled By: bddppq fbshipit-source-id: d1971c67341f277c1e80c1fa34e213b6c27f4062	2019-06-19 22:20:46 -07:00
Roy Li	3d44cd6d19	Replace Type dispatch with ATenDispatch (#22008 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22008 ghimport-source-id: b0a5cc3da283b195f88636e2a61939d2facd11d9 Test Plan: Imported from OSS Differential Revision: D15914756 Pulled By: li-roy fbshipit-source-id: 5bc300ec525a3ee9e6491dd4c55e78bbd977d691	2019-06-19 21:42:54 -07:00
Horace He	5d67c606ea	Added error for classes that don't have an init function (#21880 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/21761 Pull Request resolved: https://github.com/pytorch/pytorch/pull/21880 Differential Revision: D15879205 Pulled By: Chillee fbshipit-source-id: 8b614970196b381357b6032a73eeaab0b7a4f667	2019-06-19 21:33:37 -07:00
Zhanibek Datbayev	4fee532de6	Pass loop_over optional parameter for cached reader properly. (#21929 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21929 Just need to pass `loop_over` argument properly. Reviewed By: noname01 Differential Revision: D15885401 fbshipit-source-id: f1928277262a80e5b41f4c4f3945c2f378a4e233	2019-06-19 18:15:32 -07:00
Sebastian Messmer	96c0bd3722	ListPtr->List DictPtr->Dict step 3 (#21938 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21938 After having changed all call sites, we can now remove the old naming scheme. Reviewed By: zdevito Differential Revision: D15892402 fbshipit-source-id: 1f5b53a12fa657f6307811e8657c2e14f6285d2f	2019-06-19 18:02:08 -07:00
Sebastian Messmer	275087383b	ListPtr->List DictPtr->Dict step 2 (#21937 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21937 This changes call sites to use the new naming scheme Reviewed By: zdevito Differential Revision: D15892404 fbshipit-source-id: 8d32aa90a0ead1066688166478f299fde9c2c133	2019-06-19 18:02:05 -07:00
Sebastian Messmer	093c78f854	ListPtr->List DictPtr->Dict step 1 (#21936 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21936 This introduces torch::List and torch::Dict as aliases to ListPtr/DictPtr. After this lands, we can step by step change the call sites to the new naming and finally remove the old spellings. Reviewed By: zdevito Differential Revision: D15892405 fbshipit-source-id: 67b38a6253c42364ff349a0d4049f90f03ca0d44	2019-06-19 18:02:01 -07:00
Ailing Zhang	cba79f4872	Revert D15637222: [wip] Replace Type dispatch with ATenDispatch Differential Revision: D15637222 Original commit changeset: fcfaea0b5480 fbshipit-source-id: 9bca7ebb91d7a3609b86663089140d7c5a33f58d	2019-06-19 17:36:52 -07:00
James Reed	15be5483c0	Move NamedType method definitions into cpp file (#21983 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21983 ghimport-source-id: fe9c1eba5f4c737e1442b877d396b9e8e5298cfb Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D15907633 Pulled By: jamesr66a fbshipit-source-id: bd2dfdca117cdc3ae35fdd9d29cf521d82636069	2019-06-19 16:43:11 -07:00
Zachary DeVito	f6aac41391	Defining object destructor in c10 (#21984 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21984 ghimport-source-id: 5767592e37ed388422eed5639f8ba0722aec66e2 Test Plan: Imported from OSS Differential Revision: D15906530 Pulled By: zdevito fbshipit-source-id: bec8c8b0b5b9dcc2e8fc69b5031fcfa6bb22d54e	2019-06-19 16:27:40 -07:00
Roy Li	24a6c32407	Replace Type dispatch with ATenDispatch (#21320 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21320 ghimport-source-id: cc18f746a1c74df858cb0f6d8b7d4de4315683c7 Test Plan: Imported from OSS Differential Revision: D15637222 Pulled By: li-roy fbshipit-source-id: fcfaea0b5480ab966175341cce92e3aa0be7e3cb	2019-06-19 15:46:45 -07:00
Edward Yang	00fdb2cf95	Enable XLA by default on pull requests. (#21991 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21991 ghimport-source-id: 5077e62a613c36256d2b5a2427aa9c3887c4a797 Test Plan: Imported from OSS Differential Revision: D15907913 Pulled By: ezyang fbshipit-source-id: c67bb999f02760836d1568c1a3911add3f1538f0	2019-06-19 15:01:49 -07:00
Syed Tousif Ahmed	effcc398c4	Refactor Random Number Generators in ATen (#21555 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21555 ghimport-source-id: dd900a8c3e1ef9ef1e011b8bb5476626d18cc462 Test Plan: Imported from OSS Differential Revision: D15875780 Pulled By: ezyang fbshipit-source-id: 6e04e90af62ab9c9593d74f344a3a084aaaf6f43	2019-06-19 13:54:09 -07:00
Lara	34aee933f9	ONNX Export Interpolate (Resize) for opset version 10 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21434 Reviewed By: zrphercule Differential Revision: D15777197 Pulled By: houseroad fbshipit-source-id: 517b06a54a234ffdb762401e83f5a732023ed259	2019-06-19 13:40:27 -07:00
Sebastian Messmer	44128e09f0	Speed up op lookup and registration (#21806 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21806 Dispatcher::findSchema(op_name) now uses a lookup table instead of iterating through the list of operators to find it. This speeds up op lookup (as in finding the operator handle from the name, not as in finding a kernel when you already have the operator handle) and it also speeds up op registration since that needs to look if an op with the same name already eists. Differential Revision: D15834256 fbshipit-source-id: c3639d7b567e4ed5e3627c3ebfd01b7d08b55ac1	2019-06-19 12:05:14 -07:00
Sebastian Messmer	d1c80300ce	Better stringification of dispatch keys in error messages (#21809 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21809 Many error messages show dispatch keys, for example when the dispatcher didn't find a kernel to dispatch to. Previously, this was a string like "CPU" or "CUDA" for known backends and just an arbitrary number for other backends. Now, tensor type id registration also registers a name for the dispatch key and shows that in the error messages. There is no API change, just the error messages are better now. Differential Revision: D15835809 fbshipit-source-id: 4f0c9d0925c6708b02d79c653a2fae75b6623bb9	2019-06-19 11:44:24 -07:00
James Reed	dd046bef8d	NamedTuple serialization (#21839 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21839 ghimport-source-id: b9d82018fbf26b22d58cad3a033cbfe4e879a8fe Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D15860002 Pulled By: jamesr66a fbshipit-source-id: 0fc97c4adefa9ae4937f21179c7afa817f4099e5	2019-06-19 10:43:55 -07:00
James Reed	5a37f8c63f	Refactor TupleType to take a NamedTupleSpec (#21836 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21836 ghimport-source-id: 91cab735765ff875046b42864188e86b8487b0ae Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D15860003 Pulled By: jamesr66a fbshipit-source-id: 62a99a212ae6f9af83a90305e443f2dd05588292	2019-06-19 10:43:51 -07:00
James Reed	c0be6e6290	Introduce SerializableType (#21835 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21835 ghimport-source-id: e674048a56b9a573ba89e484f4b41818d3f08234 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D15860004 Pulled By: jamesr66a fbshipit-source-id: 2d2905296939903ed4586932bea0a504b542bbdb	2019-06-19 10:43:47 -07:00
James Reed	74104f383e	Some small fixes for NamedTuple (#21813 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21813 ghimport-source-id: a1edca8ad0384a9e493ef2f3b0aa5005a668a8f3 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D15860005 Pulled By: jamesr66a fbshipit-source-id: 4a43432d2dacebde1a676a93ac57f675db857154	2019-06-19 10:43:43 -07:00
Will Feng	6b972795e4	Add `torch.__future__._overwrite_module_params_on_conversion` global flag, and check it in `nn.Module._apply()` (#21613 ) Summary: https://github.com/pytorch/pytorch/pull/17072 breaks `model.to(xla_device)`, because moving `model` to XLA device involves changing its parameters' TensorImpl type, and the current implementation of `nn.Module.to()` doesn't support changing module parameters' TensorImpl type: ```python # `6dc445e1a8/torch/nn/modules/module.py (L192-L208)` def _apply(self, fn): ... for param in self._parameters.values(): if param is not None: # Tensors stored in modules are graph leaves, and we don't # want to create copy nodes, so we have to unpack the data. param.data = fn(param.data) # NOTE: this doesn't allow changing `param.data`'s TensorImpl type if param._grad is not None: param._grad.data = fn(param._grad.data) # NOTE: this doesn't allow changing `param._grad.data`'s TensorImpl type ... ``` yf225 TODO: fix the description here when we finish the implementation To fix this problem, we introduce a new API `model.to_()` that always assign new tensors to the parameters (thus supporting changing the parameters to any TensorImpl type), and also bump the version counter of the original parameters correctly so that they are invalidated in any autograd graph they participate in. We also add warning to the current `model.to()` API to inform users about the upcoming behavior change of `model.to()`: in future releases, it would create and return a new model instead of in-place updating the current model. This unblocks adding XLA to our CI test suite, which also allows XLA to catch up with other changes in our codebase, notably the c10 dispatcher. [xla ci] cc. resistor ailzhang Pull Request resolved: https://github.com/pytorch/pytorch/pull/21613 Differential Revision: D15895387 Pulled By: yf225 fbshipit-source-id: b79f230fb06019122a37fdf0711bf2130a016fe6	2019-06-19 10:30:02 -07:00
Jie	056a033cdc	updating upsampling bilinear2d kernel: (#21879 ) Summary: 1. faster atomicAdd trick for fp16 backward kernel 2. better launch configs for backward kernel 3. removed unnecessary buffer initialization for forward kernel Pull Request resolved: https://github.com/pytorch/pytorch/pull/21879 Differential Revision: D15898680 Pulled By: ezyang fbshipit-source-id: 1fc81e6c078f1538d82e4f36921b630499eb504f	2019-06-19 07:42:21 -07:00
hexiaoting	34536e207a	Fix: convert Onnx DynamicSlice operator with 4 inputs to caffe2 fa… (#20846 ) Summary: I reported an issue [https://github.com/pytorch/pytorch/issues/20743](url) and make this pull request for it Pull Request resolved: https://github.com/pytorch/pytorch/pull/20846 Reviewed By: zrphercule Differential Revision: D15569135 Pulled By: houseroad fbshipit-source-id: 96a2c818ef666a7d79b96decfa347d7154b34d5c	2019-06-19 00:09:15 -07:00
Will Feng	4b1df5c1f5	Use fn(param) instead of fn(param.data) in nn.Module._apply (#21865 ) Summary: When we pass `fn` to `nn.Module._apply()` and `fn` is an in-place operation, the correct behavior should also include bumping the parameters' and their gradients' version counters. This PR fixes the old incorrect behavior and makes sure the new behavior is right. Note that this PR is BC-breaking in the following way: Previously, passing an in-place operation to `nn.Module._apply()` does not bump the module's parameters' and their gradients' version counters. After this PR, the module's parameters' and their gradients' version counters will be correctly bumped by the in-place operation, which will invalidate them in any autograd graph they previously participate in. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21865 Differential Revision: D15881952 Pulled By: yf225 fbshipit-source-id: 62f9244a4283a110147e9f20145ff232a5579fbd	2019-06-18 20:45:40 -07:00
Igor Fedan	abd6cffe55	Added some extra tests for std_mean and var_mean for multiple dims. (#20650 ) Summary: Added some extra tests for std_mean and var_mean for multiple dims. Some refactoring of previously created tests based on PR comments: https://github.com/pytorch/pytorch/pull/18731 Pull Request resolved: https://github.com/pytorch/pytorch/pull/20650 Differential Revision: D15396101 Pulled By: ifedan fbshipit-source-id: d15c3c2c7084a24d6cfea4018173552fcc9c03a9	2019-06-18 20:36:32 -07:00
Jerry Zhang	fa5263af2c	Add set_quantizer_ for QTensor (#21852 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21852 To enable change of q_scale and q_zero_point in `copy_` Differential Revision: D15793427 fbshipit-source-id: a7040b5b956d161fd6af6176287f4a4aa877c9be	2019-06-18 19:50:12 -07:00
Junjie Bai	e239e31da6	Fix lint error (#21932 ) Summary: https://github.com/pytorch/pytorch/issues/21916 has broken python lint on master https://travis-ci.org/pytorch/pytorch/jobs/547354937 ``` ./tools/build_variables.py:167:39: E261 at least two spaces before inline comment ./tools/build_variables.py:379:13: E251 unexpected spaces around keyword / parameter equals ./tools/build_variables.py:379:15: E251 unexpected spaces around keyword / parameter equals ./tools/build_variables.py:380:17: E251 unexpected spaces around keyword / parameter equals ./tools/build_variables.py:380:19: E251 unexpected spaces around keyword / parameter equals ./tools/build_variables.py:381:18: E251 unexpected spaces around keyword / parameter equals ./tools/build_variables.py:381:20: E251 unexpected spaces around keyword / parameter equals ./tools/build_variables.py:382:23: E251 unexpected spaces around keyword / parameter equals ./tools/build_variables.py:382:25: E251 unexpected spaces around keyword / parameter equals ./tools/build_variables.py:387:13: E251 unexpected spaces around keyword / parameter equals ./tools/build_variables.py:387:15: E251 unexpected spaces around keyword / parameter equals ./tools/build_variables.py:388:13: E251 unexpected spaces around keyword / parameter equals ./tools/build_variables.py:388:15: E251 unexpected spaces around keyword / parameter equals ./tools/build_variables.py:389:16: E251 unexpected spaces around keyword / parameter equals ./tools/build_variables.py:389:18: E251 unexpected spaces around keyword / parameter equals ./tools/build_variables.py:390:25: E251 unexpected spaces around keyword / parameter equals ./tools/build_variables.py:390:27: E251 unexpected spaces around keyword / parameter equals ./tools/build_variables.py:391:13: E251 unexpected spaces around keyword / parameter equals ./tools/build_variables.py:391:15: E251 unexpected spaces around keyword / parameter equals ./tools/build_variables.py:395:22: E251 unexpected spaces around keyword / parameter equals ./tools/build_variables.py:395:24: E251 unexpected spaces around keyword / parameter equals ./tools/build_variables.py:402:13: E251 unexpected spaces around keyword / parameter equals ./tools/build_variables.py:402:15: E251 unexpected spaces around keyword / parameter equals ./tools/build_variables.py:403:13: E251 unexpected spaces around keyword / parameter equals ./tools/build_variables.py:403:15: E251 unexpected spaces around keyword / parameter equals ./tools/build_variables.py:404:16: E251 unexpected spaces around keyword / parameter equals ./tools/build_variables.py:404:18: E251 unexpected spaces around keyword / parameter equals ./tools/build_variables.py:405:25: E251 unexpected spaces around keyword / parameter equals ./tools/build_variables.py:405:27: E251 unexpected spaces around keyword / parameter equals ./tools/build_variables.py:406:13: E251 unexpected spaces around keyword / parameter equals ./tools/build_variables.py:406:15: E251 unexpected spaces around keyword / parameter equals ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/21932 Differential Revision: D15892041 Pulled By: bddppq fbshipit-source-id: f62949a7617f8ea9c036ea9b48ab1e340a7af83e	2019-06-18 19:08:24 -07:00
Hong Xu	3bdde56907	Fix incorrect usage of __HIP_PLATFORM_HCC__ (#21757 ) Summary: This avoid using `__HIP_PLATFORM_HCC__` in case it changes in the future. Following up https://github.com/pytorch/pytorch/issues/21718 Pull Request resolved: https://github.com/pytorch/pytorch/pull/21757 Reviewed By: xw285cornell Differential Revision: D15891867 Pulled By: bddppq fbshipit-source-id: 5de55687ab1c86eddf6b4d8d25fee48d96ec72ad	2019-06-18 18:56:32 -07:00
Michael Suo	a388c78350	fix bug in CompilationUnit::define (#21886 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21886 ghimport-source-id: fefbd758bbe2fbcaaad84a376ac5f69c40bccb80 Test Plan: Imported from OSS Differential Revision: D15867647 Pulled By: suo fbshipit-source-id: 3e0f5bbc98ec93ccf26442c4c574626e45e53888	2019-06-18 15:41:55 -07:00
David Riazati	52e1cea057	Fix recusive method compilation (#21862 ) Summary: The code in `python_sugared_value.cpp` to recursively compile methods was not being tested, so this adds a test for it and fixes some errors in it It was necessary to disable any hooks set since (at least in our tests) they would try to export a half-finished graph since they were being called on recursively compiled methods Pull Request resolved: https://github.com/pytorch/pytorch/pull/21862 Differential Revision: D15860314 Pulled By: driazati fbshipit-source-id: e8afe9d4c75c345b6e1471072d67c5e335b61337	2019-06-18 14:08:56 -07:00
Zachary DeVito	eda08b0aae	script::Module as a view. (#21814 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21814 ghimport-source-id: 49cfea6101ad9ca438600c465762e23252e05ff3 Test Plan: Imported from OSS Differential Revision: D15839583 Pulled By: zdevito fbshipit-source-id: ab4ef31a523b3ac1477aa7e6d4d9513e7408c560	2019-06-18 13:58:49 -07:00
Kutta Srinivasan	94c61d4f32	Fix infinite loop in del_post_hook (#21914 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21914 https://github.com/pytorch/pytorch/pull/21591 added a needed feature to clean up grad accumulator post hooks when the DistributedDataParallel model object is cleaned up. There's a minor typo that causes it to loop infinitely over the first element. Differential Revision: D15878884 fbshipit-source-id: b7fd0bbd51eb187579d639b1709c6f7b62b85e7a	2019-06-18 13:43:59 -07:00
Iurii Zdebskyi	c0f51142cd	Added a test case for an index error for the index_copy_ (#21912 ) Summary: Follow up PR with a test for the [fixed](`4b45f08f87`) [bug](https://github.com/pytorch/pytorch/issues/20322). Pull Request resolved: https://github.com/pytorch/pytorch/pull/21912 Differential Revision: D15878674 Pulled By: izdeby fbshipit-source-id: c8fef2214606c796d174d0faaaf633531a7bea88	2019-06-18 13:43:56 -07:00
Dmytro Dzhulgakov	ad00c12379	Clean up //caffe2/torch-cpp to avoid duplicated symbols (#21916 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21916 Hopefully fixes https://fb.workplace.com/groups/1405155842844877/permalink/2832659000094547/ Reviewed By: rutyrinott Differential Revision: D15862128 fbshipit-source-id: 77c01a57bddc39b267e307b50942e029a381711b	2019-06-18 13:05:22 -07:00
Jerry Zhang	081cd3a293	Change AT_CHECK to TORCH_CHECK in python_arg_parser.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21887 Differential Revision: D15869483 Pulled By: jerryzh168 fbshipit-source-id: f3d9d73078e7c1c08ec79694105e18084e7f9caf	2019-06-18 10:48:38 -07:00
James Reed	28ecc104f4	Fix WeakIValueEq (#21891 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21891 ghimport-source-id: a037850c96fe803540412db9a88548fa41f2d4f0 Test Plan: Imported from OSS Differential Revision: D15871588 Pulled By: jamesr66a fbshipit-source-id: ecfdece1285c0737d0b1dc2afe959c43d9413001	2019-06-18 10:35:35 -07:00
Junjie Bai	010f238d17	Retry pip install to make pytorch rocm CI more stable (#21895 ) Summary: pip install randomly core dumps, examples: https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/25720//console https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/25723//console Pull Request resolved: https://github.com/pytorch/pytorch/pull/21895 Differential Revision: D15873197 Pulled By: bddppq fbshipit-source-id: 2c967bc0a47bef9d3f7af83e99514c93b54e353f	2019-06-18 10:10:56 -07:00
davidriazati	5eb25c3704	Support `in` membership checks (#21527 ) Summary: This PR adds support for `in` checks like `key in my_dict` For now it leaves lists as a follow up due to the changes around `IValue` lists and it needing an `IValue` equality op. For objects it uses the magic method `__contains__(self, key)` ](https://our.intern.facebook.com/intern/diff/15811203/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/21527 Pulled By: driazati Differential Revision: D15811203 fbshipit-source-id: 95745060394f8a9450efaaf8ab09d9af83bea01e	2019-06-18 09:49:12 -07:00
David Riazati	afad3e4954	Add support for class annotations (#21379 ) Summary: This adds support for inferred attributes (everything except empty lists, dicts, and tuples) as well as using the PEP 526 style annotations on a class, so this eliminates the need for `torch.jit.Attribute` Pull Request resolved: https://github.com/pytorch/pytorch/pull/21379 Differential Revision: D15718537 Pulled By: driazati fbshipit-source-id: b7481ae3d7ee421613e931b7dc3427ef2a99757f	2019-06-18 09:49:09 -07:00
davidriazati	85528feb40	Mark test_snli as a slow test Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21908 Pulled By: driazati Differential Revision: D15875846 fbshipit-source-id: 98b79e7beee5ffd72e1f41d22e07e618547b23e9	2019-06-18 09:44:12 -07:00
Igor Fedan	0998a32588	Backward function will set a flag if it released variables (#21533 ) Summary: This is a fix for https://github.com/pytorch/pytorch/issues/21469 Currently there is no way to define if backward function released variables when variables were added to a vector. This change will set a flag if function has saved variables and they were released. So we will prevent if somebody will call this function again with already released variables. Functions that do not have saved variables can be called multiple times for BC Pull Request resolved: https://github.com/pytorch/pytorch/pull/21533 Differential Revision: D15810481 Pulled By: ifedan fbshipit-source-id: 5663e0c14f1b65727abc0d078aef348078d6a543	2019-06-18 09:21:17 -07:00
Ailing Zhang	f363a33e10	Set __file__ for torch.ops (#21888 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/19351 https://github.com/pytorch/lockdown/issues/93 Pull Request resolved: https://github.com/pytorch/pytorch/pull/21888 Differential Revision: D15871142 Pulled By: ailzhang fbshipit-source-id: 339e9d493e2e13f09e118814bdd1d7a5942804b8	2019-06-18 08:46:23 -07:00
Stefan Krah	31e1e63bc2	Port avg_pool3d() to ATen (#21732 ) Summary: This will need a conflict resolution once avg_pool2d() has been merged. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21732 Differential Revision: D15824923 Pulled By: ezyang fbshipit-source-id: 83341e0209b660aecf788272079d8135d78b6ff1	2019-06-18 08:33:30 -07:00
Jie	c471a63a39	UpSample-nearest cuda kernel update (#21694 ) Summary: updating upsampling kernel: 1. avoids atomicAdd for better fp16 performance. 2. better launch configures for 2D input. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21694 Differential Revision: D15875791 Pulled By: ezyang fbshipit-source-id: 426fc5d5f0c0cdf58bfa1a2b564f17a9ea286fa4	2019-06-18 08:24:25 -07:00
Richard Zou	998efb48c3	Add at::dimname_to_position helper. (#21789 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21789 ghimport-source-id: 42c0a58280f3645dd38ea11d39311a0c53f90488 Test Plan: - `build/bin/NamedTensor_test` [namedtensor ci] Imported from OSS Differential Revision: D15833455 Pulled By: zou3519 fbshipit-source-id: 8dd51a7b785972668984a7c161b94b92039a1cb1	2019-06-18 07:44:04 -07:00
Edward Yang	8f9e0f77dd	Turn off non-default stream testing. (#21793 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21793 ghimport-source-id: 5264fa90ca77fbc79898cfa2f0ee02f47dec27d4 Test Plan: Imported from OSS Differential Revision: D15874814 Pulled By: ezyang fbshipit-source-id: 5c51ab9ae431faf2db549b88b07ba00783acab25	2019-06-18 07:00:08 -07:00
Horace He	08a0ac84d7	Removed unused variable from closure in range (#21897 ) Summary: This was some code I added :^) Time for me to remove it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21897 Differential Revision: D15873213 Pulled By: Chillee fbshipit-source-id: 769c3bd71c542be4afddc02dc2f65aa5c751b10d	2019-06-18 02:21:50 -07:00
Horace He	6042012a93	Fixed "tried to access to nonexistent attribute" -> "tried to access nonexistent attribute" Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21863 Differential Revision: D15873204 Pulled By: Chillee fbshipit-source-id: c5d85487b287ee9dd8318161ef9399ffd1ee0b68	2019-06-18 02:13:09 -07:00
Horace He	df787cf079	Fixed a warning in `test_jit.py` (#21898 ) Summary: What's the point of having warnings if we never fix them :^) Pull Request resolved: https://github.com/pytorch/pytorch/pull/21898 Differential Revision: D15873280 Pulled By: Chillee fbshipit-source-id: a8274bab2badd840d36a9d2e1354677a6114ae1d	2019-06-18 01:15:15 -07:00
Lu Fang	f1c1d1a964	Export the cosine_similarity op as an ATenOp correctly (#21884 ) Summary: cosine_similarity has two non-tensor parameters, needs some special handling. Add the support for its export in this diff. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21884 Reviewed By: zrphercule Differential Revision: D15866807 Pulled By: houseroad fbshipit-source-id: a165fbc00c65c44b276df89ae705ca8960349d48	2019-06-17 23:34:59 -07:00
Bram Wasti	3ed8acdf59	Fixes lint error in py3 (#21883 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21883 ghimport-source-id: c4330d71033929178ef10f2a0fcd8b0b2b468cb5 Test Plan: Imported from OSS Differential Revision: D15866746 Pulled By: bwasti fbshipit-source-id: c3d23f3396a95d5b1d689a07662e82e48cb3ab7a	2019-06-17 22:20:06 -07:00
Ilia Cherniavskii	2ba164b943	Future interface for ATen/Parallel (#21764 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21764 ghimport-source-id: fca083c09d814a0411020871f49429509fc0e8b5 Test Plan: Imported from OSS see https://github.com/pytorch/pytorch/pull/21764 Differential Revision: D15816658 Pulled By: ilia-cher fbshipit-source-id: 0e25ca6ff66a837d4f69f37a47e59927ab10e216	2019-06-17 22:05:59 -07:00
Sam Gross	d8314a6260	Replace nullary/unary/binary loops with generic implementation (#21475 ) Summary: ``` This replaces the kernel helpers in Loops.h/cuh with the following: cpu_kernel cpu_kernel_vec gpu_kernel gpu_kernel_with_scalars These work with functions with any number of input arugments, with the exception of 'gpu_kernel_with_scalars' which is limited to binary operations. Previously, we only supported functions of 0, 1, or 2 input arguments. Adding support for 3 or 4 input argument functions required significant amount of additional code. This makes a few other changes: Remove 'ntensors' from the for_each/serial_for_each loop. Most loops assume a fixed number of tensors, and the value is accessible from TensorIterator::ntensors() Only lift CPU scalars to parameters in 'gpu_kernel_with_scalars'. Previously, we performed this recursively in gpu_unary_kernel and gpu_binary_kernel, so something like `torch.add(3, 4, out=cuda_tensor)` would specialize to a "nullary" kernel. Now, only the first scalar input is lifted to a kernel parameter. Any additional scalar inputs are copied to CUDA tensors. Note that operations like `x + 5` and `5 + x` still work efficiently. This avoids generating an exponential number of specializations in the number of input arguments. ``` Performance measurements Timing numbers are unchanged for basic elementwise operations. Linked below is a script to measure torch.add perf on PR vs. master CPU+GPU (GCC 7.3): [miniperf.py](https://gist.github.com/colesbury/4a61893a22809cb0931f08cd37127be4) Generated assembly cpu_kernel and cpu_kernel_vec still generate good vectorized code with both GCC 7.3 and GCC 4.8.5. Below is the assembly for the "hot" inner loop of torch.add as well as an auto-vectorized torch.mul implementation using cpu_kernel/ binary_kernel. (The real torch.mul uses cpu_kernel_vec but I wanted to check that auto vectorization still works well): [torch.add GCC 7.3](https://gist.github.com/colesbury/927ddbc71dc46899602589e85aef1331) [torch.add GCC 4.8](https://gist.github.com/colesbury/f00e0aafd3d1c54e874e9718253dae16) [torch.mul auto vectorized GCC 7.3](https://gist.github.com/colesbury/3077bfc65db9b4be4532c447bc0f8628) [torch.mul auto vectorized GCC 4.8](https://gist.github.com/colesbury/1b38e158b3f0aaf8aad3a76963fcde86) Pull Request resolved: https://github.com/pytorch/pytorch/pull/21475 Differential Revision: D15745116 Pulled By: colesbury fbshipit-source-id: 914277d7930dc16e94f15bf87484a4ef82890f91	2019-06-17 19:08:33 -07:00
Gu, Jinghui	7f057f00cc	Update mkldnn-bridge to fix MKLDNN grouped conv issue (#21854 ) Summary: 1. Fix grouped conv issue in https://github.com/pytorch/pytorch/issues/21597 2. Fix build error in `731670f40a` Pull Request resolved: https://github.com/pytorch/pytorch/pull/21854 Test Plan: buck run experimental/jbai/pt_issue_21597_mkldnn_conv_2d_repro:run Reviewed By: yinghai Differential Revision: D15861105 Pulled By: bddppq fbshipit-source-id: fe3e2943a15aab4294f8e6bb15db15829a94420f	2019-06-17 18:21:26 -07:00
Zachary DeVito	5237835a17	Make script::Method a value type (#21675 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21675 ghimport-source-id: 90ee7ba00e58b0151ca4c17e91fd17303c9d5d08 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D15777725 Pulled By: zdevito fbshipit-source-id: 8482cd2e1dcd7dd77a9cacbb76743bd190c7c4cf	2019-06-17 18:14:50 -07:00
Sam Gross	cc4498a54a	Always enable P2P access for GPU copies (#21872 ) Summary: PR https://github.com/pytorch/pytorch/issues/20685 incorrectly only enabled P2P access for non-contiguous copies. This can make cudaMemcpy slow for inter-gpu copies, especially on ROCm devices. I didn't notice a difference on CUDA 10, but ngimel says it's important for CUDA too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21872 Differential Revision: D15863965 Pulled By: colesbury fbshipit-source-id: 0a858f3c338fa2a5d05949d7f65fc05a70a9dfe1	2019-06-17 17:48:28 -07:00
Hongyu Xiong	76a250d590	Add new regression loss function type to FBLearner (#21080 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21080 Add Huber loss as a new option for regression training (refer to TensorFlow implementation: https://fburl.com/9va71wwo) # huber loss def huber(true, pred, delta): error = abs(true-pred) loss = 0.5 * min(error, delta)^2 + delta * max(error - delta, 0) return mean(loss) As a combination of MSE loss (`x < delta`) and MAE loss (`x >= delta`), the advantage of Huber loss is to reduce the training dependence on outlier. One thing worth to note is that Huber loss is not 2nd differential at `x = delta`. To further address this problem, one could consider adopt the loss of `LOG(cosh(x))`. Reviewed By: chintak Differential Revision: D15524377 fbshipit-source-id: 73acbe2728ce160c075f9acc65a1c21e3eb64e84	2019-06-17 17:43:00 -07:00
Bram Wasti	8aeb4ef4bf	Add python string standard lib (#21807 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21807 ghimport-source-id: dcb2c78b8facb90a323ab9212b7703e553354273 Test Plan: Imported from OSS Differential Revision: D15835509 Pulled By: bwasti fbshipit-source-id: bc8bc5ae5a4fb4a1581aa94485973ed87af4eaaf	2019-06-17 15:48:36 -07:00
Michael Suo	d329dffd92	improve error message on recursive class defs (#21842 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21842 ghimport-source-id: 33569714b18fc476c4e6b3bc976b53b1f107273d Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D15857568 Pulled By: suo fbshipit-source-id: 6307597b9741cfdccd5c55216ebdc7c4391a5e23	2019-06-17 15:23:21 -07:00
Michael Suo	cdae8b93a7	improve recursive scripting error message (#21841 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21841 ghimport-source-id: fbca813d12ca4bfad7967e12c8dafe5eaba77cab Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D15857569 Pulled By: suo fbshipit-source-id: 152eba10565cf7119508079e98512f116eb3a5a8	2019-06-17 15:23:17 -07:00
Dmytro Dzhulgakov	c0420d9618	Attempt to fix TRT build after library merge (#21775 ) Summary: After fixing https://github.com/pytorch/pytorch/issues/20774 the TRT build was broken Because of missing annotations, pybind_state_gpu.so was missing symbols, but pybind_state.so did not. It caused a weird combination when trying to import pybind_state_gpu first left system in semi-initialized state and lead to sigsev. Minimal repro: ``` >>> import ctypes >>> ctypes.CDLL('/var/lib/jenkins/.local/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state_gpu.so') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.7/ctypes/__init__.py", line 362, in __init__ self._handle = _dlopen(self._name, mode) OSError: /var/lib/jenkins/.local/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state_gpu.so: undefined symbol: _ZN6caffe219TensorRTTransformer9TransformEPNS_9WorkspaceEPNS_6NetDefERKSt13unordered_mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_11TensorShapeESt4hashISB_ESt8equal_toISB_ESaISt4pairIKSB_SC_EEE >>> ctypes.CDLL('/var/lib/jenkins/.local/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state.so') Segmentation fault (core dumped) ``` Too lazy to repro locally, let's see if CI passes Pull Request resolved: https://github.com/pytorch/pytorch/pull/21775 Differential Revision: D15829605 Pulled By: dzhulgakov fbshipit-source-id: 1adb2bde56b0cd68f84cfca67bc050adcf787cd9	2019-06-17 14:16:45 -07:00
Hong Xu	0408697317	Followup cleanup in cmake.py and add a comment in setup.py (#21792 ) Summary: Following up b811b6d5c03596d789a33d7891b606842e01f7d2 * Use property instead of __setattr__ in CMake. * Add a comment clarifying when built_ext.run is called. --- cc ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/21792 Differential Revision: D15860606 Pulled By: umanwizard fbshipit-source-id: ba1fa07f58d4eac81ac27fa9dc7115d1cdd3dec0	2019-06-17 13:46:25 -07:00
Edward Yang	7279e07c8b	Don't use anonymous namespace in header. (#21790 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21790 ghimport-source-id: ff648a1e9a1b8627f0742307e2e7810d6445d597 Test Plan: Imported from OSS Differential Revision: D15827311 Pulled By: ezyang fbshipit-source-id: 996bfd3a93fcda5934dcc523adae0648cba1c4fa	2019-06-17 13:26:02 -07:00
Richard Zou	1aa16d356e	named inference rule for tensor.select (#21752 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21752 ghimport-source-id: 95e17087b8c29c9bd88003ae225cb7329d0b67e6 Test Plan: - `python test/test_namedtensor.py` [namedtensor ci] gh-metadata: pytorch pytorch 21752 gh/zou3519/50/head Imported from OSS Differential Revision: D15833453 Pulled By: zou3519 fbshipit-source-id: 7b51e4137e54712aa9c6274a9e6bb48ab7191b8d	2019-06-17 13:12:49 -07:00
LUO luoyuchu	b403b10ff9	Fix #11752 : fix numerical issue in log_softmax (#21672 ) Summary: https://github.com/pytorch/pytorch/issues/11866 has corrected this issue in function `host_softmax` (aten/src/ATen/native/SoftMax.cpp). But I tried the example proposed in https://github.com/pytorch/pytorch/issues/11752. `log_softmax` is still not working for big logits. I have looked into the source code, found that example had called `vec_host_softmax_lastdim`, not `host_softmax`. This code fixes the issue in `_vec_log_softmax_lastdim` and has a test for `log_softmax`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21672 Differential Revision: D15856327 Pulled By: VitalyFedyunin fbshipit-source-id: 7a1fd3c0a03d366c99eb873e235361e4fcfa7567	2019-06-17 12:59:08 -07:00
Ivan Ogasawara	0f675f9cbc	Port im2col and vol2col (#21769 ) Summary: resolves partially https://github.com/pytorch/pytorch/issues/18353 Pull Request resolved: https://github.com/pytorch/pytorch/pull/21769 Differential Revision: D15854530 Pulled By: ezyang fbshipit-source-id: 574853c068010d1b7588047d2ab7450077471447	2019-06-17 10:06:26 -07:00
Richard Zou	2b23fac8da	Disallow creation of tensors with duplicate names (#21781 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21781 ghimport-source-id: d77e0c97fe0104b4b29571fd5828967399d34fb1 Test Plan: - `python test/test_namedtensor.py -v` [namedtensor ci] gh-metadata: pytorch pytorch 21781 gh/zou3519/51/head Imported from OSS Differential Revision: D15833454 Pulled By: zou3519 fbshipit-source-id: fca4de83fba4bced615ec3cbd4ce4c441ddfcaf2	2019-06-17 09:59:50 -07:00
Richard Zou	44707dd3ca	Rename Dimname::name to Dimname::full_name (#21803 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21803 ghimport-source-id: e0bc5a746e745e18f19215c6551d79cb0cd5f9c5 Test Plan: - [namedtensor ci] Imported from OSS Differential Revision: D15833452 Pulled By: zou3519 fbshipit-source-id: 7aa4d78ff436bd6a622a5ea235b75135d9798d33	2019-06-17 08:32:32 -07:00
Richard Zou	7c1528bab6	Copy NamedTensorMeta in TensorImpl::copy_tensor_data() (#21735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21735 ghimport-source-id: 4a4289693e372880e3d36e579c83d9e8745e70ed Test Plan: - I'm not sure how to test this other than making sure it compiles. - [namedtensor ci] gh-metadata: pytorch pytorch 21735 gh/zou3519/49/head Imported from OSS Differential Revision: D15833456 Pulled By: zou3519 fbshipit-source-id: ea2fa6d5c5f1eb2d7970d47189d6e4fcd947146d	2019-06-17 08:32:28 -07:00
Shen Li	da4e60226c	Keep Reducer hooks in a vector instead of an unordered_map (#21783 ) Summary: kuttas pointed out that the DDP Reducer only needs to remember `uintptr, Function` pairs, and hence does not need a nunordered map as added by https://github.com/pytorch/pytorch/issues/21591. Using a vector should speed it up a bit. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21783 Differential Revision: D15854312 Pulled By: mrshenli fbshipit-source-id: 153ba035b8d658c7878a613f16a42de977d89c43	2019-06-17 08:24:19 -07:00
Xiaodong Wang	76713fb564	Fix remote build + clean up disable feature hack (#21816 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21816 Clean up disable feature hack. Reviewed By: bddppq Differential Revision: D15833285 fbshipit-source-id: a2ae5d0f15e47b835dbd3997bbaa0add7e868f20	2019-06-17 08:08:34 -07:00
Dmytro Dzhulgakov	4a6aa1d806	Populate producer_info.json in any PyTorch model at FB (#21662 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21662 Use hook added in https://github.com/pytorch/pytorch/pull/20863 to auto-write a file with environment information (including user, machine, Flow, etc). Reviewed By: natalialunova Differential Revision: D15690185 fbshipit-source-id: ccaaeda9562db32925041d18f394fb98fab8db99	2019-06-16 20:12:23 -07:00
vishwakftw	c9ba3f699d	Bag of documentation fixes (#21846 ) Summary: Thanks henon for raising the issues. Fixes https://github.com/pytorch/pytorch/issues/21830 Fixes https://github.com/pytorch/pytorch/issues/21831 Fixes https://github.com/pytorch/pytorch/issues/21832 Fixes https://github.com/pytorch/pytorch/issues/21827 Fixes https://github.com/pytorch/pytorch/issues/21822 Fixes https://github.com/pytorch/pytorch/issues/21820 Pull Request resolved: https://github.com/pytorch/pytorch/pull/21846 Differential Revision: D15847389 Pulled By: soumith fbshipit-source-id: 421cc48af646a2618af731697de7d4de83d3eabe	2019-06-16 19:35:27 -07:00
Zachary DeVito	972ec676b2	Remove lowered execution (#21674 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21674 ghimport-source-id: b8e27f0ce9b8b362daf73556ee67457fb5355062 Reviewed By: eellison Differential Revision: D15777726 Pulled By: zdevito fbshipit-source-id: 718ac676c9a1bcf99b856862fd29631d825645da	2019-06-16 14:29:18 -07:00
Ailing Zhang	ff1172d705	high pri Jit builtins (#21451 ) Summary: bin/hex/oct/round/chr Pull Request resolved: https://github.com/pytorch/pytorch/pull/21451 Differential Revision: D15702863 Pulled By: ailzhang fbshipit-source-id: 9f69896b79e7584f12353e9f2ee2969dbe1ec6d6	2019-06-16 09:48:38 -07:00
Michael Suo	4f75da3b41	change ClassType::compilation_unit to return owning ptr (#21787 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21787 ghimport-source-id: eed7b98b0f02745066164b8ef3906291931e2ecb Test Plan: Imported from OSS Differential Revision: D15831353 Pulled By: suo fbshipit-source-id: 50695c35dba8ffea710cbc9aca8aba6a75512fa0	2019-06-16 02:37:07 -07:00
Edward Yang	263b1985a8	Revert D15833924: [jit] Fix stdout capturing, remove some expect files Differential Revision: D15833924 Original commit changeset: 152972b4c240 fbshipit-source-id: 1d5a2258bc134fdc7bd2cb557bcc05f2289443b6	2019-06-15 20:39:11 -07:00
Will Feng	04f09d4235	Move unwrap logic from c10 to caffe2 (#21620 ) Summary: After https://github.com/pytorch/pytorch/pull/17072, we are allowed to pass Variables into ATen ops, thus there is no need to unwrap input variables in the c10 call path. Note that since Caffe2 still expects inputs to be pure Tensors, we moved the unwrapping logic to the Caffe2 wrapper. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21620 Differential Revision: D15763560 Pulled By: yf225 fbshipit-source-id: 5375f0e51eb320f380ae599ebf98e6b259f0bff8	2019-06-14 22:02:43 -07:00
peter	794ee6d00c	Switch to out-source builds for LibTorch Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21772 Differential Revision: D15839332 Pulled By: yf225 fbshipit-source-id: 017cf61c5682c6a8ffeaf2ca952e1418c27be30e	2019-06-14 21:00:18 -07:00
Will Feng	4a2fc00db0	Revert D15830704: [jit] Add Python string standard lib Differential Revision: D15830704 Original commit changeset: e55a8c6bf910 fbshipit-source-id: 1ec953bfaabab0288e953f48cde0a32370ac3fc6	2019-06-14 20:52:58 -07:00
svcscm	97b92eede1	Updating submodules Reviewed By: cdelahousse fbshipit-source-id: 979417253fab9142059bdfb6e834f44bb1cc8d0d	2019-06-14 17:41:30 -07:00
davidriazati	220efdbdc4	Refactor pybind_utils.h (#21550 ) Summary: This refactors pybind_utils so we can have all our type-inferring stuff in 1 place (e.g. for #21379) There is some follow up work to make the error messages better, but I think that's fine to save for another PR. ](https://our.intern.facebook.com/intern/diff/15727002/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/21550 Pulled By: driazati Differential Revision: D15727002 fbshipit-source-id: a6974f2e1e5879f0503a18efc138da31cda7afa2	2019-06-14 17:27:45 -07:00
Nikolay Korovaiko	a85305fdea	Hook up profiled execution in the interpreter (#21799 ) Summary: Rebasing https://github.com/pytorch/pytorch/pull/21616 onto master Pull Request resolved: https://github.com/pytorch/pytorch/pull/21799 Differential Revision: D15832854 Pulled By: Krovatkin fbshipit-source-id: 88d754446df2abc25ea86e46764848d48ee3a5fc	2019-06-14 16:56:13 -07:00
James Reed	4bcc72fe95	Support for NamedTuple (#21428 ) Summary: Resolves https://github.com/pytorch/lockdown/issues/18 This implements NamedTuple by taking advantage of the existing `names` field in `TupleType`. TODO: This currently doesn't retain the NamedTuple-ness through serialization. Discussed with suo offline, we can probably make a way to define an anonymous NamedTuple in script (e.g. `NamedTuple('Foo', [('a', int), ('b', float), ('c', List[float])])` and serialize that TODO: implement support for calling the constructor with kwargs Pull Request resolved: https://github.com/pytorch/pytorch/pull/21428 Differential Revision: D15741564 Pulled By: jamesr66a fbshipit-source-id: c077cbcea1880675ca6deb340a9ec78f824a136c	2019-06-14 16:45:56 -07:00
Hector Yuen	ac8d1a1f76	fix some issues found by enabling -Wshorten-64-to-32 (#18187 ) Summary: when enabling this flag, there were a lot of warnings, this pr focuses on the warnings where this comparison could be affecting array indices, which could be ones most prone to fail the good news is that I didn't find anything obviously concerning one degenerate case could be when the matrices we work with are too skinny could run into issues (dim1=1, dim2 needs to hold a big number) Pull Request resolved: https://github.com/pytorch/pytorch/pull/18187 Differential Revision: D14527182 Pulled By: hyuen fbshipit-source-id: b9f46b6f68ab912c55368961758a7a5af1805555	2019-06-14 16:29:32 -07:00
Jerry Zhang	94f903654c	Add qscheme() method (#20608 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20608 Exposing QScheme in python as Python objects like `torch.qscheme.per_tensor_affine` etc. Reviewed By: zafartahirov Differential Revision: D15364354 fbshipit-source-id: 4d6a96d67e9ead051cf4a8f934553a8c7232fdb7	2019-06-14 16:29:29 -07:00
davidriazati	d0021b3ac7	Fix stdout capturing, remove some expect files Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21805 Pulled By: driazati Differential Revision: D15833924 fbshipit-source-id: 152972b4c24041b8a459d5b8ef8789543a6b8153	2019-06-14 16:05:06 -07:00
Thiago Crepaldi	07fea3f5b6	Add new get_batch() method to ChunkDataset API (#21797 ) Summary: We plan on generating python bindings for C++ ChunkDataset API using the current Pytorch Dataloader class, which must call get_batch() instead of get_batch(size) This changes doesnt break the current API, just add one more method that will make future extensions easier (WIP) Pull Request resolved: https://github.com/pytorch/pytorch/pull/21797 Differential Revision: D15830522 Pulled By: soumith fbshipit-source-id: 7208f305b48bf65d2783eaff43ff57a05e62c255	2019-06-14 13:39:54 -07:00
Bram Wasti	dddc65db9e	Add Python string standard lib (#21059 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21059 ghimport-source-id: f813585cde1b275c134b19009a2f5c0b3d70fc6e Reviewed By: jamesr66a Differential Revision: D15830704 Pulled By: bwasti fbshipit-source-id: e55a8c6bf910a163b9a5260235e315af9532b129	2019-06-14 13:34:42 -07:00
Peter Yeh	65a3dbdfb0	Remove hip device sync in miopen Conv implementation (#21791 ) Summary: xw285cornell bddppq Note there are other optimizations coming. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21791 Differential Revision: D15829238 Pulled By: bddppq fbshipit-source-id: 66c62c646f315d65b4e432ca20890faded843db4	2019-06-14 12:32:50 -07:00
Tzu-Wei Huang	1fc240e59a	add tests for add_custom_scalars and others (#20987 ) Summary: Originally, the tests for tensorboard writer are smoke tests only. This PR lets CI compare the output with expected results at low level. The randomness of the tensors in the test are also removed. ps. I found that how protobuf serializes data differs between different python environment. One method to solve this is to write the data and then read it back instantly. (compare the data at a higher level) For `add_custom_scalars`, the data to be written is a dictionary. and the serialized result might be different (not `ordereddict`). So only smoke test for that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20987 Reviewed By: NarineK, lanpa Differential Revision: D15804871 Pulled By: orionr fbshipit-source-id: 69324c11ff823b19960d50def73adff36eb4a2ac	2019-06-14 12:27:07 -07:00
Richard Zou	0d6eb209e6	Expose torch.empty(sizes, *, names, ...) to Python (#21648 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21648 ghimport-source-id: 583f155c8ee95967d2f8b9d8df27d94b9e725694 Differential Revision: D15804482 Pulled By: zou3519 fbshipit-source-id: f86520dda479100be2a752e4db8a902167413a83	2019-06-14 11:52:47 -07:00
Stefan Krah	710821875a	Fix flaky nuclear_norm() test (#21638 ) Summary: Try to fix a sporadic failure on some CIs. I've run this test hundreds of times on my machine (GeForce 1060, MAGMA) but I cannot reproduce this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21638 Differential Revision: D15827779 Pulled By: ezyang fbshipit-source-id: 3586075e48907b3b84a101c560a34cc733514a02	2019-06-14 11:40:03 -07:00
zaf	ff8c3fd54e	Adding the quantized namespace to torch.nn and importing it from torch (#21600 ) Summary: Stack:     ⚫  https://github.com/pytorch/pytorch/issues/21600 Adding the quantized namespace to torch  [💛](https://our.intern.facebook.com/intern/diff/D15742149/) Add nn.quantized name space to torch Pull Request resolved: https://github.com/pytorch/pytorch/pull/21600 Differential Revision: D15742149 Pulled By: zafartahirov fbshipit-source-id: 60dede12c81861f369d208b06f5b68e9384312f6	2019-06-14 11:05:45 -07:00
Sebastian Messmer	9a1dc43f34	Deprecate unordered_map and vector in IValues (#21712 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21712 Warn when people use unordered_map or vector with IValues. These APIs are deprecated. The unordered_map API is slow because it requires copying the whole map. The vector API is slow for some types (e.g. std::string) because for them it also requires copying the whole map. Also, the vector API would get slow for all types if we decide to switch to SmallVector. Differential Revision: D15792428 fbshipit-source-id: 1b72406b3a8d56521c862858c9f0ed01e56f2757	2019-06-14 11:05:41 -07:00
Edward Yang	029a968212	Define __setstate__ on _ConvNd to handle pre-padding_mode pickles. (#21687 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21687 ghimport-source-id: df49530d25239ac4d62eae83c5d7b0d8f00f836a Differential Revision: D15807402 Pulled By: ezyang fbshipit-source-id: f51b221444afc4e017db7544642a9c0a7d2a3efb	2019-06-14 11:00:21 -07:00
Brian Vaughan	7284f448ba	Fix handling of kwargs from common method invocations (#21499 ) Summary: When kwargs are specified in a test defined via common_method_invocations, it doesn't work if there isn't also a positional argument (`{'foo':'foo'}` without a positional arg generates a python call like: `self.method(, foo=foo)`, erroring on the `,`). I wanted to test something in a different PR and noticed I couldn't. Also fixed some flake8 warnings I was seeing locally. I replaced `lambda x: x` with `ident` since it seems a bit cleaner to me, but happy to revert that if others don't agree? Pull Request resolved: https://github.com/pytorch/pytorch/pull/21499 Differential Revision: D15826974 Pulled By: nairbv fbshipit-source-id: a3f37c80ba2303c7d9ae06241df06c7475b64e36	2019-06-14 10:47:33 -07:00
Lu Fang	c1744a6c39	Add ONNX py3 CI cases (#21715 ) Summary: So far, we only have py2 ci for onnx. I think py3 support is important. And we have the plan to add onnxruntime backend tests, which only supports py3. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21715 Reviewed By: bddppq Differential Revision: D15796885 Pulled By: houseroad fbshipit-source-id: 8554dbb75d13c57b67ca054446a13a016983326c	2019-06-14 10:20:14 -07:00
xiaobing.zhang	c06ccbe663	Add aten mkldnn zero_ operator (#20573 ) Summary: ### mkldnn backward ops list: - [ ] \(https://github.com/pytorch/pytorch/pull/20567) Add aten mkldnn conv2d backward operator 💛 - [ ] \(https://github.com/pytorch/pytorch/pull/20570) Add aten mkldnn backward ops: relu, linear and reshape 💛 - [ ] \(https://github.com/pytorch/pytorch/pull/20571) Add aten mkldnn backward ops: max_pool2d, avg_pool2d and adaptive_avg_poo2d 💛 - [ ] \(https://github.com/pytorch/pytorch/pull/20572) Add aten mkldnn batchnorm backward operator 💛 - [ ] \(https://github.com/pytorch/pytorch/pull/20573) Add aten mkldnn zero_ operator💛 - [ ] \(https://github.com/pytorch/pytorch/pull/20575) Add mkldnn mul operator 💚 Pull Request resolved: https://github.com/pytorch/pytorch/pull/20573 Differential Revision: D15820477 Pulled By: bddppq fbshipit-source-id: 35d95f5b4e013c8db1911f52148550a2e40a2e68	2019-06-14 09:48:49 -07:00
Tongzhou Wang	bc6281028c	rebuild_storage_fd retry on EINTR (#21723 ) Summary: Some data loader tests are flaky on py 2 with the following error ``` Jun 12 22:17:31 Traceback (most recent call last): Jun 12 22:17:31 File "test_dataloader.py", line 798, in test_iterable_dataset Jun 12 22:17:31 fetched = sorted([d.item() for d in dataloader_iter]) Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 697, in __next__ Jun 12 22:17:31 idx, data = self._get_data() Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 664, in _get_data Jun 12 22:17:31 success, data = self._try_get_data() Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 617, in _try_get_data Jun 12 22:17:31 data = self.data_queue.get(timeout=timeout) Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/multiprocessing/queues.py", line 135, in get Jun 12 22:17:31 res = self._recv() Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/multiprocessing/queue.py", line 22, in recv Jun 12 22:17:31 return pickle.loads(buf) Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/pickle.py", line 1382, in loads Jun 12 22:17:31 return Unpickler(file).load() Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/pickle.py", line 858, in load Jun 12 22:17:31 dispatch[key](self) Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/pickle.py", line 1133, in load_reduce Jun 12 22:17:31 value = func(*args) Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/multiprocessing/reductions.py", line 274, in rebuild_storage_fd Jun 12 22:17:31 fd = multiprocessing.reduction.rebuild_handle(df) Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/multiprocessing/reduction.py", line 157, in rebuild_handle Jun 12 22:17:31 new_handle = recv_handle(conn) Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/multiprocessing/reduction.py", line 83, in recv_handle Jun 12 22:17:31 return _multiprocessing.recvfd(conn.fileno()) Jun 12 22:17:31 OSError: [Errno 4] Interrupted system call ``` Apparently, Python 2.7's `recvfd` calls `recvmsg` without EINTR retry: https://github.com/python/cpython/blob/2.7/Modules/_multiprocessing/multiprocessing.c#L174 So we should call it with an outer try-catch loop. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21723 Differential Revision: D15806247 Pulled By: ezyang fbshipit-source-id: 16cb661cc0fb418fd37353a1fef7ceeb634f02b7	2019-06-14 09:10:00 -07:00
LeviViana	deb2140c6e	Throwing errors for min and max reductions in empty CUDA tensors (#19612 ) Summary: Related to https://github.com/pytorch/pytorch/issues/17750. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19612 Differential Revision: D15813649 Pulled By: gchanan fbshipit-source-id: aa3dc34dd1e6d8bb24fa4c18891204108759bb35	2019-06-14 08:34:30 -07:00
Hong Xu	b811b6d5c0	When building extensions, honor options set in CMake. (#21653 ) Summary: Currently when building extensions, variables such as USE_CUDA, USE_CUDNN are used to determine what libraries should be linked. But we should use what CMake has detected, because: 1. If CMake found them unavailable but the variables say some libraries should be linked, the build would fail. 2. If the first build is made using a set of non-default build options, rebuild must have these option passed to setup.py again, otherwise the extension build process is inconsistent with CMake. For example, ```bash # First build USE_CUDA=0 python setup.py install # Subsequent builds like this would fail, unless "build/" is deleted python setup.py install ``` This commit addresses the above issues by using variables from CMakeCache.txt when building the extensions. --- The changes in `setup.py` may look lengthy, but the biggest changed block is mostly moving them into a function `configure_extension_build` (along with some variable names changed to `cmake_cache_vars['variable name']` and other minor changes), because it must be called after CMake has been called (and thus the options used and system environment detected by CMake become available). Pull Request resolved: https://github.com/pytorch/pytorch/pull/21653 Differential Revision: D15824506 Pulled By: ezyang fbshipit-source-id: 1e1eb7eec7debba30738f65472ccad966ee74028	2019-06-14 08:13:40 -07:00
Hong Xu	4001e71547	When converting to NumPy, throw TypeError when type is not supported (#21608 ) Summary: This makes the error thrown in aten_to_numpy_dtype consistent with that in numpy_dtype_to_aten. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21608 Differential Revision: D15816035 Pulled By: gchanan fbshipit-source-id: 392e8b9ea37003a859e7ed459911a1700fcbd695	2019-06-14 07:35:03 -07:00
Michael Carilli	2d5ce519f2	Fix with emit_nvtx, also allow shape information to appear in nvtx ranges. (#21691 ) Summary: This PR is intended as a fix for https://github.com/pytorch/pytorch/issues/21644. It allows the `with emit_nvtx` context manager to take an additional `record_shapes` argument. `record_shapes` is False by default, but if True, the nvtx ranges generated for each autograd op will append additional information about the sizes of Tensors received by that op. The format of shape information is equivalent to what the CPU-side profiler spits out. For example, ``` M = torch.randn(2, 3) mat1 = torch.randn(2, 3) mat2 = torch.randn(3, 3) with torch.cuda.profiler.profile(): with torch.autograd.profiler.emit_nvtx(record_shapes=True): torch.addmm(M, mat1, mat2) ``` produces the following nvtx range label for addmm: ![Screenshot from 2019-06-12 10-48-01](https://user-images.githubusercontent.com/7799218/59374008-b7d13100-8cff-11e9-9245-58410073d965.png) (cf the "Input Shapes" shown in `864cfbc216 (diff-115b6d48fa8c0ff33fa94b8fce8877b6)`) I also took the opportunity to do some minor docstring cleanup. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21691 Differential Revision: D15816226 Pulled By: gchanan fbshipit-source-id: b2b01ea10fea61a6409a32b41e85b6c8b4851bed	2019-06-14 07:35:00 -07:00
Shagun	b9675efb5a	Fix the issue of sizes vs size for tensor creation ops (#21686 ) Summary: Related to [pytorch#20921](https://github.com/pytorch/pytorch/issues/20921) Pull Request resolved: https://github.com/pytorch/pytorch/pull/21686 Differential Revision: D15816109 Pulled By: gchanan fbshipit-source-id: 4428b8e77b6c8b297ddb77e58fc1cb916c9cc46e	2019-06-14 07:34:56 -07:00
Benny Chen	1e7bd7586d	Query caffe2 operator stats for detailed execution info (#20924 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20924 I found a python3 bug for deserializing caffe2 code. The exception thrown is Unicode related error instead of just decode error, and we need to catch that as well Reviewed By: ipiszy Differential Revision: D15293221 fbshipit-source-id: 29820800d1b4cbe5bf3f5a189fe2023e655d0508	2019-06-13 23:41:04 -07:00
Aapo Kyrola	d9eec4ef0d	backend.py: _getattr__ must raise AttributeError (#21763 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21763 Custom __getattr__ functions can only raise AttributeError. This code throwed NotImplementedError which caused upstream troubles when hasattr() was called. Differential Revision: D15815176 fbshipit-source-id: 0982e2382de4578d3fc05c5d2a63f624d6b4765e	2019-06-13 23:17:57 -07:00
Anshul Jain (B*8)	044809f1f3	Handling numel() == 0 in convTranspose (#21652 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21652 Diff fixes issue of empty ROIs for convTranspose Issue StackTrace: P65374505 Reviewed By: jerryzh168 Differential Revision: D15766739 fbshipit-source-id: 39cf8feca66b6aae22ff4ec5c1b6a4e3f20f378d	2019-06-13 23:02:26 -07:00
Richard Zou	5c0e058950	Implement at::empty(IntArrayRef, DimnameList?, TensorOptions) in aten (#21647 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21647 ghimport-source-id: 1db4ec31f047f7854a39c28e2b38918dc6b44f42 Differential Revision: D15804425 Pulled By: zou3519 fbshipit-source-id: 575cc3de09287efe75e7052df129626748208d0d	2019-06-13 20:38:19 -07:00
Edward Yang	3e79036382	Make it possible to trigger all tests by pushing to ci-all/ branch (#21750 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21750 ghimport-source-id: 4792aa5ccab7e4b54c21f23d0b78802f85bbeb8d Differential Revision: D15819367 Pulled By: ezyang fbshipit-source-id: db91ee727c66469ac78e59b3662f29db53a916bc	2019-06-13 19:53:35 -07:00
Ailing Zhang	16b4a12ed8	better example for local weights (#21685 ) Summary: fixes https://github.com/pytorch/hub/issues/29 Pull Request resolved: https://github.com/pytorch/pytorch/pull/21685 Differential Revision: D15817774 Pulled By: ailzhang fbshipit-source-id: d2f615e5d431186d45a21d8300fb9ba3c37b246c	2019-06-13 17:56:25 -07:00
Sherman Wong	adc99efb46	Add batch id to tracer event (#21446 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21446 this is used for easier tracing of iter id when looking at trace diagram Reviewed By: ilia-cher Differential Revision: D15628950 fbshipit-source-id: ee75b3bdb14a36abc18c7bddc49d8ec9789b724d	2019-06-13 17:13:42 -07:00
Mikhail Zolotukhin	fbecb4621f	schema_matching.cpp: improve error messages. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21141 Differential Revision: D15808354 Pulled By: ZolotukhinM fbshipit-source-id: 16d938fd5acafb445a0c433cabc9a55cab563165	2019-06-13 17:04:38 -07:00
Sam Gross	cfd8c58b45	Tune elementwise ops for ROCm (#21754 ) Summary: ``` The stride calculation using OffsetCalculator performs poorly with MAX_DIMS=25. This reduces MAX_DIMS (after coalescing) to 16 on ROCm. I think it's unlikely that anyone will exceed this limit. If they do, we can add additional specializations for ROCm with more dimensions. ``` I'm not sure about the underlying cause. With MAX_DIM=25, the add kernel's params is ~648 bytes vs. ~424 bytes with MAX_DIM=16. The kernel instruction footprint is bigger too, but most of these instructions are never executed and most kernel parameters are never loaded because the typical dimensionality is much smaller. Mini benchmark here: https://gist.github.com/colesbury/1e917ae6a0ca9d24712121b92fed4c8f (broadcasting operations are much faster) cc iotamudelta Pull Request resolved: https://github.com/pytorch/pytorch/pull/21754 Reviewed By: bddppq Differential Revision: D15811906 Pulled By: colesbury fbshipit-source-id: 063f92c083d26e2ef2edc98df7ff0400f9432b9d	2019-06-13 16:25:26 -07:00
Sungmann Cho	f59581218f	Fix spelling errors (#21665 ) Summary: alloctor -> allocator excutable -> executable excution -> execution foward -> forward initiaize -> initialize paralell -> parallel preprocesor -> preprocessor tranpose -> transpose Pull Request resolved: https://github.com/pytorch/pytorch/pull/21665 Differential Revision: D15806155 Pulled By: soumith fbshipit-source-id: d92b21ec8650a2b32f05faf9af0b7d2b073e992c	2019-06-13 15:21:55 -07:00
Natalia Gimelshein	efd20de276	fix multihead attention for half (#21658 ) Summary: Currently multihead attention for half type is broken ``` File "/home/ngimel/pytorch/torch/nn/functional.py", line 3279, in multi_head_attention_forward attn_output = torch.bmm(attn_output_weights, v) RuntimeError: Expected object of scalar type Float but got scalar type Half for argument https://github.com/pytorch/pytorch/issues/2 'mat2' ``` because softmax converts half inputs into fp32 inputs. This is unnecessary - all the computations in softmax will be done in fp32 anyway, and the results need to be converted into fp16 for the subsequent batch matrix multiply, so nothing is gained by writing them out in fp32. This PR gets rid of type casting in softmax, so that half works. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21658 Differential Revision: D15807487 Pulled By: zhangguanheng66 fbshipit-source-id: 4709ec71a36383d0d35a8f01021e12e22b94992d	2019-06-13 15:17:04 -07:00
Will Feng	4716409f30	Use expect to fill in pytorchbot token (#20459 ) Summary: In this PR, we use `expect` to fill in the token for pytorchbot when doing `git push`, so that we don't need to save the token in the git remote URL. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20459 Differential Revision: D15811676 Pulled By: yf225 fbshipit-source-id: cd3b780da05d202305f76878e55c3435590f15a8	2019-06-13 14:56:38 -07:00
Edward Yang	b858f42e16	Document that no_grad is thread local. (#21755 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21755 ghimport-source-id: dfb53759024d9ba9d104fdb2a8151ab996e55234 Differential Revision: D15811172 Pulled By: ezyang fbshipit-source-id: c8c7c1c15277d8fe8cc513e20af449257d7ff15c	2019-06-13 13:47:09 -07:00
BowenBao	3e8dc565bd	Bug fix: ONNX export full operator (#21669 ) Summary: Fix an obvious bug. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21669 Reviewed By: zrphercule Differential Revision: D15806614 Pulled By: houseroad fbshipit-source-id: d0f6e934252e0057f3dbcc7f160236ee6f4497ac	2019-06-13 13:20:21 -07:00
Iurii Zdebskyi	4b45f08f87	Added dim check for index_copy_ (#21617 ) Summary: Fixing reported [bug](https://github.com/pytorch/pytorch/issues/20322) The issue was related to not checking the dimensions of source vs destination tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21617 Differential Revision: D15749963 Pulled By: izdeby fbshipit-source-id: acff114c729fd9c0a9a51325e0ebd8b42e1f2fc1	2019-06-13 13:15:23 -07:00
Aapo Kyrola	aa6887e6ef	add error message to missing function backend (#21742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21742 Add error message to NotImplementedError so we know which function it is about. Reviewed By: bddppq Differential Revision: D15806379 fbshipit-source-id: 14eab9d03aa5b44ab95c5caeadc0e01d51f22188	2019-06-13 13:10:48 -07:00
Guanheng Zhang	756a20de93	Add/edit docs for nn.transformer (#21746 ) Summary: Add docs for TransformerEncoder and TransformerDecoder, plus minor edits. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21746 Differential Revision: D15807498 Pulled By: zhangguanheng66 fbshipit-source-id: 388efb5821c4c3d25865cecea70902e9b2bf5d15	2019-06-13 12:27:26 -07:00
davidriazati	7c7d5be033	Skip failing test Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21751 Pulled By: driazati Differential Revision: D15809091 fbshipit-source-id: 3cc96e632a7b89b4d86d68d2a76021d971447e12	2019-06-13 12:21:56 -07:00
Tongzhou Wang	51ee048709	improve torch.load & torch.save doc formatting Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21747 Differential Revision: D15808189 Pulled By: ezyang fbshipit-source-id: 5413eaaa901be098c6bad135f702ba103bc79d6c	2019-06-13 12:13:04 -07:00
Natalia Lunova	63a7c7bb2a	Add event and event_counter columns to caffe2_usage_tracer table (#21739 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21739 Added event and event_counter columns for PyTorch/Caffe2 API usage metrics Reviewed By: dzhulgakov Differential Revision: D15119119 fbshipit-source-id: a71010bd659109a8e4f3a8bad84b22c1d15dc528	2019-06-13 12:06:02 -07:00
Kevin Chen	f87d5cc191	Fix first reshape in pixel_shuffle conversion (#21486 ) Summary: When converting pixel_shuffle to reshape + transpose + reshape, the first reshape should be: [N, C * r^2, H, W] => [N, C, r, r, H, W] in order to match pytorch's implementation (see ATen PixelShuffle.cpp). This previously wasn't caught by the test case, since it uses C = r = 4. Updated test case to have C = 2, r = 4. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21486 Reviewed By: houseroad Differential Revision: D15700945 Pulled By: houseroad fbshipit-source-id: 47019691fdc20e152e867c7f6fd57da104a12948	2019-06-13 11:44:54 -07:00
Ilia Cherniavskii	fc3f702ba8	at::launch benchmark (#21581 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21581 ghimport-source-id: 6a65d73694b17611d6ad45db0b39b86c318a68c7 Differential Revision: D15736495 Pulled By: ilia-cher fbshipit-source-id: 6b9109ad3611ff3c8b1a37796e9149bef0c2ad36	2019-06-13 10:46:35 -07:00
davidriazati	eca42a5122	Fix failing test for Final annotations (#21725 ) Summary: ](https://our.intern.facebook.com/intern/diff/15800009/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/21725 Pulled By: driazati Differential Revision: D15800009 fbshipit-source-id: 5409c213161e3f2031710933897b85872aad2a83	2019-06-13 10:41:44 -07:00
Ilia Cherniavskii	5485f09f18	Native TBB parallel backend (#20480 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20480 ghimport-source-id: c710f897c4c9b9616fc3dd76d80b4845aea43a1f Differential Revision: D15333692 Pulled By: ilia-cher fbshipit-source-id: 61e476dd5c737fe144e3aec000d8ebb11fbc0547	2019-06-13 10:11:16 -07:00
Stefan Krah	ab0d5ab99d	Port avg_pool2d() to ATen Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21635 Differential Revision: D15768487 Pulled By: ezyang fbshipit-source-id: 85e1d883aded0f4d3ac5100719df335f5a337fc5	2019-06-13 10:03:58 -07:00
Xiaodong Wang	5a7e2ccc0b	Add use_rocm flag to detect AMD build in the runtime (#21718 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21718 adding a detection method on whether the package is built for AMD. Reviewed By: bddppq Differential Revision: D15795893 fbshipit-source-id: 91a21ee76b2273b1032507bdebe57e016717181d	2019-06-13 09:30:49 -07:00
Junjie Bai	556af7c19d	ROCm 2.5 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21724 Differential Revision: D15799149 Pulled By: bddppq fbshipit-source-id: c72689e73470f2ca145556a2ac8cb34e36e341ef	2019-06-13 01:32:46 -07:00
James Malcolm	42770e1370	Improving Categorical Distribution Docs' (#16291 ) (#21707 ) Summary: Closes: Confusing documentation with distributions.Categorical about logits https://github.com/pytorch/pytorch/issues/16291 Solution: Changes documentation on the Categorical distribution from `log probabilities` to `event log-odds`. This makes should reduce confusion as raised by this issue, and is consistent with other distributions such as `torch.Binomial`. More than happy to make any other changes if they fit :). Pull Request resolved: https://github.com/pytorch/pytorch/pull/21707 Differential Revision: D15799181 Pulled By: soumith fbshipit-source-id: f11acca7a5c130102a3ff6674640235ee5aa69bf	2019-06-12 23:54:02 -07:00
BowenBao	a3db2844e1	Support tuples in ScriptModule inputs/outputs (#20784 ) Summary: - [x] Add tests after https://github.com/pytorch/pytorch/pull/20256 is merged - Support exporting ScriptModule with inputs/outputs of arbitrarily constructed tuples. - Moved the assigning of output shapes to after graph conversion to ONNX is completed. By then all tuples in the IR has already been lowered by the pass ```_jit_pass_lower_all_tuples```. If assigning output shapes is required to happen before that, we'll need to hand parse the tuple structures in the graph, and repeat the same logic in ```_jit_pass_lower_all_tuples```. Handling inputs is easier because all tuple information is encoded within the input tensor type. - Swap the order of ```_jit_pass_lower_all_tuples``` and ```_jit_pass_erase_number_types```. Ops like ```prim::TupleIndex``` relies on index being a scalar. ```_jit_pass_erase_number_types``` will convert these kind of scalars to tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20784 Reviewed By: zrphercule Differential Revision: D15484171 Pulled By: houseroad fbshipit-source-id: 4767a84038244c929f5662758047af6cb92228d3	2019-06-12 23:37:28 -07:00
vishwakftw	4c03ac7ac4	Allow batch sizes > 65535 for inverse, solve, cholesky_solve and tria… (#21689 ) Summary: …ngular_solve Changelog: - Iterate over mini batches of 65535 matrices (maximum) Pull Request resolved: https://github.com/pytorch/pytorch/pull/21689 Differential Revision: D15800254 Pulled By: soumith fbshipit-source-id: c743ff13f1ba25d26874429d44e41a3c0ed21d6a	2019-06-12 23:30:19 -07:00
xiaobing.zhang	b599bb3836	Add mkldnn mul operator (#20575 ) Summary: ### mkldnn backward ops list: - [ ] \(https://github.com/pytorch/pytorch/pull/20567) Add aten mkldnn conv2d backward operator 💛 - [ ] \(https://github.com/pytorch/pytorch/pull/20570) Add aten mkldnn backward ops: relu, linear and reshape 💛 - [ ] \(https://github.com/pytorch/pytorch/pull/20571) Add aten mkldnn backward ops: max_pool2d, avg_pool2d and adaptive_avg_poo2d 💛 - [ ] \(https://github.com/pytorch/pytorch/pull/20572) Add aten mkldnn batchnorm backward operator 💛 - [ ] \(https://github.com/pytorch/pytorch/pull/20573) Add aten mkldnn zero_ operator💛 - [ ] \(https://github.com/pytorch/pytorch/pull/20575) Add mkldnn mul operator 💛 Pull Request resolved: https://github.com/pytorch/pytorch/pull/20575 Differential Revision: D15799529 Pulled By: bddppq fbshipit-source-id: 4887d8ef1a0e316ad9db199b657d9481fc13e486	2019-06-12 22:41:51 -07:00
Will Feng	d3b3cbe26e	Revert D15769066: [pytorch][PR] schema_matching.cpp: improve error messages. Differential Revision: D15769066 Original commit changeset: 5853e0360581 fbshipit-source-id: ac6fa8429136abf4c7835919009f936eea11ea7b	2019-06-12 20:17:38 -07:00
Karl Ostmo	49481d576d	Torch rename (#20774 ) Summary: This renames the CMake `caffe2` target to `torch`, as well as renaming `caffe2_gpu` to `torch_gpu` (and likewise for other gpu target variants). Many intermediate variables that don't manifest as artifacts of the build remain for now with the "caffe2" name; a complete purge of `caffe2` from CMake variable names is beyond the scope of this PR. The shell `libtorch` library that had been introduced as a stopgap in https://github.com/pytorch/pytorch/issues/17783 is again flattened in this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20774 Differential Revision: D15769965 Pulled By: kostmo fbshipit-source-id: b86e8c410099f90be0468e30176207d3ad40c821	2019-06-12 20:12:34 -07:00
Nikolay Korovaiko	e9121e27ce	remove liveness tests Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21719 Differential Revision: D15797628 Pulled By: Krovatkin fbshipit-source-id: 87742bdde0b05aff4341ababb1f55c51991768ec	2019-06-12 19:04:41 -07:00
Nick Korovaiko	f5c00345b3	Profiling Programs section in README.md Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21695 Differential Revision: D15795716 Pulled By: Krovatkin fbshipit-source-id: e14a44210ea4312a247157a6681fce449e40f779	2019-06-12 17:52:05 -07:00
Nikolay Korovaiko	8dd670657b	Liveness for BailOut graphs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21615 Differential Revision: D15793434 Pulled By: Krovatkin fbshipit-source-id: d89f1bf61ea57a1e3b75f8e2b200c27beb8b46cf	2019-06-12 17:22:33 -07:00
Zachary DeVito	8c57ce87b0	make tests pass with enable_first_class_module() enabled. (#21565 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21565 ghimport-source-id: d1fe735fb7821eadc59116fb921d8fe39a49f818 Reviewed By: driazati Differential Revision: D15729503 Pulled By: zdevito fbshipit-source-id: fabb678f040d21fae7545e3b2be1d098e24c544e	2019-06-12 17:13:00 -07:00
Zachary DeVito	d8056cb832	Update quantization to work with first-class modules. (#21660 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21660 ghimport-source-id: f9a11b2748f49042ee636755358d79c547aa249e Reviewed By: suo Differential Revision: D15770237 Pulled By: zdevito fbshipit-source-id: 41fa8577028eef247bc545635cd93192a0b19db4	2019-06-12 17:12:57 -07:00
Zachary DeVito	56f4602630	Add WeakIValue, use in tracer. (#21515 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21515 ghimport-source-id: 7898a68791db2b5050164ab01d6ca6991e05746d Reviewed By: suo Differential Revision: D15719981 Pulled By: zdevito fbshipit-source-id: 42cf26cf6541bcdf95f1343da3b9228fe2c229da	2019-06-12 17:12:53 -07:00
davidriazati	0293cf5bb6	Add `Final[T]` annotated members to `__constants__` (#21603 ) Summary: Class member annotations can be marked with `Final[T]` instead of adding them to `__constants__`. `Final` comes from the `typing_extensions` module (which will be used if it is present). If not, the polyfill from `_jit_internal` is exposed as `torch.jit.Final` for users that don't want to install `typing_extensions`. This keeps around `__constants__` since a lot of code is still using it, but in documentation follow ups we should change the examples to all to use `Final`. TODO: install typing_extensions on CI, move tests to a Python3 only file when #21489 lands ](https://our.intern.facebook.com/intern/diff/15746274/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/21603 Pulled By: driazati Differential Revision: D15746274 fbshipit-source-id: d2c9b5643b4abba069b130c26fd42714c906ffac	2019-06-12 16:40:40 -07:00
David Riazati	0481a7710d	Support for type annotations instead of torch.jit.annotate() (#21390 ) Summary: This adds support for PEP 526 style annotations on assignments in place of `torch.jit.annotate()`, so ```python a = torch.jit.annotate(List[int], []) ``` turns into ```python a : List[int] = [] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/21390 Differential Revision: D15790937 Pulled By: driazati fbshipit-source-id: 0cc204f7209a79839d330663cc6ba8320d3a4120	2019-06-12 15:51:46 -07:00
Brennan Vincent	699de487db	numerical integration "trapz" function. (#21610 ) Summary: This is intended to match [numpy.trapz](https://docs.scipy.org/doc/numpy/reference/generated/numpy.trapz.html): numerical integration based on the trapezoid rule. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21610 Differential Revision: D15747618 Pulled By: umanwizard fbshipit-source-id: 8eadb2e75c9877b07592d875ca0b2cca6cb72297	2019-06-12 15:30:13 -07:00
Sebastian Messmer	b527e48588	Use c10::List (#21177 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21177 - Integrate c10::ListPtr into IValue and the c10 dispatcher. - Streamline conversion to/from IValue. Before, we had IValue::to<> and kernel_functor.h had its own ivalue_to_arg_type and return_type_to_ivalue. They are now unified. Also, this means that nested types like Dicts of Lists of Optional of Dict of ... do work as expected now Differential Revision: D15476433 fbshipit-source-id: bde9df80df20091aa8e6ae17ba7e90abd149b954	2019-06-12 13:58:24 -07:00
Syed Tousif Ahmed	ae342fd076	Refactor Random Number Generators in ATen (#21364 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21364 ghimport-source-id: ca7d37e10190ba46dc8512f437404ca9216d3369 Differential Revision: D15696497 Pulled By: ezyang fbshipit-source-id: 2e713b8566ae915e175b5a79ac1dd9b86cc2a23d	2019-06-12 13:01:30 -07:00
Mikhail Zolotukhin	96910251e0	schema_matching.cpp: improve error messages. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21141 Differential Revision: D15769066 Pulled By: ZolotukhinM fbshipit-source-id: 5853e0360581c44e42b068add3bf2bc68e671b2b	2019-06-12 12:43:12 -07:00
Richard Zou	28adca82ea	Add some named tensor helper functions (#21636 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21636 ghimport-source-id: 5eff5744cd3c80f75bdb02576be1407a64e0434d Differential Revision: D15780269 Pulled By: zou3519 fbshipit-source-id: 87ff40ffbe0ebd5fc4d105709c9f6f8dda5f9952	2019-06-12 12:34:44 -07:00
Richard Zou	20b0acf057	Add some more namedtensor builds to the CI (#21632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21632 ghimport-source-id: 6a8da97ce153c6d279017af920edd0d20765c32c Differential Revision: D15760331 Pulled By: zou3519 fbshipit-source-id: b2f4c65df5f6f9322d47da995c76851387e5df47	2019-06-12 12:34:40 -07:00
Richard Zou	3e6eb3dcab	Add virtual dtor to NamedTensorMetaInterface (#21633 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21633 ghimport-source-id: 6cdf0b1559e696a19e282ff6d5ba79c6b119e8c0 Differential Revision: D15760589 Pulled By: zou3519 fbshipit-source-id: 537882c05ab7b19889a31c648c5efeb1949831a8	2019-06-12 12:34:37 -07:00
Guanheng Zhang	83cec5f3ee	nn.Transformer (#20170 ) Summary: Accidentally rebased the old PR and make it too messy. Find it here (https://github.com/pytorch/pytorch/pull/19274) Create a PR for comments. The model is still WIP but I want to have some feedbacks before moving too far. The transformer model depends on several modules, like MultiheadAttention (landed). Transformer is implemented based on the paper (https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf). Users have the flexibility to build a transformer with self-defined and/or built-in components (i.e encoder, decoder, encoder_layer, decoder_layer). Users could use Transformer class to build a standard transformer model and modify sub-layers as needed. Add a few unit tests for the transformer module, as follow: TestNN.test_Transformer_cell TestNN.test_transformerencoderlayer TestNN.test_transformerdecoderlayer TestNN.test_transformer_args_check TestScript.test_scriptmodule_transformer_cuda There is another demonstration example for applying transformer module on the word language problem. https://github.com/pytorch/examples/pull/555 Pull Request resolved: https://github.com/pytorch/pytorch/pull/20170 Differential Revision: D15417983 Pulled By: zhangguanheng66 fbshipit-source-id: 7ce771a7e27715acd9a23d60bf44917a90d1d572	2019-06-12 12:22:12 -07:00
svcscm	180aa234fc	Updating submodules Reviewed By: cdelahousse fbshipit-source-id: 5cbf562652b9d7cf3877b5f819141f88c9b857d3	2019-06-12 12:17:42 -07:00
Will Feng	8f40164517	Add libtorch Linux CPU binary build to PR CI (#21671 ) Summary: Currently we don't have any Linux libtorch binary build in the PR CI, which led to nightly build failure such as https://circleci.com/gh/pytorch/pytorch/1939687. This PR adds Linux libtorch CPU binary build to prevent such breakage from happening in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21671 Differential Revision: D15785003 Pulled By: yf225 fbshipit-source-id: d1f2e4235e48296ddecb3367f8e5a0df16f4ea49	2019-06-12 12:07:31 -07:00
Shen Li	39d412194f	Fix ProcessGroupGloo allgather for tensors with shared storage (#21490 ) Summary: Fix https://github.com/pytorch/pytorch/issues/20421 `ProcessGroupGloo` only requires input/output tensors to be contiguous. Contiguous tensors might not start from the beginning of the underlying storage, e.g., `chunk(..., dim=0)[1]`. The current implementation passes `tensor.storage().data()` ptr to gloo buffer. This leads to wrong results if the tensor has a non-zero storage offset. The proposed solution is to use `tensor.data_ptr()` instead. Let's see if this breaks any tests. cc qijianan777 Pull Request resolved: https://github.com/pytorch/pytorch/pull/21490 Differential Revision: D15768907 Pulled By: mrshenli fbshipit-source-id: 9d7d1e9baf0461b31187c7d21a4a53b1fbb07397	2019-06-12 11:59:17 -07:00
fehiepsi	ad73ea22f7	Add strong Wolfe line search for lbfgs (#8824 ) Summary: This pull request adds a line search for lbfgs. "strong Wolfe" is the default line search method in [minFunc](https://www.cs.ubc.ca/~schmidtm/Software/minFunc.html) and it is also recommended in the [Numerical Optimization](https://www.springer.com/gp/book/9780387303031) book. The implementation is based on four sources: + https://www.cs.ubc.ca/~schmidtm/Software/minFunc.html + https://www.springer.com/gp/book/9780387303031 Algorithms 3.5, 3.6, formula 3.59 + https://github.com/torch/optim/blob/master/lswolfe.lua + https://github.com/torch/optim/blob/master/polyinterp.lua The 'lua' version is based on an old version of `minFunc`, which has been updated in 2012. I made a couple of small changes based on the updated version. Due to that, the test of comparing with `.lua` version is not consistent (that's is the reason I changed a learning rate in the test). Pull Request resolved: https://github.com/pytorch/pytorch/pull/8824 Differential Revision: D15783067 Pulled By: vincentqb fbshipit-source-id: 5316d9088233981120376d79c7869d5f97e51b69	2019-06-12 11:32:41 -07:00
Jiyan Yang	2c91ba3bbc	Add div hashing Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21422 Reviewed By: xianjiec Differential Revision: D15589181 fbshipit-source-id: f6ff0726164f88da45e4b090b4d5ad05305b3225	2019-06-12 11:27:37 -07:00
daquexian	76e01542ed	Fix the shape of PReLU weight (#21330 ) Summary: Fix issue https://github.com/pytorch/pytorch/issues/21271 Pull Request resolved: https://github.com/pytorch/pytorch/pull/21330 Reviewed By: zrphercule Differential Revision: D15776459 Pulled By: houseroad fbshipit-source-id: 4e0aef88e9c91c79faa3da6fa66f7466dee52018	2019-06-12 11:03:40 -07:00
Daya Khudia	7123c6ca04	Enable groupwise for qconv (#21592 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21592 We now support groupwise convolutions for qconv2d Reviewed By: zafartahirov Differential Revision: D15739239 fbshipit-source-id: 80b9b4fef5b9ee3d22ebecbaf205b970ab3d4250	2019-06-12 11:03:36 -07:00
Will Feng	8cc8e15473	Back out "[pytorch][PR] [Re-landing] Fix caffe2 windows CI for new Windows AMI" (#21670 ) Summary: Original commit changeset: e65c1d6bfcc9 Pull Request resolved: https://github.com/pytorch/pytorch/pull/21670 Differential Revision: D15776087 Pulled By: yf225 fbshipit-source-id: cbb55cbbcb133cae1aeb2fe75cc52e7350cc6c88	2019-06-12 10:37:19 -07:00
Shen Li	cbcb2b5ad7	Delete DDP hooks in Reducer destructor (#21591 ) Summary: Closes https://github.com/pytorch/pytorch/issues/21344 DDP assigns the original module to the first module replica instead of creating a new one. Then, it creates a new Reducer to add post hooks to sync gradients. However, because every reconstructed DDP instance wraps the same original module, all their reducers will add hooks to the same set of variables. This PR deletes DDP hooks from variables when destructing Reducer, trying to make DDP failure recoverable. pietern kuttas and I discussed the following solutions: #### Solution 1 Keep `add_post_hook` API intact, and do a `dynamic_cast` in `del_post_hook` to check hook type. If the type matches Reducer's hook, delete it. As pietern mentioned, this will not work if we create multiple DDP instances from the same original model. #### Solution 2 Use a counter to generate a unique key for every hook in `Function`, and keep them in a map. return the key to the caller of `add_post_hook`, and ask the caller to provide key if it needs to delete the hook. Con: this would add extra overhead to `add_post_hook` and every `Function` object. #### Solution 3 [Current implementation] kuttas suggests that, instead of generating a unique key, directly using the address of the pointer would be better. In order to avoid messing up dereferencing, let `add_post_hook` to return a `uintptr_t`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21591 Differential Revision: D15745706 Pulled By: mrshenli fbshipit-source-id: e56d2d48de0c65f6667790ab16337eac7f7d8b76	2019-06-12 07:08:28 -07:00
Edward Yang	1e4af2b969	Pin torchvision version. (#20811 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20811 ghimport-source-id: 52e043453272d8441a2c0efd7f005b71ded024d6 Differential Revision: D15779416 Pulled By: ezyang fbshipit-source-id: 1b3c2d9aeab57e580038f0c2a8bfbfcae9d7b62a	2019-06-12 06:16:20 -07:00
Jongsoo Park	1ffa9d3d3b	correct measure quantization error when followed_by=Relu and dequantize_output=1 (#21664 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21664 As title Reviewed By: csummersea Differential Revision: D15770947 fbshipit-source-id: 57f5842e1a250300703b02134c314e4f06b767b8	2019-06-11 23:36:15 -07:00
James Reed	c2a18a6702	Override print when python is present (#21625 ) Summary: This makes it so we can see the output of prim::Print in environments like iPython notebooks which override sys.stdout Pull Request resolved: https://github.com/pytorch/pytorch/pull/21625 Differential Revision: D15756793 Pulled By: jamesr66a fbshipit-source-id: 7d9a14b2e229ed358e784318e9d862677db2c461	2019-06-11 22:58:22 -07:00
Elias Ellison	aa7e27fa70	Emit Loop Condition as Separate Block (#21611 ) Summary: Emit loop condition as a separate block in loops, then inline them before conversion to SSA. This is needed for breaks & continues where we will inline the condition block after the continue pass and before the break pass. I also considered emitting a prim::For and a prim::While, but i think it's easier to just have one pathway. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21611 Differential Revision: D15775820 Pulled By: eellison fbshipit-source-id: de17c5e65f6e4a0256a660948b1eb630e41b04fb	2019-06-11 22:03:26 -07:00
Mingzhe Li	341a7e4bb5	Fix issue in backward path (#21663 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21663 as title Reviewed By: hl475 Differential Revision: D15770793 fbshipit-source-id: b3d0dd030237c4d62bddc388984a273153fac4a6	2019-06-11 21:09:25 -07:00
Jongsoo Park	afd202be9f	StoreMatrixInMatrixMarketFormat can store both integer and float tensors (#21606 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21606 StoreMatrixInMatrixMarketFormat was able to dump quantized tensors only but sometimes we want to dump float tensors. Reviewed By: csummersea Differential Revision: D15741611 fbshipit-source-id: 95b03c2fdf1bd8407f7d925171d9dc9f25677464	2019-06-11 17:28:19 -07:00
Lu Fang	c2a08d339b	Automatic update of fbcode/onnx to dd599b05f424eb161a31f3e059566a33310dbe5e (#21641 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21641 Previous import was 5160f3ac3380302224998f1c95e111cd961c4bc5 Included changes: - [dd599b05](https://github.com/onnx/onnx/commit/dd599b05): Fix type s/depracted/deprecated/ (#2092) <Takeshi Watanabe> - [abb1702a](https://github.com/onnx/onnx/commit/abb1702a): Add shape inference for Tile op (#2076) <Hariharan Seshadri> - [67638d9c](https://github.com/onnx/onnx/commit/67638d9c): [New Operator] Round (#2053) <Jeff Saremi> - [584e4477](https://github.com/onnx/onnx/commit/584e4477): Add dilations support in ConvTranspose shape inference and update docs (#2068) <daquexian> Reviewed By: zrphercule Differential Revision: D15762382 fbshipit-source-id: 590f25fb733e1565eb90fcdeb797b0ba34e2d2c3	2019-06-11 16:54:47 -07:00
Will Feng	968114ae3d	Revert D15769256: [jit] Add python string standard lib Differential Revision: D15769256 Original commit changeset: 1af487446361 fbshipit-source-id: 96bea4a49664dad68762bef75ae28e64c673f8b1	2019-06-11 16:54:43 -07:00
Brennan Vincent	039629cedd	fix incorrect use of TeX in docs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21649 Differential Revision: D15766392 Pulled By: umanwizard fbshipit-source-id: a362ec06e971ee12c47a45bc9c15cc773ec878e3	2019-06-11 16:19:40 -07:00
Mikhail Zolotukhin	1bd21d3f14	test_jit: Remove tests checking non-guaranteed properties from 'test_insert_observers'. (#21657 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21657 ghimport-source-id: e9c7e45c00db55bf3b7895d06d77f0d99bfc1afe Differential Revision: D15769295 Pulled By: ZolotukhinM fbshipit-source-id: cfb40bc5d7116b1d99f5e0f5c4f5577f5aa33804	2019-06-11 16:12:09 -07:00
Daya Khudia	ee33afe2b1	randomized testing for qconv (#21436 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21436 Test many different options Reviewed By: zafartahirov Differential Revision: D15683754 fbshipit-source-id: 60d0fc697b53c7e4adadbe80995d45f28729bca4	2019-06-11 16:07:22 -07:00
Natalia Gimelshein	cf5c3bb3fe	make range functions respect current stream (#21619 ) Summary: Stream is not respected on range/linspace/logspace functions, which contributes to https://github.com/pytorch/pytorch/issues/21589 (this is not a complete solution for that issue). Pull Request resolved: https://github.com/pytorch/pytorch/pull/21619 Differential Revision: D15769666 Pulled By: ezyang fbshipit-source-id: 7c036f7aecb3119430c4d432775cad98a5028fa8	2019-06-11 15:46:48 -07:00
Bram Wasti	9241c4b3c6	Add python string standard lib (#21656 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21656 ghimport-source-id: cc7d7f68e33e95a97f6274c50823138aa4bacabb Differential Revision: D15769256 Pulled By: bwasti fbshipit-source-id: 1af487446361d90d03dce004c3e2169a3e62667d	2019-06-11 15:23:23 -07:00
vishwakftw	9737b166a4	Fix bug in multinomial_alias_draw (#21324 ) Summary: An incorrect increment / decrement caused the samples to not be generated from a multinomial distribution Changelog: - Remove the incorrect increment / decrement operation Fixes https://github.com/pytorch/pytorch/issues/21257, fixes https://github.com/pytorch/pytorch/issues/21508 cc: LeviViana neerajprad Pull Request resolved: https://github.com/pytorch/pytorch/pull/21324 Differential Revision: D15761029 Pulled By: colesbury fbshipit-source-id: 2aeb51e2d3cfdb8356806a7d5b12d4b9910e37fb	2019-06-11 15:18:17 -07:00
Ejaaz Merali	fb9fbc009c	Fix momentum bug in CyclicLR (#20401 ) Summary: Resolves issue https://github.com/pytorch/pytorch/issues/19003 The author of this issue also asked that `cycle_momentum` default to `False` if the optimizer does not have a momentum parameter, but I'm not sure what the best way to do this would be. Silently changing the value based on the optimizer may confuse the user in some cases (say the user explicitly set `cycle_momentum=True` but doesn't know that the Adam optimizer doesn't use momentum). Maybe printing a warning when switching this argument's value would suffice? Pull Request resolved: https://github.com/pytorch/pytorch/pull/20401 Differential Revision: D15765463 Pulled By: ezyang fbshipit-source-id: 88ddabd9e960c46f3471f37ea46013e6b4137eaf	2019-06-11 15:10:28 -07:00
Bram Wasti	cdbc20677c	Add len to OrderedDict types (#21651 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21651 ghimport-source-id: 0bba5c1930865e2d18b18782ba8c8990b0761d4d Differential Revision: D15767795 Pulled By: bwasti fbshipit-source-id: 70e27176897b0f977c9034ffb3ad21091c91e12e	2019-06-11 14:53:40 -07:00
Will Feng	7a040f4b0b	Revert D15706021: [jit] Support for type annotations instead of torch.jit.annotate() Differential Revision: D15706021 Original commit changeset: 8bf1459f229d fbshipit-source-id: 7ae34578560e2dccd0f04af2220445b3999771fe	2019-06-11 14:33:28 -07:00
Vitaly Fedyunin	b46e87cd3d	Fix catch block to fix 'error: catching polymorphic type' (#21637 ) Summary: Fix catch block to fix 'error: catching polymorphic type `class c10::Error` by value [-Werror=catch-value=]' Pull Request resolved: https://github.com/pytorch/pytorch/pull/21637 Differential Revision: D15761860 Pulled By: VitalyFedyunin fbshipit-source-id: befc18a9c217440381cdb50a1319b0b5db5710e9	2019-06-11 12:30:52 -07:00
Sergey Zagoruyko	dd439bc39e	Rename hubconf.conf to hubconf.py in docs (#21631 ) Summary: It's a typo I guess. cc fmassa Pull Request resolved: https://github.com/pytorch/pytorch/pull/21631 Differential Revision: D15764909 Pulled By: soumith fbshipit-source-id: 5ffc7bde181c13e151332e7de3c0da36505b495e	2019-06-11 12:22:43 -07:00
davidriazati	bbcd6cc782	Support for type annotations instead of torch.jit.annotate() (#21390 ) Summary: This adds support for PEP 526 style annotations on assignments in place of `torch.jit.annotate()`, so ```python a = torch.jit.annotate(List[int], []) ``` turns into ```python a : List[int] = [] ``` ](https://our.intern.facebook.com/intern/diff/15706021/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/21390 Pulled By: driazati Differential Revision: D15706021 fbshipit-source-id: 8bf1459f229d5fd0e16e59953b9656e85a2207fb	2019-06-11 12:03:57 -07:00
Shen Li	25d1496d58	Fix Process Group for tensors shared across processes (#21449 ) Summary: Ops on a Process Group (pg) instance will hit an error when input/output tensors are created on a different process, because, pg calls `recordStream` on `CUDACachingAllocator` which only knows tensors created within the same process. The proposed solution is to add a `suppressError` arg (suggestions for better names?) to `recordStream`. See comments in code for arguments. CC pichuang1984 Pull Request resolved: https://github.com/pytorch/pytorch/pull/21449 Differential Revision: D15689736 Pulled By: mrshenli fbshipit-source-id: e7fc81b167868f8666536067eaa7ae2c8584d88e	2019-06-11 11:50:25 -07:00
Ailing Zhang	50ee1f3fa7	better error msg when seeing a unsupported builtin function (#21068 ) Summary: fixes https://github.com/pytorch/lockdown/issues/39. Hopefully it doesn't break other tests.... Pull Request resolved: https://github.com/pytorch/pytorch/pull/21068 Differential Revision: D15761895 Pulled By: ailzhang fbshipit-source-id: 60cbb16cfc930b377d753b81e10b7edaea9a1281	2019-06-11 11:32:44 -07:00
Brian Johnson	4610347fdf	Breaks up NN module in docs so it loads faster. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21291 Differential Revision: D15760935 Pulled By: ezyang fbshipit-source-id: 114da4c52b78949e631e9adcae4eb620546124fb	2019-06-11 09:38:41 -07:00
Hong Xu	646a7f99bb	Move management of calls of "cmake --build" to setup_helper/cmake.py and refactor as a CMake class Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21493 Differential Revision: D15759279 Pulled By: ezyang fbshipit-source-id: 157e1de36f1c5a51caf2a25b363a94369c442012	2019-06-11 07:04:05 -07:00
Richard Zou	835a6b9da2	Fix namedtensor build (#21609 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21609 ghimport-source-id: 648a0bcd28db2cdda1bf2fa6a904ca8f851088c2 Differential Revision: D15747687 Pulled By: zou3519 fbshipit-source-id: 2a972a15fa7399391617fc6e6b19879b86568c3a	2019-06-11 06:53:50 -07:00
Jinghui	29c849ff34	implement transpose operator for MKLDNN (#19955 ) Summary: implement transpose operator for MKLDNN 1. upgrade mkldnn-bridge to support ND transpose 2. implement transpose operator in caffe2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19955 Differential Revision: D15701832 Pulled By: bddppq fbshipit-source-id: e4337cd0ba6f8180a35c8c70cbb6830a0a84182f	2019-06-11 01:55:13 -07:00
Gu, Jinghui	731670f40a	upgrade mkldnn-bridge (#20569 ) Summary: 1. reduce the overhead of mkldnn-bridge itself 2. remove redundant code and useless APIs 3. provide new operators, including int8 inner_product, ND permute/transpose, elem_add/mul, and etc. 4. improve inner_product to support io format weights without implicit reorder 5. add SoftMax support Pull Request resolved: https://github.com/pytorch/pytorch/pull/20569 Reviewed By: houseroad Differential Revision: D15558663 Pulled By: bddppq fbshipit-source-id: 79a63aa139037924e9ffb1069f7e7f1d334efe3a	2019-06-11 00:47:11 -07:00
Mingzhe Li	f2623c74a9	add PT pointwise unary ops to the benchmark suite (#21207 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21207 This diff adds 80 PT pointwise unary ops to the benchmark suite. Most of the ops are added using the generate_pt_tests_from_list interface. The rest are handled separately. Reviewed By: zheng-xq Differential Revision: D15471597 fbshipit-source-id: 8ea36e292a38b1dc50f064a48c8cd07dbf78ae56	2019-06-10 21:35:44 -07:00
Mingzhe Li	4e3c97a0be	add separate path for op with JIT (#21210 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21210 This diff introduces a new path to run op with JIT. There are two steps involved here: 1. Users need to script the op. This should happen in the `init` method. 2. The generated graph from step1 is passed to `jit_forward` which will be executed by the benchmark backend Reviewed By: zheng-xq Differential Revision: D15460831 fbshipit-source-id: 48441d9cd4be5d0acebab901f45544616e6ed2ee	2019-06-10 19:53:58 -07:00
Roy Li	a82feee07c	Method-only entries in native functions should have self as first argument (#21549 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21549 ghimport-source-id: a98fd7a18b4c523d9facb328a3b80a35416834ce Differential Revision: D15724794 Pulled By: li-roy fbshipit-source-id: a0f218cf6fd32d9694921685fc805d868156fce3	2019-06-10 19:32:34 -07:00
Bram Wasti	fff22125a5	AT_CHECK -> TORCH_CHECK (#21547 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21547 ghimport-source-id: d99d3fdcd9abde4e1126716d32ed05aaf8508c50 Differential Revision: D15747676 Pulled By: bwasti fbshipit-source-id: ae9824436e8316e2d0002d2973df4833a18c5f23	2019-06-10 16:58:09 -07:00
Sebastian Messmer	f5c24fc66d	Deprecate torch::jit::RegisterOperators (#21552 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21552 Original commit changeset: a142c22be3fd https://github.com/pytorch/pytorch/pull/21368 got reverted because of a MSVC issue. This commit re-introduces that change and fixes the MSVC issue. Differential Revision: D15727526 fbshipit-source-id: 8eb0eb9a7108dc049911b79342c364ac1b8623c8	2019-06-10 16:52:24 -07:00
Michael Suo	cab3e726df	Split out Function into its own file (#21539 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21539 ghimport-source-id: f1e4396a0bec6e30d3179f926ec4da68807942f7 Differential Revision: D15741979 Pulled By: suo fbshipit-source-id: 4cd0ed36bcbf8db0b36a101dda6f58975f806889	2019-06-10 16:37:58 -07:00
Mingzhe Li	512c9d8c76	add PT gather op to the benchmark suite (#21614 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21614 as title Reviewed By: kimishpatel Differential Revision: D15525115 fbshipit-source-id: 6a17e1d791bdb432cc3d51e45c5e82b96268127d	2019-06-10 16:31:52 -07:00
Sebastian Messmer	32a0440209	Publish torch::Dict and torch::OperatorKernel (#20723 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20723 These classes already existed but only as c10::Dict and c10::OperatorKernel. Since they're now part of torch::RegisterOperators(), they should also live in the torch namespace. Differential Revision: D15421575 fbshipit-source-id: d64ebd8664fadc264bbbae7eca1faa182529a32b	2019-06-10 16:19:42 -07:00
Shen Li	a93a1ccbb3	Run test_c10d.py in multi-gpu environment (#21598 ) Summary: yf225 helped me discovered that our CI does not run multi-gpu tests in `test_c10d.py`. There are quite a few multi-gpu c10d tests. This PR tries to enable those tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21598 Differential Revision: D15744256 Pulled By: mrshenli fbshipit-source-id: 0a1524a862946128321f66fc8b7f331eff10e52a	2019-06-10 15:58:38 -07:00
Cheng,Penghui	74f6c55f0f	support negative axis in concat and split operators Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17955 Differential Revision: D14476031 Pulled By: ezyang fbshipit-source-id: e0e57e8595ed2005ded9e923572a40fe62aca5a7	2019-06-10 15:26:29 -07:00
Edward Yang	3889855a5b	Revert "Redefine scheduler to set learning rate using recursive formula" #14010 (#21463 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21463 ghimport-source-id: 1b0ea4a282b41388d5c6f6a5d18d37c14ae874ad Differential Revision: D15747426 Pulled By: ezyang fbshipit-source-id: 0708394f907b98a9f45bcfa26e5cc450fda8cf76	2019-06-10 15:26:25 -07:00
Stefan Krah	8b9b215dc5	Add a 'dim' argument to nuclear norm (#21022 ) Summary: Addresses #18275. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21022 Differential Revision: D15743515 Pulled By: ezyang fbshipit-source-id: e4aaea0bd7f863a2abad45c4322d6a9fb02a88e3	2019-06-10 15:18:34 -07:00
Kartikey Pandey	2378c120e6	Implements divmod function (#20979 ) Summary: This PR refer to issue #18627 Pull Request resolved: https://github.com/pytorch/pytorch/pull/20979 Differential Revision: D15743929 Pulled By: wanchaol fbshipit-source-id: 967fc3fd519501e427176e10b112c8be1390540b	2019-06-10 15:00:56 -07:00
eellison	8a88d33103	Uninitialized Ivalue (#21387 ) Summary: Create an uninitialized ivalue. This will be needed for Breaks & Continues to match up if block outputs of values that are guaranteed not to be used but need to escape the block scope. It is not exposed to users. Was previously part of final returns but I was asked to make a separate PR for it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21387 Differential Revision: D15745124 Pulled By: eellison fbshipit-source-id: ae6a6f766b4a70a71b9033987a630cfbf044e296	2019-06-10 14:51:24 -07:00
Gregory Chanan	dd0ffd6864	Use schema string specification in derivatives.yaml. (#20916 ) Summary: For consistency, derivatives.yaml now uses the same schema specification as native_functions.yaml. Note that there are some small downsides, e.g. changing the default values or return parameter names in native_functions.yaml also now requires updating derivatives.yaml as well. But this has a few nice properties: 1) Able to copy-paste definitions from native_functions to derivatives. 2) Makes it impossible to write derivatives for operators without schemas (e.g. old TH operators). 3) Moves us closer to the ideal situation of co-locating forward and backwards declarations. Note that this doesn't change any generated code; in particular, this has the same behavior of mapping in-place and out-of-place definitions together. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20916 Differential Revision: D15497800 Pulled By: gchanan fbshipit-source-id: baee5caf56b675ce78dda4aaf6ce6a34575a6432	2019-06-10 13:47:55 -07:00
Sebastian Messmer	5f25a252d6	Allow tensors with requires_grad=True in c10 ops (#21599 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21599 We prevented this because c10 ops can't have a backwards yet and calling them with requires_grad=True would do the wrong thing if the c10 op is not purely implemented by calling other autograd-able ops. However, it is a valid use case to have c10 ops that just call other autograd-aware ops, and these ops should be callable with requires_grad=True. This should fix https://github.com/pytorch/pytorch/issues/21584. Differential Revision: D15744692 fbshipit-source-id: ba665365c850ef63fc9c51498fd69afe49e5d7ec	2019-06-10 13:37:06 -07:00
Sam Gross	5a48642fde	Revert D15717575: [pytorch][PR] Fix bug in multinomial_alias_draw Differential Revision: D15717575 Original commit changeset: b1154e226d42 fbshipit-source-id: 305ca010bfda88c9295c52e0626d867452c72f84	2019-06-10 13:28:11 -07:00
Michael Suo	4fb302eb34	fix optional type promotion for classes (#21593 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21593 ghimport-source-id: f68730618bccf2326218e08d0a2a70171fdd8921 Differential Revision: D15741471 Pulled By: suo fbshipit-source-id: 7ac1a0f6d9d2ff4bc819caff43a7a5b6d37cbc98	2019-06-10 12:51:00 -07:00
Michael Suo	a436822c40	Consider contained types in alias analysis (#21431 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21431 ghimport-source-id: d86ce974a065ec572e71cfa14a8f6bdf48216da7 Reviewed By: jamesr66a Differential Revision: D15718560 Pulled By: suo fbshipit-source-id: a36ce907ab26be22f12bab6175797fe8b34721f1	2019-06-10 12:42:10 -07:00
Michael Suo	bb4aff2680	cleanups to memory_dag (#21430 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21430 ghimport-source-id: 2dc5a0df8512e796c12d65d3ecc5981638122ce6 Reviewed By: jamesr66a Differential Revision: D15718561 Pulled By: suo fbshipit-source-id: 1ef31c08c8a757b632451eb07a47a8227e76c67f	2019-06-10 12:42:06 -07:00
Michael Suo	ae144032aa	cleanups to alias analysis interfaces (#21397 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21397 ghimport-source-id: 8733e1af2fe66a3f4494a2c24c82a039375a982e Reviewed By: jamesr66a Differential Revision: D15642662 Pulled By: suo fbshipit-source-id: ae66b7b4f19f255d6fe0e7e804bd0df6d86cb8d1	2019-06-10 12:42:02 -07:00
Michael Suo	ddac8da813	avoid calling front() on empty working set (#21396 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21396 ghimport-source-id: 7e57282099d2fd57c58c990b51ae933e427aecb2 Reviewed By: jamesr66a Differential Revision: D15642663 Pulled By: suo fbshipit-source-id: f9b467ba53f03438879bf3929da522aabaff2343	2019-06-10 12:41:58 -07:00
vishwakftw	bb1dbdb99b	Fix bug in multinomial_alias_draw (#21324 ) Summary: An incorrect increment / decrement caused the samples to not be generated from a multinomial distribution Changelog: - Remove the incorrect increment / decrement operation Fixes #21257, fixes #21508 cc: LeviViana neerajprad Pull Request resolved: https://github.com/pytorch/pytorch/pull/21324 Differential Revision: D15717575 Pulled By: ezyang fbshipit-source-id: b1154e226d426c0d412d360c15f7c64aec95d101	2019-06-10 12:05:48 -07:00
Nikolay Korovaiko	30d6933016	BailOut Graphs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21381 Differential Revision: D15724412 Pulled By: Krovatkin fbshipit-source-id: 18e4a1916c7cd1baea76953d0087d6257e58c55b	2019-06-10 11:49:38 -07:00
Vishwak Srinivasan	3df5a46a99	Skip triangular_solve CUDA test on non-default stream Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21590 Differential Revision: D15742549 Pulled By: ezyang fbshipit-source-id: fd5b2cbce86e5f229c2ffba114ef362934296d07	2019-06-10 11:38:42 -07:00
Elias Ellison	6f99bcda8a	fix test (#21594 ) Summary: test that wasn't on the CI, but is tested internally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21594 Differential Revision: D15742157 Pulled By: eellison fbshipit-source-id: 11fc82d1fc0281ffedd674ed96100e0c783c0599	2019-06-10 11:23:18 -07:00
fehiepsi	91ea2cd5a7	clip sigmoid to prevent transforms return inf/nan values (#20288 ) Summary: This PR addresses some numerical issues of Sigmoid/StickBreakingTransform, where these transforms give +-inf when the unconstrained values move to +-20 areas. For example, with ``` t = torch.distributions.SigmoidTransform() x = torch.tensor(20.) t.inv(t(x)), t.log_abs_det_jacobian(x, t(x)) ``` current behaviour the inverse will return `inf` and logdet return `-inf` while this PR makes it to `15.9424` and `-15.9424`. And for ``` t = torch.distributions.StickBreakingTransform() x = torch.tensor([20., 20.]) t.inv(t(x)), t.log_abs_det_jacobian(x, t(x)) ``` current value is `(inf, nan)` and `-inf` for logdet, while this PR makes it `[16.6355, 71.3942]` and `-47.8272` for logdet. Although these finite values are wrong and seems unavoidable, it is better than returning `inf` or `nan` in my opinion. This is useful in HMC where despite that the grad will be zero when the unconstrained parameter moves to unstable area (due to clipping), velocity variable will force the parameter move to another area which by chance can move the parameter out of unstable area. But inf/nan can be useful to stop doing inference early. So the changes in this PR might be inappropriate. I also fix some small issues of `_Simplex` and `_RealVector` constraints where batch shape of the input is not respected when checking validation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20288 Differential Revision: D15742047 Pulled By: ezyang fbshipit-source-id: b427ed1752c41327abb3957f98d4b289307a7d17	2019-06-10 11:16:31 -07:00
Haixin Liu	4bdbd30b96	Add python binding to deserialize blob (#21532 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21532 Add python binding to deserialize blob Reviewed By: yinghai Differential Revision: D15706816 fbshipit-source-id: f498c7e0f7392f055b13810bbf81cba59f25e1d2	2019-06-10 10:49:21 -07:00
Elias Ellison	e4fae884f6	Change compiler to use Load/Stores, then transform to SSA (#21101 ) Summary: This changes our compiler so it first emits Loads & Stores, and then transforms the graph to SSA in a follow up pass. When a variable is set, we emit a prim::Store, and when a variable is referenced, we emit a prim::Load. ``` a = 1 print(a) ``` becomes: ``` %a.1 : int = prim::Constant[value=1]() prim::Store[name="a"](%a.1) %a : int = prim::Load[name="a"]() prim::Print(%a) ``` In the follow up pass, convertToSSA, the values are turned into SSA form with the Loads & Stores removed. This change will enable breaks and continues because you can transform the graph with the variable naming information still intact. There are still some remaining jitter and edge cases issues that I have to look through, but I think is still ready for eview. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21101 Differential Revision: D15723353 Pulled By: eellison fbshipit-source-id: 3269934d4bc24ddaf3a87fdd20620b0f954d83d0	2019-06-10 10:26:43 -07:00
Ailing Zhang	1e6c99a6e0	update hub doc (#21568 ) Summary: update doc as pointed out in https://github.com/pytorch/hub/pull/22 Pull Request resolved: https://github.com/pytorch/pytorch/pull/21568 Differential Revision: D15732927 Pulled By: ailzhang fbshipit-source-id: 78ab026539e5ee59e7c3a8144e2c9fcbbc225733	2019-06-10 09:39:35 -07:00
mal	f308b07e8c	Don't leak threads on exit (#21438 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21438 ghimport-source-id: 33f145f5b3508163365442c22a223c4a44e677d8 Differential Revision: D15738856 fbshipit-source-id: 656e8d0e3d0d22f116e3ab66bf0282608d6f1a76	2019-06-10 09:14:13 -07:00
Jongsoo Park	c294d64eff	fix concat and split tensor inference function (#21382 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21382 Concat tensor inference function was not handling correctly the case where axis argument points to the last dimension so input tensors don't need to have the same number of dimensions. Split tensor inference function was not handling correctly the case where split information is provided as the second input tensor rather than as an argument. Reviewed By: mdschatz Differential Revision: D15633148 fbshipit-source-id: d566af44dc882457ee9efe83d2461b28408c2c5d	2019-06-10 08:23:53 -07:00
Malvika Joshi	9deab0cf0e	Documentation for locking discipline in engine.cpp/.h (#21548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21548 Added documentation as titled. Reviewed By: ezyang Differential Revision: D15723146 fbshipit-source-id: fab4a35c62f07256673318c0874701f7628b2f7a	2019-06-10 07:50:01 -07:00
Richard Zou	547fcaa977	Add named_guard to native_functions options (#21373 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21373 ghimport-source-id: acab6d3ab0b287d504afa98eaefa2aed6fe99453 Differential Revision: D15717925 Pulled By: zou3519 fbshipit-source-id: 8515c448b368be79f71681833b5edf960da44fe8	2019-06-10 07:29:41 -07:00
Richard Zou	8ffcbfb7d4	Add unique_ptr<NamedTensorMeta> field to TensorImpl (#21341 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21341 ghimport-source-id: 06021b06864746571a904a1cfc0aaea5f8a12325 Differential Revision: D15717907 Pulled By: zou3519 fbshipit-source-id: 48ee76cf2f11a8b092be75ecac8d5faee68ca0d9	2019-06-10 07:29:36 -07:00
peter	f9c4d0d7a9	Fix NVTX path on Windows Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21580 Differential Revision: D15738060 Pulled By: ezyang fbshipit-source-id: 05a2e97279816753d574678252bf9b35913c99b1	2019-06-10 06:05:44 -07:00
Xing Wang	c4e0d61646	Regularization is not supported in FP16 (#21319 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21319 Add assertion to raise Exception when Regularization is applied on FP16. Reviewed By: bddppq Differential Revision: D15528486 fbshipit-source-id: c887c90d1d9ccfdaded3b5fa16816c6f29910e2e	2019-06-09 23:59:48 -07:00
Iurii Zdebskyi	b1bf16eeab	Enabled _th_ixor_ and _th_equal for bool (#21538 ) Summary: Following up on the feedback in this [PR](https://github.com/pytorch/pytorch/pull/21113/files?file-filters%5B%5D=.cwrap&owned-by%5B%5D=) Pull Request resolved: https://github.com/pytorch/pytorch/pull/21538 Differential Revision: D15721390 Pulled By: izdeby fbshipit-source-id: 1b5265bf8726c1051f306f7674d731e25a6c7d03	2019-06-09 15:28:38 -07:00
杨培文 (Yang Peiwen)	e447a733a1	Update module.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21570 Differential Revision: D15732665 Pulled By: ezyang fbshipit-source-id: caa12a8619ad1396540f787b5c849d29cc5b03bd	2019-06-09 15:28:35 -07:00
Zachary DeVito	04e6564f0c	clean up the TracingState API (#21564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21564 ghimport-source-id: b6f71e2238f6f7c8de6cfbf6969a5e08e07be46c Reviewed By: suo Differential Revision: D15729497 Pulled By: zdevito fbshipit-source-id: aacfea6058fadb572df692aa9ebd6cab0bcd03fc	2019-06-09 15:28:32 -07:00
Zachary DeVito	69aa2b2814	Collapse tracing_state.h into tracer.h (#21563 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21563 ghimport-source-id: de87e5e621da33326a9d2cb8a57d82d355166479 Reviewed By: suo Differential Revision: D15729499 Pulled By: zdevito fbshipit-source-id: 17b3e2e71d004f08c4413e80091388ae9ac2df2b	2019-06-09 15:28:29 -07:00
Zachary DeVito	ea822d9626	Interpreter support for CallFunction/CallMethod (#21562 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21562 ghimport-source-id: 17e5e183f730f50d97ef48973aafc6249d54978f Reviewed By: suo Differential Revision: D15729500 Pulled By: zdevito fbshipit-source-id: efa8a133b617b1498810392a8da6b513ce00b5eb	2019-06-09 15:28:26 -07:00
Zachary DeVito	ad0c08f950	Expose ExecutionPlan in prep for function calls (#21561 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21561 ghimport-source-id: 4bf28d8140610a0cefef0c0a17f0a513ae855dde Reviewed By: suo Differential Revision: D15729498 Pulled By: zdevito fbshipit-source-id: b26458336da1efaba71d8a577c3917c6622dae0d	2019-06-09 15:28:22 -07:00
Zachary DeVito	de31f6719c	Add flag to temporarily enable first class modules (#21560 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21560 ghimport-source-id: a555ca33fcd3efd1147aaf90f26a8e63da1c1a67 Reviewed By: suo Differential Revision: D15729502 Pulled By: zdevito fbshipit-source-id: d6c11472bfc791e2ad1e9aa695b0439d72b79681	2019-06-09 15:28:19 -07:00
Zachary DeVito	18996a8952	unfinished push/pop reduction (#21559 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21559 ghimport-source-id: 81ba4a5638577781e1ea706599966c033c37e814 Reviewed By: suo Differential Revision: D15729501 Pulled By: zdevito fbshipit-source-id: 3423bff61e89617c40078d5fab726b77d21bfa27	2019-06-09 15:28:16 -07:00
Zachary DeVito	13edda417d	Prepare interpreter for function calling (#21558 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21558 ghimport-source-id: a8a19dbefea869ca1401e5afea6c02f31f95b99a Reviewed By: suo Differential Revision: D15729491 Pulled By: zdevito fbshipit-source-id: 9629664608a2379a2ddcafaf741fa8463c4fb917	2019-06-09 15:28:13 -07:00
Charles Lovering	8ae7b1c486	Update functional.py doc (#21510 ) Summary: - Fixes a typo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21510 Differential Revision: D15731277 Pulled By: ezyang fbshipit-source-id: c3f8e110f5c61e797b857477b495168ea8d63cd5	2019-06-09 15:28:09 -07:00
Brennan Vincent	74828be4a7	fix segfault in `cat` on CPU with tensors that can't be indexed with 32-bit ints. (#21530 ) Summary: Should be self-explanatory. This `int` variable is overflowing. Reported in #21526 Pull Request resolved: https://github.com/pytorch/pytorch/pull/21530 Differential Revision: D15719275 Pulled By: umanwizard fbshipit-source-id: 24e917a00a5b78bc3af29ef3b8b72eea7e89d5d5	2019-06-09 15:28:06 -07:00
Xiaomeng Yang	406374657a	Optimize batch mm op when broadcast the second input (#21556 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21556 Optimize batch mm op when broadcast the second input Reviewed By: houseroad Differential Revision: D15728914 fbshipit-source-id: c60441d69d4997dd32a3566780496c7ccda5e67a	2019-06-09 15:28:03 -07:00
Zachary DeVito	d71501259b	Revert D15572818: Prepare interpreter for function calling Differential Revision: D15572818 Original commit changeset: 3a9b5f053664 fbshipit-source-id: b932411e8e88c7414c8db332d6049fe4e26bd83e	2019-06-07 22:20:54 -07:00
Zachary DeVito	d4bcab0dba	Revert D15590900: Reduce number of stack manipulation instructions in interpreter. Differential Revision: D15590900 Original commit changeset: 98829979feba fbshipit-source-id: eb7f1d396bb2b98d2852af81c69db81430eba33c	2019-06-07 22:20:50 -07:00
Zachary DeVito	03641413e5	Revert D15600068: Add flag to temporarily enable first class modules Differential Revision: D15600068 Original commit changeset: 9b68e23d7f8b fbshipit-source-id: 45f36b3aaa4f1c457c27490579496456cbbc680b	2019-06-07 22:20:47 -07:00
Zachary DeVito	e616a5e8b8	Revert D15600067: Expose ExecutionPlan in prep for function calls Differential Revision: D15600067 Original commit changeset: 82b7de458dd6 fbshipit-source-id: ca26a362cd73bdb9e8c4eba15dd5c10986fa79fe	2019-06-07 22:20:44 -07:00
Zachary DeVito	bfb235b8c9	Revert D15618275: Interpreter support for CallFunction/CallMethod Differential Revision: D15618275 Original commit changeset: 038ae27e5416 fbshipit-source-id: 8dbe0f564ba103fe445dacc471085c659171705f	2019-06-07 22:20:40 -07:00
Zachary DeVito	c27cabe2d7	Revert D15719982: Collapse tracing_state.h into tracer.h Differential Revision: D15719982 Original commit changeset: 56bb021dd949 fbshipit-source-id: 2eb3e2c9745c35a84ebcc0fc7ac62b5f1fdd6437	2019-06-07 22:20:37 -07:00
Zachary DeVito	3cfe914191	Revert D15719980: clean up the TracingState API Differential Revision: D15719980 Original commit changeset: 3de2746c3f3c fbshipit-source-id: 4610e215936b2476a0271355ef3b8f1f480bdea8	2019-06-07 22:20:34 -07:00
Zachary DeVito	dd0faf4366	clean up the TracingState API (#21514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21514 ghimport-source-id: 6a9b6fdd7e696ea29e8715482708efe897230e4d Reviewed By: jamesr66a Differential Revision: D15719980 Pulled By: zdevito fbshipit-source-id: 3de2746c3f3c3de4111b4cb73f4c4acedbf28862	2019-06-07 20:57:05 -07:00
Zachary DeVito	8c5f3acfc0	Collapse tracing_state.h into tracer.h (#21513 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21513 ghimport-source-id: 86278929818a8fc65684bd8f2ffac31460772fe9 Reviewed By: jamesr66a Differential Revision: D15719982 Pulled By: zdevito fbshipit-source-id: 56bb021dd949668562ea481c5ff0115a9ea2b02e	2019-06-07 20:57:01 -07:00
Zachary DeVito	5f6afafdef	Interpreter support for CallFunction/CallMethod (#21325 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21325 ghimport-source-id: eeca1176f5e00c85a69cd016acccf5105e670e02 Reviewed By: jamesr66a Differential Revision: D15618275 Pulled By: zdevito fbshipit-source-id: 038ae27e5416f1ce338009627c839a4d61a00658	2019-06-07 20:56:58 -07:00
Zachary DeVito	1517ff66a1	Expose ExecutionPlan in prep for function calls (#21273 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21273 ghimport-source-id: b92c1e07fbe4122467a21b98d29635295093e0c2 Reviewed By: jamesr66a Differential Revision: D15600067 Pulled By: zdevito fbshipit-source-id: 82b7de458dd65c175f55b0f383bfc3fcf4704032	2019-06-07 20:56:55 -07:00
Zachary DeVito	7e08bc42d5	Add flag to temporarily enable first class modules (#21272 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21272 ghimport-source-id: 43e73d1b93ccbe0dd6845eb3f7444c9d0abd444b Reviewed By: jamesr66a Differential Revision: D15600068 Pulled By: zdevito fbshipit-source-id: 9b68e23d7f8b6046a5a0d6d9fd16138ac384b863	2019-06-07 20:56:52 -07:00
Zachary DeVito	dde27958dd	Reduce number of stack manipulation instructions in interpreter. (#21240 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21240 ghimport-source-id: 5e9cbe8b3df3ac721135d2f652a420ae0b14ac55 Reviewed By: jamesr66a Differential Revision: D15590900 Pulled By: zdevito fbshipit-source-id: 98829979feba23685f0ba98ba3cb840157f7259a	2019-06-07 20:56:49 -07:00
Zachary DeVito	c53e4d012d	Prepare interpreter for function calling (#21185 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21185 ghimport-source-id: 6b9cb92d1f1f59bb980dcfa0d29dfe985ee955d1 Reviewed By: jamesr66a Differential Revision: D15572818 Pulled By: zdevito fbshipit-source-id: 3a9b5f053664c09212b97f1391d8d006337b5550	2019-06-07 20:56:46 -07:00
Edward Yang	c36dc35853	Revert D15576968: Turn on Werror for deprecated-declarations. Differential Revision: D15576968 Original commit changeset: fb73a8986a5b fbshipit-source-id: 1ae19afc6816f764b895a47162728433a319ac0b	2019-06-07 19:15:56 -07:00
James Reed	b849f101b1	Fix slow unpickling (#21542 ) Summary: This was looking at the number of elements in the memo table, not the total capacity, and was thus calling reserve() a lot more than it should have Pull Request resolved: https://github.com/pytorch/pytorch/pull/21542 Reviewed By: driazati Differential Revision: D15723132 Pulled By: jamesr66a fbshipit-source-id: 20e1f9099b6a51a33994ea9dbc3f22eb3bc0c8f9	2019-06-07 17:28:55 -07:00
Edward Yang	66d596645a	Turn on Werror for deprecated-declarations. (#21195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21195 The motivation is that, while we shouldn't break USER code for using deprecated declarations, we should keep our internal code base deprecation clean. Differential Revision: D15576968 fbshipit-source-id: fb73a8986a5b60bf49ee18260653100319bb1030	2019-06-07 17:24:17 -07:00
Mingzhe Li	a5cf6d5100	reorganize op bench directory (#21543 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21543 No code change in this diff. Reviewed By: hl475 Differential Revision: D15721419 fbshipit-source-id: 06212cc882f5297064153417dc4d80bce9ec2667	2019-06-07 16:06:51 -07:00
Nikolay Korovaiko	5b4a188a95	add support for steps(strides) in tensor slices Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20929 Differential Revision: D15632636 Pulled By: Krovatkin fbshipit-source-id: 0e127bbd7b339784c4be2e0a57f28024727d5ad3	2019-06-07 15:55:26 -07:00
Junjie Bai	5744fb3007	Add mkldnn softmax operator Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21516 Differential Revision: D15712759 Pulled By: bddppq fbshipit-source-id: bf515135263156bea1a2b3e53a47edf697b8b1e2	2019-06-07 15:22:18 -07:00
Gregory Chanan	a947d98282	Set "scalar_check: false" for some LAPACK functions that can't return scalars. (#21498 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21498 ghimport-source-id: 33ce2f3f083616f633561e4871585f439e2647c0 Differential Revision: D15715477 Pulled By: gchanan fbshipit-source-id: 7772573ba74cdf7a5f2d86d2e581652ebd85e1c6	2019-06-07 15:00:54 -07:00
Sebastian Messmer	fe5ceea580	Rename caffe2<->c10 operator wrappers (#21322 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21322 Naming is everything. - Rename c10_operator.h -> export_caffe2_op_to_c10.h - Rename operator_c10wrapper.h -> export_c10_op_to_caffe2.h - Rename corresponding macros This hugely improves readability and explains what these things are doing. Reviewed By: dzhulgakov Differential Revision: D15616816 fbshipit-source-id: d976aefcb43a0f55d85c3424fdd9aca7e71c3603	2019-06-07 13:48:10 -07:00
Michael Suo	dad85b7e69	clang-format by line (#21531 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21531 ghimport-source-id: 711867e19cc3948a5e2a6aa8c4f2cd631abb04d2 Reviewed By: zdevito Differential Revision: D15719260 Pulled By: suo fbshipit-source-id: e88c5d3e14e6ecc956ce30ab0246ed606f4b0a38	2019-06-07 13:42:44 -07:00
Richard Zou	fa0c5c31d4	Turn namedtensor build back on (#21520 ) Summary: namedtensor build + test should run on PRs only if the commit message includes [namedtensor ci]. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21520 Differential Revision: D15718404 Pulled By: zou3519 fbshipit-source-id: ce8b5df2682e795e64958a9d49e2e3c091599b33	2019-06-07 13:37:48 -07:00
Rui Zhu	2b902e9738	Fix the offset numerical bug when casting (#21484 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21484 cast<int32_t*> => cast<int32_t> Also fixed reserve problem which might cause incorrect pointer. Reviewed By: yinghai Differential Revision: D15699866 fbshipit-source-id: 374418476bddd60f5c5306c8c57319ccf28b9990	2019-06-07 12:33:18 -07:00
Michael Suo	ac8c3fa7b6	Revert D15717337: [pytorch][PR] [precommit hook] clang-format by line Differential Revision: D15717337 Original commit changeset: 57e65a679a8f fbshipit-source-id: f73794087a23d56d03497b29d9a9e4e7d54deaad	2019-06-07 11:50:42 -07:00
Michael Suo	a77802cf56	clang-format by line (#15657 ) Summary: This should further reduce noise by only clang-formatting the lines you actually touched in the precommit hook. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15657 Differential Revision: D15717337 Pulled By: suo fbshipit-source-id: 57e65a679a8fdee5c3ff28e241c74ced9398eb0c	2019-06-07 11:46:36 -07:00
Gregory Chanan	b7f5d1e4c6	Fix size of histc return on CPU when input is 0-dimensional and bins=1. (#21497 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21497 ghimport-source-id: bc03f27408aa772f78d5351afe404b5e91a7c4ce Differential Revision: D15715478 Pulled By: gchanan fbshipit-source-id: 90e1b65249b4b12f936ee8877cc0bc5a972d9ceb	2019-06-07 11:23:46 -07:00
Elias Ellison	881adb5bcd	fix tuple indexing bug (#21521 ) Summary: lower tuples pass didn't check bounds for tuple index Pull Request resolved: https://github.com/pytorch/pytorch/pull/21521 Differential Revision: D15716813 Pulled By: eellison fbshipit-source-id: 8eead98c2c63118e7d24a8c8bf6184b02afb7dcd	2019-06-07 11:17:59 -07:00
Tim Khatkevich	a5cca4d342	add failback for Sign operator (#21343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21343 Needed to binarise features Reviewed By: yinghai Differential Revision: D15625653 fbshipit-source-id: 52f48259a040dac35a7000bb1eea9feb5c7ef1ab	2019-06-07 10:56:22 -07:00
svcscm	51fb42ebcf	Updating submodules Reviewed By: yns88 fbshipit-source-id: 5778cdb5173fc16e5d5474fefa2ea89264101184	2019-06-07 10:43:12 -07:00
Tzu-Wei Huang	54413cf91e	replace LegacyTracedModule with torchscript used in add_graph (#21339 ) Summary: The new implementation of tracing supports more module. So many error-handling code can be removed by placing the old one (LegacyTracedModule). cc orionr Pull Request resolved: https://github.com/pytorch/pytorch/pull/21339 Reviewed By: natalialunova Differential Revision: D15695154 Pulled By: orionr fbshipit-source-id: af7d35754e9f34bd1a0ad7b72a9ebe276ff8ab98	2019-06-07 10:43:08 -07:00
huba	b144ba66d5	Change PyTorch tests to use non-default CUDA stream (#21474 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21474 ghimport-source-id: b2477765362248a80557d1a20db02a1290bdcde3 Differential Revision: D15699700 Pulled By: fbhuba fbshipit-source-id: 1aa4309fec0982c8477cfab29ca5f42d2b171f97	2019-06-07 10:24:48 -07:00
Gregory Chanan	5cc3a3e2bf	Set "scalar_check: false" for TH methods that can't return scalars. (#21496 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21496 ghimport-source-id: d7197bccfe9e4d807f38a66e02ca6f0bf32bdc2b Differential Revision: D15715479 Pulled By: gchanan fbshipit-source-id: fa59eb808d26119b33eb97bb90ef70e95e58458d	2019-06-07 10:19:23 -07:00
Lara	cc85c3dbbc	ONNX Export Slice and Flip ops for opset 10 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20533 Reviewed By: zrphercule Differential Revision: D15579713 Pulled By: houseroad fbshipit-source-id: 91f3ac0cb14ef226f980362b0013b6b92cb8b8da	2019-06-07 10:03:26 -07:00
Edward Yang	3eced796cd	Make torchvision install chatty. (#21476 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21476 ghimport-source-id: adfd08b818f31ebbdf3da89d6bb95d33e14a9403 Differential Revision: D15715270 Pulled By: ezyang fbshipit-source-id: dde02579d9960ac960306d0a024b8e17846ae0ff	2019-06-07 08:41:13 -07:00
Yiming Wu	1503c734ce	updating gemmlowp tp 3fb5c Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21488 Differential Revision: D15715264 Pulled By: ezyang fbshipit-source-id: 86978f294720e0ce6f60b748a71f0604d6cfa00c	2019-06-07 08:32:49 -07:00
Edward Yang	8c9a88bdab	Make test_cuda.py work on Python 2. (#21466 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21466 ghimport-source-id: 0a235c8b8cf994621a5a5afe022340dd35764c91 Differential Revision: D15698096 Pulled By: ezyang fbshipit-source-id: 1759c2681071e9c7e83de3de86daf4333c5f8f3a	2019-06-07 08:13:03 -07:00
Kaixhin	c60465873c	Fix batch norm multiplier init (#13774 ) Summary: Fixes #12259, needs to make sure tests (see #13766) don't break due to numerical precision issues. Not sure what would need to be adjusted here... Pull Request resolved: https://github.com/pytorch/pytorch/pull/13774 Differential Revision: D15715021 Pulled By: ezyang fbshipit-source-id: 20ce2beee1b39ebe9f023c5f2b25be53acccb5f3	2019-06-07 07:50:39 -07:00
Richard Zou	c604847602	Implement at::match(Dimname, Dimname) and at::unify(Dimname, Dimname) (#21281 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21281 ghimport-source-id: 4b241d54a60c188b8566065c90b227b40914a5ca Differential Revision: D15699063 Pulled By: zou3519 fbshipit-source-id: c0f00c370d266a4ea5211aae943041fd899e960a	2019-06-07 06:30:45 -07:00
Richard Zou	4727685ea1	Added at::Dimname (#21280 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21280 ghimport-source-id: 921848326e4828ffd422868be26c409c6490e1ab Differential Revision: D15698516 Pulled By: zou3519 fbshipit-source-id: 502b9b019d51dd46327e6caf2af69aa520c70cb6	2019-06-07 06:30:42 -07:00
Edward Yang	e27c2f1437	Revert "Revert D15632268: [pytorch][PR] Continuation of Port max_unpool1d, max_unpool2d and max_unpool3d to ATen" (#21427 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21427 ghimport-source-id: 930c2fb29320f70e78f94e7eaaffe8e2ab62e7f3 Differential Revision: D15698423 Pulled By: ezyang fbshipit-source-id: 891c94c24b6d377cd6dd94d86cc66465b582359f	2019-06-07 05:52:27 -07:00
Edward Yang	d6af6588c2	Super resolution export to Caffe2 is broken, skip it. (#21479 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21479 ghimport-source-id: 60fa97fb2dfb37a758c0e8b9c2bc0fb2819fd2f7 Differential Revision: D15713609 Pulled By: ezyang fbshipit-source-id: a3d9c49e2db985f4373508cd44e94d43ae6e24da	2019-06-07 05:46:26 -07:00
Peng Gong	78a376592d	add cancelAsyncCallback method to OperatorBase (#21492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21492 If one async operator failed, async_scheduling net currently only marks all scheduled async operators as finished without cancelling the callbacks. The new behavior is to cancel the callbacks first, then set event status to finished. Reviewed By: ilia-cher Differential Revision: D15702475 fbshipit-source-id: 55a1774d768b2e238bab859b83332f1877a001ca	2019-06-06 20:57:12 -07:00
David Zhang	696b2c89b4	Adding gradient to Boolean Mask operator (#21423 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21423 - add gradient for boolean mask - add test for gradient checking Reviewed By: BIT-silence Differential Revision: D15640036 fbshipit-source-id: 79f40c6901e805bf1b8e9b01b57903e30b00f654	2019-06-06 20:48:47 -07:00
svcscm	d3d195e0b1	Updating submodules Reviewed By: yns88 fbshipit-source-id: af5812e3d071e66f9d0272c36bf639eb04bde7e4	2019-06-06 20:42:34 -07:00
Owen Anderson	772fd79d40	Defer constructing error strings for definitions under If's until they're needed. (#21429 ) Summary: This saves ~7% DenseNet load time (4.3s -> 4.0s) on my laptop. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21429 Differential Revision: D15681374 fbshipit-source-id: 9925a6154d51f2d592e26cb5ff8bf7ab3ee2519b	2019-06-06 20:32:57 -07:00
Alex Şuhan	abc0d3e544	Fix unused variable warning Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21444 Differential Revision: D15701786 Pulled By: ezyang fbshipit-source-id: 8348e08f9b8f3047b30736f9a944786ab84e6b68	2019-06-06 19:37:54 -07:00
ptrblck	bad67015fe	Add warning for Turing GPUs and CUDA <= 9000 (#21468 ) Summary: Turing GPUs (compute capability 7.5) require CUDA10 to work properly. We've seen some issues for these GPUs using PyTorch binaries with CUDA9 or older: [Discussion Board #1](https://discuss.pytorch.org/t/cudnn-status-execution-failed-error/38575) [Discussion Board #2](https://discuss.pytorch.org/t/cublas-runtime-error-on-gpu-running-but-works-on-cpu/46545/6) Tested on using CUDA9 with an RTX 2080Ti. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21468 Differential Revision: D15696170 Pulled By: ezyang fbshipit-source-id: ed43f4e4948d3f97ec8e7d7952110cbbfeafef2a	2019-06-06 19:33:02 -07:00
Edward Yang	63d4bbb0ec	Turn XLA back on for default set (but filtered out using should_run_job) (#21470 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21470 ghimport-source-id: 69800c1ce1187591b7bcdb8a63973b4fd8d0e326 Differential Revision: D15696930 Pulled By: ezyang fbshipit-source-id: fafbcba38d9572a23ee9c1d81cdcce3a154ae4c6	2019-06-06 19:18:45 -07:00
Huamin Li	f433913996	add more info back to BenchResult (#21502 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21502 In BenchResult, we keep name, avg_fwd, std_fwd, avg_bwd, and std_bwd. There is no information about the number of each iteration. In this diff, I am adding more info to BenchResult to include the number reported from each iteration. Reviewed By: wanchaol Differential Revision: D15706306 fbshipit-source-id: 3f14be4ba91f1f6da473995783bd7af1d067938d	2019-06-06 18:43:51 -07:00
Edward Yang	d51bd2191c	Revert D15629687: Deprecate torch::jit::RegisterOperators Differential Revision: D15629687 Original commit changeset: 2f87f18be655 fbshipit-source-id: a142c22be3fdf14a2b3c29b8766b218fb0883927	2019-06-06 18:09:01 -07:00
davidriazati	37ab35c8fc	Move jit testing utils to their own file (#21491 ) Summary: This moves `JitTestCase` to its own file so that we can have other jit test files (ex. `test_jit_py3.py`) There aren't any code changes, just a move and cleaning up the imports Pull Request resolved: https://github.com/pytorch/pytorch/pull/21491 Pulled By: driazati Differential Revision: D15703060 fbshipit-source-id: 6082e8b482100bb7b0cd9ae69738f1273e626171	2019-06-06 15:52:45 -07:00
Sebastian Messmer	e87f77def6	Fix typo (#21426 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21426 - Differential Revision: D15679789 fbshipit-source-id: 5fd448e66af159fd79883aa874065424ec9694ad	2019-06-06 15:44:16 -07:00
Sebastian Messmer	d714abf597	Deprecate torch::jit::RegisterOperators (#21368 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21368 - Differential Revision: D15629687 fbshipit-source-id: 2f87f18be65552f3eb3f4c945d7f19ba4bae0eb8	2019-06-06 15:44:12 -07:00
David Zhang	cb2ec07fa2	ReshapeOp supports empty tensor (#21230 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21230 tsia; we support empty tensor with this diff for reshape operator Reviewed By: jerryzh168 Differential Revision: D15583356 fbshipit-source-id: 6d44c04e95ca3546509bfb12102e29c878f9a7c7	2019-06-06 15:02:11 -07:00
Aapo Kyrola	b161832f10	support ceil mode by padding changes (#21310 ) Summary: Modify MKLDNN pooling operation to support ceil mode by adjusting the right/bottom padding accordingly. This is done similarly as in Caffe (see discussion https://github.com/pytorch/pytorch/pull/19205#discussion_r276903751). To make this possible, I split the padding to left and right (top / bottom). This naming is confusing but actually follows mkldnn's own naming for pooling::compute(). We increase the r paddings so that it matches the ceiling mode expected output size. Strengthened the test case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21310 Reviewed By: bddppq Differential Revision: D15611664 Pulled By: akyrola fbshipit-source-id: 46b40015dafef69a8fd5e7b2c261d8dbf448cd20	2019-06-06 14:47:35 -07:00
Daya Khudia	80a083ef92	Remove unneeded headers (#21393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21393 Result of splitting the base diff. We moved a header from src/* to include/fbgemm/* Reviewed By: jianyuh Differential Revision: D15635188 fbshipit-source-id: ad7d0ddba964ff1cb8b2e33f5f98e457a4d2eac9	2019-06-06 14:23:54 -07:00
Brian Vaughan	8a9ea55b25	Add autograd for to_sparse. (#20458 ) Summary: https://github.com/pytorch/pytorch/issues/18111 Pull Request resolved: https://github.com/pytorch/pytorch/pull/20458 Differential Revision: D15699732 Pulled By: nairbv fbshipit-source-id: f7a5424c1f1d3b0e4eba0d503d75ae8a18ef7ff4	2019-06-06 14:23:51 -07:00
ThisIsIsaac	87d10d49f4	Bilinear Upsampling increased throughput (#19306 ) Summary: changed `UpsampleBilinearKernel` s.t. the throughput increased 40~50%. I tested locally with my local test code -- not pytorch's provided test code -- because I am having a build problem ( which I made an issue about [here](https://github.com/pytorch/pytorch/issues/19184)). I tested with various tensor sizes and across all the sizes, it should a significant increase in throughput. 1. added `__restrict__` 2. instead of launch as many threads as there are output elements, I launched only `output_height * output_width` may threads and had each thread iterate through the channel and batch dimension. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19306 Differential Revision: D15701840 Pulled By: ezyang fbshipit-source-id: 53c54d4f4e4a28b58ecc7d7ae6b864cbfc760e27	2019-06-06 13:58:57 -07:00
Xingdong Zuo	c5d5d45f40	Fix numerically instability of `SigmoidTransform` (#19802 ) Summary: fix #18254 for numerically instability of `SigmoidTransform` Pull Request resolved: https://github.com/pytorch/pytorch/pull/19802 Differential Revision: D15701837 Pulled By: ezyang fbshipit-source-id: fe6c755c523487c8bbdcc3bfb8455801617c70a4	2019-06-06 13:58:53 -07:00
fehiepsi	f8cab38578	Address precision matrix instability of MVN distribution (#21366 ) Summary: Currently, when the input of MVN is precision matrix, we take inverse to convert the result to covariance matrix. This, however, will easily make the covariance matrix not positive definite, hence will trigger a cholesky error. For example, ``` import torch torch.manual_seed(0) x = torch.randn(10) P = torch.exp(-(x - x.unsqueeze(-1)) ** 2) torch.distributions.MultivariateNormal(loc=torch.ones(10), precision_matrix=P) ``` will trigger `RuntimeError: cholesky_cpu: U(8,8) is zero, singular U.` This PR uses some math tricks ([ref](https://nbviewer.jupyter.org/gist/fehiepsi/5ef8e09e61604f10607380467eb82006#Precision-to-scale_tril)) to only take inverse of a triangular matrix, hence increase the stability. cc fritzo, neerajprad , SsnL Pull Request resolved: https://github.com/pytorch/pytorch/pull/21366 Differential Revision: D15696972 Pulled By: ezyang fbshipit-source-id: cec13f7dfdbd06dee94b8bed8ff0b3e720c7a188	2019-06-06 13:54:46 -07:00
vfn	8ece538a79	Addresses bad behavior with overridden optimizer.step by #20124 (#21460 ) Summary: This PR addresses the problem described in the comment: https://github.com/pytorch/pytorch/pull/20203#issuecomment-499231276 and previously coded bad behaviour: - a warning was raised all the times when lr schedulling is initialized Now the code checks that: - on the second call of `lr_scheduler.step`, ensure that `optimizer.step` has been already called, otherwise raise a warning (as it was done in #20203 ) - if optimizer's step is overridden -> raise once another warning to aware user about the new pattern: `opt.step()` -> `lrs.step()` as we can not check this . Now tests check that - at initialization (`lrs = StepLR(...)`)there is no warnings - if we replace `optimizer.step` by something else (similarly to the [code of nvidia/apex](https://github.com/NVIDIA/apex/blob/master/apex/amp/_process_optimizer.py#L287)) there is another warning raised. cc ezyang PS. honestly I would say that there is a lot of overhead introduced for simple warnings. I hope all these checks will be removed in future `1.2.0` or other versions... Pull Request resolved: https://github.com/pytorch/pytorch/pull/21460 Differential Revision: D15701776 Pulled By: ezyang fbshipit-source-id: eac5712b9146d9d3392a30f6339cd33d90c497c7	2019-06-06 13:54:42 -07:00
Mingzhe Li	51d0da2802	Improve build docs and process for Windows (#21190 ) Summary: Fixes #21026. 1. Improve build docs for Windows 2. Change `BUILD_SHARED_LIBS=ON` for Caffe2 local builds 3. Change to out-source builds for LibTorch and Caffe2 (transferred to #21452) Pull Request resolved: https://github.com/pytorch/pytorch/pull/21190 Differential Revision: D15695223 Pulled By: ezyang fbshipit-source-id: 0ad69d7553a40fe627582c8e0dcf655f6f63bfdf	2019-06-06 13:46:52 -07:00
Ilia Cherniavskii	6fc702f384	Per-callback sampling (#21394 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21394 ghimport-source-id: 2607c7b456031a1ddb19fabc3b6fe2585c276d76 Differential Revision: D15639723 Pulled By: ilia-cher fbshipit-source-id: 938d02c1daf5bec5afa5d3cd021d2dae7e7160ce	2019-06-06 13:46:48 -07:00
peter	bb788631ce	Fix caffe2 windows CI for new Windows AMI (#21452 ) Summary: The alternative of #21410. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21452 Differential Revision: D15701767 Pulled By: ezyang fbshipit-source-id: e65c1d6bfcc98e88460f4a57e5b99c2f395c0ceb	2019-06-06 13:46:45 -07:00
Thomas Viehmann	3feb40d602	pack_padded_sequence: Check for empty (zero-element) tensors (#21461 ) Summary: Fixes: #20529 Thank you, JamieCT for the bug report with reproducing script. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21461 Differential Revision: D15696183 Pulled By: ezyang fbshipit-source-id: a93cde2c924f8447563c64ce8a1cf75fcee60a01	2019-06-06 13:41:52 -07:00
Natalia Lunova	3b6362d98e	Remove NodeExecStats and AllocatorMemoryUsed (#21419 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21419 Removed ```node_stats``` and unused imports Reviewed By: orionr Differential Revision: D15672824 fbshipit-source-id: 6167c80c081d925f02a1d279f3af0e1b8de66752	2019-06-06 13:35:52 -07:00
Brennan Vincent	0a3fb45d3d	allow passing Python built-in types as dtypes (#21215 ) Summary: Another simple bit of syntax that NumPy supports and we don't. Support int, float, and bool. ```python >>> torch.randn((2,3), dtype=float) tensor([[-0.1752, -0.3240, -0.6148], [ 0.1861, 1.6472, 0.1687]], dtype=torch.float64) ``` A bit confusingly, Python's "float" actually means double, but nothing we can do about that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21215 Differential Revision: D15697012 Pulled By: umanwizard fbshipit-source-id: 9a38d960a610b8e67023486b0c9265edd3c22246	2019-06-06 13:17:23 -07:00
Junjie Bai	b647804a55	Fix embedding bag nan output when input is empty (#21400 ) Summary: ``` import torch Embed = torch.nn.EmbeddingBag(100, 10, sparse=True) print(Embed(input=torch.LongTensor([]), offsets=torch.LongTensor([0]))) print(Embed(input=torch.LongTensor([]), offsets=torch.LongTensor([0, 0]))) ``` Before this fix: ``` tensor([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]]) tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]) ``` After this fix: ``` tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]) tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/21400 Differential Revision: D15643357 Pulled By: bddppq fbshipit-source-id: 119eba38129dc0a3757c331304a18044714fcca5	2019-06-06 13:03:17 -07:00
Brennan Vincent	f4f32cecfd	numpy like nonzero (called nonzero_tuple) (#20293 ) Summary: No performance degradation compared to Numpy when indexing: ``` In [15]: x=torch.randn((1000,1000)) In [16]: %timeit x[x.nonzero_tuple()] 4.63 ms ± 102 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In [17]: y=x.numpy() In [18]: %timeit y[y.nonzero()] 14.6 ms ± 281 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In [20]: x=x.t() In [22]: %timeit x[x.nonzero_tuple()] 9.01 ms ± 626 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In [24]: y=x.numpy() In [25]: %timeit y[y.nonzero()] 16.8 ms ± 770 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/20293 Differential Revision: D15358754 Pulled By: umanwizard fbshipit-source-id: 1344aabd95c969eeda9780c475a39551231879e1	2019-06-06 12:50:59 -07:00
davidriazati	8a2985eb05	Support recursive ModuleList / Sequential (#21306 ) Summary: Adds support for recursively compiling `nn.Sequential` and `nn.ModuleList`. When either is used, it is converted to a `jit._ConstModuleList` or `jit._ConstSequential` as necessary. Due to this, we don't need to add it to `__constants__` since it's made constant on demand. This PR also moves the recursive script tests out to their own class `TestRecursiveScript` (the added test is called `test_iterable_modules`) ](https://our.intern.facebook.com/intern/diff/15611738/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/21306 Pulled By: driazati Differential Revision: D15611738 fbshipit-source-id: fac52993990bd2dfad71d044c463a58a3759932a	2019-06-06 12:23:59 -07:00
Iurii Zdebskyi	2e37ab85af	Enable bool support for several index methods (#21435 ) Summary: Enable bool tensors for these index methods: - index_select - index_copy - put - take - index_fill Tested via unit tests TODO: Enable index_add in a separate PR as it requires more "side" changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21435 Differential Revision: D15684964 Pulled By: izdeby fbshipit-source-id: 48440e4d44873d70c4577e017dd0d8977e0fa15a	2019-06-06 12:14:01 -07:00
davidriazati	61cc03fb8d	Make ScriptModule.training an attribute instead of a parameter (#21078 ) Summary: Redo of #19587 ](https://our.intern.facebook.com/intern/diff/15560540/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/21078 Pulled By: driazati Differential Revision: D15560540 fbshipit-source-id: f415775d87c163f93b3bbdd5f87c9ff73f58b049	2019-06-06 12:06:49 -07:00
Iurii Zdebskyi	f1adddd1c6	Updated sum() logic to properly deal with bool tensor (#21421 ) Summary: `torch.tensor([True, False, True], dtype=torch.bool).sum()` should return 2 instead of True as it does now. Tested via unit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/21421 Differential Revision: D15674203 Pulled By: izdeby fbshipit-source-id: b00e3d0ca809c9b92b750adc05632522dad50c74	2019-06-06 12:02:23 -07:00
Shen Li	b7b6b612a7	Fix C++ data parallel (#20910 ) Summary: Fixes #19540 CC nmerrill67 C++ data parallel was using Module.clone() to create module replicas on every destination device. However, clone() does not set up gradient edges to point from replicas to the original module. As a result, the gradient will not be aggregated into the original module. This commit fixes the the problem by manually setting gradient edges from every parameter X in every replica to the same parameter X in the original module. ## Failed Attempt Initially I tried implementing what we did in `replicate.py`, which 1. create module replicas 2. use Python `Broadcast` autograd function to broadcast every parameter in the original module to all destination devices. 3. assign the broadcast result params to module replicas' `_parameters` dict. This works in Python because derived module member field params (e.g., `Linear.weight`) and base module `_parameters` (e.g., `Linear._parameters['weight']`) are referencing the same parameter instance. Assigning one of them will apply to both. However, in C++, even though I can modify Module's `parameters_ `values and gradient edges to point to the broadcast source, I cannot touch the weight and bias member fields in Linear, because replicate cannot (and should not) add special-case handlers to every different module. (See `Linear` [.h](https://github.com/pytorch/pytorch/blob/master/torch/csrc/api/include/torch/nn/modules/linear.h), [.cpp](https://github.com/pytorch/pytorch/blob/master/torch/csrc/api/src/nn/modules/linear.cpp)) Although they initially point to the same `TensorImpl` instance, after assigning to `Module.parameters_['weight']`, it will be different from `Linear.weight`. ## Solution Options gchanan and I had several discussions on this issue and figured two solutions to this problem. ### Option One [implemented in this PR] Replicate the module in two steps: 1. call `Module.clone()` to create a module replica on every destination device. 2. manually setting gradient edges from every parameter in every replica to the same parameter in the original module. * Pro: Does not need to change any existing module, and relatively easier to implement * Con: It is a little hackish. ### Options Two Implement a `Replicatable` class (similar to `Cloneable`), and make it a friend class of `Module`. For more details see `Note [Replicating Modules]` in the code change. * Pro: Maybe this aligns more with our existing approach implemented in `Cloneable`? * Con: Require changes to every existing module. I am inclined to go with option one, because `replicate` will only be used on data parallel. I feel it is too big an overkill if we have to change all existing module implementations due to a data parallel requirement. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20910 Differential Revision: D15556426 Pulled By: mrshenli fbshipit-source-id: aa836290ec657b32742e2bea80bd0ac2404ef3b0	2019-06-06 11:57:31 -07:00
Hong Xu	da4f3629c5	Add missing shebangs to Python files with executable permissions. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21305 Differential Revision: D15613078 Pulled By: ezyang fbshipit-source-id: 1fedf4368d65db406b617a51402ee8a20968aff7	2019-06-06 10:53:40 -07:00
Yoshiaki Nakamura	52596164d4	Fix 32-bit env. model load issue (#20900 ) Summary: Fixed an issue where models can not be loaded in a 32-bit environment like Raspbian. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20900 Differential Revision: D15696709 Pulled By: ezyang fbshipit-source-id: 37a81f05f235d3b9fc6244e12d3320ced3d1465e	2019-06-06 10:30:29 -07:00
Hong Xu	f891b4338a	Test the exceptions raised by isfinite and isinf (#21168 ) Summary: Following up ef1fdc27a3779586efad631d698cec2d6d19503f Pull Request resolved: https://github.com/pytorch/pytorch/pull/21168 Differential Revision: D15696615 Pulled By: ezyang fbshipit-source-id: 46904974ef3c4cb87c7a1d06871bf01543e61ef2	2019-06-06 10:30:26 -07:00
mruberry	dffff3218b	Improves NVRTC Error messages (#21174 ) Summary: Current versions of NVRTC incorrectly map error code 7 to the error string "NVRTC unknown error." This update maps error code 7 to the correct string explicitly in PyTorch. See the documentation at: https://docs.nvidia.com/cuda/nvrtc/index.html#group__error. This may give us a better idea of the source of NVRTC errors that some community members, like Uber, have reported. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21174 Differential Revision: D15696593 Pulled By: ezyang fbshipit-source-id: f5c7b5876c07b311ab5f2d7c8e375e93273912c6	2019-06-06 10:30:22 -07:00
vishwakftw	6615797837	Add derivative for QR decomposition (#21274 ) Summary: Changelog: - Implement derivative for QR decomposition for tall and square matrices i.e., num rows >= num cols Pull Request resolved: https://github.com/pytorch/pytorch/pull/21274 Differential Revision: D15696506 Pulled By: ezyang fbshipit-source-id: 1c77bb8369818112c84139360f6e2650f92bf2fd	2019-06-06 10:11:21 -07:00
Kabir Kwatra	26bcadcc61	Gumbel-Softmax Arxiv Docs Link Fix (#21376 ) Summary: Links separated #20297 Pull Request resolved: https://github.com/pytorch/pytorch/pull/21376 Differential Revision: D15696413 Pulled By: ezyang fbshipit-source-id: 513bd430e41c109aa2d0fbaa9a242acb2a12059b	2019-06-06 10:11:18 -07:00
Mads R. B. Kristensen	ee15ad1bd6	"CharTensor" numpy conversion is supported now (#21458 ) Summary: Fixed #21269 by removed the the expected `ValueError` when converting a tensor to a Numpy `int8` array in the Numba interoperability test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21458 Differential Revision: D15696363 Pulled By: ezyang fbshipit-source-id: f4ee9910173aab0b90a757e75c35925b026d1cc4	2019-06-06 10:06:41 -07:00
Will Feng	c8083e0292	Include named_any.h in modules.h (#21437 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/19462. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21437 Differential Revision: D15684880 Pulled By: yf225 fbshipit-source-id: db23c7e4e0f62d22b0b6c18f15420c3bb66af366	2019-06-06 09:57:33 -07:00
Hong Xu	856e3518c5	Parallelize eye() on CPU. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21077 Differential Revision: D15695329 Pulled By: ezyang fbshipit-source-id: 9841777238dac7c08cde2db3cd9401853f633af3	2019-06-06 09:52:13 -07:00
selaselah	ae18f8e761	Fix latex formular error about *normal (#21000 ) Summary: issue: https://github.com/pytorch/pytorch/issues/20903 the latex abort norm should be `\mathcal{N}(\text{mean}, \text{std}^2)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/21000 Differential Revision: D15695335 Pulled By: ezyang fbshipit-source-id: 34dcca0acb20c297f876287e081cd44d11a3e516	2019-06-06 08:47:42 -07:00
lsrock1	4e02d3c0a1	insert default parameters in binary cross entropy with logits (#21336 ) Summary: I inserted default weight and reduction params in binary_cross_entropy_with_logits function . These default params exist in python and binary_cross_entropy function in cpp. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21336 Differential Revision: D15628917 Pulled By: ezyang fbshipit-source-id: 38e5f53851125238842df1bd71cb6149c8603be1	2019-06-06 08:47:39 -07:00
Zeno Gantner	d50dca4075	fix two typos: "a the" => "the" Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20437 Differential Revision: D15321243 Pulled By: zou3519 fbshipit-source-id: 6e1690132769b8ef2fd679cb5898c378efac2112	2019-06-06 08:42:57 -07:00
BowenBao	63a55d4932	Support gather export with OneHot + Mul (#21235 ) Summary: This could serve as a alternative solution to export ```torch.gather``` before something similar goes into ONNX spec. The exported model is verified to be correct against onnxruntime backend. We weren't able to test against Caffe2 backend because it doesn't seem to support OneHot opset9. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21235 Differential Revision: D15613039 Pulled By: houseroad fbshipit-source-id: 7fc097f85235c071474730233ede7d83074c347f	2019-06-06 08:35:28 -07:00
Hong Xu	240d62fbaa	Move redundant code that checks NumPy during build to a helper module and add an option to disable building with NumPy Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21417 Reviewed By: ezyang Differential Revision: D15694357 Pulled By: fmassa fbshipit-source-id: bc1bda23349ba4531f19619fa4adecb846225c20	2019-06-06 08:15:19 -07:00
Edward Yang	a68d2e817b	Kill apt-get even harder, and before we purge. (#21464 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21464 ghimport-source-id: 81beb6ef39e3d412e755f0ae06c9186d8e11a8bc Differential Revision: D15694828 Pulled By: ezyang fbshipit-source-id: 0791fe017a1318425528795f576fb96e54b14dae	2019-06-06 07:49:39 -07:00
Mingzhe Li	12528990f8	change output of ai_pep_format (#21440 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21440 This diff modifies the output format when ai_pep_format is enabled. Reviewed By: hl475 Differential Revision: D15681042 fbshipit-source-id: df5f2dbb38d1bd866ca7f74ef4e63459d480be6e	2019-06-05 21:54:24 -07:00
svcscm	4e679e30a8	Updating submodules Reviewed By: yns88 fbshipit-source-id: 060bf204b6400515bbc8f1b9b3ef34bef9d32560	2019-06-05 20:06:53 -07:00
Horace He	7e300fbb21	Added degrees, radians, ldexp (#21131 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21131 ghimport-source-id: 62b9cb71a17f9c9a7999a6e33c2d8b840ce097ff Differential Revision: D15563184 Pulled By: Chillee fbshipit-source-id: e2c47fb9f9c0fe9f039cfd001c5e6d5b455e034c	2019-06-05 19:17:02 -07:00
Nishant Pandit	bd2d318e23	Modify quant-dequant node api to take module object and method name (#21407 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21407 Modify api takes module object and method whose graph is instrumented to insert the quant dequant nodes Differential Revision: D15651624 fbshipit-source-id: 1ff1ae446c986184c724504c8fdd0dcd43864016	2019-06-05 19:08:56 -07:00
svcscm	505ae5f51d	Updating submodules Reviewed By: yns88 fbshipit-source-id: 9ab609a16522eb233f128df024903eb880742224	2019-06-05 19:03:34 -07:00
Horace He	f8202d85a0	Added frexp, isinf, isnan, isfinite (#21130 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21130 ghimport-source-id: fa771086da13deed232e142db6f940439bcc67bc Differential Revision: D15563186 Pulled By: Chillee fbshipit-source-id: fe33dbc454af2a9626ad810a5304300eb17d7530	2019-06-05 18:46:39 -07:00
Hector Yuen	26db46b324	change the epilogue of SLS to match the simd section (#21439 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21439 this bug got exposed after testing accuracy on shapes not multiples of 8 Reviewed By: jspark1105 Differential Revision: D15684759 fbshipit-source-id: 2950f2bd87ee1d8e539148285a14c755f606b3a7	2019-06-05 18:41:55 -07:00
Junjie Bai	7e6d932208	Make strtod_c compatible with different gcc abi (#21293 ) Summary: We have encountered `std::bad_cast` error when running PyTorch binary built with cxx11 abi on CentOS7, stack trace: ``` #0 0x00007fec10160207 in raise () from /lib64/libc.so.6 #1 0x00007fec101618f8 in abort () from /lib64/libc.so.6 #2 0x00007fec015767d5 in __gnu_cxx::__verbose_terminate_handler() () from /lib64/libstdc++.so.6 #3 0x00007fec01574746 in ?? () from /lib64/libstdc++.so.6 #4 0x00007fec01574773 in std::terminate() () from /lib64/libstdc++.so.6 #5 0x00007fec01574993 in __cxa_throw () from /lib64/libstdc++.so.6 #6 0x00007fec015c94d2 in std::__throw_bad_cast() () from /lib64/libstdc++.so.6 #7 0x00007feb2ab3c2d7 in std::__cxx11::numpunct<char> const& std::use_facet<std::__cxx11::numpunct<char> >(std::locale const&) () from /root/.local/lib/python2.7/site-packages/torch/lib/libcaffe2.so #8 0x00007feb28643d62 in torch::jit::script::strtod_c(char const, char*) () from /root/.local/lib/python2.7/site-packages/torch/lib/libcaffe2.so ``` We are suspecting this line will get compiled to gcc abi dependent symbol: ``` char decimal_point = std::use_facet<std::numpunct<char>>(std::locale()).decimal_point(); ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/21293 Differential Revision: D15609910 Pulled By: bddppq fbshipit-source-id: e247059729863868e4b36d6fec4fcbc36fbc4bb1	2019-06-05 18:10:09 -07:00
svcscm	e07d94558d	Updating submodules Reviewed By: yns88 fbshipit-source-id: 6608ac4be8c338ff5a8116b275bbad487d317972	2019-06-05 16:28:40 -07:00
Brian Vaughan	991c557270	Fix an incorrect implementation of celu (#21213 ) Summary: Fixing an incorrect implementation of the CELU activation function. The existing implementation works by a chance combination of errors that seem to cancel each other out. This change makes the code more readable, aligns the parameter names correctly, and is consistent with the cuda implementation. I came across this issue while working on version counters... I attempted to specify a gradient in derivatives.yaml for CELU due to a failed test, but the derivative couldn't be specified correctly without fixing the celu implementation. https://github.com/pytorch/pytorch/pull/20612 Pull Request resolved: https://github.com/pytorch/pytorch/pull/21213 Differential Revision: D15678823 Pulled By: nairbv fbshipit-source-id: 29fa76b173a66c2c44ed2e0b7959e77f95d19c43	2019-06-05 15:45:50 -07:00
Owen Anderson	335869e833	Fix 3x DenseNet compile time regression by restoring earlier-out tests in AliasDB::writesToAlias. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21425 Differential Revision: D15678631 fbshipit-source-id: 3da2c694de13ad03019e2b3ff451e762199265bb	2019-06-05 15:40:29 -07:00
Edward Yang	6b9f46b2d0	Fix "warning: missing return statement at end of non-void function" (#21424 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21424 Fixes #21418 Differential Revision: D15676140 fbshipit-source-id: cfadce164c6cfefb16f8bf7bc09529ba8b910769	2019-06-05 15:30:54 -07:00
Syed Tousif Ahmed	0e3c4a054b	Remove curandStateMTGP32 usage (#21301 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21301 ghimport-source-id: d4516237a8fb46d1f74c47532e849e5926fc6a79 Differential Revision: D15632929 Pulled By: ezyang fbshipit-source-id: b5147edb95dc3d71f87581aa2ab002e48c3fef30	2019-06-05 14:06:25 -07:00
Brian Vaughan	793b302653	ensure version_counter gets incremented for non-differentiable outputs (#20612 ) Summary: issue: https://github.com/pytorch/pytorch/issues/14571 To reproduce I: 1) added these lines to derivatives.yaml: ``` - name: add_(Tensor self, Scalar other, Scalar alpha) output_differentiability: [False, False, False] - name: add_(Tensor self, Tensor other, Scalar alpha) output_differentiability: [False, False, False] ``` 2) Ran this code: ``` import torch scalar = torch.tensor(5) var1 = torch.randn(4,2,requires_grad=True) var2 = var1.detach().requires_grad_() output1 = var1 * scalar output2 = var2 * scalar output1.sum().backward() scalar.add_(5, 1) output2.sum().backward() print(var1.grad) print(var2.grad) ``` Observed modified var2.grad in the output: ``` tensor([[5., 5.], [5., 5.], [5., 5.], [5., 5.]]) tensor([[10., 10.], [10., 10.], [10., 10.], [10., 10.]]) ``` After making this change, re-running the above code produces the expected error: ``` Traceback (most recent call last): File "test.py", line 18, in <module> output2.sum().backward() File "/home/bvaughan/anaconda3/lib/python3.7/site-packages/torch/tensor.py", line 107, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/home/bvaughan/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py", line 93, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.LongTensor []] is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True). ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/20612 Differential Revision: D15661958 Pulled By: nairbv fbshipit-source-id: af3373135a1a589a635b49e0ff62622a210258e6	2019-06-05 13:36:05 -07:00
Edward Yang	8215f44405	Revert D15660575: [pytorch][PR] Fix Caffe2 CI job for new Windows AMI Differential Revision: D15660575 Original commit changeset: cfc0f325b0fb fbshipit-source-id: cb7d87605c9019b9e563bf5ce4325a919263938e	2019-06-05 12:15:34 -07:00
Peyman Manikashani	98e3aaeb78	Adding support for exporting models with variable length input/output to ONNX (#20034 ) Summary: Proposal: https://gist.github.com/pk-g/cc45ff8c5891b5699bffd883a87f13ae?fbclid=IwAR17bRA7Fks4APoZRYiNa93UkLdoFCpRDuIYEx0lNVyPTyaDAShbEnytiQo Pull Request resolved: https://github.com/pytorch/pytorch/pull/20034 Reviewed By: zrphercule Differential Revision: D15606731 Pulled By: houseroad fbshipit-source-id: 247251e07b4893cb3f7a1287948b1f57aadb7851	2019-06-05 12:02:23 -07:00
Horace He	ba2bdf8d0e	Added factorial (#21129 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21129 ghimport-source-id: a676dd33c4d0b2b60c3e9ce725bda0abeb22375f Differential Revision: D15563183 Pulled By: Chillee fbshipit-source-id: 641cae34c181a16c772665f5f7ed01c96a67ea9c	2019-06-05 11:51:03 -07:00
Edward Yang	7a1c9076ac	Revert D15632268: [pytorch][PR] Continuation of Port max_unpool1d, max_unpool2d and max_unpool3d to ATen Differential Revision: D15632268 Original commit changeset: 8e337e8dc17a fbshipit-source-id: de98b1af51a53105c97fb076b09efb6fa8eb08a7	2019-06-05 11:41:50 -07:00
davidriazati	f172fadd80	Make warnings be UserWarnings with source file info (#21231 ) Summary: Redo of #15201, this makes `warnings.warn` calls match their Python behavior ](https://our.intern.facebook.com/intern/diff/15605266/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/21231 Pulled By: driazati Differential Revision: D15605266 fbshipit-source-id: 5931fd720b0c40d52dd492fbd1f5a76abefaab5c	2019-06-05 11:09:11 -07:00
Edward Yang	3068a945ce	Retry awscli install. (#21383 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21383 ghimport-source-id: e518a4f8bf498694b6d504b8a695c5f11e7c681f Differential Revision: D15664738 Pulled By: ezyang fbshipit-source-id: d645db505de906e65c057f0d6964b5ce0fb6ff52	2019-06-05 10:38:01 -07:00
Iurii Zdebskyi	bf0e3b62ae	Minor preparational JIT changes (#21096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21096 ghimport-source-id: 169f8b4b70cd77b0f9b07cca81d2b4cde2c46456 Reviewed By: ezyang Differential Revision: D15546176 Pulled By: izdeby fbshipit-source-id: cdd0a1c87263955eef9d3174ec2f36d1d2935f48	2019-06-05 10:30:01 -07:00
Edward Yang	c15254d4ab	Expunge some more deprecated uses of AT_CHECK. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21194 Differential Revision: D15576898 fbshipit-source-id: f030195f5bffe0027d4081aece57e2852aaf9ecb	2019-06-05 10:25:25 -07:00
Daya Khudia	ec7dc52e60	Fix a bug in qconv (#21294 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21294 Returned output tensor wasn't of correct shape Reviewed By: zafartahirov Differential Revision: D15605081 fbshipit-source-id: f79a9d5b93b8b97e79c09411b9dc681987a61e44	2019-06-05 10:19:31 -07:00
Iurii Zdebskyi	03617574d3	Сhange type of a tensor with bools (#19097 ) Summary: This is bc-breaking change Change dtype of a tensor which was created from bool data. Old behavior: torch.tensor([True, False]) -> uint8 tensor Now: torch.tensor([True, False]) -> bool tensor Tested via tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19097 Reviewed By: ezyang Differential Revision: D15632553 Pulled By: izdeby fbshipit-source-id: b019150844c561a6845710a3c62b12f06b68bbe3	2019-06-05 10:19:27 -07:00
Anthony Scopatz	22ddddfb80	Continuation of Port max_unpool1d, max_unpool2d and max_unpool3d to ATen (#19465 ) Summary: This PR is a continuation of #15310, which itself is a continuation of #14845, #14941, & #15293. It should be synced up with the pytorch/master branch as of yesterday. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19465 Differential Revision: D15632268 Pulled By: ezyang fbshipit-source-id: 8e337e8dc17ac31439935ccb530a7caf77f960e6	2019-06-05 10:13:58 -07:00
Jason Lian	6874c4058d	Add type annotation to stft (#21302 ) Summary: We want to be able to call stft from a torchscript which requires that stft have a type annotation Pull Request resolved: https://github.com/pytorch/pytorch/pull/21302 Differential Revision: D15607973 Pulled By: cpuhrsch fbshipit-source-id: c4a5c09cdaafe7e81cf487a3ad216d1b03464a21	2019-06-05 10:06:48 -07:00
peter	7c6f2836d4	Fix Caffe2 CI job for new Windows AMI Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21410 Differential Revision: D15660575 Pulled By: ezyang fbshipit-source-id: cfc0f325b0fbc22282686a4d12c7a53236d973d4	2019-06-05 06:35:39 -07:00
Vishwak Srinivasan	6251c563eb	Add CUDA support for _dirichlet_grad (#21191 ) Summary: Changelog: - Migrate _dirichlet_grad implementation from TH to ATen - Add CUDA support for _dirichlet_grad Closes #11030. Closes #15773. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21191 Differential Revision: D15660330 Pulled By: ezyang fbshipit-source-id: c8ad5b80366e5348139ce9be10400f22fc430344	2019-06-05 06:35:35 -07:00
Yongqiang Wang	b460a1987e	Per discussion at https://github.com/pytorch/pytorch/pull/21244 , fix bugs in (#21392 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21392 as discussed at https://github.com/pytorch/pytorch/pull/21244, we found some values in log_beta are not properly initialized. This diff will 1) initialize all log_beta to -inf; 2) fix a tricky compare condition; 3) zero all the gradient elements corresponding to padding to zero. Offline experiments show that this diff can fix previous seen NaN loss. Differential Revision: D15637977 fbshipit-source-id: 477008a5e11aae946bd2aa401ab7e0c513421af0	2019-06-05 00:28:45 -07:00
Yuri Putivsky	42b2f56124	Fixing race condition at Module::forward method (#21398 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21398 Module::forward method calls find_method() function potentially in multiple threads. Internally it calls find_offset() method and reads dict_ object. If the correspondent name is not in a dictionary thread call insert() method and modifies dict_ object. At the same time when first thread modifies dict_ object another thread can enter forward()->find_method()->find_offset() path and access dict_ object for reading while it have been modified -> crash. Moved mutex protection up to protect both calls find_offset() and insert(). Consider to use C++ 17 shared_mutex locking object instead of recursive_mutex object. Reviewed By: bddppq Differential Revision: D15638942 fbshipit-source-id: ca6a453448302a0b3666c87724755fa4e9ce242f	2019-06-04 23:03:25 -07:00
Syed Tousif Ahmed	95eb9339c1	Adds CUDA C++11 and Profiling Notes (#21386 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21386 ghimport-source-id: 9430c7640b90d9add38d9bf2f1bd0c8f62b7f239 Differential Revision: D15640102 Pulled By: ezyang fbshipit-source-id: 98a5efdea9b1de05207ebd3624cb20acda9fe96b	2019-06-04 19:18:55 -07:00
Syed Tousif Ahmed	eadac840f7	Speedup bernoulli_scalar_cuda_kernel with grid-stride loop (#21300 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21300 ghimport-source-id: c314c28cb693b554d6f24de235c11ba24ed6bf61 Reviewed By: jerryzh168 Differential Revision: D15632935 Pulled By: ezyang fbshipit-source-id: 9bb24f17d78151bf50942905c967bdcfe1ff00cb	2019-06-04 19:13:57 -07:00
Syed Tousif Ahmed	c82bf8ef10	Move THCTensor_(lognormal) to ATen (#21299 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21299 ghimport-source-id: 2c63f289f02087f023feda8bff6b90ed49737889 Reviewed By: jerryzh168 Differential Revision: D15632930 Pulled By: ezyang fbshipit-source-id: 85c17cdca486b46942c5b500e4fd4d95bb5657f9	2019-06-04 19:13:53 -07:00
Syed Tousif Ahmed	4671bed0f3	Move THCTensor_(geometric) to ATen (#21298 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21298 ghimport-source-id: c0e2604aa25cc5da2b67293cafd88c2e77e476f9 Reviewed By: jerryzh168 Differential Revision: D15632932 Pulled By: ezyang fbshipit-source-id: 248ca4b56967116f27174cda44893ecfe4ca9a99	2019-06-04 19:13:50 -07:00
Syed Tousif Ahmed	d341bcb3dc	Move THCTensor_(exponential) to ATen (#21297 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21297 ghimport-source-id: 5f45154e714ab44dec961dabf1c64e54aaa063a2 Reviewed By: jerryzh168 Differential Revision: D15632931 Pulled By: ezyang fbshipit-source-id: 0367eec0a9ef6812b1b3ab7597817ee40a011bb8	2019-06-04 19:13:46 -07:00
Horace He	92b76df8f6	Finished trigonometric functions (#21128 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21128 ghimport-source-id: d566de103f2aefc59e6423181de325d8f42620f4 Differential Revision: D15563190 Pulled By: Chillee fbshipit-source-id: ad2e09cac5c7dae9978a7bd61098c2828620cdc4	2019-06-04 17:59:09 -07:00
Horace He	7309cb60fd	Finished the high-priority functions (#21127 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21127 ghimport-source-id: 609021958e76ea01299f62b9491038005e6b4f27 Differential Revision: D15563189 Pulled By: Chillee fbshipit-source-id: 5c6155a69fff7447689ef012ea303dc358d50486	2019-06-04 17:59:05 -07:00
Horace He	622588d8fd	Added remainder of high-priority trigonometric math ops (#21126 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21126 ghimport-source-id: e310f3cfb28436b99ad038691887ca82068ca2c9 Differential Revision: D15563191 Pulled By: Chillee fbshipit-source-id: 7135ddd5bc9eebc818694fa8b67eaade907fa8a1	2019-06-04 17:59:02 -07:00
Brennan Vincent	e268fc97c3	Re-add Tensor.T (#21175 ) Summary: Something flaky is going on with `test_inplace_view_saved_output` on Windows. With my PR #20598 applied, the test fails, even though there is no obvious reason it should be related, so the PR was reverted. Based on commenting out various parts of my change and re-building, I think the problem is with the name -- renaming everything from `T` to `asdf` seems to make the test stop failing. I can't be sure that this is actually the case though, since I could just be seeing patterns in non-deterministic build output... I spoke with colesbury offline and we agreed that it is okay to just disable this test on Windows for now and not block landing the main change. He will look into why it is failing. Test Plan: I will wait to make sure the Windows CI suite passes before landing this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21175 Differential Revision: D15566970 Pulled By: umanwizard fbshipit-source-id: edf223375d41faaab0a3a14dca50841f08030da3	2019-06-04 17:38:25 -07:00
Hong Xu	ba08cf336d	Reorganize cmake related functions to tools/setup_helpers/cmake.py (#21367 ) Summary: Currently tools/build_pytorch_libs.py looks quite convoluted. This commit reorgnizes cmake related functions to a separate file to make the code clearer. --- This is hopefully helpful for further contribution for better integration with cmake. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21367 Differential Revision: D15636991 Pulled By: soumith fbshipit-source-id: 44d76e4e77aec0ce33cb32962b6a79a7f82785da	2019-06-04 17:01:38 -07:00
Elias Ellison	6ee9e87ff5	Back out "[pytorch][PR] don't materialize constants" (#21374 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21374 Original commit changeset: d5609b0a5697 Not materializing constants slows compilation time significantly Differential Revision: D15630632 fbshipit-source-id: c6b5026ee6eae2ef290628f350f49a657495bd5d	2019-06-04 16:32:09 -07:00
James Reed	45d2305732	fix incorrect default on Graph::toString (#21370 ) Summary: This default was incorrect and made printing in python not print file:line:col This wasn't caught because FileCheck internally uses operator<< to print the graph, which has `true` hardcoded as the value. I've added more comprehensive tests to catch this Pull Request resolved: https://github.com/pytorch/pytorch/pull/21370 Differential Revision: D15631135 Pulled By: jamesr66a fbshipit-source-id: c809e06fff4f0174eefeb89062024384b4944ef7	2019-06-04 16:15:38 -07:00
James Reed	0dc7286e15	Better error message when trying to instantiate NamedTuple Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21309 Differential Revision: D15630564 Pulled By: jamesr66a fbshipit-source-id: 82753feee65bbe6c8b2f827cc2664628f3b9f4a3	2019-06-04 16:11:05 -07:00
Igor Fedan	d348d6405c	cdist: pairwise distances between two sets of tensors with batch mode (#20934 ) Summary: Batch implementation for cdist function Pull Request resolved: https://github.com/pytorch/pytorch/pull/20934 Differential Revision: D15609458 Pulled By: ifedan fbshipit-source-id: 31c12e120d168baec6a6af913f599838a44034d7	2019-06-04 15:52:52 -07:00
Karl Ostmo	6a3ebdbbc5	Remove all conda 3.5 nightly configs, remove libtorch smoketests (#21380 ) Summary: \| \| Before \| After ------------ \| ------------ \| ------------- Binary builds \| ![binarybuilds-config-dimensions](https://user-images.githubusercontent.com/261693/58915716-77a5f900-86d6-11e9-8a39-7ef587e56281.png) \| ![binarybuilds-config-dimensions](https://user-images.githubusercontent.com/261693/58915620-4a594b00-86d6-11e9-9e5f-95cf085e6fc8.png) \| Smoke tests \| ![binarysmoketests-config-dimensions](https://user-images.githubusercontent.com/261693/58915728-812f6100-86d6-11e9-80c1-182242fdfd0e.png) \| ![binarysmoketests-config-dimensions](https://user-images.githubusercontent.com/261693/58915686-68bf4680-86d6-11e9-8cd2-e65a47384b4f.png) \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/21380 Differential Revision: D15634729 Pulled By: kostmo fbshipit-source-id: aef44b0e5b9997be55d93969ab85effca68c5c88	2019-06-04 15:48:47 -07:00
Michael Suo	ca32563999	add suggestion to use lld to CONTRIBUTING.md (#21334 ) Summary: I found this significantly speeds up incremental builds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21334 Differential Revision: D15632994 Pulled By: suo fbshipit-source-id: bb4af90f4400bffa90d168d82ff30fece5e3835c	2019-06-04 15:40:49 -07:00
bddppq	4940e41d16	Fix mkl-dnn tautological compare error (#21371 ) Summary: ``` ../third_party/ideep/mkl-dnn/src/cpu/jit_avx512_common_convolution.hpp:144:821: error: self-comparison always evaluates to true [-Werror,-Wtautological-compare] virtual pd_t clone() const override { return new pd_t(this); } virtual status_t create_primitive(primitive_t *primitive, const primitive_at_t inputs, const primitive_t *outputs) const override { double ms = get_msec(); primitive_t::input_vector ins(inputs, inputs + this->n_inputs()); primitive_t::outpu t_vector outs(outputs, outputs + this->n_outputs()); auto ret = safe_ptr_assign<primitive_t>(primitive, new (jit_avx512_common_convolution_bwd_data_t)(this, ins, outs)); ms = get_msec() - ms; if (mkldnn_verbose()->level >= 2) { printf("mkldnn_verbose,create,%s,%g\n", this->info(), ms); fflush(0); } return ret; } v irtual const char *name() const override { return (avx512_common == sse42 ? "jit:" "sse42" : (avx512_common == avx ? "jit:" "avx" : (avx512_common == avx2 ? "jit:" "avx2" : (avx512_common == avx512_common ? "jit:" "avx512_common" : (avx512_common == avx512_core ? "jit:" "avx512_core" : (avx512_common == avx512_mic ? "jit:" "avx512_mic" : (avx512_common == avx512_mic_4ops ? "jit:" "avx512_mic_4ops" : "jit:" ""))))))); }; ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/21371 Differential Revision: D15631392 Pulled By: bddppq fbshipit-source-id: 3b0008acab8ae53ce61327686bd8367e7fb5d298	2019-06-04 15:27:07 -07:00
Michael Suo	403ca41142	make analyzeConservative more conservative (#21227 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21227 ghimport-source-id: cac97ba20cb020f3edc4e83e7641201f0826f40a Reviewed By: jamesr66a Differential Revision: D15592316 Pulled By: suo fbshipit-source-id: b311f73a5d81d6d0b0331678b6a625e446588ebd	2019-06-04 15:09:46 -07:00
Michael Suo	0dbae7eddb	cleanup templated implementation of mayAlias (#21224 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21224 ghimport-source-id: 6ec4ea015043bbddd031f92c5149e8313f21977d Reviewed By: jamesr66a Differential Revision: D15592318 Pulled By: suo fbshipit-source-id: 47c52342f2a1360752306908e2f394ef52e47504	2019-06-04 15:09:43 -07:00
Michael Suo	adf6f6c442	use memory locations instead of values for working set (#21223 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21223 ghimport-source-id: 82800a465a4273e185bfffe2f67835b2f7f3a519 Reviewed By: jamesr66a Differential Revision: D15592317 Pulled By: suo fbshipit-source-id: 5e87c803a928b61c923200888a3ff1ac7b2523e0	2019-06-04 15:09:39 -07:00
Michael Suo	f330168570	remove multisets from work set (#21222 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21222 ghimport-source-id: 0eb6daa92bef68a35bef918c3f3a791b401812aa Reviewed By: jamesr66a Differential Revision: D15592319 Pulled By: suo fbshipit-source-id: 895d26538ba1edcd73b83147a68b7e4069084230	2019-06-04 15:09:36 -07:00
Michael Suo	df0b83654a	cleanups to alias analysis (#21221 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21221 ghimport-source-id: 778e7317bbe874d35a903d89af5e0bc9721c8680 Reviewed By: jamesr66a Differential Revision: D15592313 Pulled By: suo fbshipit-source-id: d6f6d2be8cd80b40dd26d0bb3be30f074e356105	2019-06-04 15:09:33 -07:00
Brennan Vincent	77c2f5dd75	fix copyright notice in docs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21372 Differential Revision: D15631889 Pulled By: umanwizard fbshipit-source-id: cf764432c27cb1b01d8137ed60ec7de361450d0e	2019-06-04 14:53:45 -07:00
Cheng,Penghui	57f932a638	Enable 'empty' function for mkldnn Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21184 Differential Revision: D15625296 Pulled By: bddppq fbshipit-source-id: 47d26798bcf48e227ffd813f299959a7b8993641	2019-06-04 14:16:13 -07:00
Mingzhe Li	b869a3b4ac	add new ops to benchmark_all_test (#21365 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21365 This diff adds new operators to benchmark_all_test so all the supported ops can be built as one binary Reviewed By: hl475 Differential Revision: D15627328 fbshipit-source-id: b7ca550a279f485102a6a6bd47e4032c7beb9940	2019-06-04 13:54:26 -07:00
Horace He	2ed6f017ed	Added better tests for math ops and unified them (#21125 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21125 ghimport-source-id: 2a576b563208ce3d83e6771643e20d24bc72af86 Differential Revision: D15563188 Pulled By: Chillee fbshipit-source-id: 0e77471729f715063d6bee075d2fc65f8db8b6c3	2019-06-04 13:15:54 -07:00
Horace He	6938de8851	made floor/ceil return ints (#21124 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21124 ghimport-source-id: e3e45bd50c9af1ee03fd58f2f4d631ce23d9612e Differential Revision: D15563187 Pulled By: Chillee fbshipit-source-id: 6504a41da883a8287d64db20d40cf958edb7404c	2019-06-04 10:32:16 -07:00
Syed Tousif Ahmed	87690d2b77	Move THCTensor_(cauchy) to ATen (#21289 ) Summary: ## Effective Bandwidth Benchmark - using https://gist.github.com/syed-ahmed/f8b7384d642f4bce484228b508b4bc68 - on V100 ### Float Type #### Before: ``` cauchy, size, elements 65536 forward 4.980564117431641e-06 bandwidth (GB/s) 52.63339529803734 cauchy, size, elements 131072 forward 6.232261657714844e-06 bandwidth (GB/s) 84.12483762631982 cauchy, size, elements 262144 forward 9.548664093017577e-06 bandwidth (GB/s) 109.81389540833959 cauchy, size, elements 524288 forward 1.59454345703125e-05 bandwidth (GB/s) 131.52052963827754 cauchy, size, elements 1048576 forward 2.86865234375e-05 bandwidth (GB/s) 146.21165262978724 cauchy, size, elements 2097152 forward 5.4748058319091796e-05 bandwidth (GB/s) 153.2220184158516 cauchy, size, elements 4194304 forward 0.00010075807571411133 bandwidth (GB/s) 166.50988897012377 cauchy, size, elements 8388608 forward 0.0001935744285583496 bandwidth (GB/s) 173.34124269355965 cauchy, size, elements 16777216 forward 0.00038077831268310545 bandwidth (GB/s) 176.24129779641603 cauchy, size, elements 33554432 forward 0.0006851387023925781 bandwidth (GB/s) 195.8986224705994 ``` #### After: ``` cauchy, size, elements 65536 forward 6.077289581298828e-06 bandwidth (GB/s) 43.13501874366419 cauchy, size, elements 131072 forward 6.2131881713867184e-06 bandwidth (GB/s) 84.38308731972373 cauchy, size, elements 262144 forward 6.46829605102539e-06 bandwidth (GB/s) 162.11008150033175 cauchy, size, elements 524288 forward 6.8783760070800785e-06 bandwidth (GB/s) 304.8905726935182 cauchy, size, elements 1048576 forward 9.505748748779296e-06 bandwidth (GB/s) 441.23867681003264 cauchy, size, elements 2097152 forward 1.5070438385009766e-05 bandwidth (GB/s) 556.6266744001266 cauchy, size, elements 4194304 forward 2.4406909942626954e-05 bandwidth (GB/s) 687.396152951685 cauchy, size, elements 8388608 forward 4.6243667602539064e-05 bandwidth (GB/s) 725.6005792706125 cauchy, size, elements 16777216 forward 9.100198745727539e-05 bandwidth (GB/s) 737.4439380404413 cauchy, size, elements 33554432 forward 0.00017449140548706055 bandwidth (GB/s) 769.1939188944922 ``` ### Double Type #### Before: ``` cauchy, size, elements 65536 forward 4.885196685791015e-06 bandwidth (GB/s) 53.660889593753055 cauchy, size, elements 131072 forward 6.229877471923828e-06 bandwidth (GB/s) 84.15703235943361 cauchy, size, elements 262144 forward 9.605884552001953e-06 bandwidth (GB/s) 109.15975455706132 cauchy, size, elements 524288 forward 1.5976428985595704e-05 bandwidth (GB/s) 131.26537863315923 cauchy, size, elements 1048576 forward 2.9621124267578124e-05 bandwidth (GB/s) 141.59840666786866 cauchy, size, elements 2097152 forward 5.5103302001953126e-05 bandwidth (GB/s) 152.23421637604707 cauchy, size, elements 4194304 forward 0.00010124444961547851 bandwidth (GB/s) 165.70998275677383 cauchy, size, elements 8388608 forward 0.0001944279670715332 bandwidth (GB/s) 172.58027487195184 cauchy, size, elements 16777216 forward 0.00034950494766235353 bandwidth (GB/s) 192.01119883668116 cauchy, size, elements 33554432 forward 0.0007002186775207519 bandwidth (GB/s) 191.67973135938277 ``` #### After: ``` cauchy, size, elements 65536 forward 5.91278076171875e-06 bandwidth (GB/s) 44.33514628129032 cauchy, size, elements 131072 forward 6.234645843505859e-06 bandwidth (GB/s) 84.09266751632889 cauchy, size, elements 262144 forward 7.433891296386719e-06 bandwidth (GB/s) 141.05344807902503 cauchy, size, elements 524288 forward 1.1401176452636719e-05 bandwidth (GB/s) 183.94171941045587 cauchy, size, elements 1048576 forward 1.960039138793945e-05 bandwidth (GB/s) 213.99082890665372 cauchy, size, elements 2097152 forward 3.434181213378906e-05 bandwidth (GB/s) 244.26806504326578 cauchy, size, elements 4194304 forward 6.517410278320313e-05 bandwidth (GB/s) 257.4215107465028 cauchy, size, elements 8388608 forward 0.0001229524612426758 bandwidth (GB/s) 272.9057365819818 cauchy, size, elements 16777216 forward 0.00023239374160766602 bandwidth (GB/s) 288.77225150621814 cauchy, size, elements 33554432 forward 0.00046050310134887696 bandwidth (GB/s) 291.4589013773367 ``` Resubmit of https://github.com/pytorch/pytorch/pull/20622 Pull Request resolved: https://github.com/pytorch/pytorch/pull/21289 Differential Revision: D15622713 Pulled By: ezyang fbshipit-source-id: abe8bd57794bd1c3a0b92395367a9653c5d0f2db	2019-06-04 08:24:42 -07:00
Edward Yang	f9e746e9c8	Use "important" node to toggle whether or not to build on PR (#21308 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21308 ghimport-source-id: 75fba872a658d8257a3f6ff9d9e33a320c6e523e Differential Revision: D15621909 Pulled By: ezyang fbshipit-source-id: 6d016d9ffdeb6414d70a1b48ed4766b5dc626353	2019-06-04 08:05:56 -07:00
peter	1291d95e82	Revert "Fix the caffe2_gpu linkage with torch on Windows" (#21335 ) Summary: The original PR (#16071) is not working anymore after `caffe2` and `torch` is unified. What's more, It is making the binary big since the optimizing flag is disabled on a very big project(the `torch` library used to be small, but it now applies on the whole `caffe2` and `caffe2_gpu` library). We need to get it reverted. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21335 Differential Revision: D15622163 Pulled By: soumith fbshipit-source-id: 900bd400106d27a1512eed1e9f2288114f5f41bb	2019-06-04 07:49:49 -07:00
Bram Vanroy	38d68ad803	Update randomness.rst (#21337 ) Summary: Following [this question on the forums](https://discuss.pytorch.org/t/reproducibility-and-performance/46504), I propose the following doc change. It clarifies that 'performance reduction' concerns the processing speed (and not the training accuracy). Related website commit: https://github.com/pytorch/pytorch.github.io/pull/211 Pull Request resolved: https://github.com/pytorch/pytorch/pull/21337 Differential Revision: D15622151 Pulled By: soumith fbshipit-source-id: f0edeb20049f2ee715c400e7c57abb966864d621	2019-06-04 07:38:00 -07:00
Edward Yang	ae42a11ab2	Make .circleci Conf class uses dataclasses; use types. (#21284 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21284 ghimport-source-id: a628af85b7a085e15903168e957bed1e273d6636 Differential Revision: D15621908 Pulled By: ezyang fbshipit-source-id: 64e2da8b96cdc1b53c0b314771d225eebf3d4b2d	2019-06-04 07:28:53 -07:00
Sam Gross	25a6ff10f0	Add gtest for TensorIterator (#21253 ) Summary: This adds a regression test for the bug fix in #21236. Operations involving CUDA tensors an CPU scalars should not copy the CPU scalar to the device (because that is slow). They should instead "lift" the scalar to a kernel parameter. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21253 Reviewed By: bddppq Differential Revision: D15604080 Pulled By: colesbury fbshipit-source-id: c14ded5d584499eaa5ea83337ffc50278205f3d6	2019-06-04 07:23:42 -07:00
svcscm	fecd5fa171	Updating submodules Reviewed By: yns88 fbshipit-source-id: 1a30f65182aead9145cb02fb544e2b7a25043f44	2019-06-04 07:23:39 -07:00
svcscm	2ee2d78a29	Updating submodules Reviewed By: yns88 fbshipit-source-id: 0256f2f4afaeaa2c16074dbca3b9a03ca434c7de	2019-06-03 23:36:35 -07:00
Ilia Cherniavskii	af4c24153f	Honor OMP/MKL environment variables in AT_PARALLEL_NATIVE case (#21189 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21189 ghimport-source-id: 4dcfaf04880346ff5ca79ca4dd11c94dcb645ce5 Differential Revision: D15574578 Pulled By: ilia-cher fbshipit-source-id: 919fccb58b997f9a7add5486a79f9cd4cabaa1ee	2019-06-03 23:22:58 -07:00
Junjie Bai	f251416d70	Update fbgemm submodule (#21328 ) Summary: Fix master breakage cc jianyuh Pull Request resolved: https://github.com/pytorch/pytorch/pull/21328 Differential Revision: D15618649 Pulled By: bddppq fbshipit-source-id: bce279705520dbd9c6df5fb794cdaeaed48a1a5a	2019-06-03 22:17:04 -07:00
Wanchao Liang	113a27ee45	bake constants into the traced graph, get rid of getNestedValueTrace (#21046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21046 ghimport-source-id: 5cb3efb1896fbe42336e24c14fbf0bb5e646528e Differential Revision: D15530991 Pulled By: wanchaol fbshipit-source-id: b096ca5a1cdce496742b7f7e1de3ef8d21e9a8b0	2019-06-03 21:48:11 -07:00
Zachary DeVito	cf356a342b	Fix a bug in loop unrolling (#21239 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21239 ghimport-source-id: 68256b752be795b32ab3f426848ed1d64fc5ea3e Reviewed By: suo Differential Revision: D15590901 Pulled By: zdevito fbshipit-source-id: 8700aab723d4486fd20d3414df8160b36a3cc5da	2019-06-03 21:35:14 -07:00
Zachary DeVito	6e657c5586	Add CallMethod, inline eagerly (#21116 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21116 ghimport-source-id: 3c47e335dd80f52216e50e0a215cedc1862a9e78 Reviewed By: eellison Differential Revision: D15552816 Pulled By: zdevito fbshipit-source-id: 708fe87439d94117dca0a26c98f0917f497f718f	2019-06-03 21:35:11 -07:00
Jianyu Huang	0f58d20fe4	Add quantized::fbgemm_linear_unpack operator for serialization (#97 ) Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/97 Pull Request resolved: https://github.com/pytorch/pytorch/pull/20721 - FBGEMM: Add unpack function for PackBMatrix class: Unpack pmat buffer to the origin_buf (Used for the serialization to recover weight matrix). - PyTorch Quantizer: Add quantized::fbgemm_linear_unpack operator for serialization. Reviewed By: zafartahirov Differential Revision: D15314568 fbshipit-source-id: 12080c8887ce31dc849d23e132ae1766ac319407	2019-06-03 20:36:30 -07:00
Hong Xu	4b576e5184	Do not hardcode build_dir in build_caffe2. Use the build_dir parameter. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21296 Differential Revision: D15613035 Pulled By: bddppq fbshipit-source-id: 19313cbe0135581990d489f489d366d00962a3c3	2019-06-03 20:31:30 -07:00
Jiakai Liu	702ba3d2fb	build torch for libtorch mobile build (#21234 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21234 ghimport-source-id: 8d401691a811991c79acf5e09e60389910910365 Differential Revision: D15616540 Pulled By: ljk53 fbshipit-source-id: 150e706630911bf14c55f47f4058eaada1edf1cc	2019-06-03 19:51:05 -07:00
Bram Wasti	82ceeaeca2	Add options to jit's operator constructor (#21315 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21315 ghimport-source-id: 168ddecb333a8cb309e7b859683de9b077123205 Differential Revision: D15614506 Pulled By: bwasti fbshipit-source-id: ae013a88e2069c38845b5b8ff805db96ab2c29e9	2019-06-03 19:30:22 -07:00
root	457c0f164e	insert missing #pragma once in VariableTypeUtils.h (#21134 ) Summary: insert missing #pragma once keyword to prevent redefinition error Pull Request resolved: https://github.com/pytorch/pytorch/pull/21134 Differential Revision: D15607673 Pulled By: li-roy fbshipit-source-id: 0000fa18e3c55e5d36a64b171d6e85eb4bc211a1	2019-06-03 17:50:56 -07:00
Lu Fang	1c5bd1fa65	Automatic update of fbcode/onnx to 5160f3ac3380302224998f1c95e111cd961c4bc5 (#21311 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21311 Previous import was 9005291283e943f1a91da5f0acf218bc4e8eb2ca Included changes: - [5160f3ac](https://github.com/onnx/onnx/commit/5160f3ac): Fix typo (#2069) <Takeshi Watanabe> - [ac218ac6](https://github.com/onnx/onnx/commit/ac218ac6): Add a missing step when upgrading an operator (#2071) <daquexian> - [5972eed9](https://github.com/onnx/onnx/commit/5972eed9): Clarify the axis/size in pads, strides, dilations (#2048) <daquexian> Reviewed By: bddppq Differential Revision: D15612734 fbshipit-source-id: 235dc3d49e4a6ccd4f43e6c2f648e87611d52697	2019-06-03 17:35:53 -07:00
James Reed	02fd1878e3	Cast dropout to float in RNN (#21304 ) Summary: This solves the situation where, for example, someone instantiates LSTM with `dropout=0`, a Python integer. This works fine in Python, but JIT throws a type error because it expected float but got int Resolves https://github.com/pytorch/lockdown/issues/65 Pull Request resolved: https://github.com/pytorch/pytorch/pull/21304 Differential Revision: D15613153 Pulled By: jamesr66a fbshipit-source-id: eabff76e3af3de0612583b37dbc5f7eab7e248a4	2019-06-03 16:59:04 -07:00
Chunli Fu	45de3ef6a7	Export feature length information for onnxifi operator (#21303 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21303 Export feature length information for onnxifi operator recommit for D15548138 disable caffe2_extract_feature_length_for_shape_inference by default change LOG(INFO) to VLOG(4) change LOG(WARNING) to LOG_EVERY_N(WARNING, 1000) Reviewed By: yinghai, ipiszy Differential Revision: D15608620 fbshipit-source-id: f96410366fe6bae954fea9d6b50ee72f4969d024	2019-06-03 16:53:06 -07:00
Ailing Zhang	7c823312d3	hub doc improvements Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21307 Differential Revision: D15610441 Pulled By: ailzhang fbshipit-source-id: 2b2a28ed808936cf7c93db31afc6b5ea888ab1b1	2019-06-03 16:29:39 -07:00
Spandan Tiwari	22865d4ce1	Add ONNX export support for torch.rand. (#20559 ) Summary: This PR adds support for torch.rand export in the PyTorch ONNX exporter. There are other generator ops that need to be supported for export and they will added in subsequent PRs. This op is needed with priority for a model on our end. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20559 Differential Revision: D15379653 Pulled By: houseroad fbshipit-source-id: d590db04a4cbb256c966f4010a9361ab8eb3ade3	2019-06-03 16:09:01 -07:00
Xing Wang	7d84ca6e06	clean code to unify the logic to use fp16 by the optimizer engine (#20915 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20915 Clean the unary processor code. Some question are added into the comments to seek suggestions. Reviewed By: pjh5 Differential Revision: D15448502 fbshipit-source-id: ef0c45718c1a06187e3fe2e4e59b7f20c641d9c5	2019-06-03 15:03:35 -07:00
Mingzhe Li	3004b397f0	change test_name to be globally unique value across tests (#21206 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21206 This diff change the default test_name to be a globally unique value across tests. With that, users can list all the tests and choose to run a specific test. Reviewed By: zheng-xq Differential Revision: D15543508 fbshipit-source-id: 0814ef6a60d41637fed5245e30c282497cf21bb8	2019-06-03 14:55:11 -07:00
Mingzhe Li	ca80ec7c97	introduce a new intrace to add op [PT changes] (#21149 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21149 The diff modifies the interface for PyTorch operators in the benchmark suite Reviewed By: zheng-xq Differential Revision: D15433897 fbshipit-source-id: e858183431eb37d90313356716c2de8709372b58	2019-06-03 14:55:08 -07:00
Elias Ellison	88d033f842	don't materialize constants (#21229 ) Summary: This doesn't affect anything because we run constant pooling, and in the case of Closures and Forks creates unnecessary closures over constants. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21229 Differential Revision: D15587764 Pulled By: eellison fbshipit-source-id: d5609b0a5697071fab5050eb9e03876ab9ebb27a	2019-06-03 13:36:57 -07:00
BowenBao	9a41f44732	Improve ONNX Loop export (#20445 ) Summary: ~~This is work in progress due to its dependency on multiple pending PRs.~~ - [x] ONNX: Relax constraint on subgraph input/output type & shape check. https://github.com/onnx/onnx/pull/2009 - [x] PyTorch: Add infra to test_pytorch_onnx_caffe2.py to test ScriptModule models. https://github.com/pytorch/pytorch/pull/20256 This PR should partially resolve https://github.com/pytorch/pytorch/issues/17531. However, ideally we shouldn't need to put cast(and reshape) node to help the conversion for loop condition. - Added cast node for condition values before entering loop node. The ONNX spec only accepts Bool type, while in PyTorch if the condition value is an output from other node it could potentially have any integral type. - Tidying up the exported ONNX loop subgraph input type & shape. According to ONNX spec, input "M" is exported as 0-d scalar tensor with type int64. input "Cond" is exported as incomplete tensor of type Bool without shape information. This is because through out the iteration, the rank of condition value is dynamic, either 0-d or 1-d, as long as it holds a single value. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20445 Differential Revision: D15534188 Pulled By: houseroad fbshipit-source-id: d174e778529def05ee666afeee4b8fb27786e320	2019-06-03 13:00:00 -07:00
mal	4980b8b95c	Renaming member variables in engine.cpp/h (#21283 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21283 ghimport-source-id: 360a138e420ace3cd4ca6ccbc761c8e68319440d Differential Revision: D15607428 fbshipit-source-id: f8df6b42796a49c4d68fa8366b6a68d5715f6421	2019-06-03 12:54:50 -07:00
Jianyu Huang	37fed9b24a	Rename FC to Linear for the function name (#21268 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21268 As Title says. Reviewed By: zafartahirov Differential Revision: D15599232 fbshipit-source-id: 0046f933657f60807fdca7009676bfb052748d91	2019-06-03 11:55:35 -07:00
Jianyu Huang	63b3c5a66a	Replace AT_ASSERTM with TORCH_CHECK (#21267 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21267 Replace AT_ASSERTM with TORCH_CHECK: AT_ASSERTM is deprecated. Not sure when ```AT_ASSERT``` is dprecated with some new TORCH ASSERT function. Reviewed By: zafartahirov Differential Revision: D15599242 fbshipit-source-id: 23f21a9a23dc3c147dc817e6d278066d0832e08d	2019-06-03 11:47:14 -07:00
Natalia Gimelshein	ad971a37d0	Improve performance of advanced indexing backward (#20557 ) Summary: This PR improves performance of advanced indexing backward, partially solving #15245 (performance is still worse than gather, but not by such outrageous margins). Before, using benchmarking harness from #15245, cuda 10/V100: ``` Indexing is faster by at most -270.61607820767887 us on N: 16 D: 256 K: 1 Indexing is slower by at most 11127.466280784833 us on N: 16 D: 4096 K: 4096 ``` after: ``` Indexing is faster by at most 23.524456737696028 us on N: 512 D: 4096 K: 4096 Indexing is slower by at most 186.24056029472553 us on N: 16 D: 1024 K: 4096 ``` Strategy is to reuse embedding backward kernel, adapting it to handle unindexed dimensions in the beginning by launching additional threadblocks, and also allowing it to handle slices that are bigger than `65K*128`, that is hardly ever a problem for embedding. Still, integer indexing is baked in the kernel, and is important for performance, so for now bigger than 2G element tensors are not supported. The main savings come from not having to expand index to all unindexed dimensions, and not sorting expanded index with incoming gradient values, but rather only sorting unexpanded index. There are ways to make sorting overhead smaller (thanks mcarilli for suggestions) but I'll get to it when it becomes a real problem, or rather, when cuda graphs will force us to get rid of thrust::sort calls. I've also added tests for indexing backward, before tests for index_put_ and indexing backward were non-existent. This PR also fixes #20457 by casting indices to `self` backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20557 Differential Revision: D15582434 Pulled By: ezyang fbshipit-source-id: 91e8f2769580588ec7d18823d99a26f1c0da8e2a	2019-06-03 11:38:53 -07:00
James Reed	4ac732ed7a	file:line for tracing (#21247 ) Summary: Stacked on https://github.com/pytorch/pytorch/pull/21217 This adds support for recording file and line information during tracing, by extracting the top Python interpreter frame Pull Request resolved: https://github.com/pytorch/pytorch/pull/21247 Reviewed By: suo, driazati Differential Revision: D15594553 Pulled By: jamesr66a fbshipit-source-id: 72e1b3a46f1dabe3e83a608ec1a7d083bd1720f9	2019-06-03 11:13:49 -07:00
Diego Estrada	27d1daab45	Export ONNX Dropout for opset 10 (#20710 ) Summary: Remove Dropout from the opset 10 blacklist. ONNX Dropout was modified in opset 10, but only the output "mask" was modified, which is not exported in pytorch opset 9. So we can still fallback on the opset 9 op. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20710 Differential Revision: D15571248 Pulled By: houseroad fbshipit-source-id: 15267eb63308a29a435261034b2f07324db1dea6	2019-06-03 10:59:56 -07:00
Lucas Hendren	770089c2b8	math module support: isnan, asinh, atanh, cosh, sinh, and tanh (#19337 ) Summary: driazati and eellison Please review This PR is for #19026 . Specifically, isnan, asinh, atanh, cosh, sinh, and tanh Pull Request resolved: https://github.com/pytorch/pytorch/pull/19337 Differential Revision: D15580932 Pulled By: driazati fbshipit-source-id: 38513fa59088e038264f9f6f0d6374a13a165589	2019-06-03 10:54:42 -07:00
Elias Ellison	fb72625267	Remove onnx export expects (#21238 ) Summary: We're not getting much from checking the export strings, and they are noisy and slow development. DIdn't realize they existed until now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21238 Differential Revision: D15604256 Pulled By: eellison fbshipit-source-id: 488e9401231228cffe132dab99d519563fa63afc	2019-06-03 10:30:12 -07:00
shihongzhi	2e59a0a646	add contiguous function type hint for tensor (#21285 ) Summary: Fixes #21261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/21285 Differential Revision: D15604270 Pulled By: soumith fbshipit-source-id: c1c02348e338477a507052de0a1065cf42a99387	2019-06-03 10:17:03 -07:00
Natalia Lunova	96667dfe41	Write add_scalars data in the same file (#21100 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21100 Added multifile flag to write scalar data into separate files. This can slow down dashboard loading. Reviewed By: orionr Differential Revision: D15548913 fbshipit-source-id: dd39a7f76f93025d28f14babbf933e39860e6910	2019-06-03 09:53:27 -07:00
peter	5b33698776	Fix build error in c10 on Windows (#21005 ) Summary: Targets https://github.com/pytorch/pytorch/issues/20635#issuecomment-496265510 Reference: 1. https://docs.microsoft.com/en-us/cpp/preprocessor/predefined-macros?view=vs-2015#microsoft-specific-predefined-macros 2. https://docs.microsoft.com/en-us/cpp/cpp/deprecated-cpp?view=vs-2019 Pull Request resolved: https://github.com/pytorch/pytorch/pull/21005 Differential Revision: D15543134 Pulled By: ezyang fbshipit-source-id: f32709b018a7de651cb31575fc6117bfc4dd3bd1	2019-06-03 09:53:24 -07:00
Syed Tousif Ahmed	155f767382	Move THCTensor_{normal, normal_means, normal_stddevs, normal_means_stddevs} to ATen (#21287 ) Summary: ## Effective Bandwidth Benchmark - using https://gist.github.com/syed-ahmed/f8b7384d642f4bce484228b508b4bc68 - on V100 ### Float Type #### Before: ``` normal, size, elements 65536 forward 4.956722259521484e-06 bandwidth (GB/s) 52.88656218258779 normal, size, elements 131072 forward 5.285739898681641e-06 bandwidth (GB/s) 99.18914098114568 normal, size, elements 262144 forward 7.548332214355469e-06 bandwidth (GB/s) 138.91492454529376 normal, size, elements 524288 forward 1.1980533599853516e-05 bandwidth (GB/s) 175.0466273076219 normal, size, elements 1048576 forward 2.091646194458008e-05 bandwidth (GB/s) 200.52645667862762 normal, size, elements 2097152 forward 3.9961338043212894e-05 bandwidth (GB/s) 209.91809610901498 normal, size, elements 4194304 forward 7.39765167236328e-05 bandwidth (GB/s) 226.79110538115253 normal, size, elements 8388608 forward 0.0001377725601196289 bandwidth (GB/s) 243.5494555001696 normal, size, elements 16777216 forward 0.0002710080146789551 bandwidth (GB/s) 247.62686107087774 normal, size, elements 33554432 forward 0.0005375170707702637 bandwidth (GB/s) 249.69947058177252 ``` #### After: ``` normal, size, elements 65536 forward 6.198883056640625e-06 bandwidth (GB/s) 42.288908760615385 normal, size, elements 131072 forward 6.756782531738281e-06 bandwidth (GB/s) 77.59432800112916 normal, size, elements 262144 forward 7.560253143310547e-06 bandwidth (GB/s) 138.6958849291706 normal, size, elements 524288 forward 7.550716400146485e-06 bandwidth (GB/s) 277.7421225831386 normal, size, elements 1048576 forward 1.1034011840820313e-05 bandwidth (GB/s) 380.1250225673293 normal, size, elements 2097152 forward 1.802682876586914e-05 bandwidth (GB/s) 465.34019427102237 normal, size, elements 4194304 forward 2.8417110443115234e-05 bandwidth (GB/s) 590.3913430460946 normal, size, elements 8388608 forward 4.8711299896240235e-05 bandwidth (GB/s) 688.8428777608927 normal, size, elements 16777216 forward 9.685993194580078e-05 bandwidth (GB/s) 692.8444265018856 normal, size, elements 33554432 forward 0.00018213510513305663 bandwidth (GB/s) 736.9130069787966 ``` ### Double Type #### Before: ``` normal, size, elements 65536 forward 5.8841705322265624e-06 bandwidth (GB/s) 44.55071425348461 normal, size, elements 131072 forward 8.018016815185547e-06 bandwidth (GB/s) 65.38873789925661 normal, size, elements 262144 forward 1.2989044189453124e-05 bandwidth (GB/s) 80.72772597474304 normal, size, elements 524288 forward 2.2075176239013673e-05 bandwidth (GB/s) 95.00046465285668 normal, size, elements 1048576 forward 4.1041374206542965e-05 bandwidth (GB/s) 102.19696784254678 normal, size, elements 2097152 forward 7.57598876953125e-05 bandwidth (GB/s) 110.72624650312186 normal, size, elements 4194304 forward 0.00013725996017456056 bandwidth (GB/s) 122.22949779865557 normal, size, elements 8388608 forward 0.0002614736557006836 bandwidth (GB/s) 128.32815569921402 normal, size, elements 16777216 forward 0.0005080199241638184 bandwidth (GB/s) 132.0988819689674 normal, size, elements 33554432 forward 0.0009479570388793945 bandwidth (GB/s) 141.58629821311564 ``` #### After: ``` normal, size, elements 65536 forward 5.991458892822265e-06 bandwidth (GB/s) 43.75294977222444 normal, size, elements 131072 forward 7.293224334716797e-06 bandwidth (GB/s) 71.88699756626349 normal, size, elements 262144 forward 8.094310760498048e-06 bandwidth (GB/s) 129.54481623281296 normal, size, elements 524288 forward 1.2805461883544922e-05 bandwidth (GB/s) 163.7701177100726 normal, size, elements 1048576 forward 2.2592544555664064e-05 bandwidth (GB/s) 185.64991604491345 normal, size, elements 2097152 forward 3.801822662353516e-05 bandwidth (GB/s) 220.6470092112881 normal, size, elements 4194304 forward 6.761550903320313e-05 bandwidth (GB/s) 248.1267425164457 normal, size, elements 8388608 forward 0.00013209104537963867 bandwidth (GB/s) 254.02503177684966 normal, size, elements 16777216 forward 0.0002667689323425293 bandwidth (GB/s) 251.56176699703818 normal, size, elements 33554432 forward 0.0004705166816711426 bandwidth (GB/s) 285.25604559501795 ``` Resubmit of #20621 Pull Request resolved: https://github.com/pytorch/pytorch/pull/21287 Differential Revision: D15603695 Pulled By: ezyang fbshipit-source-id: f8c5032678d503d45ac99fb1475a929df7c2b361	2019-06-03 09:45:02 -07:00
Nikolay Korovaiko	21113c2d36	EliminateGuards Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21070 Differential Revision: D15603561 Pulled By: Krovatkin fbshipit-source-id: 03056688e8b99eddcb30d80cc20ab37ad3f13af2	2019-06-03 09:39:45 -07:00
Sam Gross	c8539be962	Make is_contiguous checks generic in number of arguments (#21106 ) Summary: Loops.h has contains specializations for cases where all the inputs are contiguous as well as cases where one input is a scalar and all other inputs are contiguous. Previously, there were separate checks for each functions that take zero, one, or two input arguments. This is getting unwieldy, especially once we add support for functions that take three inputs (#21025). This requires the use of recursive templates (which have their own downsides), but this seems better than the alternative. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21106 Differential Revision: D15562430 Pulled By: colesbury fbshipit-source-id: 5f19ab2212e16e29552887f4585c2b4a70309772	2019-06-03 09:19:19 -07:00
Hong Xu	b159e0ce08	Significantly simplify the spawning of pytorch libs building process. (#21105 ) Summary: Instead of attempting to hardcode calls to "ninja" or "make", we should always let cmake do it. This better integrates build configurations (DEBUG or REL_WITH_DEB_INFO) and better handles the case in which the native build tool is not in PATH (cmake has some capacity to find them and has options for users to specify their locations). Pull Request resolved: https://github.com/pytorch/pytorch/pull/21105 Differential Revision: D15602883 Pulled By: soumith fbshipit-source-id: 32ac46d438af00e791defde6ae5ac21c437d0bb0	2019-06-03 08:28:19 -07:00
Shen Li	f62a006097	Retry Fix Python DataParallel RNN in no_grad mode (#21262 ) Summary: Retry #21197 The previous one failed because it uses some Python3 only syntax. ezyang Do we still have multi-GPU py2 tests? I am curious why the CI tests did not catch this error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21262 Differential Revision: D15598941 Pulled By: mrshenli fbshipit-source-id: 95f416589448c443685d6d236d205b011998a715	2019-06-03 08:04:35 -07:00
Xiaomeng Yang	0c6efbd410	Fix gelu documents (#21265 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21265 Fix gelu documents Reviewed By: hl475 Differential Revision: D15598958 fbshipit-source-id: 483040069102daada705401c36c8990598142d3d	2019-06-02 20:17:56 -07:00
Xiaomeng Yang	eaa3ba6587	Add autograd for layer_norm on CPU (#20883 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20883 Add autograd for layer_norm on CPU, after this diff, both PyTorch and jit model can automatically benefit from performance improvement of nn.functional.layer_norm Reviewed By: zheng-xq Differential Revision: D15483790 fbshipit-source-id: 94ed3b16ab6d83ca6c254dbcfb224ff7d88837f3	2019-06-02 16:55:32 -07:00
Xiaomeng Yang	31c79b71ff	Add gelu gradient for pytorch (#21237 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21237 Add gelu gradient for pytorch Reviewed By: zheng-xq Differential Revision: D15589816 fbshipit-source-id: 76fda7c413afed5b6cc3abe3a26c258d393a53ce	2019-06-02 09:42:42 -07:00
Xiaomeng Yang	93ae040ff0	Add gelu activation in pytorch (#20665 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20665 Add gelu activation forward on CPU in pytorch Compare to current python implemented version of gelu in BERT model like def gelu(self, x): x * 0.5 * (1.0 + torch.erf(x / self.sqrt_two)) The torch.nn.functional.gelu function can reduce the forward time from 333ms to 109ms (with MKL) / 112ms (without MKL) for input size = [64, 128, 56, 56] on a devvm. Reviewed By: zheng-xq Differential Revision: D15400974 fbshipit-source-id: f606b43d1dd64e3c42a12c4991411d47551a8121	2019-06-02 09:08:47 -07:00
Karl Ostmo	aac424a6c4	Revert D15577342: [pytorch][PR] Fix Python DataParallel RNN in no_grad mode Differential Revision: D15577342 Original commit changeset: 1a024c572171 fbshipit-source-id: 9a3ddc14ebb2d75d9dc3ee1fe69df9ffba3529de	2019-06-01 22:17:19 -07:00
Zafar Takhirov	360e6d1b0b	Fixes a bug in the test (#21146 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21146 The error was reported by https://our.intern.facebook.com/intern/test/562949965807317?ref_report_id=1837062 The API changed from `a.quantize_linear(...)` to `torch.quantize_linear(a, ...)` Reviewed By: dskhudia Differential Revision: D15557418 fbshipit-source-id: 88463e09fdf1f574f1b8128f6a00c2810091cd03	2019-06-01 18:00:33 -07:00
James Reed	62ae348d1a	Exclude file:line from graphs used for fuser kernel cache (#21252 ) Summary: cc ezyang this is meant to fix the fuser failures on master Pull Request resolved: https://github.com/pytorch/pytorch/pull/21252 Differential Revision: D15594283 Pulled By: jamesr66a fbshipit-source-id: 85f37e78b2de051c92ade3fe4c44c7530b4542e5	2019-06-01 16:18:55 -07:00
Yinghai Lu	7c40576c61	Save the weight shape info the first time we have chance to extract it (#21233 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21233 It is possible that OnnxifiOp is created in a thread where weights have been cleaned from the workspace, which is legit use case as we can create the backend once and lower all the weights. So we need to extract the weight shape info the first time we create the backend and save it. Reviewed By: bertmaher, rdzhabarov Differential Revision: D15587237 fbshipit-source-id: 1f264dc32c0398c42b618e9c41c119eb13e1c9f1	2019-06-01 12:55:29 -07:00
Jiang Wu	0efc527dd1	Revert D15548138: Export feature length information for onnxifi operator Differential Revision: D15548138 Original commit changeset: 460118648bb4 fbshipit-source-id: 1a25ca2942d804f6c88e96c436f09f68c260b9be	2019-06-01 12:41:47 -07:00
Shen Li	51ebbe970a	Fix Python DataParallel RNN in no_grad mode (#21197 ) Summary: Fixes #21108 When grad is disabled, Python autograd function outputs are [wrapped as detached aliases](`8cde4c4d22/torch/csrc/autograd/python_function.cpp (L395-L399)`), which prevents calling `Tensor.set_()` on them after recent changes in Tensors and Variables. This will hit a problem when users would like to call `rnn.flatten_parameters()` in the forward pass, as the function [calls `set_()`](`9d09f5df6c/aten/src/ATen/native/cudnn/RNN.cpp (L669)`). The proposed solution is to avoid using an autograd Broadcast if in no_grad mode. apsdehal Pull Request resolved: https://github.com/pytorch/pytorch/pull/21197 Differential Revision: D15577342 Pulled By: mrshenli fbshipit-source-id: 1a024c572171a3f2daca9454fd3ee6450d112f7c	2019-06-01 10:37:57 -07:00
Tongzhou Wang	f051fbd4a8	Fix typo in test_dataloader Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21226 Differential Revision: D15592797 Pulled By: soumith fbshipit-source-id: b9a83e574c7b10fb0d661332ab68e376409a4724	2019-06-01 10:30:14 -07:00
Natalia Gimelshein	d168a8533f	compare scalar device with common device (#21236 ) Summary: I think there was a typo in #20690 here https://github.com/pytorch/pytorch/pull/20690/files#diff-b47a50873394e38a005b4c1acd151957R130. Original conditional was ` common_backend == Backend::CUDA && op.tensor.type().backend() == Backend::CPU)`, now it is `op.device.is_cuda() && op.tensor.device().is_cpu()`. It seems that `op.device` and `op.tensor.device()` should be the same, so this conditional is never true. This leads to spurious h2d copies for operations between cuda tensors and cpu scalars, because cpu scalars are now sent to gpu, instead of being passed to lambdas directly. Unfortunately, I don't know how to test this change, because functionally everything was fine after #20690, it was just a performance regression. cc colesbury Pull Request resolved: https://github.com/pytorch/pytorch/pull/21236 Differential Revision: D15592754 Pulled By: soumith fbshipit-source-id: 105bfecc61c222cfdb7294a03c9ecae3cc7f5817	2019-06-01 10:24:31 -07:00
Hans Lee	41b17e2458	Fix wrong type hints for Tensor.is_cuda, is_leaf (#21192 ) Summary: `Tensor.is_cuda` and `is_leaf` is not a predicate function but a `bool` attribute. This patch fixes the type hints in `torch/__init__.pyi` for those attributes. ```diff - def is_cuda(self) -> bool: ... + is_cuda: bool - def is_leaf(self) -> bool: ... + is_leaf: bool ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/21192 Differential Revision: D15592766 Pulled By: soumith fbshipit-source-id: 8c4ecd6939df8b8a8a19e1c9db6d40193bca7e4a	2019-06-01 10:04:52 -07:00
peter	be7fc40621	Fix `sccache not being used on Windows` (#21248 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/21167. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21248 Differential Revision: D15592742 Pulled By: soumith fbshipit-source-id: 4add002698c13301f142526cd783c866d345bf5e	2019-06-01 09:47:39 -07:00
James Reed	619261d7a7	Add file-line info for jit.load and string frontend (#21217 ) Summary: This makes file-line reporting also work for things loaded using `torch.jit.load()` as well as the string frontend (via `CompilationUnit`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/21217 Differential Revision: D15590838 Pulled By: jamesr66a fbshipit-source-id: 6b6a12574bf9eca0b83f24f0b50535fda5863243	2019-05-31 23:43:15 -07:00
Owen Anderson	b663eec119	Lazily build error strings in schema matching using replay. (#21241 ) Summary: Saves ~20% (5.3s -> 4.3s) loading DenseNet on my laptop. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21241 Differential Revision: D15590338 fbshipit-source-id: 2c8aebc829d4ea46f358d74d396cc44f5f57fcf5	2019-05-31 23:34:20 -07:00
Sovvo	5bc7c1f83d	fix contribution and governance links (#21243 ) Summary: Updated web links on contribution_guide and governance documentation Pull Request resolved: https://github.com/pytorch/pytorch/pull/21243 Differential Revision: D15591065 Pulled By: soumith fbshipit-source-id: fdcfc518605a08a2ac35a10c146122d7d0a3f609	2019-05-31 21:02:13 -07:00
Chunli Fu	85786bea7d	Export feature length information for onnxifi operator (#21110 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21110 Export feature length information for onnxifi operator Reviewed By: ipiszy Differential Revision: D15548138 fbshipit-source-id: 460118648bb4467c096f79dea524060c9524f23d	2019-05-31 20:25:34 -07:00
Mingzhe Li	516ea33f6a	add PT maxpool and avgpool ops to the benchmark suite (#21200 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21200 This diff adds MaxPool1d/2d/3d and AvgPool1d/2d/3d to the benchmark suite. Reviewed By: hl475 Differential Revision: D15541980 fbshipit-source-id: 394d136ee94a16ee24285939323ca5fe317e99d3	2019-05-31 19:35:29 -07:00
Mingzhe Li	dceea73460	add PT conv and convtranspose ops to the benchmark suite (#21199 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21199 This diff adds Conv1d, ConvTranspose1d, Conv2d, ConvTranspose2d, Conv3d, and ConvTranspose3d operators to the benchmark suite. Reviewed By: hl475 Differential Revision: D15520817 fbshipit-source-id: 5512afec2be8a1036fbcd170f70265c7e455fcde	2019-05-31 19:35:25 -07:00
Mingzhe Li	2d75d31398	add PT linear op to the benchmark suite (#21204 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21204 as title Reviewed By: hl475 Differential Revision: D15484743 fbshipit-source-id: 7094a983e370e1c3952021146b58b844874b7d5e	2019-05-31 19:35:22 -07:00
Mingzhe Li	00b3e69211	add PT batchnorm op to the benchmark suite (#21201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21201 as title Reviewed By: hl475 Differential Revision: D15482581 fbshipit-source-id: d93713a35be41e76d077df419cb24585f69d72eb	2019-05-31 19:35:18 -07:00
Mingzhe Li	ed1078bde3	migrate matmul operator to the new interface (#21198 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21198 as title Reviewed By: hl475 Differential Revision: D15325768 fbshipit-source-id: a5d7c6837cd09445e75846660d12807dd26af6cc	2019-05-31 19:35:15 -07:00
Michael Suo	c8dc707fee	avoid multiple writes to files on export (#21186 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21186 ghimport-source-id: 2f62fed50e0d74f4162b74b6a2f44b8baa376316 Differential Revision: D15581527 Pulled By: suo fbshipit-source-id: b1150cfa47d8df6f217f048c742a5ba9fa7f7935	2019-05-31 19:14:46 -07:00
Junjie Bai	4c19421f16	Register gradient op with engine (#21205 ) Summary: cc dreiss Pull Request resolved: https://github.com/pytorch/pytorch/pull/21205 Differential Revision: D15578948 Pulled By: bddppq fbshipit-source-id: ef285174e8637daef624c8088ebd903a70582345	2019-05-31 18:48:47 -07:00
James Reed	daa1e2de1a	Add file:line:graph to graph printout (#21180 ) Summary: Example: ``` import torch torch.jit.script def foo(x): y = torch.neg(x) return x - y print(foo.graph.debug_str()) ``` ``` graph(%x : Tensor): %2 : int = prim::Constant[value=1]() %y : Tensor = aten::neg(%x) # demo.py:5:9 %3 : Tensor = aten::sub(%x, %y, %2) # demo.py:6:12 return (%3) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/21180 Differential Revision: D15583548 Pulled By: jamesr66a fbshipit-source-id: 0c6dc2fb7555c01dde9c563b78422ef234b2681b	2019-05-31 18:14:18 -07:00
Aapo Kyrola	678dc44d4c	use _sparse_coo_tensor_unsafe in coalesce for speedup (#21214 ) Summary: Studied why sparse tensor coalesce was slow: issue #10757. Using nv-prof, and writing a simple benchmark, I determined bulk of the time was used ``kernelTransformReduceInnermostDimIndex``, which is called when sparse tensor is constructed with sparse_coo_tensor when it does sanity check on the minimum and maximum indices. However, we do not need this sanity check because after coalescing the tensor, these min/maxs won't change. On my benchmark with 1 million non-zeros, the runtime of coalesce. was about 10x from 0.52s to 0.005 sec. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21214 Reviewed By: bddppq Differential Revision: D15584338 Pulled By: akyrola fbshipit-source-id: a08378baa018dbd0b45d7aba661fc9aefd3791e0	2019-05-31 17:10:05 -07:00
Yinghai Lu	9e5f1db66b	Reuse common options between ONNXIFI and TVM transformations (#21163 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21163 These two backend transformation share some common traits. Therefore we want to reuse the data struct/code as much as possible. Reviewed By: hlu1 Differential Revision: D15561177 fbshipit-source-id: 35f5d63b2b5b3657f4ba099634fd27c3af545f1b	2019-05-31 17:01:36 -07:00
Mikhail Zolotukhin	b12a5f6155	schema_matching.cpp: mark internal functions as static. (#21140 ) Summary: Some of the functions are only used in this file - mark them `static`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21140 Differential Revision: D15578076 Pulled By: Krovatkin fbshipit-source-id: 71ae67baabebd40c38ecb9292b5b8202ad2b9fc1	2019-05-31 16:40:16 -07:00
Mingzhe Li	668dbcc41b	migrate intraop benchmarks to the new interface (#21202 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21202 Migrate Ilia's op benchmarks to the new interface Reviewed By: hl475 Differential Revision: D15322577 fbshipit-source-id: 8e75d51e7ddacbd56896c55f2996a9358491d83e	2019-05-31 16:19:04 -07:00
Mingzhe Li	c62d476206	migrate add operator to the new interface (#21152 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21152 Migrate existing add benchmark to use the new op front-end Reviewed By: zheng-xq Differential Revision: D15325524 fbshipit-source-id: 34e969e1bd289913d881c476711bce9f8ac18a29	2019-05-31 16:19:00 -07:00
Jerry Zhang	fd19d06db4	remaining use of t.quantize_linear (#21219 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21219 att Differential Revision: D15583802 fbshipit-source-id: 742e8b799d67485b2d48b1458839f3f3b000f200	2019-05-31 16:05:44 -07:00
Hong Xu	4dbeb87e52	PyTorch Dockerfile should update submodules recursively. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21216 Differential Revision: D15584114 Pulled By: bddppq fbshipit-source-id: dbe0c3a54024a90fcd2c6689f8b9689ed0cd639b	2019-05-31 14:56:57 -07:00
Elias Ellison	0aeb971622	conditionally defined var better error message (#20911 ) Summary: i will do loops in a follow up after some other changes I am working on have landed Pull Request resolved: https://github.com/pytorch/pytorch/pull/20911 Differential Revision: D15497205 Pulled By: eellison fbshipit-source-id: 8cac197c6a6045b27b552cbb39e6fc86ca747b18	2019-05-31 14:32:03 -07:00
davidriazati	2f4824b2fb	Add support for recursive compilation on Modules (#20708 ) Summary: Following on #19747, this implements most of the `torch.jit.script()` changes laid out in #20939. Still to do: * Accessing a method from Python does not add it as a `ScriptMethod` (so only `export`ed methods and `forward` are compiled) * Calling a method other than `forward` on a submodule doesn't work ](https://our.intern.facebook.com/intern/diff/15560490/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/20708 Pulled By: driazati Differential Revision: D15560490 fbshipit-source-id: cc7ef3a1c2772eff9beba5f3e66546d2b7d7198a	2019-05-31 14:27:16 -07:00
Sebastian Messmer	834d678eb8	Remove old custom op implementation (#21085 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21085 Now that torch::jit::RegisterOperators() always passes through to torch::RegisterOperators() (see diffs stacked below this), we can remove the old custom op implementation. Reviewed By: dzhulgakov Differential Revision: D15542261 fbshipit-source-id: ef437e6c71950e58fdd237d6abd035826753c2e4	2019-05-31 13:51:14 -07:00
Sebastian Messmer	384d828ea5	Add aliasAnalysis to torch::RegisterOperators() (#21084 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21084 - Now AliasAnalysisKind can be set using the torch::RegisterOperators() API - This also allows us to remove the last place in torch::jit::RegisterOperators that didn't use c10 yet. Reviewed By: dzhulgakov Differential Revision: D15542097 fbshipit-source-id: ea127ecf051a5c1e567e035692deed44e04faa9e	2019-05-31 13:51:07 -07:00
Sebastian Messmer	80556761c8	c10::OperatorOptions (#21181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21181 Implement c10::OperatorOptions as a class to store metadata about operators. This is meant to replace torch::jit::OperatorOptions. Reviewed By: dzhulgakov Differential Revision: D15569897 fbshipit-source-id: 95bf0bf917c1ef2bdf32702405844e1a116d9a64	2019-05-31 13:51:00 -07:00
Sebastian Messmer	b91e0d14a7	registration options should only be callable on rvalues (#21079 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21079 They're invalidating *this, so they shouldn't be callable on non-rvalues. Reviewed By: dzhulgakov Differential Revision: D15541583 fbshipit-source-id: a2a9dafb29af03477486ea2ce9029399f557c728	2019-05-31 13:50:54 -07:00
Owen Anderson	181792176d	Implement various AliasAnalysis operations directly on top of MemoryLocations. (#21203 ) Summary: This reduces DenseNet load time by about 25% (down to 5.3s on my laptop) and gets AliasAnalysis out of the profile top hits entirely. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21203 Differential Revision: D15578155 fbshipit-source-id: ddbb1ad25c9540b5214702830084aa51cc6fd3cb	2019-05-31 13:38:32 -07:00
Thor Johnsen	e098878d75	Cuda persistent softmax (#20827 ) Summary: Adds persistent cuda kernels that speed up SoftMax applied over the fast dimension, i.e. torch.nn.Softmax(dim=-1) and torch.nn.LogSoftmax(dim=-1). When the size is <= 1024, this code is 2-10x faster than the current code, speedup is higher for smaller sizes. This code works for half, float and double tensors with 1024 or fewer elements in the fast dimension. Numerical accuracy is on par with the current code, i.e. relative error is ~1e-8 for float tensors and ~1e-17 for double tensors. Relative error was computed against the CPU code. The attached image shows kernel time in us for torch.nn.Softmax(dim=-1) applied to a half precision tensor of shape [16384,n], n is plotted along the horizontal axis. Similar uplifts can be seen for the backward pass and for LogSoftmax. ![image](https://user-images.githubusercontent.com/41591019/58212822-b63ebb00-7cb5-11e9-910d-1fc7d8585d58.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/20827 Differential Revision: D15582509 Pulled By: ezyang fbshipit-source-id: 65805db37487cebbc4ceefb1a1bd486d24745f80	2019-05-31 13:20:15 -07:00
Tao Xu	052bab7069	Move legacy TH functions(sinh,cosh) to TensorIterator + Vec256 (#21115 ) Summary: This is a follow up on Jame's PR: https://github.com/pytorch/pytorch/pull/19041. The idea is to replace the legacy `sinh` / `cosh` ops that are being dispatched to TH with the operations defined in `Vec256` for better performance. benchmark(from Jame's script): ```python import torch, time ops = ['sinh', 'cosh'] x = torch.rand(1024, 1024) NITER = 10000 print('op', 'time per iter (ms)', 'gops/s', 'GB/s', sep='\t') for op in ops: s = time.time() for i in range(NITER): getattr(x, op)() elapsed_sec = ((time.time() - s) / NITER) print(op, elapsed_sec * 1000, (10241024/elapsed_sec)/1e9, (1024102442) / elapsed_sec / 1e9, sep='\t') ``` code on master: ``` op time per iter (ms) gops/s GB/s sinh 3.37614369392395 0.3105839369002935 2.484671495202348 cosh 3.480502033233643 0.3012714803748572 2.4101718429988574 ``` after change (on Macbook pro 2018): ``` op time per iter (ms) gops/s GB/s sinh 0.8956503868103027 1.1707425301677301 9.365940241341841 cosh 0.9392147302627564 1.1164390487217428 8.931512389773943 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/21115 Reviewed By: ljk53 Differential Revision: D15574580 Pulled By: xta0 fbshipit-source-id: 392546a0df11ed4f0945f2bc84bf5dea2750b60e	2019-05-31 12:06:26 -07:00
Jerry Zhang	7f960a9c01	remove quantize_linear from Tensor method (#21196 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21196 we'll add `quantize(quantizer)` as a tensor method later when we expose `quantizer` in Python frontend Python ``` torch.quantize_linear(t, ...) ``` C++ ``` at::quantize_linear(t, ...) ``` Differential Revision: D15577123 fbshipit-source-id: d0abeea488418fa9ab212f84b0b97ee237124240	2019-05-31 12:01:10 -07:00
Jongsoo Park	c185145d8c	remove dependency to caffe2::math and eigen (#21169 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21169 We should minimize dependency from perfkernels (we were including eigen header files only in cc files not compiled with avx or avx2 options but better to be very strict because it's easy to introduce illegal instruction errors in perfkernels) Reviewed By: salexspb Differential Revision: D15563839 fbshipit-source-id: d4b1bca22d7f2e6f20f23664d4b99498e5984586	2019-05-31 11:55:16 -07:00
Brennan Vincent	8c927b208c	improve test_docs_coverage error messages (#21029 ) Summary: Most important fix: Correct "tensor.rst" to "tensors.rst" Secondary fix: some minor English spelling/grammar fixes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21029 Differential Revision: D15523230 Pulled By: umanwizard fbshipit-source-id: 6052d8609c86efa41a4289cd3a099b2f1037c810	2019-05-31 11:13:39 -07:00
davidriazati	e13b483f58	Fix weak module cuda() _flat_weights bug (#21107 ) Summary: Dynamically creating a type at runtime was messing up the MRO and has been causing many other problems. I think it's best to delete it, this causes a regression since ```python self.linear = nn.Linear(10, 10) isinstance(self.linear, nn.Linear) ``` will now be `False` again, but this will be fixed once recursive script mode is the default (#20939) ](https://our.intern.facebook.com/intern/diff/15560549/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/21107 Pulled By: driazati Differential Revision: D15560549 fbshipit-source-id: 7bd6b958acb4f353d427d66196bb4ee577ecb1a6	2019-05-31 10:35:30 -07:00
Mingzhe Li	0223d3744a	introduce a new intrace to add op [C2 changes] (#21148 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21148 The diff modifies the interface for Caffe2 operators in the benchmark suite Reviewed By: zheng-xq Differential Revision: D15433888 fbshipit-source-id: c264a95906422d7a26c10b1f9836ba8b35e36b53	2019-05-31 09:21:07 -07:00
Mingzhe Li	31089b02ce	introduce a new interface to add op [core changes] (#21147 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21147 This diff introduces a new interface to add PT/C2 operators to the benchmark suite. The following steps are needed to add a new operator: 1. Specify the input shapes, args to an operator in configs 2. Create a PT/C2 benchmark class which includes ```init``` (create tensors), ```forward``` (specify the operator to be tested.), and ```backward```(gradient of an op.) methods 3. call generate_pt_test/generate_c2_test to create test cases based on configs Reviewed By: zheng-xq Differential Revision: D15250380 fbshipit-source-id: 1025a7cf60d2427baa0f3f716455946d3d3e6a27	2019-05-31 09:21:04 -07:00
Edward Yang	012069ca8f	Revert D15454048: Move THCTensor_{normal, normal_means, normal_stddevs, normal_means_stddevs} to ATen Differential Revision: D15454048 Original commit changeset: 8bfc57bf015b fbshipit-source-id: 98c562ab4cf7a00e9041b2aa50eb7fb0f0c48f69	2019-05-31 07:49:22 -07:00
Edward Yang	dc8f306b8e	Revert D15454052: Move THCTensor_(cauchy) to ATen Differential Revision: D15454052 Original commit changeset: 4f4d33ec11cf fbshipit-source-id: 832a738796e6b6bdf969a44bb2cdcf171cbd5f77	2019-05-31 07:49:18 -07:00
Ailing Zhang	be9ce6318e	remove import torchvision when testing torch.hub (#21132 ) Summary: This should pass once https://github.com/pytorch/vision/pull/971 is merged. To remove torchvision as baseline, we just compare to sum of all param.sum() in pretrained resnet18 model, which means we need to manually update the number only when that pretrained weights are changed, which is generally rare. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21132 Differential Revision: D15563078 Pulled By: ailzhang fbshipit-source-id: f28c6874149a1e6bd9894402f6847fd18f38b2b7	2019-05-31 07:38:30 -07:00
Edward Yang	e161360b62	Revert D15558784: [reland][pt1][quant] remove quantize_linear from Tensor method Differential Revision: D15558784 Original commit changeset: 0b194750c423 fbshipit-source-id: d180a7f76bb05ad7470f17bc3d2bd614fab16529	2019-05-31 06:20:05 -07:00
Sebastian Messmer	5fcd37bd8f	List (#21164 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21164 Write a List type to be used in operator kernels. This abstracts away from the concrete list type used (e.g. std::vector vs SmallVector) and allows us to change these implementation details without breaking the kernel API. Also, this class allows for handling List<bool>, which would not work with ArrayRef because vector<bool> is a bitset and can't be converted to ArrayRef<bool>. Reviewed By: ezyang Differential Revision: D15476434 fbshipit-source-id: 5855ae36b45b70437f996c81580f34a4c91ed18c	2019-05-31 04:15:39 -07:00
Jerry Zhang	f91f24764e	remove quantize_linear from Tensor method (#21156 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21156 we'll add `quantize(quantizer)` as a tensor method later when we expose `quantizer` in Python frontend Python ``` torch.quantize_linear(t, ...) ``` C++ ``` at::quantize_linear(t, ...) ``` Differential Revision: D15558784 fbshipit-source-id: 0b194750c423f51ad1ad5e9387a12b4d58d969a9	2019-05-30 22:02:12 -07:00
Jerry Zhang	0a0ff83124	replace `num_bits` with `quant_min` and `quant_max` (#21097 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21097 att Differential Revision: D15547166 fbshipit-source-id: 60bc7f7d82c424558b67881627fb74f1eff515af	2019-05-30 20:57:57 -07:00
Jerry Zhang	277bf69fa0	Add torch.load/torch.save for QTensor (#20830 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20830 att Reviewed By: dzhulgakov Differential Revision: D15340701 fbshipit-source-id: 677038c8101f66dec4856c2eccf9f9e394012226	2019-05-30 20:52:19 -07:00
vishwakftw	eb4d43df3b	Make CUDA triu / tril support batches of size > 65535 (#21067 ) Summary: In the previous implementation of triu / tril, we passed the batch size in the 2nd dimension of a grid. This is limited to 65535, which means that performing triu / tril on a tensor with batch size > 65535 will throw an error. This PR removes the dependence on the 2nd dimension, and corresponding non-contiguity constraints. Changelog: - Compute offset, row and col in the kernel - Use 1st dimension of grid alone - Remove unnecessary contiguity checks on tensors as a result of this change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21067 Differential Revision: D15572501 Pulled By: ezyang fbshipit-source-id: 93851cb661918ce794d43eeb12c8a38762e1358c	2019-05-30 20:16:11 -07:00
Michael Suo	057ddab766	on import, register class before defining it (#21182 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21182 ghimport-source-id: 2457a4306c0a72888bb8359a267fcd12b43f103a Differential Revision: D15571334 Pulled By: suo fbshipit-source-id: 26ca9dddb25df1b1eac2e17c70f682e20e08cb6d	2019-05-30 20:09:01 -07:00
Syed Tousif Ahmed	d6438c956b	Move THCTensor_(cauchy) to ATen (#20622 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20622 ghimport-source-id: b100d6cededf6f2c2020c3d7961271f16497bbdc Differential Revision: D15454052 Pulled By: ezyang fbshipit-source-id: 4f4d33ec11cf36b91c67759bd27252d1e457cff1	2019-05-30 18:13:16 -07:00
Syed Tousif Ahmed	26d16ae515	Move THCTensor_{normal, normal_means, normal_stddevs, normal_means_stddevs} to ATen (#20621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20621 ghimport-source-id: f461d7f1eb6b5a8306dd8175cbb0a7fcc9f64c76 Differential Revision: D15454048 Pulled By: ezyang fbshipit-source-id: 8bfc57bf015b85f57ed99a54176926386aab4e34	2019-05-30 18:01:31 -07:00
Lu Fang	07ac00d21a	Automatic update of fbcode/onnx to 9005291283e943f1a91da5f0acf218bc4e8eb2ca (#21057 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21057 Previous import was cc2333a3f929caca7223b98699237f19388dd585 Included changes: - [90052912](https://github.com/onnx/onnx/commit/90052912): Fix wrong condition and add --user in update_doc.sh (#2050) <daquexian> - [a4f44a20](https://github.com/onnx/onnx/commit/a4f44a20): Add bit-shift operators for supporting hashing (#1931) <Wei-Sheng Chin> - [0098752c](https://github.com/onnx/onnx/commit/0098752c): Add shape inference logic for Expand op (#2041) <Hariharan Seshadri> - [fbe8addb](https://github.com/onnx/onnx/commit/fbe8addb): update qops tests (#2040) <Ashwini Khade> - [874fb37c](https://github.com/onnx/onnx/commit/874fb37c): Fix torchvision installation (#2054) <bddppq> - [1f5f6582](https://github.com/onnx/onnx/commit/1f5f6582): Fix bug that kernel_shape rather than effective_kernel_shape is used in dilated conv (#2043) <daquexian> - [38b6c44e](https://github.com/onnx/onnx/commit/38b6c44e): Changes done internally at Facebook (#2035) <Lu Fang> - [5c51f0db](https://github.com/onnx/onnx/commit/5c51f0db): Explicitly specify type of integers in the input tensor. (#2034) <Dmitri Smirnov> Reviewed By: benoitsteiner Differential Revision: D15534241 fbshipit-source-id: 8d2b78a986e5b7fbeb248f2d7b80c1a07230654e	2019-05-30 17:33:18 -07:00
Iurii Zdebskyi	ff0d00f921	Updated scalar type to onnx mapping (#21095 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21095 ghimport-source-id: 32a79eace02216de9170f163027b1aa93756b821 Differential Revision: D15546175 Pulled By: izdeby fbshipit-source-id: 4e47c8538aaf30b4af198baac7279133e4d74b36	2019-05-30 17:11:12 -07:00
Daya S Khudia	726caeace3	Use QTensor for bias (#21038 ) Summary: Use QTesnor for bias tensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/21038 Differential Revision: D15524980 Pulled By: dskhudia fbshipit-source-id: c7bf2efc8fe3f4b5574c721c2f64ff073045ecc4	2019-05-30 16:16:03 -07:00
Iurii Zdebskyi	64f06d4964	Enable all and any for bool tensors (#21033 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21033 ghimport-source-id: 35fdcf27b0bde8ec3e5b3051cf0d730f20f94783 Differential Revision: D15530497 Pulled By: izdeby fbshipit-source-id: 9c15cc960055f59a05ce0276f9d51c567626d966	2019-05-30 16:16:00 -07:00
Iurii Zdebskyi	9a22cb9f49	Enabled add, sum and mul for bool tensor (#21032 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21032 ghimport-source-id: 6ab21752b4af451e8b10a0e02cd5d726aa7472f0 Differential Revision: D15530496 Pulled By: izdeby fbshipit-source-id: f4f83aa80eafbb4f307aadc1a13d8cdcf3055c24	2019-05-30 16:11:43 -07:00
James Reed	fe39602451	Support for rudimentary f-strings (#21037 ) Summary: Resolves https://github.com/pytorch/lockdown/issues/51 This adds support for converting simple f-string literals to calls to `string.format()`. It does not support conversion specifiers or format strings. This also does not support the string parser frontend, since that implementation would be more involved and likely would require modifying our TorchScript AST Pull Request resolved: https://github.com/pytorch/pytorch/pull/21037 Reviewed By: zdevito Differential Revision: D15541183 Pulled By: jamesr66a fbshipit-source-id: ae9df85e73f646d7219c1349f5b7683becbcef20	2019-05-30 15:50:45 -07:00
James Reed	76deb450c6	Record source/line info in SourceRange and report in highlight (#21157 ) Summary: Resubmission of https://github.com/pytorch/pytorch/pull/20898 with flake8 fix Pull Request resolved: https://github.com/pytorch/pytorch/pull/21157 Reviewed By: zdevito Differential Revision: D15560324 Pulled By: jamesr66a fbshipit-source-id: fc4e429eac03d2768f758b19c9d43e0bb614c2b8	2019-05-30 15:45:30 -07:00
Horace He	416357648c	Optimize alias analysis (#20899 ) Summary: # Overall Improvements 1. Switched from using `unordered_set` to sparse bitset. 1. Prevent some excessive memory allocations (thanks to resistor ) 1. Take advantage of the sparse bitset operations 1. Switch to `flat_hash_map` instead of `unordered_map` in some places. # Benchmarks (somewhat approximate, best of a couple runs) 1. InceptionNet (load + one forward pass): 19.8->13.3 1. GoogleNet(load + one forward pass): 10.0 -> 7.24 1. DenseNet (only load): 7.3 -> 5.3 I use the `sparse bitset` taken from https://llvm.org/doxygen/SparseBitVector_8h_source.html. I had to make some modifications to use `__builtin_popcountl` and instructions like that instead of other transitive clang dependencies. ## Some notes on our graph topologies In general, our graphs are very sparse, and most of the components aren't connected. For GoogleNet, we have 200k nodes, we do 2k `mayAlias` queries, and the sum of magnitudes of sets at each node is 500k (ie: every node, on average, reaches 2.5 leaves). PS: Holy crap macbooks throttle an insane amount with the default fan settings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20899 Differential Revision: D15564612 Pulled By: Chillee fbshipit-source-id: 2a293a21a9be25f942ca888c8f225cab32bbfcd0	2019-05-30 15:37:50 -07:00
Abhinav Jauhri	31aefd9b09	Adding models to jenkins benchmark script (#21010 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21010 Adding and editing the jenkins benchmarks script to accommadate both Renext and Shufflenet models. Reviewed By: bddppq Differential Revision: D15515354 fbshipit-source-id: 2a92c272b0b74ed3ecc78af6544a06337c7753cf	2019-05-30 15:17:40 -07:00
Elias Ellison	f6e5846a67	add handle to run all jit tests (#21161 ) Summary: Now you can run `python test/run_tests --jit` to run all jit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/21161 Differential Revision: D15563912 Pulled By: eellison fbshipit-source-id: 4bb0285cda4168b72a3dc4bba471485566a59873	2019-05-30 14:12:21 -07:00
Stephen Chen	7f308b88b9	Only populate net_pos in ssaRewrite if the op doesn't already have a net_pos argument (#21051 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21051 In net transforms, we perform an SSARewrite where we update the 'net_pos' for all the ops in the net. The transform function also takes a unordered set of net positions for blacklisting. It's possible that SSARewrite will change the indexes of the ops so the blacklist is applied to the wrong ops. We fix this issue by having SSARewrite only assign new net_pos if the op doesn't already have one. Reviewed By: yinghai Differential Revision: D15532795 fbshipit-source-id: e020492a7b5196a91cdc39d0eda761b1ca612cdb	2019-05-30 13:37:35 -07:00
Horace He	80020306ef	Added base parameter to math.log (#21151 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21151 ghimport-source-id: 76dc0852022a87a000888a787de1391f71923074 Differential Revision: D15563185 Pulled By: Chillee fbshipit-source-id: 6ed7cc32ed7c103f360022b97f6df47ccd0403e7	2019-05-30 13:32:52 -07:00
svcscm	4e3e4d7ff5	Updating submodules Reviewed By: cdelahousse fbshipit-source-id: 80a731a1b8b04df01cb0d68ec39d4af10e0b61b7	2019-05-30 13:07:20 -07:00
Jesse Hellemn	4aee92833c	Update libtorch docs (#21150 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/20271 Pull Request resolved: https://github.com/pytorch/pytorch/pull/21150 Differential Revision: D15559590 Pulled By: pjh5 fbshipit-source-id: 4063bf91464425e8efe4765dc17bb7e9b7bfccc7	2019-05-30 12:49:56 -07:00
Roy Li	313ef4f5d5	Make data_ptr a method on Tensor (#20878 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20878 ghimport-source-id: f19993d97ecb8cfcd60b371d9ed49e3ad2e051c7 Differential Revision: D15482061 Pulled By: li-roy fbshipit-source-id: c0563ce849fc3277e86a1a58bd384e38365786b2	2019-05-30 11:47:59 -07:00
Qian Hong	d17aa72373	Added more regression test for groupconv w/o bias. (#18519 ) Summary: Follow-up of https://github.com/pytorch/pytorch/issues/18218, which was fixed by https://github.com/pytorch/pytorch/pull/18463 with mkl-dnn upgraded to v0.18.1. Covering special case when group > 1, input-channel / group < 16 and output-channel is multiple of 16. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18519 Differential Revision: D14643071 Pulled By: soumith fbshipit-source-id: d0ebed59326c67089e042b50583b87ed2c3ccc2f	2019-05-30 11:36:07 -07:00
Zachary DeVito	6dc445e1a8	Conservative alias analysis rules for CallFunction/CallMethod (#21087 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21087 ghimport-source-id: 4fa6763ffecc7d2974b902dd9bd2bd9ac467bab7 Differential Revision: D15542512 Pulled By: zdevito fbshipit-source-id: 2dcd673cd4c200d7a854347429d4f33a11793cbc	2019-05-30 11:01:56 -07:00
Michael Suo	b6d1a72f48	improve error message on inferred type (#21058 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21058 ghimport-source-id: 7fad3a0567022dd417f4bd079a50a22e3c1dc020 Differential Revision: D15547218 Pulled By: suo fbshipit-source-id: 5dbd567c79e6d01e9af4b8552777f7f0043df5b2	2019-05-30 10:50:34 -07:00
Jesse Hellemn	ec76976a7a	Remove all devtoolset7 jobs (#21153 ) Summary: These do not work. We'll save time and cpu until someone has the time to fix these. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21153 Differential Revision: D15558601 Pulled By: pjh5 fbshipit-source-id: f9bfe580aa7962a88506f9af0032647f553637a4	2019-05-30 10:39:26 -07:00
Edward Yang	fffffde2f8	Delete more tabs, fix lint. (#21142 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21142 ghimport-source-id: 4666c0731d9c08e9990ffafd0ae88fa1e7896348 Differential Revision: D15555285 Pulled By: ezyang fbshipit-source-id: 9e5bfacf202ceba37bd29cfd5dcb651b7f79068d	2019-05-30 06:36:47 -07:00
Edward Yang	e9df9e7960	Revert D15552424: [pytorch][PR] [JIT] Record source/line info in SourceRange and report in highlight Differential Revision: D15552424 Original commit changeset: 78d0f0de03f7 fbshipit-source-id: cc24f62189b7bbcdc1406912cfb3d4ca52b8e67e	2019-05-30 05:17:15 -07:00
Edward Yang	c4a90ca18e	Revert D15477933: [pt1][quant] remove quantize_linear and dequantize from Tensor method Differential Revision: D15477933 Original commit changeset: c8aa81f681e0 fbshipit-source-id: ec494fbbab72e20da262bdd8657887e1fdd173cb	2019-05-30 05:04:12 -07:00
Ilia Cherniavskii	3805490d6a	Typo fix (#21122 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21122 fix a typo Reviewed By: dzhulgakov Differential Revision: D15553921 fbshipit-source-id: 260b0be5975d49bb6d70e45d83505efcecf02875	2019-05-30 00:16:01 -07:00
Dmytro Dzhulgakov	52ded63128	Revert D15546045: [jit] Add support for recursive compilation on Modules Differential Revision: D15546045 Original commit changeset: c2c8fe179088 fbshipit-source-id: c921fb92cf9f5c6c94c77fa5070f9c5775c91b77	2019-05-29 23:42:50 -07:00
Zachary DeVito	3083c71cde	First class functions in IR, inlined eagerly (#21052 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21052 ghimport-source-id: cc476b9cc301967dde5de6212ca144cdb252e84c Differential Revision: D15533353 Pulled By: zdevito fbshipit-source-id: 4d25461969cfcc9e5f641d585584cc100c7b34ae	2019-05-29 23:04:18 -07:00
James Reed	6b099edb53	fix lint Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21118 Differential Revision: D15553121 Pulled By: jamesr66a fbshipit-source-id: 14ebf0e4cb33f8155ac86a9538beb8570bdfe8c8	2019-05-29 21:50:12 -07:00
Yinghai Lu	7cea6d9b71	Redesign the output shape adjustment of OnnxifiOp (#21027 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21027 Previously, we are only able to adjust batch size when output shape has batch size conditioned at its first dim. Although not common, there are cases where we want to slice back the output whose batch size is conditioned on non-first dim, or whose output shape doesn't really has batch size in it but rather is an expression of it. Examples are shapes at the output of `Transpose` or `Tile`. This diff redesigns how we handle the output size. The key is when we run OnnxifiOp, the input shapes are given, and we can actually do a shape inference to derive the real output shapes, no matter how they got transformed. And then we compare the real output shape with max batch sized output shape, dim by dim and use a `Slice` op to cut the max output back to real output shape. Notice that general `Slice` op is slow and in most of the cases, we still prefer adjusting batch size by shrinking its first dim, which is just an operation on meta info without data allocation/manipulation. Therefore, we add a flag `fast_path` to detect this situation and operate accordingly. Reviewed By: tracelogfb Differential Revision: D15515189 fbshipit-source-id: 9c1fff161f82d0bc20eeac07ca4a2756e964e9fd	2019-05-29 21:39:00 -07:00
James Reed	6875018793	Record source/line info in SourceRange and report in highlight (#20898 ) Summary: Resolves https://github.com/pytorch/lockdown/issues/29 Examples: ``` import torch torch.jit.script def foobar(x): return torch.blargh(xyz) == RuntimeError: object has no attribute blargh: at compile.py:5:12 torch.jit.script def foo(x): return torch.blargh(x) ~~~~~~~~~~~~ <--- HERE ``` It also gets the correct column number in the case where the original source file has common leading whitespace in front of the callable: ``` import torch with torch.no_grad(): torch.jit.script def foo(x): return torch.blargh(x) == RuntimeError: object has no attribute blargh: at compile_leading.py:6:24 torch.jit.script def foo(x): return torch.blargh(x) ~~~~~~~~~~~~ <--- HERE ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/20898 Differential Revision: D15552424 Pulled By: jamesr66a fbshipit-source-id: 78d0f0de03f7ccbf3e7ea193a1b4eced57ea5d69	2019-05-29 21:32:33 -07:00
James Reed	57f4f98c40	Fix borked SourceRanges Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21109 Reviewed By: zdevito Differential Revision: D15551392 Pulled By: jamesr66a fbshipit-source-id: 4f29214049b8feced0e740f84007b5751703ee20	2019-05-29 20:13:14 -07:00
Jerry Zhang	67291ba74f	remove quantize_linear and dequantize from Tensor method (#20874 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20874 A criteria for what should go in Tensor method is whether numpy has it, for this one it does not so we are removing it as a Tensor method, we can still call it as function. Python ``` torch.quantize_linear(t, ...), torch.dequantize(t) ``` C++ ``` at::quantize_linear(t, ...), at::dequantize(t) ``` Reviewed By: dzhulgakov Differential Revision: D15477933 fbshipit-source-id: c8aa81f681e02f038d72e44f0c700632f1af8437	2019-05-29 19:17:16 -07:00
davidriazati	8d3388aef2	Add support for recursive compilation on Modules (#20708 ) Summary: Following on #19747, this implements most of the `torch.jit.script()` changes laid out in #20939. Still to do: * Accessing a method from Python does not add it as a `ScriptMethod` (so only `export`ed methods and `forward` are compiled) * Calling a method other than `forward` on a submodule doesn't work ](https://our.intern.facebook.com/intern/diff/15546045/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/20708 Pulled By: driazati Differential Revision: D15546045 fbshipit-source-id: c2c8fe179088ffbdad47198e799a456560655b86	2019-05-29 18:52:36 -07:00
Horace He	33d35f5f93	Fixed isinstance typos Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21102 Differential Revision: D15549564 Pulled By: Chillee fbshipit-source-id: 6746dc9e01b5a30d55d544beb70b7005f0cfd8ae	2019-05-29 17:51:27 -07:00
Edward Yang	990e63f587	Remove unnecessary sources from base CircleCI AMI (#21103 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/21103 Differential Revision: D15550213 Pulled By: kostmo fbshipit-source-id: b4a2c38d168f722b30c96494079ccdd468b9ece8	2019-05-29 17:46:08 -07:00
BowenBao	12b0dede39	Support exporting tensor factories from scripting Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20255 Differential Revision: D15534186 Pulled By: houseroad fbshipit-source-id: 182e117a35fa31445fcad8cb492160500f71599a	2019-05-29 16:53:49 -07:00
Owen Anderson	9be72ce44f	Convert Tree to use intrusive_ptr instead of shared_ptr. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20815 Differential Revision: D15453817 fbshipit-source-id: 569ab807d32fb3dcebfe201a049c770b1600e5c7	2019-05-29 16:33:02 -07:00
Jerry Zhang	4900edebcf	QTensor permute, transpose and contiguous (#20869 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20869 Adding support for the functions listed in the title, by implementing the copy kernel. Differential Revision: D15474060 fbshipit-source-id: 9264df6e442cca1cc5d952e3e5dcc9f4a426f317	2019-05-29 16:05:53 -07:00
Sebastian Messmer	99b057d89c	Failing assertions is unlikely (#20876 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20876 Tell the compiler that assertions are likely to succeed. This allows the compiler to generate betterr code and optimize for the success case. Differential Revision: D15480066 fbshipit-source-id: 4485154d66b2ee0ef8a401718712dbd61d811aee	2019-05-29 15:59:33 -07:00
Zafar Takhirov	9daf48525e	Quantized Max Pool op (#20474 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20474 parallel implementaiton of the MaxPool (no ReLU). Reviewed By: dskhudia Differential Revision: D15327923 fbshipit-source-id: ca6475e7fe1434b55d4b7730a074bb7ff50355fd	2019-05-29 15:01:01 -07:00
Michael Suo	154029a6ff	Revert D15534670: [jit] improve error message on inferred type Differential Revision: D15534670 Original commit changeset: 8bbfd6e9c1af fbshipit-source-id: fe62cf954292e8ef1d00a3cc569206f73cedcd31	2019-05-29 14:56:08 -07:00
Michael Suo	5dacf6b048	improve error message on inferred type (#21058 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21058 ghimport-source-id: e7d6e082b0faf4f3d3e683f2c98863ee269439f0 Differential Revision: D15534670 Pulled By: suo fbshipit-source-id: 8bbfd6e9c1afbc3006d7d55ed633e18618e05021	2019-05-29 14:47:00 -07:00
Michael	6ea9044d3c	add 'all' builtin (#20521 ) Summary: [jit] add 'all' builtin Pull Request resolved: https://github.com/pytorch/pytorch/pull/20521 Differential Revision: D15527657 Pulled By: driazati fbshipit-source-id: eaa3c1c560810581150646858339369e4305fdf2	2019-05-29 14:46:56 -07:00
peter	8fcd80af20	Fix "cuda: unknown error" on Windows (#21062 ) Summary: Thanks Jonas1312 for validating this workground. Fixes #20635. However, I don't know exactly why this one is needed. The following are my guesses: 1. It is a CUDA bug. Static linking against `cudart` is the default now, so they didn't run enough tests for dynamic ones. 2. It is related to UCRT. But (1)according to msdn, shared DLLs should share the same CRT. (2) The CUDA related objects like `CUDevice` passing to `cudart` are stored on the stack, not the heap. (3) If this is the case, it should always fail, not sometimes. https://docs.microsoft.com/en-us/cpp/c-runtime-library/potential-errors-passing-crt-objects-across-dll-boundaries?view=vs-2019 3. It is a bug of our side. However, I was unable to find it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21062 Differential Revision: D15543557 Pulled By: ezyang fbshipit-source-id: c23af45ebf582fad93ce5f029af6e1f06cf1d49d	2019-05-29 14:34:02 -07:00
Jerry Zhang	157fcfc07d	Add `quantize_linear_per_channel` (#20765 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20765 att Reviewed By: dskhudia Differential Revision: D15435455 fbshipit-source-id: 77770044411ce8ee02d26d63eb7e79cd10db103e	2019-05-29 14:29:16 -07:00
Sebastian Messmer	53ccba004f	New torch assertion macros (#20887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20887 Switch AT_xxx assertion macros to the TORCH_ variants and make sure the separation between TORCH_CHECK and TORCH_INTERNAL_ASSERT makes sense. Differential Revision: D15484658 fbshipit-source-id: 490ae64cc36946756c30971f1b685048bc5f77da	2019-05-29 14:15:04 -07:00
vfdev	449a2c3555	Fixes #20124 (#20203 ) Summary: Fixes #20124 Description: Code wraps `optimizer.step()` method to detect whether user is following new pattern or old pattern. In case of old pattern detected, a UserWarning is raised. Documentation is also updated to reflect the change: ![Screen Shot 2019-05-07 at 11 05 17](https://user-images.githubusercontent.com/2459423/57287527-04e63580-70b8-11e9-9ddd-5d159ef0ed2f.png) cc SsnL, bado-lee Pull Request resolved: https://github.com/pytorch/pytorch/pull/20203 Differential Revision: D15543060 Pulled By: ezyang fbshipit-source-id: 3605e1afdb6ffc2dfd5e75e92e01b967c4d065b5	2019-05-29 14:15:01 -07:00
Jerry Zhang	74375299e0	add torch.nn._intrinsic and torch.nn._intrinsic.quantized namespace (#20940 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20940 - `torch.nn._intrinsic` will contain normal(unquantized) fused modules like Conv2DRelu, Conv2DBnRelu, FakeQuantize ops etc. - `torch.nn._intrinsic` will contain fused and quantized modules like Quantized Conv2DRelu, Quantized LinearRelu etc. Right now I only added FakeQuantize op in `torch.nn._intrinsic` namespace, we'll have more later Differential Revision: D15505228 fbshipit-source-id: d380929e38af7a5bcfbea27474d5b80f95d43b03	2019-05-29 14:09:37 -07:00
davidriazati	736bf7b46c	Fix __constants__ for some nn modules (#21071 ) Summary: A bunch of modules were missing entries for `__constants__` which was making their `__repr__`s not work. Others had `__constants__` that were not necessary since it was provided by some parent class instead. Fixes #20978 ](https://our.intern.facebook.com/intern/diff/15539518/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/21071 Pulled By: driazati Differential Revision: D15539518 fbshipit-source-id: 24bdd1ef41ef636eefd5d2bad4ab2d79646ed4f0	2019-05-29 13:55:53 -07:00
Wanchao Liang	1e1f2c85f0	remove constant pooling expect (#21003 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21003 ghimport-source-id: c1e0d0555758cab12ce82e0283bab559c7e8e4e2 Differential Revision: D15523443 Pulled By: wanchaol fbshipit-source-id: 40973c1c0c0ab07fe4b1334e9ae0e4b16b5add8e	2019-05-29 13:55:50 -07:00
Yashodhan Ghadge	0ffd20c268	Fix empty tensor for unique_dim (#19000 ) Summary: Fixes: #18408 cc: zasdfgbnm Pull Request resolved: https://github.com/pytorch/pytorch/pull/19000 Reviewed By: ezyang Differential Revision: D15470136 Pulled By: VitalyFedyunin fbshipit-source-id: daf71566b4dbdc91927d164f813b5ee8645af1a2	2019-05-29 13:50:32 -07:00
Wanchao Liang	2cd1c78632	Revert D15523444: [jit] move casting ops from prim to aten Differential Revision: D15523444 Original commit changeset: 642342bf1cce fbshipit-source-id: 29de1c7e19cbb3273230c280346e786e61d2d445	2019-05-29 13:42:05 -07:00
Iurii Zdebskyi	7cb1aa67b0	Enabled min, max, minall, maxall, cmin, cmax, cmaxValue, cminValue for bool tensors (#21031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21031 ghimport-source-id: 379b3e9d20872eb5ad14403ed6751cdb0e730bc5 Reviewed By: ezyang Differential Revision: D15530499 Pulled By: izdeby fbshipit-source-id: f113d6974ee18ac3dfb5c0bcff66865345d137d2	2019-05-29 13:22:54 -07:00
Sebastian Messmer	85777b92b2	Assert against using Operator methods not supported when exporting it to c10, part 2 (#17946 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17946 Some of these are probably implementable for exported operators, but aren't implemented yet and for now it's better to assert than to just return wrong results. Reviewed By: ezyang Differential Revision: D14430749 fbshipit-source-id: 2b0037a9ed227a22aa7376a90e6d3d09d3e04707	2019-05-29 13:16:00 -07:00
Wanchao Liang	a0111aaf0d	move casting ops from prim to aten (#21002 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21002 ghimport-source-id: 4c88a54a3ecb76c5ca3c2c328b749350860a166d Differential Revision: D15523444 Pulled By: wanchaol fbshipit-source-id: 642342bf1ccea83c88897bc023979a32ee01addf	2019-05-29 12:36:47 -07:00
Horace He	dd903eb645	Add start and step parameters for range in torchscript (#20795 ) Summary: Fixes #18440 I calculate a derived index from `start,stop,step` as `start + stepindex`. When `start=0` and `step=1` (the defaults/`range(n)`), this is the same behavior as before. Unluckily, it seems that we do not optimize out operations like `x1` or `x+0`. That means that we're doing lots of redundant operations when we don't need to. EDIT: More specifically, it seems like we only do this optimization for (tensor, scalar): https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/passes/peephole.cpp#L128 The most annoying part of this code is calculating the number of iterations, given `start, stop, step`. I ended up going with the formula `(abs(stop-start) + abs(step)-1)//abs(step)`. Other intuitively appealing formulas like `(stop-start + step -1)//step` don't work for negative numbers. I tried using `SymbolicVariable` for the calculations, but it seems that `symbolicvariable` only outputs ops for `tensors`, not the integers we have. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20795 Differential Revision: D15446869 Pulled By: Chillee fbshipit-source-id: 6085545ace04e25985c6ac870226f7a651f670d5	2019-05-29 12:31:29 -07:00
David Riazati	fa8c132e24	Revert D15502768: [pytorch][PR] [jit] Make ScriptModule.training an attribute instead of a parameter Differential Revision: D15502768 Original commit changeset: 3022f2d57ec6 fbshipit-source-id: 5cd08d3c3a75e38e3aa9b75a0c0059a2c6c85a1e	2019-05-29 12:18:18 -07:00
Jerry Zhang	94b9706017	fix `dequantize_linear` (#21035 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21035 Fix the dtype error in `dequantize_linear`, it should accept the same dtype argument as `quantize_linear` Differential Revision: D15521931 fbshipit-source-id: 0114c046a3f1046e42fca49c74c85e487fee8616	2019-05-29 12:18:15 -07:00
Nikolay Korovaiko	cbf2a4f5c4	print a warning if a type annotation prefix is invalid according to mypy (#20884 ) Summary: This PR adds a check that prints a warning if a type annotation prefix isn't what mypy expects. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20884 Differential Revision: D15511043 Pulled By: Krovatkin fbshipit-source-id: 9038e074807832931faaa5f4e69628f94f51fd72	2019-05-29 11:56:55 -07:00
dawars	a6bb15493d	Removed accidental TensorFlow dependency (#21066 ) Summary: I accidentally added a TF dependency in #20413 by using the from tensorboard.plugins.mesh.summary import _get_json_config import. I'm removing it at the cost of code duplication. orionr, Please review. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21066 Reviewed By: natalialunova Differential Revision: D15538746 Pulled By: orionr fbshipit-source-id: 8a822719a4a9f5d67f1badb474e3a73cefce507f	2019-05-29 11:18:10 -07:00
Dmytro Dzhulgakov	f2199a34eb	Hook to store additional metadata about environment (#20863 ) Summary: In larger system environment, there's usually a need to store some information about how the model was created (e.g. from which process, workflow, by which user, etc). It's almost like JPEG metadata written by camera. This PR adds a low-level c++ hook to allow population of additional files in zip container based on environment. The reason to have it a low-level hook instead of top-level API wrapper (e.g. `m.save_with_metadata`) is to capture all usages of the saving API transparently for user. Let me know if there are concerns. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20863 Differential Revision: D15487941 Pulled By: dzhulgakov fbshipit-source-id: 120c5a4c9758aa82846bb51a1207f923e3da1333	2019-05-29 10:11:58 -07:00
Iurii Zdebskyi	00c1584979	Added possibility to index scalars by bool masks (#21030 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21030 ghimport-source-id: 7a66ca096c62d050a38a6fcc9f6b2d61e387eb34 Differential Revision: D15530498 Pulled By: izdeby fbshipit-source-id: d5d38f9610caa55fb7179d41f568c5ea5fa1f2e2	2019-05-29 09:32:55 -07:00
Tongzhou Wang	1d4685c20f	Improve test_proper_exit error printing (#20166 ) Summary: This doesn't have `strace` yet. But still have `faulthandler` to print stack traces at hanging. Also part of an attempt to isolate changes from #19228 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/20166 Differential Revision: D15536504 Pulled By: ezyang fbshipit-source-id: fe6e6e2e9899f30d8167436d7bc62b42883a3356	2019-05-29 07:51:31 -07:00
Thomas Viehmann	aa42742df0	ctc_loss: fix backward when 2d target tensor is larger than max_target_length (#20971 ) Summary: Previously, we didn't work when 2d target tensors had extra columns at the end. Now we just ignore those. Also fix the confusion in the doc example regarding the number of classes. Thank you, ypw-rich for the report with reproducing example. Fixes: #20522 Pull Request resolved: https://github.com/pytorch/pytorch/pull/20971 Differential Revision: D15535481 Pulled By: ezyang fbshipit-source-id: 397e44e20165fc4fa2547bee9390d4c0b688df93	2019-05-29 05:13:00 -07:00
Stefan Krah	55f5eb3c47	DilatedMaxPool2d: small cleanup Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20936 Differential Revision: D15514542 Pulled By: ezyang fbshipit-source-id: 6341a4bb8a9ee0b632c32a013ea609d842a21962	2019-05-29 05:06:33 -07:00
Stefan Krah	f8565121d9	Port dilated_max_pool3d() to ATen Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20933 Differential Revision: D15525485 Pulled By: ezyang fbshipit-source-id: 6ff44f11d984903cd20d79cfad04963e6443e6ca	2019-05-29 04:58:42 -07:00
Edward Yang	0544a491d5	Revert D15499749: [pytorch][PR] Add `Tensor.T` attribute to reverse dimensions Differential Revision: D15499749 Original commit changeset: f3306b496667 fbshipit-source-id: 7f50431d2ea37bc41bfed62f386ddedea1412878	2019-05-29 04:29:48 -07:00
Roy Li	3038cf8eee	Remove THSTensor and SparseTensorRef (#20877 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20877 ghimport-source-id: a07f53ca158f9a3dce7a25ef5a169871e98ea3ea Differential Revision: D15480353 Pulled By: li-roy fbshipit-source-id: 1152dbc4df827ded3be1a57f007a6b7de12f567f	2019-05-29 01:37:03 -07:00
Roy Li	9faa409b56	Fix __irshift__ dispatch (#21047 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21047 ghimport-source-id: 8c781c9882eebb07325a1fc7aa6f340bbec18886 Differential Revision: D15529160 Pulled By: li-roy fbshipit-source-id: d9a444e42df5c509ae10849ba6f8006fbec830c5	2019-05-29 01:03:34 -07:00
Roy Li	8dda19b79f	Remove extraneous TensorId checks in as_strided (#21045 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21045 ghimport-source-id: e95fbf50bccf6ebc613bb13fb16915254912f22d Differential Revision: D15528971 Pulled By: li-roy fbshipit-source-id: c721cc6280dff6e14c5533681d0b35aaa8f98f00	2019-05-29 00:53:53 -07:00
Zachary DeVito	d76546a463	Fix tracing bugs where using `1 - x` in C++ would cause the size of 1 to get hardcoded. (#20932 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20932 ghimport-source-id: f0a7f12ffd77aec063a088b18c6b1d108c712df8 Differential Revision: D15501251 Pulled By: zdevito fbshipit-source-id: 91e6e5944d2663b673afde45fc6eed22f31101c4	2019-05-29 00:14:25 -07:00
Junjie Bai	5c53aa4869	Make build with makefiles less noisy (#21053 ) Summary: https://github.com/pytorch/pytorch/pull/17783 has made ninja and makefile builds to print out build commands unconditionally, this has made the build log very verbose, e.g. ROCm CI build log becomes >13mb. Large build log make searching for real error hard. https://github.com/pytorch/pytorch/pull/20508 has reverted the ninja change, and this one reverts the makefile change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21053 Differential Revision: D15533412 Pulled By: bddppq fbshipit-source-id: ad89b617d06acc670d75d4cf25111a4081e9c95e	2019-05-29 00:08:45 -07:00
Junjie Bai	9b147961c4	Fix get_gpu_memory_info in non-cuda builds (#21054 ) Summary: #21024 broke master cc akyrola Pull Request resolved: https://github.com/pytorch/pytorch/pull/21054 Reviewed By: akyrola Differential Revision: D15533406 Pulled By: bddppq fbshipit-source-id: 0dcfa0ce865e109b46280ef1786dbc7a8af30739	2019-05-28 23:05:15 -07:00
Hans Lee	ffdce79078	Deprecate variadic inputs of checkpoint_sequential (#21006 ) Summary: I've reported inconsistency between `checkpoint_sequential` and `nn.Sequential` at https://github.com/pytorch/pytorch/issues/19260. Both should provide the same input signature but they don't. I think the consistency is important and I agree with apaszke that `nn.Sequential`'s semantics should be kept instead of `checkpoint_sequential`. I hope `checkpoint_sequential` raises `TypeError` on variadic arguments since PyTorch 1.2.0. But for now, it's okay just to warn as `DeprecationWarning`. I've talked about this approach with soumith. Please review this pull request. Any comment will be my pleasure. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21006 Differential Revision: D15530801 Pulled By: soumith fbshipit-source-id: 0ceb2cc6a17dcc547d0d00ebaf9df8603be53183	2019-05-28 21:33:45 -07:00
Thomas Viehmann	d23d04f17f	Allow nondet_tol for nondeterminism in gradcheck and gradgradcheck (#20980 ) Summary: gradcheck currently includes a determinism check (although only trying twice and seeing if results match). This can lead to flaky tests, e.g. in #20971, but also #13818. This adds nondet_tol for both gradcheck and gradgradcheck. It does not change / reenable any tests yet. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20980 Differential Revision: D15530129 Pulled By: soumith fbshipit-source-id: 04d7f85b5b59cd62867820c74b064ba14f4fa7f8	2019-05-28 21:26:13 -07:00
njdalton	d190450a35	Fix typo in CyclicLR docs (#21021 ) Summary: Fixes a typo in the CyclicLR docs by adding the lr_scheduler directory and puts in other required arguments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21021 Differential Revision: D15530109 Pulled By: soumith fbshipit-source-id: 98781bdab8d82465257229e50fa3bd0015da1286	2019-05-28 21:18:50 -07:00
Aapo Kyrola	f1fe4b1114	add simple memory analyzer and log warning if GPU underutilized (#21024 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21024 Add a new pybinded call to CUDAGetMemoryInfo. Reviewed By: wesolwsk Differential Revision: D15520607 fbshipit-source-id: f6d04e48f7d7cb089fc52fa8835cfee3f452d2f1	2019-05-28 19:58:54 -07:00
Horace He	1bed5f39f4	Fix warning in register_c10_ops by making index unsigned (#20964 ) Summary: Just an annoying warning that's been popping up a lot. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20964 Differential Revision: D15531064 Pulled By: Chillee fbshipit-source-id: 9580115676c5e246481054bbfc749a551a3cca5e	2019-05-28 18:02:09 -07:00
vishwakftw	f6ec464890	Enable batched QR decomposition and add a `some` option (#20689 ) Summary: This PR covers two important points with respect to the QR decomposition: - batching of input matrices (#7500) - adding `some` as an option in `torch.qr` akin to NumPy's `mode` option (#10538) Changelog: - Enable batching for inputs to `torch.qr` - Move QR decomposition implementation to ATen (CPU and CUDA) - Remove existing implementations in TH/THC - Add a `some` option to `torch.qr` that will enable users to switch between complete and reduced decomposition - Modify doc strings Pull Request resolved: https://github.com/pytorch/pytorch/pull/20689 Differential Revision: D15529230 Pulled By: soumith fbshipit-source-id: 16af82b1d2db8a3a758fa8a5f798d83f5f950efb	2019-05-28 17:52:37 -07:00
Xiaomeng Yang	c1048182be	Use constants from math.h for gelu op (#20974 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20974 Use constants from math.h for gelu op Reviewed By: hl475, houseroad Differential Revision: D15511736 fbshipit-source-id: 7d069888fb5c7c310774d056f18711365b39b8e4	2019-05-28 17:52:34 -07:00
Jongsoo Park	0290897bca	tracing for intra_op_parallel (#20603 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20603 When we use intra_op_parallel operators, Caffe2 tracing was generating trace only for the master task giving a false impression that a lot of threads are underutilized. This diff also traces child tasks. Reviewed By: ilia-cher Differential Revision: D14820008 fbshipit-source-id: ff4ed203804d86d9231c21c99d869f1ddf1d1ef9	2019-05-28 17:39:23 -07:00
Hong Xu	9a989ec469	Add an option to stop the build process once cmake terminates. (#21034 ) Summary: Add an option to setup.py to stop the build process once cmake terminates. This leaves users a chance to fine adjust build options. Also update README accordingly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21034 Differential Revision: D15530096 Pulled By: soumith fbshipit-source-id: 71ac6ff8483c3ee77c38d88f0d059db53a7d3901	2019-05-28 17:11:00 -07:00
Brennan Vincent	9294de8c9f	Add `Tensor.T` attribute to reverse dimensions (#20598 ) Summary: For compatibility with numpy Pull Request resolved: https://github.com/pytorch/pytorch/pull/20598 Differential Revision: D15499749 Pulled By: umanwizard fbshipit-source-id: f3306b496667f20169e9b28db3150d12183703bc	2019-05-28 16:59:06 -07:00
Zafar Takhirov	2791a44948	Renaming the relu kernel and adding hypothesis tests (#20647 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20647 The initial assumption was that `qint8` would be unsigned. After introduction of `quint8` and `qint8`, some tests break. Reviewed By: jerryzh168 Differential Revision: D15332106 fbshipit-source-id: 6ed18da428915aea918a363c5f38754a3c75d06b	2019-05-28 16:46:44 -07:00
Kimish Patel	d6d192e0af	Added engine information to the profiling result. (#20493 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20493 This helps distinguish if the op was a quantized op or not. Reviewed By: salexspb Differential Revision: D15337854 fbshipit-source-id: 43c7aef143085cfaeb4ec2102a7f36cc454e0e94	2019-05-28 16:41:12 -07:00
Kimish Patel	7afa75006e	Enable operator profiling via command line (#20173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20173 Enabled op profiling even when net type is not dag or prof dag. Also added engine type info to summary. Reviewed By: salexspb, ilia-cher Differential Revision: D15177813 fbshipit-source-id: 5be0efeaabc9a961cf1d73b0703749c08bb1adbb	2019-05-28 16:41:08 -07:00
Horace He	2ba608b4a0	Fixed gcd to use 64 bit integers (#21041 ) Summary: Not much to say. Fixes implementation introduced here: https://github.com/pytorch/pytorch/pull/19115 Pull Request resolved: https://github.com/pytorch/pytorch/pull/21041 Differential Revision: D15528801 Pulled By: Chillee fbshipit-source-id: bacd709eb711ca00156bd70480d6051b437517ed	2019-05-28 16:20:55 -07:00
David Riazati	28079c3906	Make ScriptModule.training an attribute instead of a parameter (#19587 ) Summary: Stack from [ghstack](https://github.com/ezyang/ghstack): * #19587 [jit] Make ScriptModule.training an attribute instead of a parameter Remove the hack we had previously where `training` was a buffer Pull Request resolved: https://github.com/pytorch/pytorch/pull/19587 Differential Revision: D15502768 Pulled By: driazati fbshipit-source-id: 3022f2d57ec6849868f9225d9bc2bfb7828cb318	2019-05-28 16:06:46 -07:00
Nikolay Korovaiko	18809f7b0b	Better error message in __get_state__ to let a user know that ScriptModules can't be deep-copied atm (#20885 ) Summary: Before we look into supporting `deepcopy` we could at least improve an error msg. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20885 Differential Revision: D15511023 Pulled By: Krovatkin fbshipit-source-id: 93b8730a2cc663eee0147f14d3341d0606748eaf	2019-05-28 15:09:07 -07:00
peter	07c4e45ca6	Some minor fixes for the changes in #20945 (#21008 ) Summary: Fixes after #20945 Pull Request resolved: https://github.com/pytorch/pytorch/pull/21008 Differential Revision: D15526193 Pulled By: ezyang fbshipit-source-id: 4cfabc482c149e0aeb92ae7fff04098771fe33ed	2019-05-28 14:48:50 -07:00
Wanchao Liang	0885dd28c8	refactor register_prim_ops (#21001 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21001 ghimport-source-id: f1b8e3999bf18fb0f3b857a13c3e3f609e1e4b4e Differential Revision: D15523445 Pulled By: wanchaol fbshipit-source-id: c1e29b0985bde580703a1fca9df46da773826df6	2019-05-28 14:11:04 -07:00
Sam Gross	b85c52923b	Re-land "Fix advanced indexing on "huge" Tensors" (#21019 ) Summary: This #20919 without the changes to aten/src/THC/THCIntegerDivider.cuh that broke the ROCm build. cc bddppq Original summary: This fixes advanced indexing in cases where there's more than 2^31-1 bytes in the output. The `gpu_index_kernel` was missing the `can_use_32bit_indexing`/`with_32bit_indexing` check. This also adds a number of TORCH_INTERNAL_ASSERTS in Loops.cuh, OffsetCalculator, and IntDivider that sizes are fit in a signed 32-bit integer. More comprehensive tests that require a 32 GB GPU are here: https://gist.github.com/colesbury/e29387f5851521256dff562be07b981e Pull Request resolved: https://github.com/pytorch/pytorch/pull/21019 Differential Revision: D15518477 Pulled By: colesbury fbshipit-source-id: 4db5626fda76eb58250793e8aa7d4f2832db3a34	2019-05-28 12:45:56 -07:00
Horace He	52d27890dc	Improve error message for missing attribute (#20779 ) Summary: Fixes #20495 . Now for ```python class A(torch.jit.ScriptModule): def __init__(self): super(A, self).__init__() torch.jit.script_method def forward(self, x): return x + self.whatisgoingon class B(A): def __init__(self): super(B, self).__init__() torch.jit.script_method def bar(self, x): return x * x A() ``` it does ``` RuntimeError: attribute 'whatisgoingon' does not exist: torch.jit.script_method def forward(self, x): return x + self.whatisgoingon ~~~~~~~~~~~~~~~~~~ <--- HERE ``` I added a test in `test_jit.py` as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20779 Differential Revision: D15441138 Pulled By: Chillee fbshipit-source-id: 88f458c36b5e32a1ffc467b27bbc28a3c5c07321	2019-05-28 12:27:52 -07:00
Orion Reblitz-Richardson	bc10677fcb	Some name and variable cleanup (#20861 ) Summary: As a part of https://github.com/pytorch/pytorch/pull/20580 I noticed that we had some unusual variable naming in `summary.py`. This cleans it up and also removes some variables that weren't being used. I'll wait until we have an `add_custom_scalars` test to land this. cc lanpa natalialunova Pull Request resolved: https://github.com/pytorch/pytorch/pull/20861 Differential Revision: D15503420 Pulled By: orionr fbshipit-source-id: 86d105a346198a1ca543d1c5d297804402ab5a0c	2019-05-28 12:22:47 -07:00
Your Name	99674eb86f	Re-enable test_dag_net_forking on ROCm (#21013 ) Summary: Fixes #16229 Pull Request resolved: https://github.com/pytorch/pytorch/pull/21013 Differential Revision: D15515824 Pulled By: bddppq fbshipit-source-id: 23a6c7eaad6129328c6b9dfcc55ac2d31a6d2dc0	2019-05-28 12:12:53 -07:00
Sam Pepose	082936f033	Clarify cycliclr param docs (#20880 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20880 This clarifies how the momentum parameters should be used. Reviewed By: soumith Differential Revision: D15482450 fbshipit-source-id: e3649a38876c5912cb101d8e404abca7c3431766	2019-05-28 12:07:47 -07:00
Chunli Fu	68c3ef72b5	Change bound shape inference for LengthsRangeFill & GatherRanges, add more tests (#20610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20610 Change InferLengthsRangeFill Add InferGatherRanges add tests from ClipRangesGatherSigridHash all the way to SparseLengthsWeightedSum add tests from SigridTransforms all the way to SparseLengthsWeightedSum e2e test will be added in the following diff Reviewed By: ipiszy Differential Revision: D15382730 fbshipit-source-id: a611cd129007a273dfc43955cd99af1c4ed04efd	2019-05-28 11:33:51 -07:00
davidriazati	bbe3411846	Refactor schema_matching.cpp (#20549 ) Summary: It was kind of hard to read through this code so this adds a bunch of comments, no behavior should be changed ](https://our.intern.facebook.com/intern/diff/15499974/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/20549 Pulled By: driazati Differential Revision: D15499974 fbshipit-source-id: 95bf660c3b2bab1c90a2250696cece68bd1925cc	2019-05-28 10:55:09 -07:00
Roy Li	ff6cda0da6	Generate TH functions outside of Type (#20309 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20309 ghimport-source-id: d0a0195be53f991f20eb0fbb03edf3814f18b831 Differential Revision: D15509848 Pulled By: li-roy fbshipit-source-id: 35aafdcb9bb868a41f75cf422c48d357f8655d67	2019-05-28 02:55:51 -07:00
Roy Li	eacb311810	Move 1d tensor checks to TH (#20859 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20859 ghimport-source-id: 675cea2c31b48bb5e3b4676640021ace783ea3a8 Differential Revision: D15509850 Pulled By: li-roy fbshipit-source-id: 468b3b1249d58dd8104643d61d263d1f9b0308bf	2019-05-28 02:55:48 -07:00
Roy Li	d2f14db6cb	Change view dispatch to abstract (#20308 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20308 ghimport-source-id: cac8d130d45cc36e51d1661c15ad98c10353ea54 Differential Revision: D15509849 Pulled By: li-roy fbshipit-source-id: 9576028b7075f58c431dc8c12a38c4c5a34c9340	2019-05-28 02:55:41 -07:00
Ilia Cherniavskii	580eab6562	Restore TBB module (#20454 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20454 ghimport-source-id: 14aca1dedbe647d41e55e7538a6b7eeab0fc4384 Differential Revision: D15326062 Pulled By: ilia-cher fbshipit-source-id: 02b005a679b10dc7a264978e87a8d2bb98ab972f	2019-05-28 02:49:36 -07:00
Ilia Cherniavskii	82aecfad6a	Native ATen/Parallel backend (#20087 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20087 ghimport-source-id: bcfc8a86abe0893e4a380fe6f6123e2082ba4317 Differential Revision: D15248663 Pulled By: ilia-cher fbshipit-source-id: fdb7a8860c85d8202026b629cb7fa344782bd2c4	2019-05-28 01:40:54 -07:00
peter	f4b434a6a5	Fix incorrect torch version in CMake (#21007 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/20525 Pull Request resolved: https://github.com/pytorch/pytorch/pull/21007 Differential Revision: D15515260 Pulled By: soumith fbshipit-source-id: 149084cce276c5e76ca0c5c0872c5e990c47bdfd	2019-05-27 23:46:49 -07:00
jpgard	0556141339	fix small typo muliprocessing -> multiprocessing (#20998 ) Summary: Minor typo fix in docstring. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20998 Differential Revision: D15514698 Pulled By: soumith fbshipit-source-id: a9ceb557251ff5868e810331195243b6a8717851	2019-05-27 21:36:13 -07:00
Junjie Bai	5ddbfc97e9	Revert D15501945: [pytorch][PR] Fix advanced indexing on "huge" Tensors Differential Revision: D15501945 Original commit changeset: e876e678e866 fbshipit-source-id: 2833eb118a62e301571a983529f6e4fc91442581	2019-05-27 20:26:37 -07:00
peter	3b0d431bf5	Check for incompatible versions between CUDA and MSVC Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20945 Differential Revision: D15514576 Pulled By: ezyang fbshipit-source-id: 3c0b8b64edce236a84a7195605d437a00a67b7f4	2019-05-27 19:22:21 -07:00
Syed Tousif Ahmed	0d35f14565	Update cuSPARSE namespace collision w/ CUDA 10.1 Update 1 (#20889 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20889 ghimport-source-id: 8f5f500fa542d4992cd9213923e1af8de115ee58 Differential Revision: D15495545 Pulled By: ezyang fbshipit-source-id: 60057cf13694158299a8124b1a787cb4e3c21d21	2019-05-27 18:43:32 -07:00
Nishant Pandit	9d9751f634	Convert dequantize_linear to an internal function _dequantize_linear (#20938 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20938 Dequantize_linear need not be exposed to the front end users. It will only be used for the jit passes for q-dq insertion and op substitution. Differential Revision: D15446097 fbshipit-source-id: a5fbcf2bb72115122c9653e5089d014e2a2e891d	2019-05-27 15:40:21 -07:00
Guanheng Zhang	8e3311c5e2	Remove functionality unsupported by the JIT from multi_head_attention_forward. (#20653 ) Summary: Remove the internal functions in multi_head_attention_forward. Those internal functions cause 10-15% performance regression and there is possibly a JIT issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20653 Differential Revision: D15398888 Pulled By: cpuhrsch fbshipit-source-id: 0a3f053a4ade5009e73d3974fa6733c2bff9d929	2019-05-27 15:12:58 -07:00
Soumith Chintala	6e76813a39	fix SyncBatchNorm doc (#20991 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/19265 Pull Request resolved: https://github.com/pytorch/pytorch/pull/20991 Differential Revision: D15513518 Pulled By: soumith fbshipit-source-id: 9618c0b2442e013e4d37793cdb04cb4f4b1b141c	2019-05-27 14:46:58 -07:00
xiaobing.zhang	ebc8d7170e	fix the bug for mkldnn clone (#20943 ) Summary: This PR is to solve the bug for clone a MKLDNN tensor, please see the issue https://github.com/pytorch/pytorch/issues/20895. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20943 Differential Revision: D15511516 Pulled By: mrshenli fbshipit-source-id: 05b41d6c7eaf8703521f4c768b8f26ec8501dc5e	2019-05-27 12:09:52 -07:00
Soumith Chintala	6480d3f140	Revert D15511921: [pytorch][PR] BatchSampler now uses list.clear() instead of creating new objects Differential Revision: D15511921 Original commit changeset: e943d21e75e1 fbshipit-source-id: 933b7ef74c7a530f0a2cc087c8ee6f0455cf9239	2019-05-27 10:51:24 -07:00
Tongzhou Wang	482ae8e6b2	BatchSampler now uses list.clear() instead of creating new objects Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20976 Differential Revision: D15511921 Pulled By: soumith fbshipit-source-id: e943d21e75e19f9154a0570f3188cc3ce174083e	2019-05-26 23:45:26 -07:00
Tongliang Liao	ecf012213b	Update submodule URL based on redirection. (#20973 ) Summary: Changes: - protobuf has been moved to protocolbuffers/protobuf a while ago. - cpuinfo has been moved to pytorch/cpuinfo and updated in FBGEMM recently. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20973 Differential Revision: D15511926 Pulled By: soumith fbshipit-source-id: 2c50373c9b245524f839bd1059870dd2b84e3b81	2019-05-26 22:29:21 -07:00
Tongzhou Wang	bb89827e1d	Update cuda pinned memory note to include tensor.to (#20977 ) Summary: separate bits of changes from #19228 Pull Request resolved: https://github.com/pytorch/pytorch/pull/20977 Differential Revision: D15511919 Pulled By: soumith fbshipit-source-id: 5015a29cdac6d6e160388c493182c330f0da63ec	2019-05-26 22:22:06 -07:00
Hong Xu	1e8f129a05	In setup.py, also check some submodules of submodules. (#20937 ) Summary: Sometimes users forget using the "--recursive" option when they update submodules. This added check should help expose this issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20937 Differential Revision: D15502846 Pulled By: mrshenli fbshipit-source-id: 34c28a2c71ee6442d16b8b741ea44a18733b1536	2019-05-26 18:43:24 -07:00
Dougal J. Sutherland	8dbdd00f87	tweak tqdm to have download speed in kB/MB/etc (#20908 ) Summary: This changes the progress bars in `_download_url_to_file` from saying things like `49773343.40it/s` to `47.5MB/s`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20908 Differential Revision: D15511223 Pulled By: soumith fbshipit-source-id: 2422eb5fb486f9ef4bd69c556c4ed1775b8b2860	2019-05-26 15:34:47 -07:00
Tongzhou Wang	5ab6e07180	.view(...) now suggests .reshape(...) instead .contiguous().view(...) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20968 Differential Revision: D15511236 Pulled By: soumith fbshipit-source-id: 673fc2982ad6ea287fdd0cff2684bdc2317a6709	2019-05-26 15:34:44 -07:00
Alexander Krotov	c611630b9d	Fix subscripts in RNN documentation Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20949 Differential Revision: D15510760 Pulled By: soumith fbshipit-source-id: 51e9dbea7d8c8194e46e12311e397deff32dbe2f	2019-05-26 14:57:40 -07:00
daquexian	a3a458ed30	Fix align corner docs (#20961 ) Summary: I believe the `True` and `False` in the doc are reversed :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/20961 Differential Revision: D15510806 Pulled By: soumith fbshipit-source-id: 62566bb595e187506b23dedc24892e48f35b1147	2019-05-26 14:57:37 -07:00
Yifu Wang	5e69e76aba	Remove padding_mode from torch.nn.functional.conv{1,2,3}d's docstr (#20891 ) Summary: Fixes #20694 Pull Request resolved: https://github.com/pytorch/pytorch/pull/20891 Differential Revision: D15510790 Pulled By: soumith fbshipit-source-id: aa3630693c7446bf18a390cb49c4df9bc9c59eea	2019-05-26 14:52:51 -07:00
Shen Li	4c5b1e3460	Update nccl submodule to v2.4.6 (#20882 ) Summary: Fixes #20630 Haven't tested it yet. Let's see if it passes all CI tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20882 Reviewed By: pietern Differential Revision: D15483561 Pulled By: mrshenli fbshipit-source-id: 5f0730a04d92906af077b2fe2170b674ca371e6c	2019-05-26 13:00:26 -07:00
Kaiyu Shi	9310e600f6	Use a simpler way to delete recursive function Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20913 Differential Revision: D15508071 Pulled By: mrshenli fbshipit-source-id: ad9a0ab4295bb0f1063d43682a10c124d8384635	2019-05-26 12:17:25 -07:00
Shagun	66e6571eb8	fixed issue #20921 (#20922 ) Summary: For tensor creation ops like `torch.zeros` and `torch.ones`, the docs [0], [1] use `sizes` as the first argument to the function call while the correct argument is `size`. This is tested for pytorch 1.1 installed using pip on ubuntu 19.04 An example ``` >>> torch.zeros(2, 3) tensor([[0., 0., 0.], [0., 0., 0.]]) >>> torch.zeros(sizes = (2, 3)) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: zeros() missing 1 required positional arguments: "size" >>> torch.zeros(size = (2, 3)) tensor([[0., 0., 0.], [0., 0., 0.]]) >>> torch.ones(sizes = (2, 3)) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: ones() missing 1 required positional arguments: "size" >>> torch.ones(size = (2, 3)) tensor([[1., 1., 1.], [1., 1., 1.]]) ``` [0]: https://pytorch.org/docs/master/torch.html#torch.zeros [1]: https://pytorch.org/docs/master/torch.html#torch.ones Pull Request resolved: https://github.com/pytorch/pytorch/pull/20922 Differential Revision: D15498741 Pulled By: mrshenli fbshipit-source-id: 963324ffa004d62ca77ce30ed6f0c3932b5b79b7	2019-05-25 22:22:18 -07:00
Tongzhou Wang	83fe92870d	Update multiprocessing note now that shared CUDA tensors are refcounted (#19904 ) Summary: The mp notes are not updated after https://github.com/pytorch/pytorch/pull/16854. (The torch.multiprocessing page is.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/19904 Differential Revision: D15509661 Pulled By: soumith fbshipit-source-id: 7c11e14a6c804498dda3adbf19710e63e6a564a0	2019-05-25 17:40:42 -07:00
Edward Yang	bdce5533fe	Fix pytorch_macos_10_13_py3_test (#20944 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20944 ghimport-source-id: da2dcbfaff4e0e75f6b4e836fc1af1d8aee11c56 Differential Revision: D15508912 Pulled By: ezyang fbshipit-source-id: 6758a8a516d0a875a5f6bbbb12e43d899bcf2161	2019-05-25 08:17:34 -07:00
Yanghan Wang	81e70ffa19	fix bug of not using get_score_cls_index in BoxWithNMSLimitOp (#20868 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20868 When `input_boxes_include_bg_cls` is false (which means `input_scores_fg_cls_starting_id` is 0), It doesn't map the class index of score currectly when sorting and limiting the detections over all classes after nms. Reviewed By: newstzpz Differential Revision: D15472706 fbshipit-source-id: dc1e808b63ad09fb4bd95acf866771bb3fa92d69	2019-05-24 22:31:01 -07:00
Tomasz Wrona	2fb665a9df	Add warning about memory overhead when using multiple tiny tensors (#20801 ) Summary: added note in docs regarding #19408 Pull Request resolved: https://github.com/pytorch/pytorch/pull/20801 Differential Revision: D15503351 Pulled By: mrshenli fbshipit-source-id: 7ab371a7992233fb867aadd4bb6b74fccd232c33	2019-05-24 21:45:51 -07:00
Wanchao Liang	c7e0722814	allow pass ordered dict for nn sequential (#20796 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20796 ghimport-source-id: 9f895a2a6ebc71984196b868dc3ea6a12286bc81 Differential Revision: D15505330 Pulled By: wanchaol fbshipit-source-id: 2922c56606b477a34f4e6433fa790d5b2de9d77a	2019-05-24 20:31:05 -07:00
Sam Gross	b93bdf6989	Fix advanced indexing on "huge" Tensors (#20919 ) Summary: This fixes advanced indexing in cases where there's more than 2^31-1 bytes in the output. The `gpu_index_kernel` was missing the `can_use_32bit_indexing`/`with_32bit_indexing` check. This also adds a number of TORCH_INTERNAL_ASSERTS in Loops.cuh, OffsetCalculator, and IntDivider that sizes are fit in a signed 32-bit integer. More comprehensive tests that require a 32 GB GPU are here: https://gist.github.com/colesbury/e29387f5851521256dff562be07b981e Fixes #20888 Pull Request resolved: https://github.com/pytorch/pytorch/pull/20919 Differential Revision: D15501945 Pulled By: colesbury fbshipit-source-id: e876e678e866d2efda8ee92c47a1d2d1310671f0	2019-05-24 16:25:04 -07:00
Sam Gross	430d1a2761	Attempt to fix flaky test_structseq_repr (#20931 ) Summary: Previously, this used `crepr` afer the decref of `repr`. This is not allowed because `repr` owns the cached copy of `crepr`. Let's see if this fixes the contbuild. See #20926 Pull Request resolved: https://github.com/pytorch/pytorch/pull/20931 Differential Revision: D15501929 Pulled By: colesbury fbshipit-source-id: 24141ba62df8758d2a3998cf7c2054be09088b6a	2019-05-24 15:55:44 -07:00
Edward Yang	b1df8bfe8a	Reduce set of build/tests which run on PRs. (#20930 ) Summary: Resubmit of #20775 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/20930 Differential Revision: D15503173 Pulled By: ezyang fbshipit-source-id: a5de8eacf6b29ee26f07ac53c915fff3f4d32569	2019-05-24 15:25:37 -07:00
Brennan Vincent	c46c6a4fe6	Zero slice bug (#20914 ) Summary: Bug reported internally at FB: ```python >>> t=torch.from_numpy(np.empty((0,4))) >>> t[:,1::2]=1 Traceback (most recent call last): File "<stdin>", line 1, in <module> RuntimeError: Trying to resize storage that is not resizable at ../aten/src/TH/THStorageFunctions.cpp:76 ``` This happens because the storage offset of `t[:, 1::2]` is 1, and it has 0 elements. We can fix this by avoiding resizing the storage for no-element arrays. (We could also* have avoided it by not modifying the storage index in this case, but I felt this way was more semantically correct -- in general, we should not be assuming it's okay to do anything to the storage when it has zero elements). Pull Request resolved: https://github.com/pytorch/pytorch/pull/20914 Differential Revision: D15497860 Pulled By: umanwizard fbshipit-source-id: 6af61d73a05edfc5c07ce8be9e530f15bf72e6a9	2019-05-24 15:10:59 -07:00
davidriazati	3858e1684b	Don't print backtrace for interpreter errors (#20925 ) Summary: Eager Python errors don't include a backtrace so script shouldn't either Pull Request resolved: https://github.com/pytorch/pytorch/pull/20925 Pulled By: driazati Differential Revision: D15499952 fbshipit-source-id: 1169f13ba5578cd52948725eda73de8229146bb1	2019-05-24 14:58:48 -07:00
Yanghan Wang	371bd043d6	register ResizeNearestOp to C10 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20928 Reviewed By: smessmer Differential Revision: D15499661 fbshipit-source-id: 5af24d5c9d7ff739b8355e19dfe66b496bc026a5	2019-05-24 14:39:11 -07:00
Dávid Komorowicz	b5a5e296aa	Support 3D mesh/point cloud (#20413 ) Summary: I started adding support for the new [mesh/point cloud](https://github.com/tensorflow/graphics/blob/master/tensorflow_graphics/g3doc/tensorboard.md) data type introduced to TensorBoard recently. I created the functions to add the data, created the appropriate summaries. This new data type however requires a Merged summary containing the data for the vertices, colors and faces. I got stuck at this stage. Maybe someone can help. lanpa? I converted the example code by Google to PyTorch: ```python import numpy as np import trimesh import torch from torch.utils.tensorboard import SummaryWriter sample_mesh = 'https://storage.googleapis.com/tensorflow-graphics/tensorboard/test_data/ShortDance07_a175_00001.ply' log_dir = 'runs/torch' batch_size = 1 # Camera and scene configuration. config_dict = { 'camera': {'cls': 'PerspectiveCamera', 'fov': 75}, 'lights': [ { 'cls': 'AmbientLight', 'color': '#ffffff', 'intensity': 0.75, }, { 'cls': 'DirectionalLight', 'color': '#ffffff', 'intensity': 0.75, 'position': [0, -1, 2], }], 'material': { 'cls': 'MeshStandardMaterial', 'roughness': 1, 'metalness': 0 } } # Read all sample PLY files. mesh = trimesh.load_remote(sample_mesh) vertices = np.array(mesh.vertices) # Currently only supports RGB colors. colors = np.array(mesh.visual.vertex_colors[:, :3]) faces = np.array(mesh.faces) # Add batch dimension, so our data will be of shape BxNxC. vertices = np.expand_dims(vertices, 0) colors = np.expand_dims(colors, 0) faces = np.expand_dims(faces, 0) # Create data placeholders of the same shape as data itself. vertices_tensor = torch.as_tensor(vertices) faces_tensor = torch.as_tensor(faces) colors_tensor = torch.as_tensor(colors) writer = SummaryWriter(log_dir) writer.add_mesh('mesh_color_tensor', vertices=vertices_tensor, faces=faces_tensor, colors=colors_tensor, config_dict=config_dict) writer.close() ``` I tried adding only the vertex summary, hence the others are supposed to be optional. I got the following error from TensorBoard and it also didn't display the points: ``` Traceback (most recent call last): File "/home/dawars/workspace/pytorch/venv/lib/python3.6/site-packages/werkzeug/serving.py", line 302, in run_wsgi execute(self.server.app) File "/home/dawars/workspace/pytorch/venv/lib/python3.6/site-packages/werkzeug/serving.py", line 290, in execute application_iter = app(environ, start_response) File "/home/dawars/workspace/pytorch/venv/lib/python3.6/site-packages/tensorboard/backend/application.py", line 309, in __call__ return self.data_applications[clean_path](environ, start_response) File "/home/dawars/workspace/pytorch/venv/lib/python3.6/site-packages/werkzeug/wrappers/base_request.py", line 235, in application resp = f(*args[:-2] + (request,)) File "/home/dawars/workspace/pytorch/venv/lib/python3.6/site-packages/tensorboard/plugins/mesh/mesh_plugin.py", line 252, in _serve_mesh_metadata tensor_events = self._collect_tensor_events(request) File "/home/dawars/workspace/pytorch/venv/lib/python3.6/site-packages/tensorboard/plugins/mesh/mesh_plugin.py", line 188, in _collect_tensor_events tensors = self._multiplexer.Tensors(run, instance_tag) File "/home/dawars/workspace/pytorch/venv/lib/python3.6/site-packages/tensorboard/backend/event_processing/plugin_event_multiplexer.py", line 400, in Tensors return accumulator.Tensors(tag) File "/home/dawars/workspace/pytorch/venv/lib/python3.6/site-packages/tensorboard/backend/event_processing/plugin_event_accumulator.py", line 437, in Tensors return self.tensors_by_tag[tag].Items(_TENSOR_RESERVOIR_KEY) KeyError: 'mesh_color_tensor_COLOR' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/20413 Differential Revision: D15500737 Pulled By: orionr fbshipit-source-id: 426e8b966037d08c065bce5198fd485fd80a2b67	2019-05-24 14:30:58 -07:00
Sebastian Messmer	6063ffd055	Specify dispatch key with kernel (#20821 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20821 Change registration API. Instead of static auto registry = torch::RegisterOperators() .op("my::op", torch::RegisterOperators::options() .kernel<Kernel>() .dispatchKey(CPUTensorId())); it is now static auto registry = torch::RegisterOperators() .op("my::op", torch::RegisterOperators::options() .kernel<Kernel>(CPUTensorId())); This binds kernel and dispatch key together, allowing them to be separate from other future configuration options like alias analysis or autograd wrappers. The semantic problem behind this is that the dispatch key is a kernel config parameter and not an operator config parameter while things like autograd wrappers, alias info, and actually the kernel itself are operator config parameters. And while previously, the different kind of config parameters have been mixed, this diff now separates them. Before this change, it wouldn't have been well defined if you specified a dispatchKey together with an autogradWrapper or aliasInfo for example. // what is this supposed to do? static auto registry = torch::RegisterOperators() .op("my::op", torch::RegisterOperators::options() .aliasInfo(DEFAULT) .dispatchKey(CPUTensorId())); If we get more kernel config parameters in the future, we could introduce something like this static auto registry = torch::RegisterOperators() .op("my::op", torch::RegisterOperators::options() .kernel<Kernel>(torch::RegisterOperators::kernelOptions() .dispatchKey(CPUTensorId()) .otherConfig()); but that's overkill as long as dispatch keys are the only kernel config parameter, and we can introduce that later without breaking backwards compatibility. A nice side effect of this is that people can register multiple kernels to the same operator in the same `.op()` call: static auto registry = torch::RegisterOperators() .op("my::op", torch::RegisterOperators::options() .kernel<Kernel1>(CPUTensorId()) .kernel<Kernel2>(CUDATensorId())); Reviewed By: dzhulgakov Differential Revision: D15455790 fbshipit-source-id: 1c46bfe676dcacf74cf36bd3f5df3d2c32b8fb11	2019-05-24 14:23:35 -07:00
Igor Fedan	a2328a27e9	Improve torch.cdist performance (#20605 ) Summary: Fix based on https://github.com/pytorch/pytorch/issues/15253 Pull Request resolved: https://github.com/pytorch/pytorch/pull/20605 Differential Revision: D15396123 Pulled By: ifedan fbshipit-source-id: 3ed373e68339a35360f083d4aad1b655abcaf97e	2019-05-24 14:06:55 -07:00
Sebastian Messmer	4501dc305d	Assert against using Operator methods not supported when exporting it to c10 (#17818 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17818 Some of these are probably implementable for exported operators, but aren't implemented yet and for now it's better to assert than to just return wrong results. Reviewed By: ezyang Differential Revision: D14392459 fbshipit-source-id: bf86e6cb0a7cfefd112a65dc85cc243e57a5ad52	2019-05-24 13:45:01 -07:00
Edward Yang	c8f404a68e	Revert D15499918: Reduce set of build/tests which run on PRs. Differential Revision: D15499918 Original commit changeset: 992e3f91f95d fbshipit-source-id: 86fc43d3da7ea3e3a32e95fc4f4f3de6cbd5d49b	2019-05-24 12:55:04 -07:00
Edward Yang	d03265b44f	Reduce set of build/tests which run on PRs. (#20775 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20775 ghimport-source-id: 8d05ed03b8a841233d578a38b7c84bd1152c08e5 Differential Revision: D15499918 Pulled By: ezyang fbshipit-source-id: 992e3f91f95dd9c0564e5ed6793dd1b286ddba00	2019-05-24 12:29:52 -07:00
Sam Gross	dee11a92c1	Use Device instead of Backend in TensorIterator (#20690 ) Summary: This PR also moves Device::validate into the header file, which makes statements like `Device d = kCPU` effectively free. Device includes the device's index, so TensorIterator::compute_types now implicitly checks that all CUDA inputs are on the same GPU. Previously, this was done ad-hoc in places like TensorIterator::binary_op. Note that zero-dim Tensor (scalars) are NOT required to be on the same device as other inputs because they behave almost like Python numbers. TensorIterator handles copying zero-dim Tensors to the common device. Prior to this PR, TensorIterator would copy zero-dim Tensors between CPU and GPU, but not between different GPUs (because Backend didn't encode the GPU index). This removes that restriction. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20690 Differential Revision: D15414826 Pulled By: colesbury fbshipit-source-id: 1d0ad1f7d663252af36dd4590bcda418c2f7a09f	2019-05-24 12:14:08 -07:00
Thomas Viehmann	17941f9979	JIT: Eliminate SumToSize by using Optional Lists (#18697 ) Summary: This PR is a eliminates unneeded grad_sum_to_size and in particular speeds up the LSTM backward by allowing better fusion. It consists of two parts: - In AutoDiff, record broadcasting sizes only if the broadcast output size is different from the input size, otherwise record None. - The specialization of Optional arguments (#18407) allows us to then eliminate ` _grad_sum_to_size(t, None)` in the peephole optimization step. Thus, in the LSTM case, no SumToSize remain in the crucial fusion group. The trick here is that we can specialize on the runtime information from the forward. I'm testing that different broadcasting situations lead to different graphs. I didn't move all symbolic_script _grad_sum_to_size to the new logic, but it might be better to do this incrementally, anyway. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18697 Differential Revision: D15482076 Pulled By: wanchaol fbshipit-source-id: 7f89367e35b8729910077c95c02bccefc8678afb	2019-05-24 11:24:17 -07:00

2968 changed files with 293768 additions and 109054 deletions

									
										468

.circleci/README.md
									
												View File
												
				@ -1,3 +1,23 @@

				Structure of CI

				===============

				setup job:

				1. Does a git checkout

				2. Persists CircleCI scripts (everything in `.circleci`) into a workspace.  Why?

				   We don't always do a Git checkout on all subjobs, but we usually

				   still want to be able to call scripts one way or another in a subjob.

				   Persisting files this way lets us have access to them without doing a

				   checkout.  This workspace is conventionally mounted on `~/workspace`

				   (this is distinguished from `~/project`, which is the conventional

				   working directory that CircleCI will default to starting your jobs

				   in.)

				3. Write out the commit message to `.circleci/COMMIT_MSG`.  This is so

				   we can determine in subjobs if we should actually run the jobs or

				   not, even if there isn't a Git checkout.

				CircleCI configuration generator

				================================

				@ -35,4 +55,450 @@ Future direction

				See comment [here](https://github.com/pytorch/pytorch/pull/17323#pullrequestreview-206945747):

				In contrast with a full recursive tree traversal of configuration dimensions,

				> in the future future I think we actually want to decrease our matrix somewhat and have only a few mostly-orthogonal builds that taste as many different features as possible on PRs, plus a more complete suite on every PR and maybe an almost full suite nightly/weekly (we don't have this yet). Specifying PR jobs in the future might be easier to read with an explicit list when we come to this.

				> in the future future I think we actually want to decrease our matrix somewhat and have only a few mostly-orthogonal builds that taste as many different features as possible on PRs, plus a more complete suite on every PR and maybe an almost full suite nightly/weekly (we don't have this yet). Specifying PR jobs in the future might be easier to read with an explicit list when we come to this.

				----------------

				----------------

				# How do the binaries / nightlies / releases work?

				### What is a binary?

				A binary or package (used interchangeably) is a pre-built collection of c++ libraries, header files, python bits, and other files. We build these and distribute them so that users do not need to install from source.

				A **binary configuration** is a collection of

				* release or nightly

				    * releases are stable, nightlies are beta and built every night

				* python version

				    * linux: 2.7m, 2.7mu, 3.5m, 3.6m 3.7m (mu is wide unicode or something like that. It usually doesn't matter but you should know that it exists)

				    * macos: 2.7, 3.5, 3.6, 3.7

				    * windows: 3.5, 3.6, 3.7

				* cpu version

				    * cpu, cuda 9.0, cuda 10.0

				    * The supported cuda versions occasionally change

				* operating system

				    * Linux - these are all built on CentOS. There haven't been any problems in the past building on CentOS and using on Ubuntu

				    * MacOS

				    * Windows - these are built on Azure pipelines

				* devtoolset version (gcc compiler version)

				    * This only matters on Linux cause only Linux uses gcc. tldr is gcc made a backwards incompatible change from gcc 4.8 to gcc 5, because it had to change how it implemented std::vector and std::string

				### Where are the binaries?

				The binaries are built in CircleCI. There are nightly binaries built every night at 9pm PST (midnight EST) and release binaries corresponding to Pytorch releases, usually every few months.

				We have 3 types of binary packages

				* pip packages - nightlies are stored on s3 (pip install -f <a s3 url>). releases are stored in a pip repo (pip install torch) (ask Soumith about this)

				* conda packages - nightlies and releases are both stored in a conda repo. Nighty packages have a '_nightly' suffix

				* libtorch packages - these are zips of all the c++ libraries, header files, and sometimes dependencies. These are c++ only

				    * shared with dependencies (the only supported option for Windows)

				    * static with dependencies

				    * shared without dependencies

				    * static without dependencies

				All binaries are built in CircleCI workflows except Windows. There are checked-in workflows (committed into the .circleci/config.yml) to build the nightlies every night. Releases are built by manually pushing a PR that builds the suite of release binaries (overwrite the config.yml to build the release)

				# CircleCI structure of the binaries

				Some quick vocab:

				* A\**workflow** is a CircleCI concept; it is a DAG of '**jobs**'. ctrl-f 'workflows' on\https://github.com/pytorch/pytorch/blob/master/.circleci/config.yml to see the workflows.

				* **jobs** are a sequence of '**steps**'

				* **steps** are usually just a bash script or a builtin CircleCI command.* All steps run in new environments, environment variables declared in one script DO NOT persist to following steps*

				* CircleCI has a **workspace**, which is essentially a cache between steps of the *same job* in which you can store artifacts between steps.

				## How are the workflows structured?

				The nightly binaries have 3 workflows. We have one job (actually 3 jobs:  build, test, and upload) per binary configuration

				1. binarybuilds

				    1. every day midnight EST

				    2. linux: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/linux-binary-build-defaults.yml

				    3. macos: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/macos-binary-build-defaults.yml

				    4. For each binary configuration, e.g. linux_conda_3.7_cpu there is a

				        1. binary_linux_conda_3.7_cpu_build

				            1. Builds the build. On linux jobs this uses the 'docker executor'.

				            2. Persists the package to the workspace

				        2. binary_linux_conda_3.7_cpu_test

				            1. Loads the package to the workspace

				            2. Spins up a docker image (on Linux), mapping the package and code repos into the docker

				            3. Runs some smoke tests in the docker

				            4. (Actually, for macos this is a step rather than a separate job)

				        3. binary_linux_conda_3.7_cpu_upload

				            1. Logs in to aws/conda

				            2. Uploads the package

				2. update_s3_htmls

				    1. every day 5am EST

				    2. https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/binary_update_htmls.yml

				    3. See below for what these are for and why they're needed

				    4. Three jobs that each examine the current contents of aws and the conda repo and update some html files in s3

				3. binarysmoketests

				    1. every day

				    2. https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/nightly-build-smoke-tests-defaults.yml

				    3. For each binary configuration, e.g. linux_conda_3.7_cpu there is a

				        1. smoke_linux_conda_3.7_cpu

				            1. Downloads the package from the cloud, e.g. using the official pip or conda instructions

				            2. Runs the smoke tests

				## How are the jobs structured?

				The jobs are in https://github.com/pytorch/pytorch/tree/master/.circleci/verbatim-sources . Jobs are made of multiple steps. There are some shared steps used by all the binaries/smokes. Steps of these jobs are all delegated to scripts in https://github.com/pytorch/pytorch/tree/master/.circleci/scripts .

				* Linux jobs: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/linux-binary-build-defaults.yml

				    * binary_linux_build.sh

				    * binary_linux_test.sh

				    * binary_linux_upload.sh

				* MacOS jobs: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/macos-binary-build-defaults.yml

				    * binary_macos_build.sh

				    * binary_macos_test.sh

				    * binary_macos_upload.sh

				* Update html jobs: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/binary_update_htmls.yml

				    * These delegate from the pytorch/builder repo

				    * https://github.com/pytorch/builder/blob/master/cron/update_s3_htmls.sh

				    * https://github.com/pytorch/builder/blob/master/cron/upload_binary_sizes.sh

				* Smoke jobs (both linux and macos): https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/nightly-build-smoke-tests-defaults.yml

				    * These delegate from the pytorch/builder repo

				    * https://github.com/pytorch/builder/blob/master/run_tests.sh

				    * https://github.com/pytorch/builder/blob/master/smoke_test.sh

				    * https://github.com/pytorch/builder/blob/master/check_binary.sh

				* Common shared code (shared across linux and macos): https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/nightly-binary-build-defaults.yml

				    * binary_checkout.sh - checks out pytorch/builder repo. Right now this also checks out pytorch/pytorch, but it shouldn't. pytorch/pytorch should just be shared through the workspace. This can handle being run before binary_populate_env.sh

				    * binary_populate_env.sh - parses BUILD_ENVIRONMENT into the separate env variables that make up a binary configuration. Also sets lots of default values, the date, the version strings, the location of folders in s3, all sorts of things. This generally has to be run before other steps.

				    * binary_install_miniconda.sh - Installs miniconda, cross platform. Also hacks this for the update_binary_sizes job that doesn't have the right env variables

				    * binary_run_in_docker.sh - Takes a bash script file (the actual test code) from a hardcoded location, spins up a docker image, and runs the script inside the docker image

				### **Why do the steps all refer to scripts?**

				CircleCI creates a  final yaml file by inlining every <<* segment, so if we were to keep all the code in the config.yml itself then the config size would go over 4 MB and cause infra problems.

				### **What is binary_run_in_docker for?**

				So, CircleCI has several executor types: macos, machine, and docker are the ones we use. The 'machine' executor gives you two cores on some linux vm. The 'docker' executor gives you considerably more cores (nproc was 32 instead of 2 back when I tried in February). Since the dockers are faster, we try to run everything that we can in dockers. Thus

				* linux build jobs use the docker executor. Running them on the docker executor was at least 2x faster than running them on the machine executor

				* linux test jobs use the machine executor and spin up their own docker. Why this nonsense? It's cause we run nvidia-docker for our GPU tests; any code that calls into the CUDA runtime needs to be run on nvidia-docker. To run a nvidia-docker you need to install some nvidia packages on the host machine and then call docker with the '—runtime nvidia' argument. CircleCI doesn't support this, so we have to do it ourself.

				    * This is not just a mere inconvenience. **This blocks all of our linux tests from using more than 2 cores.** But there is nothing that we can do about it, but wait for a fix on circleci's side. Right now, we only run some smoke tests (some simple imports) on the binaries, but this also affects non-binary test jobs.

				* linux upload jobs use the machine executor. The upload jobs are so short that it doesn't really matter what they use

				* linux smoke test jobs use the machine executor for the same reason as the linux test jobs

				binary_run_in_docker.sh is a way to share the docker start-up code between the binary test jobs and the binary smoke test jobs

				### **Why does binary_checkout also checkout pytorch? Why shouldn't it?**

				We want all the nightly binary jobs to run on the exact same git commit, so we wrote our own checkout logic to ensure that the same commit was always picked. Later circleci changed that to use a single pytorch checkout and persist it through the workspace (they did this because our config file was too big, so they wanted to take a lot of the setup code into scripts, but the scripts needed the code repo to exist to be called, so they added a prereq step called 'setup' to checkout the code and persist the needed scripts to the workspace). The changes to the binary jobs were not properly tested, so they all broke from missing pytorch code no longer existing. We hotfixed the problem by adding the pytorch checkout back to binary_checkout, so now there's two checkouts of pytorch on the binary jobs. This problem still needs to be fixed, but it takes careful tracing of which code is being called where.

				# Azure Pipelines structure of the binaries

				TODO: fill in stuff

				## How are the workflows structured?

				TODO: fill in stuff

				## How are the jobs structured?

				TODO: fill in stuff

				# Code structure of the binaries (circleci agnostic)

				## Overview

				The code that runs the binaries lives in two places, in the normal [github.com/pytorch/pytorch](http://github.com/pytorch/pytorch), but also in [github.com/pytorch/builder](http://github.com/pytorch/builder) , which is a repo that defines how all the binaries are built. The relevant code is

				```

				# All code needed to set-up environments for build code to run in,

				# but only code that is specific to the current CI system

				pytorch/pytorch

				- .circleci/                # Folder that holds all circleci related stuff

				  - config.yml              # GENERATED file that actually controls all circleci behavior

				  - verbatim-sources        # Used to generate job/workflow sections in ^

				  - scripts/                # Code needed to prepare circleci environments for binary build scripts

				- setup.py                  # Builds pytorch. This is wrapped in pytorch/builder

				- cmake files               # used in normal building of pytorch

				# All code needed to prepare a binary build, given an environment

				# with all the right variables/packages/paths.

				pytorch/builder

				# Given an installed binary and a proper python env, runs some checks

				# to make sure the binary was built the proper way. Checks things like

				# the library dependencies, symbols present, etc.

				- check_binary.sh

				# Given an installed binary, runs python tests to make sure everything

				# is in order. These should be de-duped. Right now they both run smoke

				# tests, but are called from different places. Usually just call some

				# import statements, but also has overlap with check_binary.sh above

				- run_tests.sh

				- smoke_test.sh

				# Folders that govern how packages are built. See paragraphs below

				- conda/

				  - build_pytorch.sh          # Entrypoint. Delegates to proper conda build folder

				  - switch_cuda_version.sh    # Switches activate CUDA installation in Docker

				  - pytorch-nightly/          # Build-folder

				- manywheel/

				  - build_cpu.sh              # Entrypoint for cpu builds

				  - build.sh                  # Entrypoint for CUDA builds

				  - build_common.sh           # Actual build script that ^^ call into

				- wheel/

				  - build_wheel.sh            # Entrypoint for wheel builds

				- windows/

				  - build_pytorch.bat         # Entrypoint for wheel builds on Windows

				```

				Every type of package has an entrypoint build script that handles the all the important logic.

				## Conda

				Linux, MacOS and Windows use the same code flow for the conda builds.

				Conda packages are built with conda-build, see https://conda.io/projects/conda-build/en/latest/resources/commands/conda-build.html

				Basically, you pass `conda build` a build folder (pytorch-nightly/ above) that contains a build script and a meta.yaml. The meta.yaml specifies in what python environment to build the package in, and what dependencies the resulting package should have, and the build script gets called in the env to build the thing.

				tldr; on conda-build is

				1. Creates a brand new conda environment, based off of deps in the meta.yaml

				    1. Note that environment variables do not get passed into this build env unless they are specified in the meta.yaml

				    2. If the build fails this environment will stick around. You can activate it for much easier debugging. The “General Python” section below explains what exactly a python “environment” is.

				2. Calls build.sh in the environment

				3. Copies the finished package to a new conda env, also specified by the meta.yaml

				4. Runs some simple import tests (if specified in the meta.yaml)

				5. Saves the finished package as a tarball

				The build.sh we use is essentially a wrapper around ```python setup.py build``` , but it also manually copies in some of our dependent libraries into the resulting tarball and messes with some rpaths.

				The entrypoint file `builder/conda/build_conda.sh` is complicated because

				* It works for Linux, MacOS and Windows

				    * The mac builds used to create their own environments, since they all used to be on the same machine. There’s now a lot of extra logic to handle conda envs. This extra machinery could be removed

				* It used to handle testing too, which adds more logic messing with python environments too. This extra machinery could be removed.

				## Manywheels (linux pip and libtorch packages)

				Manywheels are pip packages for linux distros. Note that these manywheels are not actually manylinux compliant.

				`builder/manywheel/build_cpu.sh` and `builder/manywheel/build.sh` (for CUDA builds) just set different env vars and then call into `builder/manywheel/build_common.sh`

				The entrypoint file `builder/manywheel/build_common.sh` is really really complicated because

				* This used to handle building for several different python versions at the same time. The loops have been removed, but there's still unnecessary folders and movements here and there.

				    * The script is never used this way anymore. This extra machinery could be removed.

				* This used to handle testing the pip packages too. This is why there’s testing code at the end that messes with python installations and stuff

				    * The script is never used this way anymore. This extra machinery could be removed.

				* This also builds libtorch packages

				    * This should really be separate. libtorch packages are c++ only and have no python. They should not share infra with all the python specific stuff in this file.

				* There is a lot of messing with rpaths. This is necessary, but could be made much much simpler if the above issues were fixed.

				## Wheels (MacOS pip and libtorch packages)

				The entrypoint file `builder/wheel/build_wheel.sh` is complicated because

				* The mac builds used to all run on one machine (we didn’t have autoscaling mac machines till circleci). So this script handled siloing itself by setting-up and tearing-down its build env and siloing itself into its own build directory.

				    * The script is never used this way anymore. This extra machinery could be removed.

				* This also builds libtorch packages

				    * Ditto the comment above. This should definitely be separated out.

				Note that the MacOS Python wheels are still built in conda environments. Some of the dependencies present during build also come from conda.

				## Windows Wheels (Windows pip and libtorch packages)

				The entrypoint file `builder/windows/build_pytorch.bat` is complicated because

				* This used to handle building for several different python versions at the same time. This is why there are loops everywhere

				    * The script is never used this way anymore. This extra machinery could be removed.

				* This used to handle testing the pip packages too. This is why there’s testing code at the end that messes with python installations and stuff

				    * The script is never used this way anymore. This extra machinery could be removed.

				* This also builds libtorch packages

				    * This should really be separate. libtorch packages are c++ only and have no python. They should not share infra with all the python specific stuff in this file.

				Note that the Windows Python wheels are still built in conda environments. Some of the dependencies present during build also come from conda.

				## General notes

				### Note on run_tests.sh, smoke_test.sh, and check_binary.sh

				* These should all be consolidated

				* These must run on all OS types: MacOS, Linux, and Windows

				* These all run smoke tests at the moment. They inspect the packages some, maybe run a few import statements. They DO NOT run the python tests nor the cpp tests. The idea is that python tests on master and PR merges will catch all breakages. All these tests have to do is make sure the special binary machinery didn’t mess anything up.

				* There are separate run_tests.sh and smoke_test.sh because one used to be called by the smoke jobs and one used to be called by the binary test jobs (see circleci structure section above). This is still true actually, but these could be united into a single script that runs these checks, given an installed pytorch package.

				### Note on libtorch

				Libtorch packages are built in the wheel build scripts: manywheel/build_*.sh for linux and build_wheel.sh for mac. There are several things wrong with this

				* It’s confusing. Most of those scripts deal with python specifics.

				* The extra conditionals everywhere severely complicate the wheel build scripts

				* The process for building libtorch is different from the official instructions (a plain call to cmake, or a call to a script)

				### Note on docker images / Dockerfiles

				All linux builds occur in docker images. The docker images are

				* soumith/conda-cuda

				    * Has ALL CUDA versions installed. The script pytorch/builder/conda/switch_cuda_version.sh sets /usr/local/cuda to a symlink to e.g. /usr/local/cuda-10.0 to enable different CUDA builds

				    * Also used for cpu builds

				* soumith/manylinux-cuda90

				* soumith/manylinux-cuda92

				* soumith/manylinux-cuda100

				    * Also used for cpu builds

				The Dockerfiles are available in pytorch/builder, but there is no circleci job or script to build these docker images, and they cannot be run locally (unless you have the correct local packages/paths). Only Soumith can build them right now.

				### General Python

				* This is still a good explanation of python installations https://caffe2.ai/docs/faq.html#why-do-i-get-import-errors-in-python-when-i-try-to-use-caffe2

				# How to manually rebuild the binaries

				tldr; make a PR that looks like https://github.com/pytorch/pytorch/pull/21159

				Sometimes we want to push a change to master and then rebuild all of today's binaries after that change. As of May 30, 2019 there isn't a way to manually run a workflow in the UI. You can manually re-run a workflow, but it will use the exact same git commits as the first run and will not include any changes. So we have to make a PR and then force circleci to run the binary workflow instead of the normal tests. The above PR is an example of how to do this; essentially you copy-paste the binarybuilds workflow steps into the default workflow steps. If you need to point the builder repo to a different commit then you'd need to change https://github.com/pytorch/pytorch/blob/master/.circleci/scripts/binary_checkout.sh#L42-L45 to checkout what you want.

				## How to test changes to the binaries via .circleci

				Writing PRs that test the binaries is annoying, since the default circleci jobs that run on PRs are not the jobs that you want to run. Likely, changes to the binaries will touch something under .circleci/ and require that .circleci/config.yml be regenerated (.circleci/config.yml controls all .circleci behavior, and is generated using ```.circleci/regenerate.sh``` in python 3.7). But you also need to manually hardcode the binary jobs that you want to test into the .circleci/config.yml workflow, so you should actually make at least two commits, one for your changes and one to temporarily hardcode jobs. See https://github.com/pytorch/pytorch/pull/22928 as an example of how to do this.

				```

				# Make your changes

				touch .circleci/verbatim-sources/nightly-binary-build-defaults.yml

				# Regenerate the yaml, has to be in python 3.7

				.circleci/regenerate.sh

				# Make a commit

				git add .circleci *

				git commit -m "My real changes"

				git push origin my_branch

				# Now hardcode the jobs that you want in the .circleci/config.yml workflows section

				# Also eliminate ensure-consistency and should_run_job checks

				# e.g. https://github.com/pytorch/pytorch/commit/2b3344bfed8772fe86e5210cc4ee915dee42b32d

				# Make a commit you won't keep

				git add .circleci

				git commit -m "[DO NOT LAND] testing binaries for above changes"

				git push origin my_branch

				# Now you need to make some changes to the first commit.

				git rebase -i HEAD~2 # mark the first commit as 'edit'

				# Make the changes

				touch .circleci/verbatim-sources/nightly-binary-build-defaults.yml

				.circleci/regenerate.sh

				# Ammend the commit and recontinue

				git add .circleci

				git commit --amend

				git rebase --continue

				# Update the PR, need to force since the commits are different now

				git push origin my_branch --force

				```

				The advantage of this flow is that you can make new changes to the base commit and regenerate the .circleci without having to re-write which binary jobs you want to test on. The downside is that all updates will be force pushes.

				## How to build a binary locally

				### Linux

				You can build Linux binaries locally easily using docker.

				```

				# Run the docker

				# Use the correct docker image, soumith/conda-cuda used here as an example

				#

				# -v path/to/foo:path/to/bar makes path/to/foo on your local machine (the

				#    machine that you're running the command on) accessible to the docker

				#    container at path/to/bar. So if you then run `touch path/to/bar/baz`

				#    in the docker container then you will see path/to/foo/baz on your local

				#    machine. You could also clone the pytorch and builder repos in the docker.

				#

				# If you're building a CUDA binary then use `nvidia-docker run` instead, see below.

				#

				# If you know how, add ccache as a volume too and speed up everything

				docker run \

				    -v your/pytorch/repo:/pytorch \

				    -v your/builder/repo:/builder \

				    -v where/you/want/packages/to/appear:/final_pkgs \

				    -it soumith/conda-cuda /bin/bash

				# Export whatever variables are important to you. All variables that you'd

				# possibly need are in .circleci/scripts/binary_populate_env.sh

				# You should probably always export at least these 3 variables

				export PACKAGE_TYPE=conda

				export DESIRED_PYTHON=3.6

				export DESIRED_CUDA=cpu

				# Call the entrypoint

				# `|& tee foo.log` just copies all stdout and stderr output to foo.log

				# The builds generate lots of output so you probably need this when

				# building locally.

				/builder/conda/build_pytorch.sh |& tee build_output.log

				```

				**Building CUDA binaries on docker**

				To build a CUDA binary you need to use `nvidia-docker run` instead of just `docker run` (or you can manually pass `--runtime=nvidia`). This adds some needed libraries and things to build CUDA stuff.

				You can build CUDA binaries on CPU only machines, but you can only run CUDA binaries on CUDA machines. This means that you can build a CUDA binary on a docker on your laptop if you so choose (though it’s gonna take a loong time).

				For Facebook employees, ask about beefy machines that have docker support and use those instead of your laptop; it will be 5x as fast.

				### MacOS

				There’s no easy way to generate reproducible hermetic MacOS environments. If you have a Mac laptop then you can try emulating the .circleci environments as much as possible, but you probably have packages in /usr/local/, possibly installed by brew, that will probably interfere with the build. If you’re trying to repro an error on a Mac build in .circleci and you can’t seem to repro locally, then my best advice is actually to iterate on .circleci    :/

				But if you want to try, then I’d recommend

				```

				# Create a new terminal

				# Clear your LD_LIBRARY_PATH and trim as much out of your PATH as you

				# know how to do

				# Install a new miniconda

				# First remove any other python or conda installation from your PATH

				# Always install miniconda 3, even if building for Python <3

				new_conda="~/my_new_conda"

				conda_sh="$new_conda/install_miniconda.sh"

				curl -o "$conda_sh" https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh

				chmod +x "$conda_sh"

				"$conda_sh" -b -p "$MINICONDA_ROOT"

				rm -f "$conda_sh"

				export PATH="~/my_new_conda/bin:$PATH"

				# Create a clean python env

				# All MacOS builds use conda to manage the python env and dependencies

				# that are built with, even the pip packages

				conda create -yn binary python=2.7

				conda activate binary

				# Export whatever variables are important to you. All variables that you'd

				# possibly need are in .circleci/scripts/binary_populate_env.sh

				# You should probably always export at least these 3 variables

				export PACKAGE_TYPE=conda

				export DESIRED_PYTHON=3.6

				export DESIRED_CUDA=cpu

				# Call the entrypoint you want

				path/to/builder/wheel/build_wheel.sh

				```

				N.B. installing a brand new miniconda is important. This has to do with how conda installations work. See the “General Python” section above, but tldr; is that

				1. You make the ‘conda’ command accessible by prepending `path/to/conda_root/bin` to your PATH.

				2. You make a new env and activate it, which then also gets prepended to your PATH. Now you have `path/to/conda_root/envs/new_env/bin:path/to/conda_root/bin:$PATH`

				3. Now say you (or some code that you ran) call python executable `foo`

				    1. if you installed `foo` in `new_env`, then `path/to/conda_root/envs/new_env/bin/foo` will get called, as expected.

				    2. But if you forgot to installed `foo` in `new_env` but happened to previously install it in your root conda env (called ‘base’), then unix/linux will still find `path/to/conda_root/bin/foo` . This is dangerous, since `foo` can be a different version than you want; `foo` can even be for an incompatible python version!

				Newer conda versions and proper python hygiene can prevent this, but just install a new miniconda to be safe.

				### Windows

				TODO: fill in

									
										34

.circleci/cimodel/data/binary_build_data.py
									
												View File
												
				@ -59,11 +59,21 @@ CONFIG_TREE_DATA = OrderedDict(

				    )),

				)

				DEVTOOLSET_VERSIONS = [

				    3,

				    7,

				]

				# GCC config variants:

				#

				# All the nightlies (except libtorch with new gcc ABI) are built with devtoolset7,

				# which can only build with old gcc ABI. It is better than devtoolset3

				# because it understands avx512, which is needed for good fbgemm performance.

				#

				# Libtorch with new gcc ABI is built with gcc 5.4 on Ubuntu 16.04.

				LINUX_GCC_CONFIG_VARIANTS = OrderedDict(

				    manywheel=['devtoolset7'],

				    conda=['devtoolset7'],

				    libtorch=[

				        "devtoolset7",

				        "gcc5.4_cxx11-abi",

				    ],

				)

				class TopLevelNode(ConfigNode):

				@ -97,24 +107,24 @@ class PackageFormatConfigNode(ConfigNode):

				        self.props["package_format"] = package_format

				    def get_children(self):

				        if self.find_prop("os_name") == "linux" and self.find_prop("package_format") != "conda":

				            return [LinuxGccConfigNode(self, v) for v in DEVTOOLSET_VERSIONS]

				        if self.find_prop("os_name") == "linux":

				            return [LinuxGccConfigNode(self, v) for v in LINUX_GCC_CONFIG_VARIANTS[self.find_prop("package_format")]]

				        else:

				            return [ArchConfigNode(self, v) for v in self.find_prop("cuda_versions")]

				class LinuxGccConfigNode(ConfigNode):

				    def __init__(self, parent, devtoolset_version):

				        super(LinuxGccConfigNode, self).__init__(parent, "DEVTOOLSET=" + str(devtoolset_version))

				    def __init__(self, parent, gcc_config_variant):

				        super(LinuxGccConfigNode, self).__init__(parent, "GCC_CONFIG_VARIANT=" + str(gcc_config_variant))

				        self.props["devtoolset_version"] = devtoolset_version

				        self.props["gcc_config_variant"] = gcc_config_variant

				    def get_children(self):

				        cuda_versions = self.find_prop("cuda_versions")

				        # XXX devtoolset7 on CUDA 9.0 is temporarily disabled

				        # see https://github.com/pytorch/pytorch/issues/20066

				        if self.find_prop("devtoolset_version") == 7:

				        if self.find_prop("gcc_config_variant") == 'devtoolset7':

				            cuda_versions = filter(lambda x: x != "90", cuda_versions)

				        return [ArchConfigNode(self, v) for v in cuda_versions]

				@ -142,7 +152,7 @@ class PyVersionConfigNode(ConfigNode):

				        package_format = self.find_prop("package_format")

				        os_name = self.find_prop("os_name")

				        has_libtorch_variants = smoke and package_format == "libtorch" and os_name == "linux"

				        has_libtorch_variants = package_format == "libtorch" and os_name == "linux"

				        linking_variants = LINKING_DIMENSIONS if has_libtorch_variants else []

				        return [LinkingVariantConfigNode(self, v) for v in linking_variants]

									
										152

.circleci/cimodel/data/binary_build_definitions.py
									
												View File
												
				@ -9,7 +9,7 @@ import cimodel.lib.visualization as visualization

				class Conf(object):

				    def __init__(self, os, cuda_version, pydistro, parms, smoke, libtorch_variant, devtoolset_version):

				    def __init__(self, os, cuda_version, pydistro, parms, smoke, libtorch_variant, gcc_config_variant):

				        self.os = os

				        self.cuda_version = cuda_version

				@ -17,15 +17,17 @@ class Conf(object):

				        self.parms = parms

				        self.smoke = smoke

				        self.libtorch_variant = libtorch_variant

				        self.devtoolset_version = devtoolset_version

				        self.gcc_config_variant = gcc_config_variant

				    def gen_build_env_parms(self):

				        elems = [self.pydistro] + self.parms + [binary_build_data.get_processor_arch_name(self.cuda_version)]

				        if self.devtoolset_version is not None:

				            elems.append("devtoolset" + str(self.devtoolset_version))

				        if self.gcc_config_variant is not None:

				            elems.append(str(self.gcc_config_variant))

				        return elems

				    def gen_docker_image(self):

				        if self.gcc_config_variant == 'gcc5.4_cxx11-abi':

				            return miniutils.quote("soumith/conda-cuda-cxx11-ubuntu1604:latest")

				        docker_word_substitution = {

				            "manywheel": "manylinux",

				@ -34,8 +36,8 @@ class Conf(object):

				        docker_distro_prefix = miniutils.override(self.pydistro, docker_word_substitution)

				        # The cpu nightlies are built on the soumith/manylinux-cuda80 docker image

				        alt_docker_suffix = self.cuda_version or "80"

				        # The cpu nightlies are built on the soumith/manylinux-cuda100 docker image

				        alt_docker_suffix = self.cuda_version or "100"

				        docker_distro_suffix = "" if self.pydistro == "conda" else alt_docker_suffix

				        return miniutils.quote("soumith/" + docker_distro_prefix + "-cuda" + docker_distro_suffix)

				@ -46,44 +48,45 @@ class Conf(object):

				        parts = [self.get_name_prefix(), self.os] + self.gen_build_env_parms()

				        if self.smoke:

				            if self.libtorch_variant:

				                parts.append(self.libtorch_variant)

				        else:

				        if self.libtorch_variant:

				            parts.append(self.libtorch_variant)

				        if not self.smoke:

				            parts.append(build_or_test)

				        return "_".join(parts)

				    def gen_yaml_tree(self, build_or_test):

				        env_tuples = [("BUILD_ENVIRONMENT", miniutils.quote(" ".join(self.gen_build_env_parms())))]

				        joined = "_".join(parts)

				        return joined.replace(".", "_")

				    def gen_workflow_job(self, phase, upload_phase_dependency=None):

				        job_def = OrderedDict()

				        job_def["name"] = self.gen_build_name(phase)

				        job_def["build_environment"] = miniutils.quote(" ".join(self.gen_build_env_parms()))

				        job_def["requires"] = ["setup"]

				        job_def["filters"] = {"branches": {"only": "nightly"}}

				        if self.libtorch_variant:

				            env_tuples.append(("LIBTORCH_VARIANT", miniutils.quote(self.libtorch_variant)))

				            job_def["libtorch_variant"] = miniutils.quote(self.libtorch_variant)

				        if phase == "test":

				            if not self.smoke:

				                job_def["requires"].append(self.gen_build_name("build"))

				            if not (self.smoke and self.os == "macos"):

				                job_def["docker_image"] = self.gen_docker_image()

				            if self.cuda_version:

				                job_def["use_cuda_docker_runtime"] = miniutils.quote("1")

				        else:

				            if self.os == "linux" and phase != "upload":

				                job_def["docker_image"] = self.gen_docker_image()

				        if phase == "test":

				            if self.cuda_version:

				                job_def["resource_class"] = "gpu.medium"

				        if phase == "upload":

				            job_def["context"] = "org-member"

				            job_def["requires"] = ["setup", self.gen_build_name(upload_phase_dependency)]

				        os_name = miniutils.override(self.os, {"macos": "mac"})

				        d = {"<<": "*" + "_".join([self.get_name_prefix(), os_name, build_or_test])}

				        if build_or_test == "test":

				            if not (self.smoke and self.os == "macos"):

				                env_tuples.append(("DOCKER_IMAGE", self.gen_docker_image()))

				            if self.cuda_version:

				                env_tuples.append(("USE_CUDA_DOCKER_RUNTIME", miniutils.quote("1")))

				        else:

				            if self.os == "linux" and build_or_test != "upload":

				                d["docker"] = [{"image": self.gen_docker_image()}]

				        d["environment"] = OrderedDict(env_tuples)

				        if build_or_test == "test":

				            if self.cuda_version:

				                d["resource_class"] = "gpu.medium"

				        return d

				        job_name = "_".join([self.get_name_prefix(), os_name, phase])

				        return {job_name : job_def}

				def get_root(smoke, name):

				@ -108,7 +111,7 @@ def gen_build_env_list(smoke):

				            [c.find_prop("pyver")],

				            c.find_prop("smoke"),

				            c.find_prop("libtorch_variant"),

				            c.find_prop("devtoolset_version"),

				            c.find_prop("gcc_config_variant"),

				        )

				        newlist.append(conf)

				@ -116,31 +119,17 @@ def gen_build_env_list(smoke):

				def predicate_exclude_nonlinux_and_libtorch(config):

				    return config.os == "linux" and (config.smoke or config.pydistro != "libtorch")

				    return config.os == "linux"

				def add_build_entries(jobs_dict, phase, smoke, filter_predicate=lambda x: True):

				    configs = gen_build_env_list(smoke)

				    for conf_options in filter(filter_predicate, configs):

				        jobs_dict[conf_options.gen_build_name(phase)] = conf_options.gen_yaml_tree(phase)

				def add_binary_build_specs(jobs_dict):

				    add_build_entries(jobs_dict, "build", False)

				def add_binary_build_tests(jobs_dict):

				    add_build_entries(jobs_dict, "test", False, predicate_exclude_nonlinux_and_libtorch)

				def add_binary_build_uploads(jobs_dict):

				    add_build_entries(jobs_dict, "upload", False)

				def add_smoke_test_specs(jobs_dict):

				    add_build_entries(jobs_dict, "test", True)

				def get_nightly_uploads():

				    configs = gen_build_env_list(False)

				    mylist = []

				    for conf in configs:

				        phase_dependency = "test" if predicate_exclude_nonlinux_and_libtorch(conf) else "build"

				        mylist.append(conf.gen_workflow_job("upload", phase_dependency))

				    return mylist

				def get_nightly_tests():

				@ -149,55 +138,22 @@ def get_nightly_tests():

				    tests = []

				    for conf_options in filtered_configs:

				        params = {"requires": ["setup", conf_options.gen_build_name("build")]}

				        tests.append({conf_options.gen_build_name("test"): params})

				        yaml_item = conf_options.gen_workflow_job("test")

				        tests.append(yaml_item)

				    return tests

				def get_nightly_uploads():

				    configs = gen_build_env_list(False)

				    def gen_config(conf, phase_dependency):

				        return {

				            conf.gen_build_name("upload"): OrderedDict([

				                ("context", "org-member"),

				                ("requires", ["setup", conf.gen_build_name(phase_dependency)]),

				            ]),

				        }

				    mylist = []

				    for conf in configs:

				        phase_dependency = "test" if predicate_exclude_nonlinux_and_libtorch(conf) else "build"

				        mylist.append(gen_config(conf, phase_dependency))

				    return mylist

				def gen_schedule_tree(cron_timing):

				    return [{

				        "schedule": {

				            "cron": miniutils.quote(cron_timing),

				            "filters": {

				                "branches": {

				                    "only": ["master"],

				                },

				            },

				        },

				    }]

				def add_jobs_and_render(jobs_dict, toplevel_key, smoke, cron_schedule):

				    jobs_list = ["setup"]

				    configs = gen_build_env_list(smoke)

				    phase = "build" if toplevel_key == "binarybuilds" else "test"

				    for build_config in configs:

				        build_name = build_config.gen_build_name("build")

				        jobs_list.append({build_name: {"requires": ["setup"]}})

				        jobs_list.append(build_config.gen_workflow_job(phase))

				    jobs_dict[toplevel_key] = OrderedDict(

				        triggers=gen_schedule_tree(cron_schedule),

				        jobs=jobs_list,

				    )

									
										37

.circleci/cimodel/data/caffe2_build_data.py
									
												View File
												
				@ -1,8 +1,7 @@

				#!/usr/bin/env python3

				from cimodel.lib.conf_tree import ConfigNode, X

				from cimodel.lib.conf_tree import ConfigNode, X, XImportant

				from cimodel.lib.conf_tree import Ver

				import cimodel.data.dimensions as dimensions

				CONFIG_TREE_DATA = [

				@ -14,16 +13,17 @@ CONFIG_TREE_DATA = [

				        (Ver("cuda", "9.0"), [

				            # TODO make explicit that this is a "secret TensorRT build"

				            #  (see https://github.com/pytorch/pytorch/pull/17323#discussion_r259446749)

				            # TODO Uh oh, were we supposed to make this one important?!

				            X("py2"),

				            X("cmake"),

				            XImportant("cmake"),

				        ]),

				        (Ver("cuda", "9.1"), [X("py2")]),

				        (Ver("mkl"), [X("py2")]),

				        (Ver("gcc", "5"), [X("onnx_py2")]),

				        (Ver("cuda", "10.1"), [XImportant("py3.5")]),  # TensorRT 6 build

				        (Ver("mkl"), [XImportant("py2")]),

				        (Ver("gcc", "5"), [XImportant("onnx_py2")]),

				        (Ver("clang", "3.8"), [X("py2")]),

				        (Ver("clang", "3.9"), [X("py2")]),

				        (Ver("clang", "7"), [X("py2")]),

				        (Ver("android"), [X("py2")]),

				        (Ver("clang", "7"), [XImportant("py2"), XImportant("onnx_py3.6")]),

				        (Ver("android"), [XImportant("py2")]),

				    ]),

				    (Ver("centos", "7"), [

				        (Ver("cuda", "9.0"), [X("py2")]),

				@ -32,7 +32,7 @@ CONFIG_TREE_DATA = [

				        # TODO ios and system aren't related. system qualifies where the python comes

				        #  from (use the system python instead of homebrew or anaconda)

				        (Ver("ios"), [X("py2")]),

				        (Ver("system"), [X("py2")]),

				        (Ver("system"), [XImportant("py2")]),

				    ]),

				]

				@ -54,6 +54,8 @@ class TreeConfigNode(ConfigNode):

				        return [self.child_constructor()(self, k, v) for (k, v) in self.subtree]

				    def is_build_only(self):

				        if str(self.find_prop("language_version")) == "onnx_py3.6":

				            return False

				        return str(self.find_prop("compiler_version")) in [

				            "gcc4.9",

				            "clang3.8",

				@ -95,16 +97,13 @@ class LanguageConfigNode(TreeConfigNode):

				        self.props["language_version"] = node_name

				        self.props["build_only"] = self.is_build_only()

				    def get_children(self):

				        children = []

				        for phase in dimensions.PHASES:

				            if phase == "build" or not self.props["build_only"]:

				                children.append(PhaseConfigNode(self, phase, []))

				        return children

				    def child_constructor(self):

				        return ImportantConfigNode

				class PhaseConfigNode(TreeConfigNode):

				class ImportantConfigNode(TreeConfigNode):

				    def init2(self, node_name):

				        self.props["phase_name"] = node_name

				        self.props["important"] = True

				    def get_children(self):

				        return []

									
										118

.circleci/cimodel/data/caffe2_build_definitions.py
									
												View File
												
				@ -2,30 +2,34 @@

				from collections import OrderedDict

				import cimodel.data.dimensions as dimensions

				import cimodel.lib.conf_tree as conf_tree

				from cimodel.lib.conf_tree import Ver

				import cimodel.lib.miniutils as miniutils

				import cimodel.lib.visualization as visualization

				from cimodel.data.caffe2_build_data import CONFIG_TREE_DATA, TopLevelNode

				from dataclasses import dataclass

				DOCKER_IMAGE_PATH_BASE = "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/"

				DOCKER_IMAGE_VERSION = 276

				DOCKER_IMAGE_VERSION = 324

				class Conf(object):

				    def __init__(self, language, distro, compiler, phase, build_only):

				        self.language = language

				        self.distro = distro

				        self.compiler = compiler

				        self.phase = phase

				        self.build_only = build_only

				@dataclass

				class Conf:

				    language: str

				    distro: Ver

				    compiler: Ver

				    build_only: bool

				    is_important: bool

				    # TODO: Eventually we can probably just remove the cudnn7 everywhere.

				    def get_cudnn_insertion(self):

				        omit = self.language == "onnx_py2" \

				            or self.language == "onnx_py3.6" \

				            or self.compiler.name in ["android", "mkl", "clang"] \

				            or str(self.distro) in ["ubuntu14.04", "macos10.13"]

				@ -44,9 +48,6 @@ class Conf(object):

				        root_parts = self.get_build_name_root_parts()

				        return "_".join(root_parts + [phase]).replace(".", "_")

				    def get_name(self):

				        return self.construct_phase_name(self.phase)

				    def get_platform(self):

				        platform = self.distro.name

				        if self.distro.name != "macos":

				@ -57,6 +58,7 @@ class Conf(object):

				        lang_substitutions = {

				            "onnx_py2": "py2",

				            "onnx_py3.6": "py3.6",

				            "cmake": "py2",

				        }

				@ -64,12 +66,11 @@ class Conf(object):

				        parts = [lang] + self.get_build_name_middle_parts()

				        return miniutils.quote(DOCKER_IMAGE_PATH_BASE + "-".join(parts) + ":" + str(DOCKER_IMAGE_VERSION))

				    def gen_yaml_tree(self):

				        tuples = []

				    def gen_workflow_params(self, phase):

				        parameters = OrderedDict()

				        lang_substitutions = {

				            "onnx_py2": "onnx-py2",

				            "onnx_py3.6": "onnx-py3.6",

				        }

				        lang = miniutils.override(self.language, lang_substitutions)

				@ -77,39 +78,42 @@ class Conf(object):

				        parts = [

				            "caffe2",

				            lang,

				        ] + self.get_build_name_middle_parts() + [self.phase]

				        build_env = "-".join(parts)

				        if not self.distro.name == "macos":

				            build_env = miniutils.quote(build_env)

				        tuples.append(("BUILD_ENVIRONMENT", build_env))

				        ] + self.get_build_name_middle_parts() + [phase]

				        build_env_name = "-".join(parts)

				        parameters["build_environment"] = miniutils.quote(build_env_name)

				        if self.compiler.name == "ios":

				            tuples.append(("BUILD_IOS", miniutils.quote("1")))

				        if self.phase == "test":

				            parameters["build_ios"] = miniutils.quote("1")

				        if phase == "test":

				            # TODO cuda should not be considered a compiler

				            if self.compiler.name == "cuda":

				                tuples.append(("USE_CUDA_DOCKER_RUNTIME", miniutils.quote("1")))

				                parameters["use_cuda_docker_runtime"] = miniutils.quote("1")

				        if self.distro.name == "macos":

				            tuples.append(("PYTHON_VERSION", miniutils.quote("2")))

				        else:

				            tuples.append(("DOCKER_IMAGE", self.gen_docker_image()))

				        if self.distro.name != "macos":

				            parameters["docker_image"] = self.gen_docker_image()

				            if self.build_only:

				                tuples.append(("BUILD_ONLY", miniutils.quote("1")))

				        d = OrderedDict({"environment": OrderedDict(tuples)})

				        if self.phase == "test":

				                parameters["build_only"] = miniutils.quote("1")

				        if phase == "test":

				            resource_class = "large" if self.compiler.name != "cuda" else "gpu.medium"

				            d["resource_class"] = resource_class

				            parameters["resource_class"] = resource_class

				        d["<<"] = "*" + "_".join(["caffe2", self.get_platform(), self.phase, "defaults"])

				        return parameters

				        return d

				    def gen_workflow_job(self, phase):

				        job_def = OrderedDict()

				        job_def["name"] = self.construct_phase_name(phase)

				        job_def["requires"] = ["setup"]

				        if phase == "test":

				            job_def["requires"].append(self.construct_phase_name("build"))

				            job_name = "caffe2_" + self.get_platform() + "_test"

				        else:

				            job_name = "caffe2_" + self.get_platform() + "_build"

				        if not self.is_important:

				            job_def["filters"] = {"branches": {"only": ["master", r"/ci-all\/.*/"]}}

				        job_def.update(self.gen_workflow_params(phase))

				        return {job_name : job_def}

				def get_root():

				@ -125,11 +129,11 @@ def instantiate_configs():

				    for fc in found_configs:

				        c = Conf(

				            fc.find_prop("language_version"),

				            fc.find_prop("distro_version"),

				            fc.find_prop("compiler_version"),

				            fc.find_prop("phase_name"),

				            fc.find_prop("build_only"),

				            language=fc.find_prop("language_version"),

				            distro=fc.find_prop("distro_version"),

				            compiler=fc.find_prop("compiler_version"),

				            build_only=fc.find_prop("build_only"),

				            is_important=fc.find_prop("important"),

				        )

				        config_list.append(c)

				@ -137,17 +141,7 @@ def instantiate_configs():

				    return config_list

				def add_caffe2_builds(jobs_dict):

				    configs = instantiate_configs()

				    for conf_options in configs:

				        jobs_dict[conf_options.get_name()] = conf_options.gen_yaml_tree()

				    graph = visualization.generate_graph(get_root())

				    graph.draw("caffe2-config-dimensions.png", prog="twopi")

				def get_caffe2_workflows():

				def get_workflow_jobs():

				    configs = instantiate_configs()

				@ -158,11 +152,11 @@ def get_caffe2_workflows():

				    x = []

				    for conf_options in filtered_configs:

				        requires = ["setup"]

				        phases = ["build"]

				        if not conf_options.build_only:

				            phases = dimensions.PHASES

				        if conf_options.phase == "test":

				            requires.append(conf_options.construct_phase_name("build"))

				        x.append({conf_options.get_name(): {"requires": requires}})

				        for phase in phases:

				            x.append(conf_options.gen_workflow_job(phase))

				    return x

									
										3

.circleci/cimodel/data/dimensions.py
									
												View File
												
				@ -5,8 +5,9 @@ PHASES = ["build", "test"]

				CUDA_VERSIONS = [

				    None,  # cpu build

				    "90",

				    "92",

				    "100",

				    "101",

				]

				STANDARD_PYTHON_VERSIONS = [

									
										92

.circleci/cimodel/data/pytorch_build_data.py
									
												View File
												
				@ -1,31 +1,38 @@

				#!/usr/bin/env python3

				from cimodel.lib.conf_tree import ConfigNode, X

				from cimodel.lib.conf_tree import ConfigNode, X, XImportant

				CONFIG_TREE_DATA = [

				    ("trusty", [

				    ("xenial", [

				        (None, [

				            X("2.7.9"),

				            XImportant("2.7.9"),

				            X("2.7"),

				            ("3.5", [("important", [X(True)])]),

				            XImportant("3.5"),  # Not run on all PRs, but should be included on [test all]

				            X("nightly"),

				        ]),

				        ("gcc", [

				            ("4.8", [X("3.6")]),

				            ("5.4", [

				                X("3.6"),

				            ("5.4", [  # All this subtree rebases to master and then build

				                XImportant("3.6"),

				                ("3.6", [

				                    ("xla", [X(True)]),

				                    ("namedtensor", [X(True)]),

				                    ("namedtensor", [XImportant(True)]),

				                ]),

				            ]),

				            ("7", [X("3.6")]),

				        ]),

				    ]),

				    ("xenial", [

				        ("clang", [

				            ("5", [X("3.6")]),

				            ("5", [

				                XImportant("3.6"),  # This is actually the ASAN build

				                ("3.6", [

				                    ("namedtensor", [XImportant(True)]),  # ASAN

				                ]),

				            ]),

				            ("7", [

				                ("3.6", [

				                    ("xla", [XImportant(True)]),

				                ]),

				            ]),

				        ]),

				        ("cuda", [

				            ("9", [

				@ -36,14 +43,25 @@ CONFIG_TREE_DATA = [

				                # and

				                # https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/build.sh#L153

				                # (from https://github.com/pytorch/pytorch/pull/17323#discussion_r259453144)

				                ("2.7", [("important", [X(True)])]),

				                X("3.6"),

				                X("2.7"),

				                XImportant("3.6"),

				                ("2.7", [

				                    ("namedtensor", [XImportant(True)]),

				                ]),

				            ]),

				            ("9.2", [X("3.6")]),

				            ("10", [X("3.6")]),

				            ("10.1", [X("3.6")]),

				        ]),

				        ("android", [

				            ("r19c", [X("3.6")]),

				            ("r19c", [

				                ("3.6", [

				                    ("android_abi", [XImportant("x86_32")]),

				                    ("android_abi", [X("x86_64")]),

				                    ("android_abi", [X("arm-v7a")]),

				                    ("android_abi", [X("arm-v8a")]),

				                ])

				            ]),

				        ]),

				    ]),

				]

				@ -87,34 +105,11 @@ class DistroConfigNode(TreeConfigNode):

				        distro = self.find_prop("distro_name")

				        next_nodes = {

				            "trusty": TrustyCompilerConfigNode,

				            "xenial": XenialCompilerConfigNode,

				        }

				        return next_nodes[distro]

				class TrustyCompilerConfigNode(TreeConfigNode):

				    def modify_label(self, label):

				        return label or "<unspecified>"

				    def init2(self, node_name):

				        self.props["compiler_name"] = node_name

				    def child_constructor(self):

				        return TrustyCompilerVersionConfigNode if self.props["compiler_name"] else PyVerConfigNode

				class TrustyCompilerVersionConfigNode(TreeConfigNode):

				    def init2(self, node_name):

				        self.props["compiler_version"] = node_name

				    # noinspection PyMethodMayBeStatic

				    def child_constructor(self):

				        return PyVerConfigNode

				class PyVerConfigNode(TreeConfigNode):

				    def init2(self, node_name):

				        self.props["pyver"] = node_name

				@ -136,6 +131,7 @@ class ExperimentalFeatureConfigNode(TreeConfigNode):

				            "xla": XlaConfigNode,

				            "namedtensor": NamedTensorConfigNode,

				            "important": ImportantConfigNode,

				            "android_abi": AndroidAbiConfigNode,

				        }

				        return next_nodes[experimental_feature]

				@ -147,6 +143,9 @@ class XlaConfigNode(TreeConfigNode):

				    def init2(self, node_name):

				        self.props["is_xla"] = node_name

				    def child_constructor(self):

				        return ImportantConfigNode

				class NamedTensorConfigNode(TreeConfigNode):

				    def modify_label(self, label):

				@ -155,6 +154,16 @@ class NamedTensorConfigNode(TreeConfigNode):

				    def init2(self, node_name):

				        self.props["is_namedtensor"] = node_name

				    def child_constructor(self):

				        return ImportantConfigNode

				class AndroidAbiConfigNode(TreeConfigNode):

				    def init2(self, node_name):

				        self.props["android_abi"] = node_name

				    def child_constructor(self):

				        return ImportantConfigNode

				class ImportantConfigNode(TreeConfigNode):

				    def modify_label(self, label):

				@ -163,15 +172,22 @@ class ImportantConfigNode(TreeConfigNode):

				    def init2(self, node_name):

				        self.props["is_important"] = node_name

				    def get_children(self):

				        return []

				class XenialCompilerConfigNode(TreeConfigNode):

				    def modify_label(self, label):

				        return label or "<unspecified>"

				    def init2(self, node_name):

				        self.props["compiler_name"] = node_name

				    # noinspection PyMethodMayBeStatic

				    def child_constructor(self):

				        return XenialCompilerVersionConfigNode

				        return XenialCompilerVersionConfigNode if self.props["compiler_name"] else PyVerConfigNode

				class XenialCompilerVersionConfigNode(TreeConfigNode):

									
										202

.circleci/cimodel/data/pytorch_build_definitions.py
									
												View File
												
				@ -6,51 +6,44 @@ from cimodel.data.pytorch_build_data import TopLevelNode, CONFIG_TREE_DATA

				import cimodel.data.dimensions as dimensions

				import cimodel.lib.conf_tree as conf_tree

				import cimodel.lib.miniutils as miniutils

				import cimodel.lib.visualization as visualization

				from dataclasses import dataclass, field

				from typing import List, Optional

				DOCKER_IMAGE_PATH_BASE = "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/"

				DOCKER_IMAGE_VERSION = 300

				# ARE YOU EDITING THIS NUMBER?  MAKE SURE YOU READ THE GUIDANCE AT THE

				# TOP OF .circleci/config.yml

				DOCKER_IMAGE_VERSION = 347

				class Conf(object):

				    def __init__(self,

				                 distro,

				                 parms,

				                 pyver=None,

				                 cuda_version=None,

				                 is_xla=False,

				                 restrict_phases=None,

				                 gpu_resource=None,

				                 dependent_tests=None,

				                 parent_build=None,

				                 is_namedtensor=False,

				                 is_important=False):

				        self.distro = distro

				        self.pyver = pyver

				        self.parms = parms

				        self.cuda_version = cuda_version

				        # TODO expand this to cover all the USE_* that we want to test for

				        #  tesnrorrt, leveldb, lmdb, redis, opencv, mkldnn, ideep, etc.

				        # (from https://github.com/pytorch/pytorch/pull/17323#discussion_r259453608)

				        self.is_xla = is_xla

				        self.is_namedtensor = is_namedtensor

				        self.is_important = is_important

				        self.restrict_phases = restrict_phases

				        self.gpu_resource = gpu_resource

				        self.dependent_tests = dependent_tests or []

				        self.parent_build = parent_build

				@dataclass

				class Conf:

				    distro: str

				    parms: List[str]

				    parms_list_ignored_for_docker_image: Optional[List[str]] = None

				    pyver: Optional[str] = None

				    cuda_version: Optional[str] = None

				    # TODO expand this to cover all the USE_* that we want to test for

				    #  tesnrorrt, leveldb, lmdb, redis, opencv, mkldnn, ideep, etc.

				    # (from https://github.com/pytorch/pytorch/pull/17323#discussion_r259453608)

				    is_xla: bool = False

				    restrict_phases: Optional[List[str]] = None

				    gpu_resource: Optional[str] = None

				    dependent_tests: List = field(default_factory=list)

				    parent_build: Optional['Conf'] = None

				    is_namedtensor: bool = False

				    is_important: bool = False

				    # TODO: Eliminate the special casing for docker paths

				    # In the short term, we *will* need to support special casing as docker images are merged for caffe2 and pytorch

				    def get_parms(self, for_docker):

				        leading = []

				        if self.is_important and not for_docker:

				            leading.append("AAA")

				        # We just don't run non-important jobs on pull requests;

				        # previously we also named them in a way to make it obvious

				        # if self.is_important and not for_docker:

				        #    leading.append("AAA")

				        leading.append("pytorch")

				        if self.is_xla and not for_docker:

				            leading.append("xla")

				@ -60,7 +53,10 @@ class Conf(object):

				        cuda_parms = []

				        if self.cuda_version:

				            cuda_parms.extend(["cuda" + self.cuda_version, "cudnn7"])

				        return leading + ["linux", self.distro] + cuda_parms + self.parms

				        result = leading + ["linux", self.distro] + cuda_parms + self.parms

				        if (not for_docker and self.parms_list_ignored_for_docker_image is not None):

				            result = result + self.parms_list_ignored_for_docker_image

				        return result

				    def gen_docker_image_path(self):

				@ -78,44 +74,27 @@ class Conf(object):

				    def get_dependents(self):

				        return self.dependent_tests or []

				    def gen_yaml_tree(self, build_or_test):

				        build_job_name_pieces = self.get_build_job_name_pieces(build_or_test)

				    def gen_workflow_params(self, phase):

				        parameters = OrderedDict()

				        build_job_name_pieces = self.get_build_job_name_pieces(phase)

				        build_env_name = "-".join(map(str, build_job_name_pieces))

				        env_dict = OrderedDict([

				            ("BUILD_ENVIRONMENT", build_env_name),

				            ("DOCKER_IMAGE", self.gen_docker_image_path()),

				        ])

				        if self.pyver:

				            env_dict["PYTHON_VERSION"] = miniutils.quote(self.pyver)

				        if build_or_test == "test" and self.gpu_resource:

				            env_dict["USE_CUDA_DOCKER_RUNTIME"] = miniutils.quote("1")

				        d = {

				            "environment": env_dict,

				            "<<": "*" + "_".join(["pytorch", "linux", build_or_test, "defaults"]),

				        }

				        if build_or_test == "test":

				        parameters["build_environment"] = miniutils.quote(build_env_name)

				        parameters["docker_image"] = self.gen_docker_image_path()

				        if phase == "test" and self.gpu_resource:

				            parameters["use_cuda_docker_runtime"] = miniutils.quote("1")

				        if phase == "test":

				            resource_class = "large"

				            if self.gpu_resource:

				                resource_class = "gpu." + self.gpu_resource

				            parameters["resource_class"] = resource_class

				        return parameters

				                if self.gpu_resource == "large":

				                    env_dict["MULTI_GPU"] = miniutils.quote("1")

				            d["resource_class"] = resource_class

				        return d

				    def gen_workflow_yaml_item(self, phase):

				    def gen_workflow_job(self, phase):

				        # All jobs require the setup job

				        parameters = OrderedDict({"requires": ["setup"]})

				        job_def = OrderedDict()

				        job_def["name"] = self.gen_build_name(phase)

				        job_def["requires"] = ["setup"]

				        if phase == "test":

				@ -125,9 +104,19 @@ class Conf(object):

				            #  pytorch build job (from https://github.com/pytorch/pytorch/pull/17323#discussion_r259452641)

				            dependency_build = self.parent_build or self

				            parameters["requires"].append(dependency_build.gen_build_name("build"))

				            job_def["requires"].append(dependency_build.gen_build_name("build"))

				            job_name = "pytorch_linux_test"

				        else:

				            job_name = "pytorch_linux_build"

				        return {self.gen_build_name(phase): parameters}

				        if not self.is_important:

				            # If you update this, update

				            # caffe2_build_definitions.py too

				            job_def["filters"] = {"branches": {"only": ["master", r"/ci-all\/.*/"]}}

				        job_def.update(self.gen_workflow_params(phase))

				        return {job_name : job_def}

				# TODO This is a hack to special case some configs just for the workflow list

				@ -136,8 +125,7 @@ class HiddenConf(object):

				        self.name = name

				        self.parent_build = parent_build

				    def gen_workflow_yaml_item(self, phase):

				    def gen_workflow_job(self, phase):

				        return {self.gen_build_name(phase): {"requires": [self.parent_build.gen_build_name("build")]}}

				    def gen_build_name(self, _):

				@ -166,11 +154,12 @@ def gen_dependent_configs(xenial_parent_config):

				            restrict_phases=["test"],

				            gpu_resource=gpu,

				            parent_build=xenial_parent_config,

				            is_important=xenial_parent_config.is_important,

				        )

				        configs.append(c)

				    for x in ["pytorch_short_perf_test_gpu", "pytorch_doc_push"]:

				    for x in ["pytorch_short_perf_test_gpu", "pytorch_python_doc_push", "pytorch_cpp_doc_push"]:

				        configs.append(HiddenConf(x, parent_build=xenial_parent_config))

				    return configs

				@ -196,16 +185,18 @@ def instantiate_configs():

				    for fc in found_configs:

				        distro_name = fc.find_prop("distro_name")

				        compiler_name = fc.find_prop("compiler_name")

				        compiler_version = fc.find_prop("compiler_version")

				        is_xla = fc.find_prop("is_xla") or False

				        parms_list_ignored_for_docker_image = []

				        python_version = None

				        if distro_name == "xenial":

				        if compiler_name == "cuda" or compiler_name == "android":

				            python_version = fc.find_prop("pyver")

				            parms_list = [fc.find_prop("abbreviated_pyver")]

				        else:

				            parms_list = ["py" + fc.find_prop("pyver")]

				        compiler_name = fc.find_prop("compiler_name")

				        cuda_version = None

				        if compiler_name == "cuda":

				            cuda_version = fc.find_prop("compiler_version")

				@ -215,20 +206,24 @@ def instantiate_configs():

				            # TODO: do we need clang to compile host binaries like protoc?

				            parms_list.append("clang5")

				            parms_list.append("android-ndk-" + android_ndk_version)

				            android_abi = fc.find_prop("android_abi")

				            parms_list_ignored_for_docker_image.append(android_abi)

				            restrict_phases = ["build"]

				        elif compiler_name:

				            gcc_version = compiler_name + (fc.find_prop("compiler_version") or "")

				            parms_list.append(gcc_version)

				            if compiler_name == "clang":

				            # TODO: This is a nasty special case

				            if compiler_name == "clang" and not is_xla:

				                parms_list.append("asan")

				                python_version = fc.find_prop("pyver")

				                parms_list[0] = fc.find_prop("abbreviated_pyver")

				        if cuda_version in ["9.2", "10"]:

				        if cuda_version in ["9.2", "10", "10.1"]:

				            # TODO The gcc version is orthogonal to CUDA version?

				            parms_list.append("gcc7")

				        is_xla = fc.find_prop("is_xla") or False

				        is_namedtensor = fc.find_prop("is_namedtensor") or False

				        is_important = fc.find_prop("is_important") or False

				@ -239,6 +234,7 @@ def instantiate_configs():

				        c = Conf(

				            distro_name,

				            parms_list,

				            parms_list_ignored_for_docker_image,

				            python_version,

				            cuda_version,

				            is_xla,

				@ -251,44 +247,26 @@ def instantiate_configs():

				        if cuda_version == "9" and python_version == "3.6":

				            c.dependent_tests = gen_dependent_configs(c)

				        if (compiler_name == "gcc"

				                and compiler_version == "5.4"

				                and not is_namedtensor):

				            bc_breaking_check = Conf(

				                "backward-compatibility-check",

				                [],

				                is_xla=False,

				                restrict_phases=["test"],

				                is_namedtensor=False,

				                is_important=True,

				                parent_build=c,

				            )

				            c.dependent_tests.append(bc_breaking_check)

				        config_list.append(c)

				    return config_list

				def add_build_env_defs(jobs_dict):

				    mydict = OrderedDict()

				    config_list = instantiate_configs()

				    for c in config_list:

				        phases = c.restrict_phases or dimensions.PHASES

				        for phase in phases:

				            # TODO why does this not have a test?

				            if phase == "test" and c.cuda_version == "10":

				                continue

				            d = c.gen_yaml_tree(phase)

				            mydict[c.gen_build_name(phase)] = d

				            if phase == "test":

				                for x in filter(lambda x: type(x) is not HiddenConf, c.get_dependents()):

				                    d = x.gen_yaml_tree(phase)

				                    mydict[x.gen_build_name(phase)] = d

				    # this is the circleci api version and probably never changes

				    jobs_dict["version"] = 2

				    jobs_dict["jobs"] = mydict

				    graph = visualization.generate_graph(get_root())

				    graph.draw("pytorch-config-dimensions.png", prog="twopi")

				def get_workflow_list():

				def get_workflow_jobs():

				    config_list = instantiate_configs()

				@ -303,10 +281,10 @@ def get_workflow_list():

				            if phase == "test" and conf_options.cuda_version == "10":

				                continue

				            x.append(conf_options.gen_workflow_yaml_item(phase))

				            x.append(conf_options.gen_workflow_job(phase))

				        # TODO convert to recursion

				        for conf in conf_options.get_dependents():

				            x.append(conf.gen_workflow_yaml_item("test"))

				            x.append(conf.gen_workflow_job("test"))

				    return x

									
										27

.circleci/cimodel/lib/conf_tree.py
									
												View File
												
				@ -1,6 +1,10 @@

				#!/usr/bin/env python3

				from dataclasses import dataclass, field

				from typing import Optional, Dict

				def X(val):

				    """

				    Compact way to write a leaf node

				@ -8,23 +12,28 @@ def X(val):

				    return val, []

				class Ver(object):

				def XImportant(name):

				    """Compact way to write an important (run on PRs) leaf node"""

				    return (name, [("important", [X(True)])])

				@dataclass

				class Ver:

				    """

				    Represents a product with a version number

				    """

				    def __init__(self, name, version=""):

				        self.name = name

				        self.version = version

				    name: str

				    version: str = ""

				    def __str__(self):

				        return self.name + self.version

				class ConfigNode(object):

				    def __init__(self, parent, node_name):

				        self.parent = parent

				        self.node_name = node_name

				        self.props = {}

				@dataclass

				class ConfigNode:

				    parent: Optional['ConfigNode']

				    node_name: str

				    props: Dict[str, str] = field(default_factory=dict)

				    def get_label(self):

				        return self.node_name

									
										18

.circleci/cimodel/lib/miniyaml.py
									
												View File
												
				@ -9,23 +9,13 @@ INDENTATION_WIDTH = 2

				def is_dict(data):

				    return type(data) is dict or type(data) is OrderedDict

				    return type(data) in [dict, OrderedDict]

				def is_collection(data):

				    return is_dict(data) or type(data) is list

				# TODO can eventually drop this custom sorting

				def sortkey(x):

				    k = x[0]

				    return (

				        k == "<<",

				        k != "environment",

				        k,

				    )

				def render(fh, data, depth, is_list_member=False):

				    """

				    PyYaml does not allow precise control over the quoting

				@ -39,7 +29,7 @@ def render(fh, data, depth, is_list_member=False):

				        tuples = list(data.items())

				        if type(data) is not OrderedDict:

				            tuples.sort(key=sortkey)

				            tuples.sort()

				        for i, (k, v) in enumerate(tuples):

				@ -51,10 +41,6 @@ def render(fh, data, depth, is_list_member=False):

				            render(fh, v, depth + 1 + int(is_list_member))

				        # TODO Could eventually drop this cosmetic convention

				        if depth == 2:

				            fh.write("\n")

				    elif type(data) is list:

				        for v in data:

				            render(fh, v, depth, True)

7545

.circleci/config.yml

View File

File diff suppressed because it is too large Load Diff

									
										38

.circleci/generate_config_yml.py
									
												View File
												
				@ -74,41 +74,33 @@ class Header(object):

				# Order of this list matters to the generated config.yml.

				YAML_SOURCES = [

				    File("header-section.yml"),

				    File("linux-build-defaults.yml"),

				    File("macos-build-defaults.yml"),

				    File("commands.yml"),

				    File("nightly-binary-build-defaults.yml"),

				    File("linux-binary-build-defaults.yml"),

				    File("macos-binary-build-defaults.yml"),

				    File("nightly-build-smoke-tests-defaults.yml"),

				    Header("Job specifications job specs"),

				    Treegen(pytorch_build_definitions.add_build_env_defs, 0),

				    Header("Build parameters"),

				    File("pytorch-build-params.yml"),

				    File("caffe2-build-params.yml"),

				    File("binary-build-params.yml"),

				    Header("Job specs"),

				    File("pytorch-job-specs.yml"),

				    File("caffe2-job-specs.yml"),

				    File("binary-job-specs.yml"),

				    File("job-specs-setup.yml"),

				    File("job-specs-custom.yml"),

				    Treegen(caffe2_build_definitions.add_caffe2_builds, 1),

				    File("binary_update_htmls.yml"),

				    Header("Binary build specs individual job specifications"),

				    Treegen(binary_build_definitions.add_binary_build_specs, 1),

				    Header(

				        "Binary build tests", [

				            "These are the smoke tests run right after the build, before the upload.",

				            "If these fail, the upload doesn't happen."

				        ]

				    ),

				    Treegen(binary_build_definitions.add_binary_build_tests, 1),

				    File("binary-build-tests.yml"),

				    Header("Binary build uploads"),

				    Treegen(binary_build_definitions.add_binary_build_uploads, 1),

				    Header("Smoke test specs individual job specifications"),

				    Treegen(binary_build_definitions.add_smoke_test_specs, 1),

				    File("workflows.yml"),

				    Listgen(pytorch_build_definitions.get_workflow_list, 3),

				    Listgen(pytorch_build_definitions.get_workflow_jobs, 3),

				    File("workflows-pytorch-macos-builds.yml"),

				    Listgen(caffe2_build_definitions.get_caffe2_workflows, 3),

				    File("workflows-pytorch-android-gradle-build.yml"),

				    File("workflows-pytorch-ios-builds.yml"),

				    Listgen(caffe2_build_definitions.get_workflow_jobs, 3),

				    File("workflows-binary-builds-smoke-subset.yml"),

				    Header("Daily smoke test trigger"),

				    Treegen(binary_build_definitions.add_binary_smoke_test_jobs, 1),

				    Header("Daily binary build trigger"),

				    Treegen(binary_build_definitions.add_binary_build_jobs, 1),

				    File("workflows-nightly-ios-binary-builds.yml"),

				    File("workflows-nightly-android-binary-builds.yml"),

				    Header("Nightly tests"),

				    Listgen(binary_build_definitions.get_nightly_tests, 3),

				    File("workflows-nightly-uploads-header.yml"),

4

.circleci/scripts/README.md Normal file

View File

 @ -0,0 +1,4 @@
 All the scripts in this directory are callable from `~/workspace/.circleci/scripts/foo.sh`.
 Don't try to call them as `.circleci/scripts/foo.sh`, that won't
 (necessarily) work.  See Note [Workspace for CircleCI scripts] in
 job-specs-setup.yml for more details.

									
										2

.circleci/scripts/binary_checkout.sh
									
												View File
												
				@ -41,8 +41,6 @@ popd

				# Clone the Builder master repo

				git clone -q https://github.com/pytorch/builder.git "$BUILDER_ROOT"

				pushd "$BUILDER_ROOT"

				git fetch origin

				git reset origin/master --hard

				echo "Using builder from "

				git --no-pager log --max-count 1

				popd

									
										38

.circleci/scripts/binary_ios_build.sh
									
										Normal file
									
												View File
												
				@ -0,0 +1,38 @@

				#!/bin/bash

				set -eux -o pipefail

				echo ""

				echo "PWD: ${PWD}"

				WORKSPACE=/Users/distiller/workspace

				PROJ_ROOT=/Users/distiller/project

				export TCLLIBPATH="/usr/local/lib" 

				# Install conda

				curl -o ~/Downloads/conda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh

				chmod +x ~/Downloads/conda.sh

				/bin/bash ~/Downloads/conda.sh -b -p ~/anaconda

				export PATH="~/anaconda/bin:${PATH}"

				source ~/anaconda/bin/activate

				# Install dependencies

				conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing requests --yes

				export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}

				# sync submodules

				cd ${PROJ_ROOT}

				git submodule sync

				git submodule update --init --recursive

				# run build script

				chmod a+x ${PROJ_ROOT}/scripts/build_ios.sh

				echo "########################################################"

				cat ${PROJ_ROOT}/scripts/build_ios.sh

				echo "########################################################"

				echo "IOS_ARCH: ${IOS_ARCH}"

				echo "IOS_PLATFORM: ${IOS_PLATFORM}"

				export BUILD_PYTORCH_MOBILE=1

				export IOS_ARCH=${IOS_ARCH}

				export IOS_PLATFORM=${IOS_PLATFORM}

				unbuffer ${PROJ_ROOT}/scripts/build_ios.sh 2>&1 | ts

				#store the binary

				cd ${WORKSPACE}

				DEST_DIR=${WORKSPACE}/ios

				mkdir -p ${DEST_DIR}

				cp -R ${PROJ_ROOT}/build_ios/install ${DEST_DIR}

				mv ${DEST_DIR}/install ${DEST_DIR}/${IOS_ARCH}

									
										44

.circleci/scripts/binary_ios_upload.sh
									
										Normal file
									
												View File
												
				@ -0,0 +1,44 @@

				#!/bin/bash

				set -eux -o pipefail

				echo ""

				echo "PWD: $(pwd)"

				WORKSPACE=/Users/distiller/workspace

				PROJ_ROOT=/Users/distiller/project

				ARTIFACTS_DIR=${WORKSPACE}/ios

				ls ${ARTIFACTS_DIR}

				ZIP_DIR=${WORKSPACE}/zip

				mkdir -p ${ZIP_DIR}/install/lib

				mkdir -p ${ZIP_DIR}/src

				# copy header files

				cp -R ${ARTIFACTS_DIR}/arm64/include ${ZIP_DIR}/install/

				# build a FAT bianry

				cd ${ZIP_DIR}/install/lib

				target_libs=(libc10.a libclog.a libcpuinfo.a libeigen_blas.a libpytorch_qnnpack.a libtorch.a)

				for lib in ${target_libs[*]}

				do

				    libs=(${ARTIFACTS_DIR}/x86_64/lib/${lib} ${ARTIFACTS_DIR}/arm64/lib/${lib})

				    lipo -create "${libs[@]}" -o ${ZIP_DIR}/install/lib/${lib}

				done

				# for nnpack, we only support arm64 build

				cp ${ARTIFACTS_DIR}/arm64/lib/libnnpack.a ./

				lipo -i ${ZIP_DIR}/install/lib/*.a

				# copy the umbrella header and license

				cp ${PROJ_ROOT}/ios/LibTorch.h ${ZIP_DIR}/src/

				cp ${PROJ_ROOT}/LICENSE ${ZIP_DIR}/

				# zip the library

				ZIPFILE=libtorch_ios_nightly_build.zip

				cd ${ZIP_DIR}

				#for testing

				touch version.txt

				echo $(date +%s) > version.txt

				zip -r ${ZIPFILE} install src version.txt LICENSE

				# upload to aws

				brew install awscli

				set +x

				export AWS_ACCESS_KEY_ID=${AWS_S3_ACCESS_KEY_FOR_PYTORCH_BINARY_UPLOAD}

				export AWS_SECRET_ACCESS_KEY=${AWS_S3_ACCESS_SECRET_FOR_PYTORCH_BINARY_UPLOAD}

				set +x

				# echo "AWS KEY: ${AWS_ACCESS_KEY_ID}"

				# echo "AWS SECRET: ${AWS_SECRET_ACCESS_KEY}"

				aws s3 cp ${ZIPFILE} s3://ossci-ios-build/ --acl public-read

									
										2

.circleci/scripts/binary_linux_build.sh
									
												View File
												
				@ -19,7 +19,7 @@ fi

				# We want to call unbuffer, which calls tclsh which finds the expect

				# package. The expect was installed by yum into /usr/bin so we want to

				# find /usr/bin/tclsh, but this is shadowed by /opt/conda/bin/tclsh in

				# the conda docker images.

				# the conda docker images, so we prepend it to the path here.

				if [[ "$PACKAGE_TYPE" == 'conda' ]]; then

				  mkdir /just_tclsh_bin

				  ln -s /usr/bin/tclsh /just_tclsh_bin/tclsh

									
										12

.circleci/scripts/binary_linux_test.sh
									
												View File
												
				@ -11,7 +11,7 @@ if [[ "$PACKAGE_TYPE" == conda ]]; then

				  source activate testenv >/dev/null

				elif [[ "$DESIRED_PYTHON" == 2.7mu ]]; then

				  export PATH="/opt/python/cp27-cp27mu/bin:\$PATH"

				else

				elif [[ "$PACKAGE_TYPE" != libtorch ]]; then

				  python_nodot="\$(echo $DESIRED_PYTHON | tr -d m.u)"

				  export PATH="/opt/python/cp\$python_nodot-cp\${python_nodot}m/bin:\$PATH"

				fi

				@ -25,6 +25,9 @@ fi

				pkg="/final_pkgs/\$(ls /final_pkgs)"

				if [[ "$PACKAGE_TYPE" == conda ]]; then

				  conda install -y "\$pkg" --offline

				  if [[ "$DESIRED_CUDA" == 'cpu' ]]; then

				    conda install -y cpuonly -c pytorch

				  fi

				  retry conda install -yq future numpy protobuf six

				  if [[ "$DESIRED_CUDA" != 'cpu' ]]; then

				    # DESIRED_CUDA is in format cu90 or cu100

				@ -35,10 +38,15 @@ if [[ "$PACKAGE_TYPE" == conda ]]; then

				    fi

				    retry conda install -yq -c pytorch "cudatoolkit=\${cu_ver}"

				  fi

				else

				elif [[ "$PACKAGE_TYPE" != libtorch ]]; then

				  pip install "\$pkg"

				  retry pip install -q future numpy protobuf six

				fi

				if [[ "$PACKAGE_TYPE" == libtorch ]]; then

				  pkg="\$(ls /final_pkgs/*-latest.zip)"

				  unzip "\$pkg" -d /tmp

				  cd /tmp/libtorch

				fi

				# Test the package

				/builder/check_binary.sh

									
										2

.circleci/scripts/binary_linux_upload.sh
									
												View File
												
				@ -26,7 +26,7 @@ pushd /home/circleci/project/final_pkgs

				if [[ "$PACKAGE_TYPE" == conda ]]; then

				  retry conda install -yq anaconda-client

				  retry timeout 30 /home/circleci/project/login_to_anaconda.sh

				  anaconda upload "$(ls)" -u pytorch --label main --no-progress --force

				  anaconda upload "$(ls)" -u pytorch-nightly --label main --no-progress --force

				elif [[ "$PACKAGE_TYPE" == libtorch ]]; then

				  retry pip install -q awscli

				  s3_dir="s3://pytorch/libtorch/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"

									
										2

.circleci/scripts/binary_macos_upload.sh
									
												View File
												
				@ -26,7 +26,7 @@ pushd "$workdir/final_pkgs"

				if [[ "$PACKAGE_TYPE" == conda ]]; then

				  retry conda install -yq anaconda-client

				  retry /Users/distiller/project/login_to_anaconda.sh

				  retry anaconda upload "$(ls)" -u pytorch --label main --no-progress --force

				  retry anaconda upload "$(ls)" -u pytorch-nightly --label main --no-progress --force

				elif [[ "$PACKAGE_TYPE" == libtorch ]]; then

				  retry pip install -q awscli

				  s3_dir="s3://pytorch/libtorch/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"

									
										47

.circleci/scripts/binary_populate_env.sh
									
												View File
												
				@ -29,29 +29,37 @@ if [[ "$PACKAGE_TYPE" == 'libtorch' ]]; then

				fi

				# Pick docker image

				if [[ "$PACKAGE_TYPE" == conda ]]; then

				  export DOCKER_IMAGE="soumith/conda-cuda"

				elif [[ "$DESIRED_CUDA" == cpu ]]; then

				  export DOCKER_IMAGE="soumith/manylinux-cuda80"

				else

				  export DOCKER_IMAGE="soumith/manylinux-cuda${DESIRED_CUDA:2}"

				export DOCKER_IMAGE=${DOCKER_IMAGE:-}

				if [[ -z "$DOCKER_IMAGE" ]]; then

				  if [[ "$PACKAGE_TYPE" == conda ]]; then

				    export DOCKER_IMAGE="soumith/conda-cuda"

				  elif [[ "$DESIRED_CUDA" == cpu ]]; then

				    export DOCKER_IMAGE="soumith/manylinux-cuda100"

				  else

				    export DOCKER_IMAGE="soumith/manylinux-cuda${DESIRED_CUDA:2}"

				  fi

				fi

				# Upload to parallel folder for gcc abis

				if [[ "$DESIRED_DEVTOOLSET" == 'devtoolset7' ]]; then

				  export PIP_UPLOAD_FOLDER='nightly/devtoolset7/'

				  if [[ "$PACKAGE_TYPE" == 'conda' ]]; then

				    echo "We don't handle conda builds with gcc ABI of 1, since we don't"

				    echo "want to add a new package name to the conda builds"

				    exit 1

				  fi

				else

				# Upload to parallel folder for devtoolsets

				# All nightlies used to be devtoolset3, then devtoolset7 was added as a build

				# option, so the upload was redirected to nightly/devtoolset7 to avoid

				# conflicts with other binaries (there shouldn't be any conflicts). Now we are

				# making devtoolset7 the default.

				if [[ "$DESIRED_DEVTOOLSET" == 'devtoolset7' || "$DESIRED_DEVTOOLSET" == *"cxx11-abi"* || "$(uname)" == 'Darwin' ]]; then

				  export PIP_UPLOAD_FOLDER='nightly/'

				else

				  # On linux machines, this shouldn't actually be called anymore. This is just

				  # here for extra safety.

				  export PIP_UPLOAD_FOLDER='nightly/devtoolset3/'

				fi

				# We put this here so that OVERRIDE_PACKAGE_VERSION below can read from it

				export DATE="$(date -u +%Y%m%d)"

				export PYTORCH_BUILD_VERSION="1.2.0.dev$DATE"

				if [[ "$(uname)" == 'Darwin' ]] || [[ "$DESIRED_CUDA" == "cu100" ]] || [[ "$PACKAGE_TYPE" == conda ]]; then

				  export PYTORCH_BUILD_VERSION="1.3.0.dev$DATE"

				else

				  export PYTORCH_BUILD_VERSION="1.3.0.dev$DATE+$DESIRED_CUDA"

				fi

				export PYTORCH_BUILD_NUMBER=1

				cat >>"$envfile" <<EOL

				@ -67,15 +75,16 @@ export BUILD_PYTHONLESS="${BUILD_PYTHONLESS:-}"

				export DESIRED_DEVTOOLSET="$DESIRED_DEVTOOLSET"

				export DATE="$DATE"

				export NIGHTLIES_DATE_PREAMBLE=1.2.0.dev

				export NIGHTLIES_DATE_PREAMBLE=1.3.0.dev

				export PYTORCH_BUILD_VERSION="$PYTORCH_BUILD_VERSION"

				export PYTORCH_BUILD_NUMBER="$PYTORCH_BUILD_NUMBER"

				export OVERRIDE_PACKAGE_VERSION="$PYTORCH_BUILD_VERSION"

				export TORCH_PACKAGE_NAME='torch-nightly'

				# TODO: We don't need this anymore IIUC

				export TORCH_PACKAGE_NAME='torch'

				export TORCH_CONDA_BUILD_FOLDER='pytorch-nightly'

				export NO_FBGEMM=1

				export USE_FBGEMM=1

				export PIP_UPLOAD_FOLDER="$PIP_UPLOAD_FOLDER"

				export DOCKER_IMAGE="$DOCKER_IMAGE"

									
										59

.circleci/scripts/build_android_gradle.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,59 @@

				#!/usr/bin/env bash

				set -eux -o pipefail

				export ANDROID_NDK_HOME=/opt/ndk

				export ANDROID_HOME=/opt/android/sdk

				export GRADLE_VERSION=4.10.3

				export GRADLE_HOME=/opt/gradle/gradle-$GRADLE_VERSION

				export GRADLE_PATH=$GRADLE_HOME/bin/gradle

				BUILD_ANDROID_INCLUDE_DIR_x86=~/workspace/build_android/install/include

				BUILD_ANDROID_LIB_DIR_x86=~/workspace/build_android/install/lib

				BUILD_ANDROID_INCLUDE_DIR_x86_64=~/workspace/build_android_install_x86_64/install/include

				BUILD_ANDROID_LIB_DIR_x86_64=~/workspace/build_android_install_x86_64/install/lib

				BUILD_ANDROID_INCLUDE_DIR_arm_v7a=~/workspace/build_android_install_arm_v7a/install/include

				BUILD_ANDROID_LIB_DIR_arm_v7a=~/workspace/build_android_install_arm_v7a/install/lib

				BUILD_ANDROID_INCLUDE_DIR_arm_v8a=~/workspace/build_android_install_arm_v8a/install/include

				BUILD_ANDROID_LIB_DIR_arm_v8a=~/workspace/build_android_install_arm_v8a/install/lib

				PYTORCH_ANDROID_SRC_MAIN_DIR=~/workspace/android/pytorch_android/src/main

				JNI_INCLUDE_DIR=${PYTORCH_ANDROID_SRC_MAIN_DIR}/cpp/libtorch_include

				mkdir -p $JNI_INCLUDE_DIR

				JNI_LIBS_DIR=${PYTORCH_ANDROID_SRC_MAIN_DIR}/jniLibs

				mkdir -p $JNI_LIBS_DIR

				ln -s ${BUILD_ANDROID_INCLUDE_DIR_x86} ${JNI_INCLUDE_DIR}/x86

				ln -s ${BUILD_ANDROID_LIB_DIR_x86} ${JNI_LIBS_DIR}/x86

				if [[ "${BUILD_ENVIRONMENT}" != *-gradle-build-only-x86_32* ]]; then

				ln -s ${BUILD_ANDROID_INCLUDE_DIR_x86_64} ${JNI_INCLUDE_DIR}/x86_64

				ln -s ${BUILD_ANDROID_LIB_DIR_x86_64} ${JNI_LIBS_DIR}/x86_64

				ln -s ${BUILD_ANDROID_INCLUDE_DIR_arm_v7a} ${JNI_INCLUDE_DIR}/armeabi-v7a

				ln -s ${BUILD_ANDROID_LIB_DIR_arm_v7a} ${JNI_LIBS_DIR}/armeabi-v7a

				ln -s ${BUILD_ANDROID_INCLUDE_DIR_arm_v8a} ${JNI_INCLUDE_DIR}/arm64-v8a

				ln -s ${BUILD_ANDROID_LIB_DIR_arm_v8a} ${JNI_LIBS_DIR}/arm64-v8a

				fi

				env

				echo "BUILD_ENVIRONMENT:$BUILD_ENVIRONMENT"

				export GRADLE_LOCAL_PROPERTIES=~/workspace/android/local.properties

				rm -f $GRADLE_LOCAL_PROPERTIES

				echo "sdk.dir=/opt/android/sdk" >> $GRADLE_LOCAL_PROPERTIES

				echo "ndk.dir=/opt/ndk" >> $GRADLE_LOCAL_PROPERTIES

				if [[ "${BUILD_ENVIRONMENT}" == *-gradle-build-only-x86_32* ]]; then

				    $GRADLE_PATH -PABI_FILTERS=x86 -p ~/workspace/android/ assembleRelease

				else

				    $GRADLE_PATH -p ~/workspace/android/ assembleRelease

				fi

				find . -type f -name *aar -print | xargs tar cfvz ~/workspace/android/artifacts.tgz

									
										127

.circleci/scripts/cpp_doc_push_script.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,127 @@

				# =================== The following code **should** be executed inside Docker container ===================

				# Install dependencies

				sudo apt-get -y update

				sudo apt-get -y install expect-dev

				# This is where the local pytorch install in the docker image is located

				pt_checkout="/var/lib/jenkins/workspace"

				# Since we're cat-ing this file, we need to escape all $'s

				echo "cpp_doc_push_script.sh: Invoked with $*"

				# Argument 1: Where to copy the built documentation for Python API to

				# (pytorch.github.io/$install_path)

				install_path="$1"

				if [ -z "$install_path" ]; then

				echo "error: cpp_doc_push_script.sh: install_path (arg1) not specified"

				  exit 1

				fi

				# Argument 2: What version of the Python API docs we are building.

				version="$2"

				if [ -z "$version" ]; then

				echo "error: cpp_doc_push_script.sh: version (arg2) not specified"

				  exit 1

				fi

				is_master_doc=false

				if [ "$version" == "master" ]; then

				  is_master_doc=true

				fi

				# Argument 3: (optional) If present, we will NOT do any pushing. Used for testing.

				dry_run=false

				if [ "$3" != "" ]; then

				  dry_run=true

				fi

				echo "install_path: $install_path  version: $version  dry_run: $dry_run"

				# ======================== Building PyTorch C++ API Docs ========================

				echo "Building PyTorch C++ API docs..."

				# Clone the cppdocs repo

				rm -rf cppdocs

				git clone https://github.com/pytorch/cppdocs

				set -ex

				sudo apt-get -y install doxygen

				# Generate ATen files

				pushd "${pt_checkout}"

				pip install -r requirements.txt

				time GEN_TO_SOURCE=1 python aten/src/ATen/gen.py \

				  -s aten/src/ATen \

				  -d build/aten/src/ATen \

				  aten/src/ATen/Declarations.cwrap \

				  aten/src/THNN/generic/THNN.h \

				  aten/src/THCUNN/generic/THCUNN.h \

				  aten/src/ATen/nn.yaml \

				  aten/src/ATen/native/native_functions.yaml

				# Copy some required files

				cp aten/src/ATen/common_with_cwrap.py tools/shared/cwrap_common.py

				cp torch/_utils_internal.py tools/shared

				# Generate PyTorch files

				time python tools/setup_helpers/generate_code.py \

				  --declarations-path build/aten/src/ATen/Declarations.yaml \

				  --nn-path aten/src/

				# Build the docs

				pushd docs/cpp

				pip install breathe==4.11.1 bs4 lxml six

				pip install --no-cache-dir -e "git+https://github.com/pytorch/pytorch_sphinx_theme.git#egg=pytorch_sphinx_theme"

				pip install exhale>=0.2.1

				pip install sphinx==1.8.5

				# Uncomment once it is fixed

				# pip install -r requirements.txt

				time make VERBOSE=1 html -j

				popd

				popd

				pushd cppdocs

				# Purge everything with some exceptions

				mkdir /tmp/cppdocs-sync

				mv _config.yml README.md /tmp/cppdocs-sync/

				rm -rf *

				# Copy over all the newly generated HTML

				cp -r "${pt_checkout}"/docs/cpp/build/html/* .

				# Copy back _config.yml

				rm -rf _config.yml

				mv /tmp/cppdocs-sync/* .

				# Make a new commit

				git add . || true

				git status

				git config user.email "soumith+bot@pytorch.org"

				git config user.name "pytorchbot"

				# If there aren't changes, don't make a commit; push is no-op

				git commit -m "Automatic sync on $(date)" || true

				git status

				if [ "$dry_run" = false ]; then

				  echo "Pushing to https://github.com/pytorch/cppdocs"

				  set +x

				/usr/bin/expect <<DONE

				  spawn git push -u origin master

				  expect "Username*"

				  send "pytorchbot\n"

				  expect "Password*"

				  send "$::env(GITHUB_PYTORCHBOT_TOKEN)\n"

				  expect eof

				DONE

				  set -x

				else

				  echo "Skipping push due to dry_run"

				fi

				popd

				# =================== The above code **should** be executed inside Docker container ===================

									
										44

.circleci/scripts/publish_android_snapshot.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,44 @@

				#!/usr/bin/env bash

				# DO NOT ADD 'set -x' not to reveal CircleCI secret context environment variables

				set -eu -o pipefail

				export ANDROID_NDK_HOME=/opt/ndk

				export ANDROID_HOME=/opt/android/sdk

				export GRADLE_VERSION=4.10.3

				export GRADLE_HOME=/opt/gradle/gradle-$GRADLE_VERSION

				export GRADLE_PATH=$GRADLE_HOME/bin/gradle

				echo "BUILD_ENVIRONMENT:$BUILD_ENVIRONMENT"

				ls -la ~/workspace

				GRADLE_PROPERTIES=~/workspace/android/gradle.properties

				IS_SNAPSHOT="$(grep 'VERSION_NAME=[0-9\.]\+-SNAPSHOT' "$GRADLE_PROPERTIES")"

				echo "IS_SNAPSHOT:$IS_SNAPSHOT"

				if [ -z "$IS_SNAPSHOT" ]; then

				  echo "Error: version is not snapshot."

				elif [ -z "$SONATYPE_NEXUS_USERNAME" ]; then

				  echo "Error: missing env variable SONATYPE_NEXUS_USERNAME."

				elif [ -z "$SONATYPE_NEXUS_PASSWORD" ]; then

				  echo "Error: missing env variable SONATYPE_NEXUS_PASSWORD."

				elif [ -z "$ANDROID_SIGN_KEY" ]; then

				  echo "Error: missing env variable ANDROID_SIGN_KEY."

				elif [ -z "$ANDROID_SIGN_PASS" ]; then

				  echo "Error: missing env variable ANDROID_SIGN_PASS."

				else

				  GRADLE_LOCAL_PROPERTIES=~/workspace/android/local.properties

				  rm -f $GRADLE_LOCAL_PROPERTIES

				  echo "sdk.dir=/opt/android/sdk" >> $GRADLE_LOCAL_PROPERTIES

				  echo "ndk.dir=/opt/ndk" >> $GRADLE_LOCAL_PROPERTIES

				  echo "SONATYPE_NEXUS_USERNAME=${SONATYPE_NEXUS_USERNAME}" >> $GRADLE_PROPERTIES

				  echo "SONATYPE_NEXUS_PASSWORD=${SONATYPE_NEXUS_PASSWORD}" >> $GRADLE_PROPERTIES

				  echo "signing.keyId=${ANDROID_SIGN_KEY}" >> $GRADLE_PROPERTIES

				  echo "signing.password=${ANDROID_SIGN_PASS}" >> $GRADLE_PROPERTIES

				  $GRADLE_PATH -p ~/workspace/android/ uploadArchives

				fi

									
										118

.circleci/scripts/python_doc_push_script.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,118 @@

				# =================== The following code **should** be executed inside Docker container ===================

				# Install dependencies

				sudo apt-get -y update

				sudo apt-get -y install expect-dev

				# This is where the local pytorch install in the docker image is located

				pt_checkout="/var/lib/jenkins/workspace"

				echo "python_doc_push_script.sh: Invoked with $*"

				set -ex

				# Argument 1: Where to copy the built documentation to

				# (pytorch.github.io/$install_path)

				install_path="$1"

				if [ -z "$install_path" ]; then

				echo "error: python_doc_push_script.sh: install_path (arg1) not specified"

				  exit 1

				fi

				# Argument 2: What version of the docs we are building.

				version="$2"

				if [ -z "$version" ]; then

				echo "error: python_doc_push_script.sh: version (arg2) not specified"

				  exit 1

				fi

				is_master_doc=false

				if [ "$version" == "master" ]; then

				  is_master_doc=true

				fi

				# Argument 3: The branch to push to. Usually is "site"

				branch="$3"

				if [ -z "$branch" ]; then

				echo "error: python_doc_push_script.sh: branch (arg3) not specified"

				  exit 1

				fi

				# Argument 4: (optional) If present, we will NOT do any pushing. Used for testing.

				dry_run=false

				if [ "$4" != "" ]; then

				  dry_run=true

				fi

				echo "install_path: $install_path  version: $version  dry_run: $dry_run"

				git clone https://github.com/pytorch/pytorch.github.io -b $branch

				pushd pytorch.github.io

				export LC_ALL=C

				export PATH=/opt/conda/bin:$PATH

				rm -rf pytorch || true

				# Install TensorBoard in python 3 so torch.utils.tensorboard classes render

				pip install -q https://s3.amazonaws.com/ossci-linux/wheels/tensorboard-1.14.0a0-py3-none-any.whl

				# Get all the documentation sources, put them in one place

				pushd "$pt_checkout"

				git clone https://github.com/pytorch/vision

				pushd vision

				conda install -q pillow

				time python setup.py install

				popd

				pushd docs

				rm -rf source/torchvision

				cp -a ../vision/docs/source source/torchvision

				# Build the docs

				pip -q install -r requirements.txt || true

				if [ "$is_master_doc" = true ]; then

				  make html

				else

				  make html-stable

				fi

				# Move them into the docs repo

				popd

				popd

				git rm -rf "$install_path" || true

				mv "$pt_checkout/docs/build/html" "$install_path"

				# Add the version handler by search and replace.

				# XXX: Consider moving this to the docs Makefile or site build

				if [ "$is_master_doc" = true ]; then

				  find "$install_path" -name "*.html" -print0 | xargs -0 perl -pi -w -e "s@master\s+\((\d\.\d\.[A-Fa-f0-9]+\+[A-Fa-f0-9]+)\s+\)@<a href='http://pytorch.org/docs/versions.html'>\1 \&#x25BC</a>@g"

				else

				  find "$install_path" -name "*.html" -print0 | xargs -0 perl -pi -w -e "s@master\s+\((\d\.\d\.[A-Fa-f0-9]+\+[A-Fa-f0-9]+)\s+\)@<a href='http://pytorch.org/docs/versions.html'>$version \&#x25BC</a>@g"

				fi

				git add "$install_path" || true

				git status

				git config user.email "soumith+bot@pytorch.org"

				git config user.name "pytorchbot"

				# If there aren't changes, don't make a commit; push is no-op

				git commit -m "auto-generating sphinx docs" || true

				git status

				if [ "$dry_run" = false ]; then

				  echo "Pushing to pytorch.github.io:$branch"

				  set +x

				/usr/bin/expect <<DONE

				  spawn git push origin $branch

				  expect "Username*"

				  send "pytorchbot\n"

				  expect "Password*"

				  send "$::env(GITHUB_PYTORCHBOT_TOKEN)\n"

				  expect eof

				DONE

				  set -x

				else

				  echo "Skipping push due to dry_run"

				fi

				popd

				# =================== The above code **should** be executed inside Docker container ===================

									
										57

.circleci/scripts/setup_ci_environment.sh
									
												View File
												
				@ -1,52 +1,18 @@

				#!/usr/bin/env bash

				set -ex -o pipefail

				# Check if we should actually run

				echo "BUILD_ENVIRONMENT: ${BUILD_ENVIRONMENT}"

				echo "CIRCLE_PULL_REQUEST: ${CIRCLE_PULL_REQUEST:-}"

				if [[ "${BUILD_ENVIRONMENT}" == *-slow-* ]]; then

				  if ! [ -z "${CIRCLE_PULL_REQUEST:-}" ]; then

				    # It's a PR; test for [slow ci] tag on the TOPMOST commit

				    topmost_commit=$(git log --format='%B' -n 1 HEAD)

				    if !(echo $topmost_commit | grep -q -e '\[slow ci\]' -e '\[ci slow\]' -e '\[test slow\]' -e '\[slow test\]'); then

				      circleci step halt

				      exit

				    fi

				  fi

				fi

				if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then

				  if ! [ -z "${CIRCLE_PULL_REQUEST:-}" ]; then

				    # It's a PR; test for [xla ci] tag on the TOPMOST commit

				    topmost_commit=$(git log --format='%B' -n 1 HEAD)

				    if !(echo $topmost_commit | grep -q -e '\[xla ci\]' -e '\[ci xla\]' -e '\[test xla\]' -e '\[xla test\]'); then

				      # NB: This doesn't halt everything, just this job.  So

				      # the rest of the workflow will keep going and you need

				      # to make sure you halt there too.  Blegh.

				      circleci step halt

				      exit

				    fi

				  fi

				fi

				if [[ "${BUILD_ENVIRONMENT}" == *namedtensor* ]]; then

				  if ! [ -z "${CIRCLE_PULL_REQUEST:-}" ]; then

				    # It's a PR; test for [namedtensor] tag on the TOPMOST commit

				    topmost_commit=$(git log --format='%B' -n 1 HEAD)

				    if !(echo $topmost_commit | grep -q -e '\[namedtensor\]' -e '\[ci namedtensor\]' -e '\[namedtensor ci\]'); then

				      # NB: This doesn't halt everything, just this job.  So

				      # the rest of the workflow will keep going and you need

				      # to make sure you halt there too.  Blegh.

				      circleci step halt

				      exit

				    fi

				  fi

				fi

				# Set up NVIDIA docker repo

				curl -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -

				echo "deb https://nvidia.github.io/libnvidia-container/ubuntu16.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list

				echo "deb https://nvidia.github.io/nvidia-container-runtime/ubuntu16.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list

				echo "deb https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list

				# Remove unnecessary sources

				sudo rm -f /etc/apt/sources.list.d/google-chrome.list

				sudo rm -f /etc/apt/heroku.list

				sudo rm -f /etc/apt/openjdk-r-ubuntu-ppa-xenial.list

				sudo rm -f /etc/apt/partner.list

				sudo apt-get -y update

				sudo apt-get -y remove linux-image-generic linux-headers-generic linux-generic docker-ce

				# WARNING: Docker version is hardcoded here; you must update the

				@ -72,10 +38,14 @@ sudo apt-get -y install \

				sudo pkill -SIGHUP dockerd

				sudo pip -q install awscli==1.16.35

				retry () {

				    $*  || $* || $* || $* || $*

				}

				retry sudo pip -q install awscli==1.16.35

				if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then

				  DRIVER_FN="NVIDIA-Linux-x86_64-410.104.run"

				  DRIVER_FN="NVIDIA-Linux-x86_64-430.40.run"

				  wget "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN"

				  sudo /bin/bash "$DRIVER_FN" -s --no-drm || (sudo cat /var/log/nvidia-installer.log && false)

				  nvidia-smi

				@ -84,7 +54,6 @@ fi

				if [[ "${BUILD_ENVIRONMENT}" == *-build ]]; then

				  echo "declare -x IN_CIRCLECI=1" > /home/circleci/project/env

				  echo "declare -x COMMIT_SOURCE=${CIRCLE_BRANCH:-}" >> /home/circleci/project/env

				  echo "declare -x PYTHON_VERSION=${PYTHON_VERSION:-}" >> /home/circleci/project/env

				  echo "declare -x SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> /home/circleci/project/env

				  if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then

				    echo "declare -x TORCH_CUDA_ARCH_LIST=5.2" >> /home/circleci/project/env

				@ -97,12 +66,14 @@ if [[ "${BUILD_ENVIRONMENT}" == *-build ]]; then

				  if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then

				    # This IAM user allows write access to S3 bucket for sccache & bazels3cache

				    set +x

				    echo "declare -x XLA_CLANG_CACHE_S3_BUCKET_NAME=${XLA_CLANG_CACHE_S3_BUCKET_NAME:-}" >> /home/circleci/project/env

				    echo "declare -x AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_AND_XLA_BAZEL_S3_BUCKET_V2:-}" >> /home/circleci/project/env

				    echo "declare -x AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_AND_XLA_BAZEL_S3_BUCKET_V2:-}" >> /home/circleci/project/env

				    set -x

				  else

				    # This IAM user allows write access to S3 bucket for sccache

				    set +x

				    echo "declare -x XLA_CLANG_CACHE_S3_BUCKET_NAME=${XLA_CLANG_CACHE_S3_BUCKET_NAME:-}" >> /home/circleci/project/env

				    echo "declare -x AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}" >> /home/circleci/project/env

				    echo "declare -x AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}" >> /home/circleci/project/env

				    set -x

									
										18

.circleci/scripts/setup_linux_system_environment.sh
									
												View File
												
				@ -29,10 +29,22 @@ done

				# See if we actually were successful

				systemctl list-units --all | cat

				# For good luck, try even harder to kill apt-get

				sudo pkill apt-get || true

				# For even better luck, purge unattended-upgrades

				sudo apt-get purge -y unattended-upgrades

				cat /etc/apt/sources.list

				# Bail out early if we detect apt/dpkg is stuck

				ps auxfww | (! grep '[a]pt')

				ps auxfww | (! grep '[d]pkg')

				# For the bestest luck, kill again now

				sudo pkill apt || true

				sudo pkill dpkg || true

				# Try to detect if apt/dpkg is stuck

				if ps auxfww | grep '[a]pt'; then

				  echo "WARNING: There are leftover apt processes; subsequent apt update will likely fail"

				fi

				if ps auxfww | grep '[d]pkg'; then

				  echo "WARNING: There are leftover dpkg processes; subsequent apt update will likely fail"

				fi

									
										132

.circleci/scripts/should_run_job.py
									
										Normal file
									
												View File
												
				@ -0,0 +1,132 @@

				import argparse

				import re

				import sys

				# Modify this variable if you want to change the set of default jobs

				# which are run on all pull requests.

				#

				# WARNING: Actually, this is a lie; we're currently also controlling

				# the set of jobs to run via the Workflows filters in CircleCI config.

				default_set = set([

				    # PyTorch CPU

				    # Selected oldest Python 2 version to ensure Python 2 coverage

				    'pytorch-linux-xenial-py2.7.9',

				    # PyTorch CUDA

				    'pytorch-linux-xenial-cuda9-cudnn7-py3',

				    # PyTorch ASAN

				    'pytorch-linux-xenial-py3-clang5-asan',

				    # PyTorch DEBUG

				    'pytorch-linux-xenial-py3.6-gcc5.4',

				    # Caffe2 CPU

				    'caffe2-py2-mkl-ubuntu16.04',

				    # Caffe2 CUDA

				    'caffe2-py3.5-cuda10.1-cudnn7-ubuntu16.04',

				    # Caffe2 ONNX

				    'caffe2-onnx-py2-gcc5-ubuntu16.04',

				    'caffe2-onnx-py3.6-clang7-ubuntu16.04',

				    # Caffe2 Clang

				    'caffe2-py2-clang7-ubuntu16.04',

				    # Caffe2 CMake

				    'caffe2-cmake-cuda9.0-cudnn7-ubuntu16.04',

				    # Binaries

				    'manywheel 2.7mu cpu devtoolset7',

				    'libtorch 2.7m cpu devtoolset7',

				    'libtorch 2.7m cpu gcc5.4_cxx11-abi',

				    'libtorch-ios-10.2.1-nightly-x86_64-build',

				    'libtorch-ios-10.2.1-nightly-arm64-build',

				    'libtorch-ios-10.2.1-nightly-binary-build-upload',

				    # Caffe2 Android

				    'caffe2-py2-android-ubuntu16.04',

				    # Caffe2 OSX

				    'caffe2-py2-system-macos10.13',

				    # PyTorch OSX

				    'pytorch-macos-10.13-py3',

				    'pytorch-macos-10.13-cuda9.2-cudnn7-py3',

				    # PyTorch Android

				    'pytorch-linux-xenial-py3-clang5-android-ndk-r19c-x86_32-build',

				    # PyTorch Android gradle

				    'pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-only-x86_32',

				    # Pytorch iOS builds

				    'pytorch-ios-10.2.1-x86_64_build',

				    'pytorch-ios-10.2.1-arm64_build',

				    # Pytorch backward compatibility check

				    'pytorch-linux-backward-compatibility-check-test',

				    # XLA

				    'pytorch-xla-linux-xenial-py3.6-clang7',

				    # Named tensor

				    "pytorch-namedtensor-linux-xenial-py3.6-gcc5.4",

				    "pytorch-namedtensor-linux-xenial-py3-clang5-asan",

				    "pytorch-namedtensor-linux-xenial-cuda9-cudnn7-py2",

				    # Other checks

				    'pytorch-short-perf-test-gpu',

				    'pytorch-python-doc-push',

				    'pytorch-cpp-doc-push',

				])

				# Collection of jobs that are *temporarily* excluded from running on PRs.

				# Use this if there is a long-running job breakage that we can't fix with a

				# single revert.

				skip_override = {

				    # example entry:

				    # 'pytorch-cpp-doc-push': "https://github.com/pytorch/pytorch/issues/<related issue>"

				}

				# Takes in commit message to analyze via stdin

				#

				# This script will query Git and attempt to determine if we should

				# run the current CI job under question

				#

				# NB: Try to avoid hard-coding names here, so there's less place to update when jobs

				# are updated/renamed

				#

				# Semantics in the presence of multiple tags:

				#   - Let D be the set of default builds

				#   - Let S be the set of explicitly specified builds

				#   - Let O be the set of temporarily skipped builds

				#   - Run S \/ (D - O)

				parser = argparse.ArgumentParser()

				parser.add_argument('build_environment')

				args = parser.parse_args()

				commit_msg = sys.stdin.read()

				# Matches anything that looks like [foo ci] or [ci foo] or [foo test]

				# or [test foo]

				RE_MARKER = re.compile(r'\[(?:([^ \[\]]+) )?(?:ci|test)(?: ([^ \[\]]+))?\]')

				markers = RE_MARKER.finditer(commit_msg)

				for m in markers:

				    if m.group(1) and m.group(2):

				        print("Unrecognized marker: {}".format(m.group(0)))

				        continue

				    spec = m.group(1) or m.group(2)

				    if spec is None:

				        print("Unrecognized marker: {}".format(m.group(0)))

				        continue

				    if spec in args.build_environment or spec == 'all':

				        print("Accepting {} due to commit marker {}".format(args.build_environment, m.group(0)))

				        sys.exit(0)

				skip_override_set = set(skip_override.keys())

				should_run_set = default_set - skip_override_set

				for spec in should_run_set:

				    if spec in args.build_environment:

				        print("Accepting {} as part of default set".format(args.build_environment))

				        sys.exit(0)

				print("Rejecting {}".format(args.build_environment))

				for spec, issue in skip_override.items():

				    if spec in args.build_environment:

				        print("This job is temporarily excluded from running on PRs. Reason: {}".format(issue))

				        break

				sys.exit(1)

									
										29

.circleci/scripts/should_run_job.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,29 @@

				#!/usr/bin/env bash

				set -exu -o pipefail

				SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"

				# Check if we should actually run

				echo "BUILD_ENVIRONMENT: ${BUILD_ENVIRONMENT:-}"

				echo "CIRCLE_PULL_REQUEST: ${CIRCLE_PULL_REQUEST:-}"

				if [ -z "${BUILD_ENVIRONMENT:-}" ]; then

				  echo "Cannot run should_run_job.sh if BUILD_ENVIRONMENT is not defined!"

				  echo "CircleCI scripts are probably misconfigured."

				  exit 1

				fi

				if ! [ -e "$SCRIPT_DIR/COMMIT_MSG" ]; then

				  echo "Cannot run should_run_job.sh if you don't have COMMIT_MSG"

				  echo "written out.  Are you perhaps running the wrong copy of this script?"

				  echo "You should be running the copy in ~/workspace; SCRIPT_DIR=$SCRIPT_DIR"

				  exit 1

				fi

				if [ -n "${CIRCLE_PULL_REQUEST:-}" ]; then

				  if [[ $CIRCLE_BRANCH != "ci-all/"* ]] && [[ $CIRCLE_BRANCH != "nightly" ]] &&  [[ $CIRCLE_BRANCH != "postnightly" ]] ; then

				    # Don't swallow "script doesn't exist

				    [ -e "$SCRIPT_DIR/should_run_job.py"  ]

				    if ! python "$SCRIPT_DIR/should_run_job.py" "${BUILD_ENVIRONMENT:-}" < "$SCRIPT_DIR/COMMIT_MSG" ; then

				      circleci step halt

				      exit

				    fi

				  fi

				fi

									
										44

.circleci/validate-docker-version.py
									
										Executable file
									
												View File
												
				@ -0,0 +1,44 @@

				#!/usr/bin/env python3

				import urllib.request

				import re

				import cimodel.data.pytorch_build_definitions as pytorch_build_definitions

				import cimodel.data.caffe2_build_definitions as caffe2_build_definitions

				RE_VERSION = re.compile(r'allDeployedVersions = "([0-9,]+)"')

				URL_TEMPLATE = (

				    "https://raw.githubusercontent.com/pytorch/ossci-job-dsl/"

				    "master/src/main/groovy/ossci/{}/DockerVersion.groovy"

				)

				def check_version(job, expected_version):

				    url = URL_TEMPLATE.format(job)

				    with urllib.request.urlopen(url) as f:

				        contents = f.read().decode('utf-8')

				        m = RE_VERSION.search(contents)

				        if not m:

				            raise RuntimeError(

				                "Unbelievable! I could not find the variable allDeployedVersions in "

				                "{}; did the organization of ossci-job-dsl change?\n\nFull contents:\n{}"

				                .format(url, contents)

				            )

				        valid_versions = [int(v) for v in m.group(1).split(',')]

				        if expected_version not in valid_versions:

				            raise RuntimeError(

				                "We configured {} to use Docker version {}; but this "

				                "version is not deployed in {}.  Non-deployed versions will be "

				                "garbage collected two weeks after they are created.  DO NOT LAND "

				                "THIS TO MASTER without also updating ossci-job-dsl with this version."

				                "\n\nDeployed versions: {}"

				                .format(job, expected_version, url, m.group(1))

				            )

				def validate_docker_version():

				    check_version('pytorch', pytorch_build_definitions.DOCKER_IMAGE_VERSION)

				    check_version('caffe2', caffe2_build_definitions.DOCKER_IMAGE_VERSION)

				if __name__ == "__main__":

				    validate_docker_version()

									
										54

.circleci/verbatim-sources/binary-build-params.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,54 @@

				binary_linux_build_params: &binary_linux_build_params

				  parameters:

				    build_environment:

				      type: string

				      default: ""

				    docker_image:

				      type: string

				      default: ""

				    libtorch_variant:

				      type: string

				      default: ""

				    resource_class:

				      type: string

				      default: "2xlarge+"

				  environment:

				    BUILD_ENVIRONMENT: << parameters.build_environment >>

				    LIBTORCH_VARIANT: << parameters.libtorch_variant >>

				    ANACONDA_USER: pytorch

				  resource_class: << parameters.resource_class >>

				  docker:

				    - image: << parameters.docker_image >>

				binary_linux_test_upload_params: &binary_linux_test_upload_params

				  parameters:

				    build_environment:

				      type: string

				      default: ""

				    docker_image:

				      type: string

				      default: ""

				    libtorch_variant:

				      type: string

				      default: ""

				    resource_class:

				      type: string

				      default: "medium"

				    use_cuda_docker_runtime:

				      type: string

				      default: ""

				  environment:

				    BUILD_ENVIRONMENT: << parameters.build_environment >>

				    DOCKER_IMAGE: << parameters.docker_image >>

				    USE_CUDA_DOCKER_RUNTIME: << parameters.use_cuda_docker_runtime >>

				    LIBTORCH_VARIANT: << parameters.libtorch_variant >>

				  resource_class: << parameters.resource_class >>

				binary_mac_params: &binary_mac_params

				  parameters:

				    build_environment:

				      type: string

				      default: ""

				  environment:

				    BUILD_ENVIRONMENT: << parameters.build_environment >>

									
										261

.circleci/verbatim-sources/binary-job-specs.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,261 @@

				  binary_linux_build:

				    <<: *binary_linux_build_params

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - should_run_job

				    - run:

				        <<: *binary_checkout

				    - run:

				        <<: *binary_populate_env

				    - run:

				        name: Install unbuffer and ts

				        command: |

				            set -eux -o pipefail

				            source /env

				            OS_NAME=`awk -F= '/^NAME/{print $2}' /etc/os-release`

				            if [[ "$OS_NAME" == *"CentOS Linux"* ]]; then

				              retry yum -q -y install epel-release

				              retry yum -q -y install expect moreutils

				            elif [[ "$OS_NAME" == *"Ubuntu"* ]]; then

				              retry apt-get update

				              retry apt-get -y install expect moreutils

				              conda install -y -c eumetsat expect

				              conda install -y cmake

				            fi

				    - run:

				        name: Update compiler to devtoolset7

				        command: |

				            set -eux -o pipefail

				            source /env

				            if [[ "$DESIRED_DEVTOOLSET" == 'devtoolset7' ]]; then

				              source "/builder/update_compiler.sh"

				              # Env variables are not persisted into the next step

				              echo "export PATH=$PATH" >> /env

				              echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH" >> /env

				            else

				              echo "Not updating compiler"

				            fi

				    - run:

				        name: Build

				        no_output_timeout: "1h"

				        command: |

				            source "/pytorch/.circleci/scripts/binary_linux_build.sh"

				    - persist_to_workspace:

				        root: /

				        paths: final_pkgs

				    # This should really just be another step of the binary_linux_build job above.

				    # This isn't possible right now b/c the build job uses the docker executor

				    # (otherwise they'd be really really slow) but this one uses the macine

				    # executor (b/c we have to run the docker with --runtime=nvidia and we can't do

				    # that on the docker executor)

				  binary_linux_test:

				    <<: *binary_linux_test_upload_params

				    machine:

				        image: ubuntu-1604:201903-01

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - should_run_job

				    # TODO: We shouldn't attach the workspace multiple times

				    - attach_workspace:

				        at: /home/circleci/project

				    - setup_linux_system_environment

				    - setup_ci_environment

				    - run:

				        <<: *binary_checkout

				    - run:

				        <<: *binary_populate_env

				    - run:

				        name: Prepare test code

				        no_output_timeout: "1h"

				        command: ~/workspace/.circleci/scripts/binary_linux_test.sh

				    - run:

				        <<: *binary_run_in_docker

				  binary_linux_upload:

				    <<: *binary_linux_test_upload_params

				    machine:

				        image: ubuntu-1604:201903-01

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - should_run_job

				    - setup_linux_system_environment

				    - setup_ci_environment

				    - attach_workspace:

				        at: /home/circleci/project

				    - run:

				        <<: *binary_populate_env

				    - run:

				        <<: *binary_install_miniconda

				    - run:

				        name: Upload

				        no_output_timeout: "1h"

				        command: ~/workspace/.circleci/scripts/binary_linux_upload.sh

				  # Nighlty build smoke tests defaults

				  # These are the second-round smoke tests. These make sure that the binaries are

				  # correct from a user perspective, testing that they exist from the cloud are

				  # are runnable. Note that the pytorch repo is never cloned into these jobs

				  ##############################################################################

				  smoke_linux_test:

				    <<: *binary_linux_test_upload_params

				    machine:

				      image: ubuntu-1604:201903-01

				    steps:

				    - attach_workspace:

				        at: ~/workspace

				    - attach_workspace:

				        at: /home/circleci/project

				    - setup_linux_system_environment

				    - setup_ci_environment

				    - run:

				        <<: *binary_checkout

				    - run:

				        <<: *binary_populate_env

				    - run:

				        name: Test

				        no_output_timeout: "1h"

				        command: |

				          set -ex

				          cat >/home/circleci/project/ci_test_script.sh \<<EOL

				          # The following code will be executed inside Docker container

				          set -eux -o pipefail

				          /builder/smoke_test.sh

				          # The above code will be executed inside Docker container

				          EOL

				    - run:

				        <<: *binary_run_in_docker

				  smoke_mac_test:

				    <<: *binary_linux_test_upload_params

				    macos:

				      xcode: "9.0"

				    steps:

				      - attach_workspace:

				          at: ~/workspace

				      - attach_workspace: # TODO - we can `cp` from ~/workspace

				          at: /Users/distiller/project

				      - run:

				          <<: *binary_checkout

				      - run:

				          <<: *binary_populate_env

				      - brew_update

				      - run:

				          <<: *binary_install_miniconda

				      - run:

				          name: Build

				          no_output_timeout: "1h"

				          command: |

				            set -ex

				            source "/Users/distiller/project/env"

				            export "PATH=$workdir/miniconda/bin:$PATH"

				            # TODO unbuffer and ts this, but it breaks cause miniconda overwrites

				            # tclsh. But unbuffer and ts aren't that important so they're just

				            # disabled for now

				            ./builder/smoke_test.sh

				  binary_mac_build:

				    <<: *binary_mac_params

				    macos:

				      xcode: "9.0"

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - should_run_job

				    - run:

				        <<: *binary_checkout

				    - run:

				        <<: *binary_populate_env

				    - brew_update

				    - run:

				        <<: *binary_install_miniconda

				    - run:

				        name: Build

				        no_output_timeout: "1h"

				        command: |

				          # Do not set -u here; there is some problem with CircleCI

				          # variable expansion with PROMPT_COMMAND

				          set -ex -o pipefail

				          script="/Users/distiller/project/pytorch/.circleci/scripts/binary_macos_build.sh"

				          cat "$script"

				          source "$script"

				    - run:

				        name: Test

				        no_output_timeout: "1h"

				        command: |

				          # Do not set -u here; there is some problem with CircleCI

				          # variable expansion with PROMPT_COMMAND

				          set -ex -o pipefail

				          script="/Users/distiller/project/pytorch/.circleci/scripts/binary_macos_test.sh"

				          cat "$script"

				          source "$script"

				    - persist_to_workspace:

				        root: /Users/distiller/project

				        paths: final_pkgs

				  binary_mac_upload: &binary_mac_upload

				    <<: *binary_mac_params

				    macos:

				      xcode: "9.0"

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - should_run_job

				    - run:

				        <<: *binary_checkout

				    - run:

				        <<: *binary_populate_env

				    - brew_update

				    - run:

				        <<: *binary_install_miniconda

				    - attach_workspace: # TODO - we can `cp` from ~/workspace

				        at: /Users/distiller/project

				    - run:

				        name: Upload

				        no_output_timeout: "10m"

				        command: |

				          script="/Users/distiller/project/pytorch/.circleci/scripts/binary_macos_upload.sh"

				          cat "$script"

				          source "$script"

				  binary_ios_build:

				    <<: *pytorch_ios_params

				    macos:

				      xcode: "10.2.1"

				    steps:

				    - attach_workspace:

				        at: ~/workspace

				    - should_run_job

				    - checkout

				    - run_brew_for_ios_build

				    - run:

				        name: Build

				        contxt: org-member

				        no_output_timeout: "1h"

				        command: |

				          script="/Users/distiller/project/.circleci/scripts/binary_ios_build.sh"

				          cat "$script"

				          source "$script"

				    - persist_to_workspace:

				        root: /Users/distiller/workspace/

				        paths: ios

				  binary_ios_upload: 

				    <<: *pytorch_ios_params

				    macos:

				      xcode: "10.2.1"

				    steps:

				    - attach_workspace:

				        at: ~/workspace

				    - should_run_job

				    - checkout

				    - run_brew_for_ios_build

				    - run:

				        name: Upload

				        no_output_timeout: "1h"

				        command: |

				          script="/Users/distiller/project/.circleci/scripts/binary_ios_upload.sh"

				          cat "$script"

				          source "$script"

									
										8

.circleci/verbatim-sources/binary_update_htmls.yml
									
												View File
												
				@ -1,4 +1,4 @@

				  # update_s3_htmls job

				  # These jobs create html files for every cpu/cu## folder in s3. The html

				  # files just store the names of all the files in that folder (which are

				@ -12,8 +12,7 @@

				    steps:

				    - attach_workspace:

				        at: ~/workspace

				    - run:

				        <<: *setup_linux_system_environment

				    - setup_linux_system_environment

				    - run:

				        <<: *binary_checkout

				    # N.B. we do not run binary_populate_env. The only variable we need is

				@ -67,8 +66,7 @@

				    steps:

				    - attach_workspace:

				        at: ~/workspace

				    - run:

				        <<: *setup_linux_system_environment

				    - setup_linux_system_environment

				    - run:

				        <<: *binary_checkout

				    - run:

									
										28

.circleci/verbatim-sources/caffe2-build-params.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,28 @@

				caffe2_params: &caffe2_params

				  parameters:

				    build_environment:

				      type: string

				      default: ""

				    build_ios:

				      type: string

				      default: ""

				    docker_image:

				      type: string

				      default: ""

				    use_cuda_docker_runtime:

				      type: string

				      default: ""

				    build_only:

				      type: string

				      default: ""

				    resource_class:

				      type: string

				      default: "large"

				  environment:

				    BUILD_ENVIRONMENT: << parameters.build_environment >>

				    BUILD_IOS: << parameters.build_ios >>

				    USE_CUDA_DOCKER_RUNTIME: << parameters.use_cuda_docker_runtime >>

				    DOCKER_IMAGE: << parameters.docker_image >>

				    BUILD_ONLY: << parameters.build_only >>

				  resource_class: << parameters.resource_class >>

									
										200

.circleci/verbatim-sources/caffe2-job-specs.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,200 @@

				  caffe2_linux_build:

				    <<: *caffe2_params

				    machine:

				      image: ubuntu-1604:201903-01

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - should_run_job

				    - setup_linux_system_environment

				    - checkout

				    - setup_ci_environment

				    - run:

				        name: Build

				        no_output_timeout: "1h"

				        command: |

				          set -e

				          cat >/home/circleci/project/ci_build_script.sh \<<EOL

				          # =================== The following code will be executed inside Docker container ===================

				          set -ex

				          export BUILD_ENVIRONMENT="$BUILD_ENVIRONMENT"

				          # Reinitialize submodules

				          git submodule sync && git submodule update -q --init --recursive

				          # conda must be added to the path for Anaconda builds (this location must be

				          # the same as that in install_anaconda.sh used to build the docker image)

				          if [[ "${BUILD_ENVIRONMENT}" == conda* ]]; then

				            export PATH=/opt/conda/bin:$PATH

				            sudo chown -R jenkins:jenkins '/opt/conda'

				          fi

				          # Build

				          ./.jenkins/caffe2/build.sh

				          # Show sccache stats if it is running

				          if pgrep sccache > /dev/null; then

				            sccache --show-stats

				          fi

				          # =================== The above code will be executed inside Docker container ===================

				          EOL

				          chmod +x /home/circleci/project/ci_build_script.sh

				          echo "DOCKER_IMAGE: "${DOCKER_IMAGE}

				          time docker pull ${DOCKER_IMAGE} >/dev/null

				          export id=$(docker run -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})

				          docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace

				          export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && ./ci_build_script.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          # Push intermediate Docker image for next phase to use

				          if [ -z "${BUILD_ONLY}" ]; then

				            if [[ "$BUILD_ENVIRONMENT" == *cmake* ]]; then

				              export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-cmake-${CIRCLE_SHA1}

				            else

				              export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}

				            fi

				            docker commit "$id" ${COMMIT_DOCKER_IMAGE}

				            time docker push ${COMMIT_DOCKER_IMAGE}

				          fi

				  caffe2_linux_test:

				    <<: *caffe2_params

				    machine:

				      image: ubuntu-1604:201903-01

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - should_run_job

				    - setup_linux_system_environment

				    - setup_ci_environment

				    - run:

				        name: Test

				        no_output_timeout: "1h"

				        command: |

				          set -e

				          # TODO: merge this into Caffe2 test.sh

				          cat >/home/circleci/project/ci_test_script.sh \<<EOL

				          # =================== The following code will be executed inside Docker container ===================

				          set -ex

				          export BUILD_ENVIRONMENT="$BUILD_ENVIRONMENT"

				          # libdc1394 (dependency of OpenCV) expects /dev/raw1394 to exist...

				          sudo ln /dev/null /dev/raw1394

				          # conda must be added to the path for Anaconda builds (this location must be

				          # the same as that in install_anaconda.sh used to build the docker image)

				          if [[ "${BUILD_ENVIRONMENT}" == conda* ]]; then

				            export PATH=/opt/conda/bin:$PATH

				          fi

				          # Upgrade SSL module to avoid old SSL warnings

				          pip -q install --user --upgrade pyOpenSSL ndg-httpsclient pyasn1

				          pip -q install --user -b /tmp/pip_install_onnx "file:///var/lib/jenkins/workspace/third_party/onnx#egg=onnx"

				          # Build

				          ./.jenkins/caffe2/test.sh

				          # Remove benign core dumps.

				          # These are tests for signal handling (including SIGABRT).

				          rm -f ./crash/core.fatal_signal_as.*

				          rm -f ./crash/core.logging_test.*

				          # =================== The above code will be executed inside Docker container ===================

				          EOL

				          chmod +x /home/circleci/project/ci_test_script.sh

				          if [[ "$BUILD_ENVIRONMENT" == *cmake* ]]; then

				            export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-cmake-${CIRCLE_SHA1}

				          else

				            export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}

				          fi

				          echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}

				          time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null

				          if [ -n "${USE_CUDA_DOCKER_RUNTIME}" ]; then

				            export id=$(docker run --runtime=nvidia -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          else

				            export id=$(docker run -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          fi

				          docker cp /home/circleci/project/. "$id:/var/lib/jenkins/workspace"

				          export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && ./ci_test_script.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				  caffe2_macos_build:

				    <<: *caffe2_params

				    macos:

				      xcode: "9.0"

				    steps:

				      # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				      - should_run_job

				      - checkout

				      - run_brew_for_macos_build

				      - run:

				          name: Build

				          no_output_timeout: "1h"

				          command: |

				            set -e

				            export IN_CIRCLECI=1

				            brew install cmake

				            # Reinitialize submodules

				            git submodule sync && git submodule update -q --init --recursive

				            # Reinitialize path (see man page for path_helper(8))

				            eval `/usr/libexec/path_helper -s`

				            export PATH=/usr/local/opt/python/libexec/bin:/usr/local/bin:$PATH

				            # Install Anaconda if we need to

				            if [ -n "${CAFFE2_USE_ANACONDA}" ]; then

				              rm -rf ${TMPDIR}/anaconda

				              curl -o ${TMPDIR}/conda.sh https://repo.continuum.io/miniconda/Miniconda${ANACONDA_VERSION}-latest-MacOSX-x86_64.sh

				              chmod +x ${TMPDIR}/conda.sh

				              /bin/bash ${TMPDIR}/conda.sh -b -p ${TMPDIR}/anaconda

				              rm -f ${TMPDIR}/conda.sh

				              export PATH="${TMPDIR}/anaconda/bin:${PATH}"

				              source ${TMPDIR}/anaconda/bin/activate

				            fi

				            pip -q install numpy

				            # Install sccache

				            sudo curl https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache

				            sudo chmod +x /usr/local/bin/sccache

				            export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2

				            # This IAM user allows write access to S3 bucket for sccache

				            set +x

				            export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4}

				            export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4}

				            set -x

				            export SCCACHE_BIN=${PWD}/sccache_bin

				            mkdir -p ${SCCACHE_BIN}

				            if which sccache > /dev/null; then

				              printf "#!/bin/sh\nexec sccache $(which clang++) \$*" > "${SCCACHE_BIN}/clang++"

				              chmod a+x "${SCCACHE_BIN}/clang++"

				              printf "#!/bin/sh\nexec sccache $(which clang) \$*" > "${SCCACHE_BIN}/clang"

				              chmod a+x "${SCCACHE_BIN}/clang"

				              export PATH="${SCCACHE_BIN}:$PATH"

				            fi

				            # Build

				            if [ "${BUILD_IOS:-0}" -eq 1 ]; then

				              unbuffer scripts/build_ios.sh 2>&1 | ts

				            elif [ -n "${CAFFE2_USE_ANACONDA}" ]; then

				              # All conda build logic should be in scripts/build_anaconda.sh

				              unbuffer scripts/build_anaconda.sh 2>&1 | ts

				            else

				              unbuffer scripts/build_local.sh 2>&1 | ts

				            fi

				            # Show sccache stats if it is running

				            if which sccache > /dev/null; then

				              sccache --show-stats

				            fi

									
										90

.circleci/verbatim-sources/commands.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,90 @@

				commands:

				  # NB: This command must be run as the first command in a job. It

				  # attaches the workspace at ~/workspace; this workspace is generated

				  # by the setup job. Note that ~/workspace is not the default working

				  # directory (that's ~/project).

				  should_run_job:

				    description: "Test if the job should run or not"

				    steps:

				      - attach_workspace:

				          name: Attaching workspace

				          at: ~/workspace

				      - run:

				          name: Should run job

				          no_output_timeout: "2m"

				          command: ~/workspace/.circleci/scripts/should_run_job.sh

				  # This system setup script is meant to run before the CI-related scripts, e.g.,

				  # installing Git client, checking out code, setting up CI env, and

				  # building/testing.

				  setup_linux_system_environment:

				    steps:

				      - run:

				          name: Set Up System Environment

				          no_output_timeout: "1h"

				          command: ~/workspace/.circleci/scripts/setup_linux_system_environment.sh

				  setup_ci_environment:

				    steps:

				      - run:

				          name: Set Up CI Environment After attach_workspace

				          no_output_timeout: "1h"

				          command: ~/workspace/.circleci/scripts/setup_ci_environment.sh

				  brew_update:

				    description: "Update Homebrew and install base formulae"

				    steps:

				      - run:

				          name: Update Homebrew

				          no_output_timeout: "10m"

				          command: |

				            set -ex

				            # Update repositories manually.

				            # Running `brew update` produces a comparison between the

				            # current checkout and the updated checkout, which takes a

				            # very long time because the existing checkout is 2y old.

				            for path in $(find /usr/local/Homebrew -type d -name .git)

				            do

				            cd $path/..

				            git fetch --depth=1 origin

				            git reset --hard origin/master

				            done

				            export HOMEBREW_NO_AUTO_UPDATE=1

				            # Install expect and moreutils so that we can call `unbuffer` and `ts`.

				            # moreutils installs a `parallel` executable by default, which conflicts

				            # with the executable from the GNU `parallel`, so we must unlink GNU

				            # `parallel` first, and relink it afterwards.

				            brew unlink parallel

				            brew install moreutils

				            brew link parallel --overwrite

				            brew install expect

				  brew_install:

				    description: "Install Homebrew formulae"

				    parameters:

				      formulae:

				        type: string

				        default: ""

				    steps:

				      - run:

				          name: Install << parameters.formulae >>

				          no_output_timeout: "10m"

				          command: |

				            set -ex

				            export HOMEBREW_NO_AUTO_UPDATE=1

				            brew install << parameters.formulae >>

				  run_brew_for_macos_build:

				    steps:

				      - brew_update

				      - brew_install:

				          formulae: libomp

				  run_brew_for_ios_build:

				    steps:

				      - brew_update

				      - brew_install:

				          formulae: libtool

									
										150

.circleci/verbatim-sources/header-section.yml
									
												View File
												
				@ -7,6 +7,11 @@

				# and then update DOCKER_IMAGE_VERSION at the top of the following files:

				# * cimodel/data/pytorch_build_definitions.py

				# * cimodel/data/caffe2_build_definitions.py

				# And the inline copies of the variable in

				# * verbatim-sources/job-specs-custom.yml

				#   (grep for DOCKER_IMAGE)

				version: 2.1

				docker_config_defaults: &docker_config_defaults

				  user: jenkins

				@ -14,148 +19,3 @@ docker_config_defaults: &docker_config_defaults

				    # This IAM user only allows read-write access to ECR

				    aws_access_key_id: ${CIRCLECI_AWS_ACCESS_KEY_FOR_ECR_READ_WRITE_V4}

				    aws_secret_access_key: ${CIRCLECI_AWS_SECRET_KEY_FOR_ECR_READ_WRITE_V4}

				# This system setup script is meant to run before the CI-related scripts, e.g.,

				# installing Git client, checking out code, setting up CI env, and

				# building/testing.

				setup_linux_system_environment: &setup_linux_system_environment

				  name: Set Up System Environment

				  no_output_timeout: "1h"

				  command: ~/workspace/.circleci/scripts/setup_linux_system_environment.sh

				install_doc_push_script: &install_doc_push_script

				  name: Install the doc push script

				  no_output_timeout: "2m"

				  command: |

				    cat >/home/circleci/project/doc_push_script.sh <<EOL

				    # =================== The following code **should** be executed inside Docker container ===================

				    # This is where the local pytorch install in the docker image is located

				    pt_checkout="/var/lib/jenkins/workspace"

				    # Since we're cat-ing this file, we need to escape all $'s

				    echo "doc_push_script.sh: Invoked with \$*"

				    git clone https://yf225:${GITHUB_PYTORCHBOT_TOKEN}@github.com/pytorch/pytorch.github.io -b site

				    pushd pytorch.github.io

				    set -ex

				    # Argument 1: Where to copy the built documentation to

				    # (pytorch.github.io/$install_path)

				    install_path="\$1"

				    if [ -z "\$install_path" ]; then

				    echo "error: doc_push_script.sh: install_path (arg1) not specified"

				      exit 1

				    fi

				    # Argument 2: What version of the docs we are building.

				    version="\$2"

				    if [ -z "\$version" ]; then

				    echo "error: doc_push_script.sh: version (arg2) not specified"

				      exit 1

				    fi

				    is_master_doc=false

				    if [ "\$version" == "master" ]; then

				      is_master_doc=true

				    fi

				    # Argument 3: (optional) If present, we will NOT do any pushing. Used for testing.

				    dry_run=false

				    if [ "\$3" != "" ]; then

				      dry_run=true

				    fi

				    echo "install_path: \$install_path  version: \$version  dry_run: \$dry_run"

				    export LC_ALL=C

				    export PATH=/opt/conda/bin:$PATH

				    rm -rf pytorch || true

				    # Install TensorBoard in python 3 so torch.utils.tensorboard classes render

				    pip install -q https://s3.amazonaws.com/ossci-linux/wheels/tensorboard-1.14.0a0-py3-none-any.whl

				    # Get all the documentation sources, put them in one place

				    pushd "\$pt_checkout"

				    git clone https://github.com/pytorch/vision

				    pushd vision

				    conda install -q pillow

				    time python setup.py install

				    popd

				    pushd docs

				    rm -rf source/torchvision

				    cp -a ../vision/docs/source source/torchvision

				    # Build the docs

				    pip -q install -r requirements.txt || true

				    if [ "\$is_master_doc" = true ]; then

				      make html

				    else

				      make html-stable

				    fi

				    # Move them into the docs repo

				    popd

				    popd

				    git rm -rf "\$install_path" || true

				    mv "\$pt_checkout/docs/build/html" "\$install_path"

				    # Add the version handler by search and replace.

				    # XXX: Consider moving this to the docs Makefile or site build

				    if [ "\$is_master_doc" = true ]; then

				      find "\$install_path" -name "*.html" -print0 | xargs -0 perl -pi -w -e "s@master\s+\((\d\.\d\.[A-Fa-f0-9]+\+[A-Fa-f0-9]+)\s+\)@<a href='http://pytorch.org/docs/versions.html'>\1 \&#x25BC</a>@g"

				    else

				      find "\$install_path" -name "*.html" -print0 | xargs -0 perl -pi -w -e "s@master\s+\((\d\.\d\.[A-Fa-f0-9]+\+[A-Fa-f0-9]+)\s+\)@<a href='http://pytorch.org/docs/versions.html'>\$version \&#x25BC</a>@g"

				    fi

				    git add "\$install_path" || true

				    git status

				    git config user.email "soumith+bot@pytorch.org"

				    git config user.name "pytorchbot"

				    # If there aren't changes, don't make a commit; push is no-op

				    git commit -m "auto-generating sphinx docs" || true

				    git status

				    if [ "\$dry_run" = false ]; then

				      echo "Pushing to pytorch.github.io:site"

				      git push origin site

				    else

				      echo "Skipping push due to dry_run"

				    fi

				    popd

				    # =================== The above code **should** be executed inside Docker container ===================

				    EOL

				    chmod +x /home/circleci/project/doc_push_script.sh

				# `setup_ci_environment` has to be run **after** the ``checkout`` step because

				# it writes into the checkout directory and otherwise git will complain

				# that

				#   Directory (/home/circleci/project) you are trying to checkout to is not empty and not git repository

				setup_ci_environment: &setup_ci_environment

				  name: Set Up CI Environment After Checkout

				  no_output_timeout: "1h"

				  command: ~/workspace/.circleci/scripts/setup_ci_environment.sh

				# Installs expect and moreutils so that we can call `unbuffer` and `ts`.

				# Also installs OpenMP

				# !!!!NOTE!!!! this is copied into a binary_macos_brew_update job which is the

				# same but does not install libomp. If you are changing this, consider if you

				# need to change that step as well.

				macos_brew_update: &macos_brew_update

				  name: Brew update and install moreutils, expect and libomp

				  no_output_timeout: "1h"

				  command: |

				    set -ex

				    # moreutils installs a `parallel` executable by default, which conflicts

				    # with the executable from the GNU `parallel`, so we must unlink GNU

				    # `parallel` first, and relink it afterwards

				    brew update

				    brew unlink parallel

				    brew install moreutils

				    brew link parallel --overwrite

				    brew install expect

				    brew install libomp

									
										358

.circleci/verbatim-sources/job-specs-custom.yml
									
												View File
												
				@ -1,20 +1,17 @@

				  pytorch_short_perf_test_gpu:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-short-perf-test-gpu

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py3:300"

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py3:347"

				      PYTHON_VERSION: "3.6"

				      USE_CUDA_DOCKER_RUNTIME: "1"

				    resource_class: gpu.medium

				    machine:

				      image: ubuntu-1604:201903-01

				    steps:

				    - attach_workspace:

				        at: ~/workspace

				    - run:

				        <<: *setup_linux_system_environment

				    - run:

				        <<: *setup_ci_environment

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - should_run_job

				    - setup_linux_system_environment

				    - setup_ci_environment

				    - run:

				        name: Perf Test

				        no_output_timeout: "1h"

				@ -22,7 +19,7 @@

				          set -e

				          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}

				          echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}

				          docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null

				          time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null

				          export id=$(docker run --runtime=nvidia -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          docker cp $id:/var/lib/jenkins/workspace/env /home/circleci/project/env

				@ -36,48 +33,43 @@

				          export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/short-perf-test-gpu.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				  pytorch_doc_push:

				  pytorch_python_doc_push:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-doc-push

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py3:300"

				      BUILD_ENVIRONMENT: pytorch-python-doc-push

				      # TODO: stop hardcoding this

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py3:347"

				    resource_class: large

				    machine:

				      image: ubuntu-1604:201903-01

				    steps:

				    - attach_workspace:

				        at: ~/workspace

				    - run:

				        <<: *setup_linux_system_environment

				    - run:

				        <<: *setup_ci_environment

				    - run:

				        <<: *install_doc_push_script

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - should_run_job

				    - setup_linux_system_environment

				    - setup_ci_environment

				    - run:

				        name: Doc Build and Push

				        no_output_timeout: "1h"

				        command: |

				          set -e

				          set -ex

				          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}

				          echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}

				          docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null

				          time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null

				          export id=$(docker run -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          docker cp /home/circleci/project/doc_push_script.sh $id:/var/lib/jenkins/workspace/doc_push_script.sh

				          # master branch docs push

				          if [[ "${CIRCLE_BRANCH}" == "master" ]]; then

				            export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && ./doc_push_script.sh docs/master master") | docker exec -u jenkins -i "$id" bash) 2>&1'

				            export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export GITHUB_PYTORCHBOT_TOKEN=${GITHUB_PYTORCHBOT_TOKEN}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && . ./.circleci/scripts/python_doc_push_script.sh docs/master master site") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          # stable release docs push. Due to some circleci limitations, we keep

				          # an eternal PR open (#16502) for merging v1.0.1 -> master for this job.

				          # XXX: The following code is only run on the v1.0.1 branch, which might

				          # an eternal PR open for merging v1.3.0 -> master for this job.

				          # XXX: The following code is only run on the v1.3.0 branch, which might

				          # not be exactly the same as what you see here.

				          elif [[ "${CIRCLE_BRANCH}" == "v1.0.1" ]]; then

				            export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && ./doc_push_script.sh docs/stable 1.0.1") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          elif [[ "${CIRCLE_BRANCH}" == "v1.3.0" ]]; then

				            export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export GITHUB_PYTORCHBOT_TOKEN=${GITHUB_PYTORCHBOT_TOKEN}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && . ./.circleci/scripts/python_doc_push_script.sh docs/stable 1.3.0 site-v1.3.0 dry_run") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          # For open PRs: Do a dry_run of the docs build, don't push build

				          else

				            export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && ./doc_push_script.sh docs/master master dry_run") | docker exec -u jenkins -i "$id" bash) 2>&1'

				            export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export GITHUB_PYTORCHBOT_TOKEN=${GITHUB_PYTORCHBOT_TOKEN}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && . ./.circleci/scripts/python_doc_push_script.sh docs/master master site dry_run") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          fi

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				@ -85,30 +77,75 @@

				          # Save the docs build so we can debug any problems

				          export DEBUG_COMMIT_DOCKER_IMAGE=${COMMIT_DOCKER_IMAGE}-debug

				          docker commit "$id" ${DEBUG_COMMIT_DOCKER_IMAGE}

				          docker push ${DEBUG_COMMIT_DOCKER_IMAGE}

				          time docker push ${DEBUG_COMMIT_DOCKER_IMAGE}

				  pytorch_cpp_doc_push:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-cpp-doc-push

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py3:347"

				    resource_class: large

				    machine:

				      image: ubuntu-1604:201903-01

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - should_run_job

				    - setup_linux_system_environment

				    - setup_ci_environment

				    - run:

				        name: Doc Build and Push

				        no_output_timeout: "1h"

				        command: |

				          set -ex

				          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}

				          echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}

				          time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null

				          export id=$(docker run -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          # master branch docs push

				          if [[ "${CIRCLE_BRANCH}" == "master" ]]; then

				            export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export GITHUB_PYTORCHBOT_TOKEN=${GITHUB_PYTORCHBOT_TOKEN}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && . ./.circleci/scripts/cpp_doc_push_script.sh docs/master master") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          # stable release docs push. Due to some circleci limitations, we keep

				          # an eternal PR open (#16502) for merging v1.0.1 -> master for this job.

				          # XXX: The following code is only run on the v1.0.1 branch, which might

				          # not be exactly the same as what you see here.

				          elif [[ "${CIRCLE_BRANCH}" == "v1.0.1" ]]; then

				            export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export GITHUB_PYTORCHBOT_TOKEN=${GITHUB_PYTORCHBOT_TOKEN}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && . ./.circleci/scripts/cpp_doc_push_script.sh docs/stable 1.0.1") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          # For open PRs: Do a dry_run of the docs build, don't push build

				          else

				            export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export GITHUB_PYTORCHBOT_TOKEN=${GITHUB_PYTORCHBOT_TOKEN}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && . ./.circleci/scripts/cpp_doc_push_script.sh docs/master master dry_run") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          fi

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          # Save the docs build so we can debug any problems

				          export DEBUG_COMMIT_DOCKER_IMAGE=${COMMIT_DOCKER_IMAGE}-debug

				          docker commit "$id" ${DEBUG_COMMIT_DOCKER_IMAGE}

				          time docker push ${DEBUG_COMMIT_DOCKER_IMAGE}

				  pytorch_macos_10_13_py3_build:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-macos-10.13-py3-build

				    macos:

				      xcode: "9.0"

				    steps:

				      # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				      - should_run_job

				      - checkout

				      - run:

				          <<: *macos_brew_update

				      - run_brew_for_macos_build

				      - run:

				          name: Build

				          environment:

				            BUILD_ENVIRONMENT: pytorch-macos-10.13-py3-build

				          no_output_timeout: "1h"

				          command: |

				            set -e

				            export IN_CIRCLECI=1

				            # Install sccache

				            sudo curl https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache

				            sudo chmod +x /usr/local/bin/sccache

				            export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2

				            # This IAM user allows write access to S3 bucket for sccache

				            set +x

				            export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4}

				@ -118,54 +155,50 @@

				            chmod a+x .jenkins/pytorch/macos-build.sh

				            unbuffer .jenkins/pytorch/macos-build.sh 2>&1 | ts

				            mkdir -p /Users/distiller/pytorch-ci-env/workspace

				            # copy with -a to preserve relative structure (e.g., symlinks), and be recursive

				            cp -a /Users/distiller/project/. /Users/distiller/pytorch-ci-env/workspace

				            cp -a ~/project ~/workspace

				      - persist_to_workspace:

				          root: /Users/distiller/pytorch-ci-env

				          root: ~/workspace

				          paths:

				            - "*"

				            - miniconda3

				            - project

				  pytorch_macos_10_13_py3_test:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-macos-10.13-py3-test

				    macos:

				      xcode: "9.0"

				    steps:

				      - run:

				          name: Prepare workspace

				          command: |

				            sudo mkdir -p /Users/distiller/pytorch-ci-env

				            sudo chmod -R 777 /Users/distiller/pytorch-ci-env

				      - attach_workspace:

				          at: /Users/distiller/pytorch-ci-env

				      - run:

				          <<: *macos_brew_update

				      # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				      # This workspace also carries binaries from the build job

				      - should_run_job

				      - run_brew_for_macos_build

				      - run:

				          name: Test

				          environment:

				            BUILD_ENVIRONMENT: pytorch-macos-10.13-py3-test

				          no_output_timeout: "1h"

				          command: |

				            set -e

				            export IN_CIRCLECI=1

				            # copy with -a to preserve relative structure (e.g., symlinks), and be recursive

				            cp -a /Users/distiller/pytorch-ci-env/workspace/. /Users/distiller/project

				            cp -a ~/workspace/project/. ~/project

				            chmod a+x .jenkins/pytorch/macos-test.sh

				            unbuffer .jenkins/pytorch/macos-test.sh 2>&1 | ts

				  pytorch_macos_10_13_cuda9_2_cudnn7_py3_build:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-macos-10.13-cuda9.2-cudnn7-py3-build

				    macos:

				      xcode: "9.0"

				    steps:

				      # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				      - should_run_job

				      - checkout

				      - run:

				          <<: *macos_brew_update

				      - run_brew_for_macos_build

				      - run:

				          name: Build

				          environment:

				            BUILD_ENVIRONMENT: pytorch-macos-10.13-cuda9.2-cudnn7-py3-build

				          no_output_timeout: "1h"

				          command: |

				            set -e

				@ -202,3 +235,212 @@

				            chmod a+x .jenkins/pytorch/macos-build.sh

				            unbuffer .jenkins/pytorch/macos-build.sh 2>&1 | ts

				  pytorch_android_gradle_build:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:347"

				      PYTHON_VERSION: "3.6"

				    resource_class: large

				    machine:

				      image: ubuntu-1604:201903-01

				    steps:

				    - should_run_job

				    - setup_linux_system_environment

				    - checkout

				    - setup_ci_environment

				    - run:

				        name: pytorch android gradle build

				        no_output_timeout: "1h"

				        command: |

				          set -eux

				          docker_image_commit=${DOCKER_IMAGE}-${CIRCLE_SHA1}

				          docker_image_libtorch_android_x86_32=${docker_image_commit}-android-x86_32

				          docker_image_libtorch_android_x86_64=${docker_image_commit}-android-x86_64

				          docker_image_libtorch_android_arm_v7a=${docker_image_commit}-android-arm-v7a

				          docker_image_libtorch_android_arm_v8a=${docker_image_commit}-android-arm-v8a

				          echo "docker_image_commit: "${docker_image_commit}

				          echo "docker_image_libtorch_android_x86_32: "${docker_image_libtorch_android_x86_32}

				          echo "docker_image_libtorch_android_x86_64: "${docker_image_libtorch_android_x86_64}

				          echo "docker_image_libtorch_android_arm_v7a: "${docker_image_libtorch_android_arm_v7a}

				          echo "docker_image_libtorch_android_arm_v8a: "${docker_image_libtorch_android_arm_v8a}

				          # x86_32

				          time docker pull ${docker_image_libtorch_android_x86_32} >/dev/null

				          export id_x86_32=$(docker run -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_x86_32})

				          export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "$id_x86_32" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          # arm-v7a

				          time docker pull ${docker_image_libtorch_android_arm_v7a} >/dev/null

				          export id_arm_v7a=$(docker run -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_arm_v7a})

				          export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "$id_arm_v7a" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          mkdir ~/workspace/build_android_install_arm_v7a

				          docker cp $id_arm_v7a:/var/lib/jenkins/workspace/build_android/install ~/workspace/build_android_install_arm_v7a

				          # x86_64

				          time docker pull ${docker_image_libtorch_android_x86_64} >/dev/null

				          export id_x86_64=$(docker run -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_x86_64})

				          export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "$id_x86_64" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          mkdir ~/workspace/build_android_install_x86_64

				          docker cp $id_x86_64:/var/lib/jenkins/workspace/build_android/install ~/workspace/build_android_install_x86_64

				          # arm-v8a

				          time docker pull ${docker_image_libtorch_android_arm_v8a} >/dev/null

				          export id_arm_v8a=$(docker run -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_arm_v8a})

				          export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "$id_arm_v8a" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          mkdir ~/workspace/build_android_install_arm_v8a

				          docker cp $id_arm_v8a:/var/lib/jenkins/workspace/build_android/install ~/workspace/build_android_install_arm_v8a

				          docker cp ~/workspace/build_android_install_arm_v7a $id_x86_32:/var/lib/jenkins/workspace/build_android_install_arm_v7a

				          docker cp ~/workspace/build_android_install_x86_64 $id_x86_32:/var/lib/jenkins/workspace/build_android_install_x86_64

				          docker cp ~/workspace/build_android_install_arm_v8a $id_x86_32:/var/lib/jenkins/workspace/build_android_install_arm_v8a

				          # run gradle buildRelease

				          export COMMAND='((echo "source ./workspace/env" && echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "sudo chown -R jenkins workspace && cd workspace && ./.circleci/scripts/build_android_gradle.sh") | docker exec -u jenkins -i "$id_x86_32" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          mkdir -p ~/workspace/build_android_artifacts

				          docker cp $id_x86_32:/var/lib/jenkins/workspace/android/artifacts.tgz ~/workspace/build_android_artifacts/

				          output_image=$docker_image_libtorch_android_x86_32-gradle

				          docker commit "$id_x86_32" ${output_image}

				          time docker push ${output_image}

				    - store_artifacts:

				        path: ~/workspace/build_android_artifacts/artifacts.tgz

				        destination: artifacts.tgz

				  pytorch_android_publish_snapshot:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-publish-snapshot

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:347"

				      PYTHON_VERSION: "3.6"

				    resource_class: large

				    machine:

				      image: ubuntu-1604:201903-01

				    steps:

				    - should_run_job

				    - setup_linux_system_environment

				    - checkout

				    - setup_ci_environment

				    - run:

				        name: pytorch android gradle build

				        no_output_timeout: "1h"

				        command: |

				          set -eux

				          docker_image_commit=${DOCKER_IMAGE}-${CIRCLE_SHA1}

				          docker_image_libtorch_android_x86_32_gradle=${docker_image_commit}-android-x86_32-gradle

				          echo "docker_image_commit: "${docker_image_commit}

				          echo "docker_image_libtorch_android_x86_32_gradle: "${docker_image_libtorch_android_x86_32_gradle}

				          # x86_32

				          time docker pull ${docker_image_libtorch_android_x86_32_gradle} >/dev/null

				          export id_x86_32=$(docker run -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_x86_32_gradle})

				          export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace" && echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export SONATYPE_NEXUS_USERNAME=${SONATYPE_NEXUS_USERNAME}" && echo "export SONATYPE_NEXUS_PASSWORD=${SONATYPE_NEXUS_PASSWORD}" && echo "export ANDROID_SIGN_KEY=${ANDROID_SIGN_KEY}" && echo "export ANDROID_SIGN_PASS=${ANDROID_SIGN_PASS}" && echo "sudo chown -R jenkins workspace && cd workspace && ./.circleci/scripts/publish_android_snapshot.sh") | docker exec -u jenkins -i "$id_x86_32" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          output_image=${docker_image_libtorch_android_x86_32_gradle}-publish-snapshot

				          docker commit "$id_x86_32" ${output_image}

				          time docker push ${output_image}

				  pytorch_android_gradle_build-x86_32:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-only-x86_32

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:347"

				      PYTHON_VERSION: "3.6"

				    resource_class: large

				    machine:

				      image: ubuntu-1604:201903-01

				    steps:

				    - should_run_job

				    - run:

				        name: filter out not PR runs

				        no_output_timeout: "5m"

				        command: |

				          echo "CIRCLE_PULL_REQUEST: ${CIRCLE_PULL_REQUEST:-}"

				          if [ -z "${CIRCLE_PULL_REQUEST:-}" ]; then

				            circleci step halt

				          fi

				    - setup_linux_system_environment

				    - checkout

				    - setup_ci_environment

				    - run:

				        name: pytorch android gradle build only x86_32 (for PR)

				        no_output_timeout: "1h"

				        command: |

				          set -e

				          docker_image_libtorch_android_x86_32=${DOCKER_IMAGE}-${CIRCLE_SHA1}-android-x86_32

				          echo "docker_image_libtorch_android_x86_32: "${docker_image_libtorch_android_x86_32}

				          # x86

				          time docker pull ${docker_image_libtorch_android_x86_32} >/dev/null

				          export id=$(docker run -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_x86_32})

				          export COMMAND='((echo "source ./workspace/env" && echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "sudo chown -R jenkins workspace && cd workspace && ./.circleci/scripts/build_android_gradle.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          mkdir -p ~/workspace/build_android_x86_32_artifacts

				          docker cp $id:/var/lib/jenkins/workspace/android/artifacts.tgz ~/workspace/build_android_x86_32_artifacts/

				          output_image=${docker_image_libtorch_android_x86_32}-gradle

				          docker commit "$id" ${output_image}

				          time docker push ${output_image}

				    - store_artifacts:

				        path: ~/workspace/build_android_x86_32_artifacts/artifacts.tgz

				        destination: artifacts.tgz

				  pytorch_ios_build:

				    <<: *pytorch_ios_params

				    macos:

				      xcode: "10.2.1"

				    steps:

				      # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				      - should_run_job

				      - checkout

				      - run_brew_for_ios_build

				      - run:

				          name: Build

				          no_output_timeout: "1h"

				          command: |

				            set -e

				            export IN_CIRCLECI=1

				            WORKSPACE=/Users/distiller/workspace

				            PROJ_ROOT=/Users/distiller/project

				            export TCLLIBPATH="/usr/local/lib"

				            # Install conda

				            curl -o ~/Downloads/conda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh

				            chmod +x ~/Downloads/conda.sh

				            /bin/bash ~/Downloads/conda.sh -b -p ~/anaconda

				            export PATH="~/anaconda/bin:${PATH}"

				            source ~/anaconda/bin/activate

				            # Install dependencies

				            conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing requests --yes

				            # sync submodules

				            cd ${PROJ_ROOT}

				            git submodule sync

				            git submodule update --init --recursive

				            # export

				            export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}

				            # run build script

				            chmod a+x ${PROJ_ROOT}/scripts/build_ios.sh

				            echo "IOS_ARCH: ${IOS_ARCH}"

				            echo "IOS_PLATFORM: ${IOS_PLATFORM}"

				            export BUILD_PYTORCH_MOBILE=1

				            export IOS_ARCH=${IOS_ARCH}

				            export IOS_PLATFORM=${IOS_PLATFORM}

				            unbuffer ${PROJ_ROOT}/scripts/build_ios.sh 2>&1 | ts

									
										22

.circleci/verbatim-sources/job-specs-setup.yml
									
												View File
												
				@ -1,4 +1,4 @@

				  setup:

				    docker:

				      - image: circleci/python:3.7.3

				@ -8,6 +8,26 @@

				          name: Ensure config is up to date

				          command: ./ensure-consistency.py

				          working_directory: .circleci

				      - run:

				          name: Save commit message

				          command: git log --format='%B' -n 1 HEAD > .circleci/scripts/COMMIT_MSG

				      # Note [Workspace for CircleCI scripts]

				      # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

				      # In the beginning, you wrote your CI scripts in a

				      # .circleci/config.yml file, and life was good.  Your CI

				      # configurations flourished and multiplied.

				      #

				      # Then one day, CircleCI cometh down high and say, "Your YAML file

				      # is too biggeth, it stresses our servers so."  And thus they

				      # asketh us to smite the scripts in the yml file.

				      #

				      # But you can't just put the scripts in the .circleci folder,

				      # because in some jobs, you don't ever actually checkout the

				      # source repository.  Where you gonna get the scripts from?

				      #

				      # Here's how you do it: you persist .circleci/scripts into a

				      # workspace, attach the workspace in your subjobs, and run all

				      # your scripts from there.

				      - persist_to_workspace:

				          root: .

				          paths: .circleci/scripts

									
										96

.circleci/verbatim-sources/linux-binary-build-defaults.yml
									
												View File
											
				@ -1,96 +0,0 @@

				# binary linux build defaults

				##############################################################################

				binary_linux_build: &binary_linux_build

				  resource_class: 2xlarge+

				  steps:

				  - attach_workspace:

				      at: ~/workspace

				  - run:

				      <<: *binary_checkout

				  - run:

				      <<: *binary_populate_env

				  - run:

				      name: Install unbuffer and ts

				      command: |

				        set -eux -o pipefail

				        source /env

				        retry yum -q -y install epel-release

				        retry yum -q -y install expect moreutils

				  - run:

				      name: Upgrade gcc version (based on env var)

				      command: |

				        set -eux -o pipefail

				        source /env

				        if [[ "$DESIRED_DEVTOOLSET" == 'devtoolset7' ]]; then

				          source "/builder/upgrade_gcc_abi.sh"

				          # Env variables are not persisted into the next step

				          echo "export PATH=$PATH" >> /env

				          echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH" >> /env

				          # We need to set this variable manually because

				          # https://github.com/pytorch/pytorch/blob/master/torch/abi-check.cpp

				          # sets the ABI to 0 by default

				          echo "export _GLIBCXX_USE_CXX11_ABI=1" >> /env

				        else

				          echo "Not upgrading gcc version"

				        fi

				  - run:

				      name: Build

				      no_output_timeout: "1h"

				      command: |

				        source "/pytorch/.circleci/scripts/binary_linux_build.sh"

				  - persist_to_workspace:

				      root: /

				      paths: final_pkgs

				# This should really just be another step of the binary_linux_build job above.

				# This isn't possible right now b/c the build job uses the docker executor

				# (otherwise they'd be really really slow) but this one uses the macine

				# executor (b/c we have to run the docker with --runtime=nvidia and we can't do

				# that on the docker executor)

				binary_linux_test: &binary_linux_test

				  machine:

				    image: ubuntu-1604:201903-01

				  steps:

				  - attach_workspace:

				      at: ~/workspace

				  - attach_workspace:

				      at: /home/circleci/project

				  - run:

				      <<: *setup_linux_system_environment

				  - run:

				      <<: *setup_ci_environment

				  - run:

				      <<: *binary_checkout

				  - run:

				      <<: *binary_populate_env

				  - run:

				      name: Prepare test code

				      no_output_timeout: "1h"

				      command: ~/workspace/.circleci/scripts/binary_linux_test.sh

				  - run:

				      <<: *binary_run_in_docker

				binary_linux_upload: &binary_linux_upload

				  machine:

				    image: ubuntu-1604:201903-01

				  steps:

				  - attach_workspace:

				      at: ~/workspace

				  - run:

				      <<: *setup_linux_system_environment

				  - run:

				      <<: *setup_ci_environment

				  - attach_workspace:

				      at: /home/circleci/project

				  - run:

				      <<: *binary_populate_env

				  - run:

				      <<: *binary_install_miniconda

				  - run:

				      name: Upload

				      no_output_timeout: "1h"

				      command: ~/workspace/.circleci/scripts/binary_linux_upload.sh

									
										218

.circleci/verbatim-sources/linux-build-defaults.yml
									
												View File
											
				@ -1,218 +0,0 @@

				##############################################################################

				# Linux build defaults

				##############################################################################

				pytorch_linux_build_defaults: &pytorch_linux_build_defaults

				  resource_class: large

				  machine:

				    image: ubuntu-1604:201903-01

				  steps:

				  - attach_workspace:

				      at: ~/workspace

				  - run:

				      <<: *setup_linux_system_environment

				  - checkout

				  - run:

				      <<: *setup_ci_environment

				  - run:

				      name: Build

				      no_output_timeout: "1h"

				      command: |

				        set -e

				        # Pull Docker image and run build

				        echo "DOCKER_IMAGE: "${DOCKER_IMAGE}

				        docker pull ${DOCKER_IMAGE} >/dev/null

				        export id=$(docker run -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})

				        git submodule sync && git submodule update -q --init --recursive

				        docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace

				        if [[ ${BUILD_ENVIRONMENT} == *"namedtensor"* ]]; then

				          NAMED_FLAG="export USE_NAMEDTENSOR=1"

				        fi

				        export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo '"$NAMED_FLAG"' && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/build.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				        echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				        # Push intermediate Docker image for next phase to use

				        if [ -z "${BUILD_ONLY}" ]; then

				          # Note [namedtensor build image]

				          # The namedtensor build uses the same docker image as

				          # pytorch-linux-trusty-py3.6-gcc5.4-build. In the push step, we have to

				          # distinguish between these two so the test can pick up the correct image.

				          output_image=${DOCKER_IMAGE}-${CIRCLE_SHA1}

				          if [[ ${BUILD_ENVIRONMENT} == *"namedtensor"* ]]; then

				            export COMMIT_DOCKER_IMAGE=$output_image-namedtensor

				          else

				            export COMMIT_DOCKER_IMAGE=$output_image

				          fi

				          docker commit "$id" ${COMMIT_DOCKER_IMAGE}

				          docker push ${COMMIT_DOCKER_IMAGE}

				        fi

				pytorch_linux_test_defaults: &pytorch_linux_test_defaults

				  machine:

				    image: ubuntu-1604:201903-01

				  steps:

				  - attach_workspace:

				      at: ~/workspace

				  - run:

				      <<: *setup_linux_system_environment

				  - checkout

				  - run:

				      <<: *setup_ci_environment

				  - run:

				      name: Test

				      no_output_timeout: "90m"

				      command: |

				        set -e

				        # See Note [namedtensor build image]

				        output_image=${DOCKER_IMAGE}-${CIRCLE_SHA1}

				        if [[ ${BUILD_ENVIRONMENT} == *"namedtensor"* ]]; then

				          export COMMIT_DOCKER_IMAGE=$output_image-namedtensor

				        else

				          export COMMIT_DOCKER_IMAGE=$output_image

				        fi

				        echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}

				        docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null

				        if [ -n "${USE_CUDA_DOCKER_RUNTIME}" ]; then

				          export id=$(docker run --runtime=nvidia -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				        else

				          export id=$(docker run -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				        fi

				        if [ -n "${MULTI_GPU}" ]; then

				          export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/multigpu-test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				        else

				          export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				        fi

				        echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				caffe2_linux_build_defaults: &caffe2_linux_build_defaults

				  resource_class: large

				  machine:

				    image: ubuntu-1604:201903-01

				  steps:

				  - attach_workspace:

				      at: ~/workspace

				  - run:

				      <<: *setup_linux_system_environment

				  - checkout

				  - run:

				      <<: *setup_ci_environment

				  - run:

				      name: Build

				      no_output_timeout: "1h"

				      command: |

				        set -e

				        cat >/home/circleci/project/ci_build_script.sh <<EOL

				        # =================== The following code will be executed inside Docker container ===================

				        set -ex

				        export BUILD_ENVIRONMENT="$BUILD_ENVIRONMENT"

				        # Reinitialize submodules

				        git submodule sync && git submodule update -q --init --recursive

				        # conda must be added to the path for Anaconda builds (this location must be

				        # the same as that in install_anaconda.sh used to build the docker image)

				        if [[ "${BUILD_ENVIRONMENT}" == conda* ]]; then

				          export PATH=/opt/conda/bin:$PATH

				          sudo chown -R jenkins:jenkins '/opt/conda'

				        fi

				        # Build

				        ./.jenkins/caffe2/build.sh

				        # Show sccache stats if it is running

				        if pgrep sccache > /dev/null; then

				          sccache --show-stats

				        fi

				        # =================== The above code will be executed inside Docker container ===================

				        EOL

				        chmod +x /home/circleci/project/ci_build_script.sh

				        echo "DOCKER_IMAGE: "${DOCKER_IMAGE}

				        docker pull ${DOCKER_IMAGE} >/dev/null

				        export id=$(docker run -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})

				        docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace

				        export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && ./ci_build_script.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				        echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				        # Push intermediate Docker image for next phase to use

				        if [ -z "${BUILD_ONLY}" ]; then

				          if [[ "$BUILD_ENVIRONMENT" == *cmake* ]]; then

				            export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-cmake-${CIRCLE_SHA1}

				          else

				            export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}

				          fi

				          docker commit "$id" ${COMMIT_DOCKER_IMAGE}

				          docker push ${COMMIT_DOCKER_IMAGE}

				        fi

				caffe2_linux_test_defaults: &caffe2_linux_test_defaults

				  machine:

				    image: ubuntu-1604:201903-01

				  steps:

				  - attach_workspace:

				      at: ~/workspace

				  - run:

				      <<: *setup_linux_system_environment

				  - run:

				      <<: *setup_ci_environment

				  - run:

				      name: Test

				      no_output_timeout: "1h"

				      command: |

				        set -e

				        # TODO: merge this into Caffe2 test.sh

				        cat >/home/circleci/project/ci_test_script.sh <<EOL

				        # =================== The following code will be executed inside Docker container ===================

				        set -ex

				        export BUILD_ENVIRONMENT="$BUILD_ENVIRONMENT"

				        # libdc1394 (dependency of OpenCV) expects /dev/raw1394 to exist...

				        sudo ln /dev/null /dev/raw1394

				        # conda must be added to the path for Anaconda builds (this location must be

				        # the same as that in install_anaconda.sh used to build the docker image)

				        if [[ "${BUILD_ENVIRONMENT}" == conda* ]]; then

				          export PATH=/opt/conda/bin:$PATH

				        fi

				        # Upgrade SSL module to avoid old SSL warnings

				        pip -q install --user --upgrade pyOpenSSL ndg-httpsclient pyasn1

				        pip -q install --user -b /tmp/pip_install_onnx "file:///var/lib/jenkins/workspace/third_party/onnx#egg=onnx"

				        # Build

				        ./.jenkins/caffe2/test.sh

				        # Remove benign core dumps.

				        # These are tests for signal handling (including SIGABRT).

				        rm -f ./crash/core.fatal_signal_as.*

				        rm -f ./crash/core.logging_test.*

				        # =================== The above code will be executed inside Docker container ===================

				        EOL

				        chmod +x /home/circleci/project/ci_test_script.sh

				        if [[ "$BUILD_ENVIRONMENT" == *cmake* ]]; then

				          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-cmake-${CIRCLE_SHA1}

				        else

				          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}

				        fi

				        echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}

				        docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null

				        if [ -n "${USE_CUDA_DOCKER_RUNTIME}" ]; then

				          export id=$(docker run --runtime=nvidia -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				        else

				          export id=$(docker run -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				        fi

				        docker cp /home/circleci/project/. "$id:/var/lib/jenkins/workspace"

				        export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && ./ci_test_script.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				        echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

									
										66

.circleci/verbatim-sources/macos-binary-build-defaults.yml
									
												View File
											
				@ -1,66 +0,0 @@

				##############################################################################

				# Macos binary build defaults

				# The root of everything is /Users/distiller/pytorch-ci-env/workspace

				##############################################################################

				binary_mac_build: &binary_mac_build

				  macos:

				    xcode: "9.0"

				  steps:

				  - attach_workspace:

				      at: ~/workspace

				  - run:

				      <<: *binary_checkout

				  - run:

				      <<: *binary_populate_env

				  - run:

				      <<: *binary_macos_brew_update

				  - run:

				      <<: *binary_install_miniconda

				  - run:

				      name: Build

				      no_output_timeout: "1h"

				      command: |

				        set -eux -o pipefail

				        script="/Users/distiller/project/pytorch/.circleci/scripts/binary_macos_build.sh"

				        cat "$script"

				        source "$script"

				  - run:

				      name: Test

				      no_output_timeout: "1h"

				      command: |

				        set -eux -o pipefail

				        script="/Users/distiller/project/pytorch/.circleci/scripts/binary_macos_test.sh"

				        cat "$script"

				        source "$script"

				  - persist_to_workspace:

				      root: /Users/distiller/project

				      paths: final_pkgs

				binary_mac_upload: &binary_mac_upload

				  macos:

				    xcode: "9.0"

				  steps:

				  - attach_workspace:

				      at: ~/workspace

				  - run:

				      <<: *binary_checkout

				  - run:

				      <<: *binary_populate_env

				  - run:

				      <<: *binary_macos_brew_update

				  - run:

				      <<: *binary_install_miniconda

				  - attach_workspace: # TODO - we can `cp` from ~/workspace

				      at: /Users/distiller/project

				  - run:

				      name: Upload

				      no_output_timeout: "10m"

				      command: |

				        script="/Users/distiller/project/pytorch/.circleci/scripts/binary_macos_upload.sh"

				        cat "$script"

				        source "$script"

									
										85

.circleci/verbatim-sources/macos-build-defaults.yml
									
												View File
											
				@ -1,85 +0,0 @@

				##############################################################################

				# Macos build defaults

				##############################################################################

				caffe2_macos_build_defaults: &caffe2_macos_build_defaults

				  macos:

				    xcode: "9.0"

				  steps:

				    - checkout

				    - run:

				        <<: *macos_brew_update

				    - run:

				        name: Build

				        no_output_timeout: "1h"

				        command: |

				          set -e

				          export IN_CIRCLECI=1

				          brew install cmake

				          # Reinitialize submodules

				          git submodule sync && git submodule update -q --init --recursive

				          # Reinitialize path (see man page for path_helper(8))

				          eval `/usr/libexec/path_helper -s`

				          # Use Homebrew Python if configured to do so

				          if [ "${PYTHON_INSTALLATION}" == "homebrew" ]; then

				            export PATH=/usr/local/opt/python/libexec/bin:/usr/local/bin:$PATH

				          fi

				          pip -q install numpy

				          # Install Anaconda if we need to

				          if [ -n "${CAFFE2_USE_ANACONDA}" ]; then

				            rm -rf ${TMPDIR}/anaconda

				            curl -o ${TMPDIR}/conda.sh https://repo.continuum.io/miniconda/Miniconda${ANACONDA_VERSION}-latest-MacOSX-x86_64.sh

				            chmod +x ${TMPDIR}/conda.sh

				            /bin/bash ${TMPDIR}/conda.sh -b -p ${TMPDIR}/anaconda

				            rm -f ${TMPDIR}/conda.sh

				            export PATH="${TMPDIR}/anaconda/bin:${PATH}"

				            source ${TMPDIR}/anaconda/bin/activate

				          fi

				          # Install sccache

				          sudo curl https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache

				          sudo chmod +x /usr/local/bin/sccache

				          export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2

				          # This IAM user allows write access to S3 bucket for sccache

				          set +x

				          export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4}

				          export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4}

				          set -x

				          export SCCACHE_BIN=${PWD}/sccache_bin

				          mkdir -p ${SCCACHE_BIN}

				          if which sccache > /dev/null; then

				            printf "#!/bin/sh\nexec sccache $(which clang++) \$*" > "${SCCACHE_BIN}/clang++"

				            chmod a+x "${SCCACHE_BIN}/clang++"

				            printf "#!/bin/sh\nexec sccache $(which clang) \$*" > "${SCCACHE_BIN}/clang"

				            chmod a+x "${SCCACHE_BIN}/clang"

				            export PATH="${SCCACHE_BIN}:$PATH"

				          fi

				          # Build

				          if [ "${BUILD_IOS:-0}" -eq 1 ]; then

				            unbuffer scripts/build_ios.sh 2>&1 | ts

				          elif [ -n "${CAFFE2_USE_ANACONDA}" ]; then

				            # All conda build logic should be in scripts/build_anaconda.sh

				            unbuffer scripts/build_anaconda.sh 2>&1 | ts

				          else

				            unbuffer scripts/build_local.sh 2>&1 | ts

				          fi

				          # Show sccache stats if it is running

				          if which sccache > /dev/null; then

				            sccache --show-stats

				          fi

									
										18

.circleci/verbatim-sources/nightly-binary-build-defaults.yml
									
												View File
												
				@ -48,21 +48,3 @@ binary_run_in_docker: &binary_run_in_docker

				  # This step only runs on circleci linux machine executors that themselves

				  # need to start docker images

				  command: ~/workspace/.circleci/scripts/binary_run_in_docker.sh

				# This is copied almost verbatim from the macos_brew_update job

				# In version 2.1 and above we could make this a command and pass a parameter to

				# it, but in this version there is no way to pass a parameter to a step

				binary_macos_brew_update: &binary_macos_brew_update

				  name: Brew update and install moreutils and expect

				  no_output_timeout: "1h"

				  command: |

				    set -eux -o pipefail

				    # moreutils installs a `parallel` executable by default, which conflicts

				    # with the executable from the GNU `parallel`, so we must unlink GNU

				    # `parallel` first, and relink it afterwards

				    brew update

				    brew unlink parallel

				    brew install moreutils

				    brew link parallel --overwrite

				    brew install expect

									
										64

.circleci/verbatim-sources/nightly-build-smoke-tests-defaults.yml
									
												View File
											
				@ -1,64 +0,0 @@

				# Nighlty build smoke tests defaults

				# These are the second-round smoke tests. These make sure that the binaries are

				# correct from a user perspective, testing that they exist from the cloud are

				# are runnable. Note that the pytorch repo is never cloned into these jobs

				##############################################################################

				smoke_linux_test: &smoke_linux_test

				  machine:

				    image: ubuntu-1604:201903-01

				  steps:

				  - attach_workspace:

				      at: ~/workspace

				  - attach_workspace:

				      at: /home/circleci/project

				  - run:

				      <<: *setup_linux_system_environment

				  - run:

				      <<: *setup_ci_environment

				  - run:

				      <<: *binary_checkout

				  - run:

				      <<: *binary_populate_env

				  - run:

				      name: Test

				      no_output_timeout: "1h"

				      command: |

				        set -ex

				        cat >/home/circleci/project/ci_test_script.sh <<EOL

				        # The following code will be executed inside Docker container

				        set -eux -o pipefail

				        /builder/smoke_test.sh

				        # The above code will be executed inside Docker container

				        EOL

				  - run:

				      <<: *binary_run_in_docker

				smoke_mac_test: &smoke_mac_test

				  macos:

				    xcode: "9.0"

				  steps:

				    - attach_workspace:

				        at: ~/workspace

				    - attach_workspace: # TODO - we can `cp` from ~/workspace

				        at: /Users/distiller/project

				    - run:

				        <<: *binary_checkout

				    - run:

				        <<: *binary_populate_env

				    - run:

				        <<: *binary_macos_brew_update

				    - run:

				        <<: *binary_install_miniconda

				    - run:

				        name: Build

				        no_output_timeout: "1h"

				        command: |

				          set -ex

				          source "/Users/distiller/project/env"

				          export "PATH=$workdir/miniconda/bin:$PATH"

				          # TODO unbuffer and ts this, but it breaks cause miniconda overwrites

				          # tclsh. But unbuffer and ts aren't that important so they're just

				          # disabled for now

				          ./builder/smoke_test.sh

									
										39

.circleci/verbatim-sources/pytorch-build-params.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,39 @@

				pytorch_params: &pytorch_params

				  parameters:

				    build_environment:

				      type: string

				      default: ""

				    docker_image:

				      type: string

				      default: ""

				    resource_class:

				      type: string

				      default: "large"

				    use_cuda_docker_runtime:

				      type: string

				      default: ""

				  environment:

				    BUILD_ENVIRONMENT: << parameters.build_environment >>

				    DOCKER_IMAGE: << parameters.docker_image >>

				    USE_CUDA_DOCKER_RUNTIME: << parameters.use_cuda_docker_runtime >>

				  resource_class: << parameters.resource_class >>

				pytorch_ios_params: &pytorch_ios_params

				  parameters:

				    build_environment:

				      type: string

				      default: ""

				    ios_arch:

				      type: string

				      default: ""

				    ios_platform:

				      type: string

				      default: ""

				  environment:

				    BUILD_ENVIRONMENT: << parameters.build_environment >>

				    IOS_ARCH: << parameters.ios_arch >>

				    IOS_PLATFORM: << parameters.ios_platform >>

									
										117

.circleci/verbatim-sources/pytorch-job-specs.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,117 @@

				jobs:

				  pytorch_linux_build:

				    <<: *pytorch_params

				    machine:

				      image: ubuntu-1604:201903-01

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - should_run_job

				    - setup_linux_system_environment

				    - checkout

				    - setup_ci_environment

				    - run:

				        name: Build

				        no_output_timeout: "1h"

				        command: |

				          set -e

				          # Pull Docker image and run build

				          echo "DOCKER_IMAGE: "${DOCKER_IMAGE}

				          time docker pull ${DOCKER_IMAGE} >/dev/null

				          export id=$(docker run -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})

				          # TODO We may want to move the rebase logic to a separate step after checkout

				          # Rebase to v1.3.0 only if in xenial_py3_6_gcc5_4 case

				          if [[ "${CIRCLE_BRANCH}" != "v1.3.0" && "${BUILD_ENVIRONMENT}" == *"gcc5"* ]]; then

				            echo "Merge v1.3.0 branch into $CIRCLE_BRANCH before build in environment $BUILD_ENVIRONMENT"

				            set -x

				            git config --global user.email "circleci.ossci@gmail.com"

				            git config --global user.name "CircleCI"

				            git config remote.origin.url https://github.com/pytorch/pytorch.git

				            git config --add remote.origin.fetch +refs/heads/v1.3.0:refs/remotes/origin/v1.3.0

				            git fetch --tags --progress https://github.com/pytorch/pytorch.git +refs/heads/v1.3.0:refs/remotes/origin/v1.3.0 --depth=50 --quiet

				            export GIT_MERGE_TARGET=`git log -n 1 --pretty=format:"%H" origin/v1.3.0`

				            echo "GIT_MERGE_TARGET: " ${GIT_MERGE_TARGET}

				            export GIT_COMMIT=${CIRCLE_SHA1}

				            echo "GIT_COMMIT: " ${GIT_COMMIT}

				            git checkout -f ${GIT_COMMIT}

				            git reset --hard ${GIT_COMMIT}

				            git merge --no-edit --no-ff ${GIT_MERGE_TARGET}

				            set +x

				          else

				            echo "Do NOT merge v1.3.0 branch into $CIRCLE_BRANCH in environment $BUILD_ENVIRONMENT"

				          fi

				          git submodule sync && git submodule update -q --init --recursive

				          docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace

				          if [[ ${BUILD_ENVIRONMENT} == *"namedtensor"* ]]; then

				            NAMED_FLAG="export BUILD_NAMEDTENSOR=1"

				          fi

				          export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo '"$NAMED_FLAG"' && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/build.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          # Push intermediate Docker image for next phase to use

				          if [ -z "${BUILD_ONLY}" ]; then

				            # Note [Special build images]

				            # The namedtensor and xla builds use the same docker image as

				            # pytorch-linux-trusty-py3.6-gcc5.4-build. In the push step, we have to

				            # distinguish between them so the test can pick up the correct image.

				            output_image=${DOCKER_IMAGE}-${CIRCLE_SHA1}

				            if [[ ${BUILD_ENVIRONMENT} == *"namedtensor"* ]]; then

				              export COMMIT_DOCKER_IMAGE=$output_image-namedtensor

				            elif [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then

				              export COMMIT_DOCKER_IMAGE=$output_image-xla

				            elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-x86_64"* ]]; then

				              export COMMIT_DOCKER_IMAGE=$output_image-android-x86_64

				            elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-arm-v7a"* ]]; then

				              export COMMIT_DOCKER_IMAGE=$output_image-android-arm-v7a

				            elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-arm-v8a"* ]]; then

				              export COMMIT_DOCKER_IMAGE=$output_image-android-arm-v8a

				            elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-x86_32"* ]]; then

				              export COMMIT_DOCKER_IMAGE=$output_image-android-x86_32

				            else

				              export COMMIT_DOCKER_IMAGE=$output_image

				            fi

				            docker commit "$id" ${COMMIT_DOCKER_IMAGE}

				            time docker push ${COMMIT_DOCKER_IMAGE}

				          fi

				  pytorch_linux_test:

				    <<: *pytorch_params

				    machine:

				      image: ubuntu-1604:201903-01

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - should_run_job

				    - setup_linux_system_environment

				    - setup_ci_environment

				    - run:

				        name: Test

				        no_output_timeout: "90m"

				        command: |

				          set -e

				          # See Note [Special build images]

				          output_image=${DOCKER_IMAGE}-${CIRCLE_SHA1}

				          if [[ ${BUILD_ENVIRONMENT} == *"namedtensor"* ]]; then

				            export COMMIT_DOCKER_IMAGE=$output_image-namedtensor

				            export NAMED_FLAG="export BUILD_NAMEDTENSOR=1 && export TEST_NAMEDTENSOR=1"

				          elif [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then

				            export COMMIT_DOCKER_IMAGE=$output_image-xla

				          else

				            export COMMIT_DOCKER_IMAGE=$output_image

				          fi

				          echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}

				          time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null

				          if [ -n "${USE_CUDA_DOCKER_RUNTIME}" ]; then

				            export id=$(docker run --runtime=nvidia -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          else

				            export id=$(docker run -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          fi

				          if [[ ${BUILD_ENVIRONMENT} == *"multigpu"* ]]; then

				            export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "${NAMED_FLAG}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/multigpu-test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          else

				            export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "${NAMED_FLAG}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          fi

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

									
										93

.circleci/verbatim-sources/workflows-binary-builds-smoke-subset.yml
									
												View File
												
				@ -5,44 +5,97 @@

				      # pytorch-ci-hud to adjust the list of whitelisted builds

				      # at https://github.com/ezyang/pytorch-ci-hud/blob/master/src/BuildHistoryDisplay.js

				      - binary_linux_manywheel_2.7mu_cpu_devtoolset3_build:

				      - binary_linux_build:

				          name: binary_linux_manywheel_2_7mu_cpu_devtoolset7_build

				          build_environment: "manywheel 2.7mu cpu devtoolset7"

				          requires:

				            - setup

				      - binary_linux_manywheel_3.7m_cu100_devtoolset3_build:

				          docker_image: "soumith/manylinux-cuda100"

				      - binary_linux_build:

				          name: binary_linux_manywheel_3_7m_cu100_devtoolset7_build

				          build_environment: "manywheel 3.7m cu100 devtoolset7"

				          requires:

				            - setup

				      - binary_linux_conda_2.7_cpu_build:

				          docker_image: "soumith/manylinux-cuda100"

				      - binary_linux_build:

				          name: binary_linux_conda_2_7_cpu_devtoolset7_build

				          build_environment: "conda 2.7 cpu devtoolset7"

				          requires:

				            - setup

				      # This binary build is currently broken, see https://github.com/pytorch/pytorch/issues/16710

				      # - binary_linux_conda_3.6_cu90_build

				          docker_image: "soumith/conda-cuda"

				      # This binary build is currently broken, see https://github_com/pytorch/pytorch/issues/16710

				      # - binary_linux_conda_3_6_cu90_devtoolset7_build

				      - binary_linux_build:

				          name: binary_linux_libtorch_2_7m_cpu_devtoolset7_shared-with-deps_build

				          build_environment: "libtorch 2.7m cpu devtoolset7"

				          requires:

				            - setup

				          libtorch_variant: "shared-with-deps"

				          docker_image: "soumith/manylinux-cuda100"

				      - binary_linux_build:

				          name: binary_linux_libtorch_2_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build

				          build_environment: "libtorch 2.7m cpu gcc5.4_cxx11-abi"

				          requires:

				            - setup

				          libtorch_variant: "shared-with-deps"

				          docker_image: "yf225/pytorch-binary-docker-image-ubuntu16.04:latest"

				      # TODO we should test a libtorch cuda build, but they take too long

				      # - binary_linux_libtorch_2.7m_cu90_devtoolset3_build

				      - binary_macos_wheel_3.6_cpu_build:

				      # - binary_linux_libtorch_2_7m_cu90_devtoolset7_static-without-deps_build

				      - binary_mac_build:

				          name: binary_macos_wheel_3_6_cpu_build

				          build_environment: "wheel 3.6 cpu"

				          requires:

				            - setup

				      - binary_macos_conda_2.7_cpu_build:

				      - binary_mac_build:

				          name: binary_macos_conda_2_7_cpu_build

				          build_environment: "conda 2.7 cpu"

				          requires:

				            - setup

				      - binary_macos_libtorch_2.7_cpu_build:

				      - binary_mac_build:

				          name: binary_macos_libtorch_2_7_cpu_build

				          build_environment: "libtorch 2.7 cpu"

				          requires:

				            - setup

				      - binary_linux_manywheel_2.7mu_cpu_devtoolset3_test:

				      - binary_linux_test:

				          name: binary_linux_manywheel_2_7mu_cpu_devtoolset7_test

				          build_environment: "manywheel 2.7mu cpu devtoolset7"

				          requires:

				            - setup

				            - binary_linux_manywheel_2.7mu_cpu_devtoolset3_build

				      - binary_linux_manywheel_3.7m_cu100_devtoolset3_test:

				            - binary_linux_manywheel_2_7mu_cpu_devtoolset7_build

				          docker_image: "soumith/manylinux-cuda100"

				      - binary_linux_test:

				          name: binary_linux_manywheel_3_7m_cu100_devtoolset7_test

				          build_environment: "manywheel 3.7m cu100 devtoolset7"

				          requires:

				            - setup

				            - binary_linux_manywheel_3.7m_cu100_devtoolset3_build

				      - binary_linux_conda_2.7_cpu_test:

				            - binary_linux_manywheel_3_7m_cu100_devtoolset7_build

				          docker_image: "soumith/manylinux-cuda100"

				          use_cuda_docker_runtime: "1"

				          resource_class: gpu.medium

				      - binary_linux_test:

				          name: binary_linux_conda_2_7_cpu_devtoolset7_test

				          build_environment: "conda 2.7 cpu devtoolset7"

				          requires:

				            - setup

				            - binary_linux_conda_2.7_cpu_build

				      # This binary build is currently broken, see https://github.com/pytorch/pytorch/issues/16710

				      # - binary_linux_conda_3.6_cu90_test:

				      #     requires:

				      #       - setup

				      #       - binary_linux_conda_3.6_cu90_build

				            - binary_linux_conda_2_7_cpu_devtoolset7_build

				          docker_image: "soumith/conda-cuda"

				      # This binary build is currently broken, see https://github_com/pytorch/pytorch/issues/16710

				      # - binary_linux_conda_3_6_cu90_devtoolset7_test:

				      - binary_linux_test:

				          name: binary_linux_libtorch_2_7m_cpu_devtoolset7_shared-with-deps_test

				          build_environment: "libtorch 2.7m cpu devtoolset7"

				          requires:

				            - setup

				            - binary_linux_libtorch_2_7m_cpu_devtoolset7_shared-with-deps_build

				          libtorch_variant: "shared-with-deps"

				          docker_image: "soumith/manylinux-cuda100"

				      - binary_linux_test:

				          name: binary_linux_libtorch_2_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_test

				          build_environment: "libtorch 2.7m cpu gcc5.4_cxx11-abi"

				          requires:

				            - setup

				            - binary_linux_libtorch_2_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build

				          libtorch_variant: "shared-with-deps"

				          docker_image: "yf225/pytorch-binary-docker-image-ubuntu16.04:latest"

									
										56

.circleci/verbatim-sources/workflows-nightly-android-binary-builds.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,56 @@

				      - pytorch_linux_build:

				          name: nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build

				          build_environment: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-x86_32"

				          requires:

				            - setup

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:347"

				          filters:

				            branches:

				              only: nightly

				      - pytorch_linux_build:

				          name: nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_64_build

				          build_environment: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-x86_64"

				          requires:

				            - setup

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:347"

				          filters:

				            branches:

				              only: nightly

				      - pytorch_linux_build:

				          name: nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v7a_build

				          build_environment: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-arm-v7a"

				          requires:

				            - setup

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:347"

				          filters:

				            branches:

				              only: nightly

				      - pytorch_linux_build:

				          name: nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v8a_build

				          build_environment: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-arm-v8a"

				          requires:

				            - setup

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:347"

				          filters:

				            branches:

				              only: nightly

				      - pytorch_android_gradle_build:

				          name: nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_android_gradle_build

				          requires:

				            - nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build

				            - nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_64_build

				            - nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v7a_build

				            - nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v8a_build

				          filters:

				            branches:

				              only: nightly

				      - pytorch_android_publish_snapshot:

				          name: nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_android_publish_snapshot

				          requires:

				            - nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_android_gradle_build

				          context: org-member

				          filters:

				            branches:

				              only: nightly

									
										31

.circleci/verbatim-sources/workflows-nightly-ios-binary-builds.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,31 @@

				      # Pytorch iOS binary builds

				      - binary_ios_build:

				          name: pytorch_ios_10_2_1_nightly_x86_64_build

				          build_environment: "libtorch-ios-10.2.1-nightly-x86_64-build"

				          ios_platform: "SIMULATOR"

				          ios_arch: "x86_64"

				          requires: 

				            - setup

				          filters:

				            branches:

				              only: nightly

				      - binary_ios_build:

				          name: pytorch_ios_10_2_1_nightly_arm64_build

				          build_environment: "libtorch-ios-10.2.1-nightly-arm64-build"

				          ios_arch: "arm64"

				          ios_platform: "OS"

				          requires: 

				            - setup

				          filters:

				            branches:

				              only: nightly

				      - binary_ios_upload:

				          build_environment: "libtorch-ios-10.2.1-nightly-binary-build-upload"

				          context: org-member

				          requires:

				            - setup

				            - pytorch_ios_10_2_1_nightly_x86_64_build

				            - pytorch_ios_10_2_1_nightly_arm64_build

				          filters:

				            branches:

				              only: nightly

									
										12

.circleci/verbatim-sources/workflows-pytorch-android-gradle-build.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,12 @@

				      - pytorch_android_gradle_build-x86_32:

				          name: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-x86_32

				          requires:

				            - pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build

				      - pytorch_android_gradle_build:

				          name: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build

				          requires:

				            - pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build

				            - pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_64_build

				            - pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v7a_build

				            - pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v8a_build

									
										13

.circleci/verbatim-sources/workflows-pytorch-ios-builds.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,13 @@

				      # Pytorch iOS PR builds

				      - pytorch_ios_build:

				          name: pytorch_ios_10_2_1_x86_64_build

				          build_environment: "pytorch-ios-10.2.1-x86_64_build"

				          ios_platform: "SIMULATOR"

				          requires:

				            - setup

				      - pytorch_ios_build:

				          name: pytorch_ios_10_2_1_arm64_build

				          build_environment: "pytorch-ios-10.2.1-arm64_build"

				          ios_arch: "arm64"

				          requires:

				            - setup

									
										2

.circleci/verbatim-sources/workflows-pytorch-macos-builds.yml
									
												View File
												
				@ -1,3 +1,4 @@

				      # Warning: indentation here matters!

				      # Pytorch MacOS builds

				      - pytorch_macos_10_13_py3_build:

				@ -10,4 +11,3 @@

				      - pytorch_macos_10_13_cuda9_2_cudnn7_py3_build:

				          requires:

				            - setup

									
										19

.circleci/verbatim-sources/workflows-s3-html.yml
									
												View File
												
				@ -6,25 +6,30 @@

				  # These jobs are all idempotent and very lightweight; they just upload html

				  # files that track what binaries are available and what their sizes are.

				  update_s3_htmls:

				    triggers:

				      - schedule:

				          cron: "0 9 * * *"

				    jobs:

				      - setup:

				          filters:

				            branches:

				              only:

				                - master

				    jobs:

				      - setup

				              only: postnightly

				      - update_s3_htmls_for_nightlies:

				          context: org-member

				          requires:

				            - setup

				          filters:

				            branches:

				              only: postnightly

				      - update_s3_htmls_for_nightlies_devtoolset7:

				          context: org-member

				          requires:

				            - setup

				          filters:

				            branches:

				              only: postnightly

				      - upload_binary_sizes:

				          context: org-member

				          requires:

				            - setup

				          filters:

				            branches:

				              only: postnightly

									
										1

.circleci/verbatim-sources/workflows.yml
									
												View File
												
				@ -7,6 +7,5 @@

				# PR jobs pr builds

				workflows:

				  version: 2

				  build:

				    jobs:

1

.clang-tidy

View File

 @ -5,6 +5,7 @@ Checks: '
   ,bugprone-*
   ,-bugprone-forward-declaration-namespace
   ,-bugprone-macro-parentheses
   ,-bugprone-lambda-function-name
   ,cppcoreguidelines-*
   ,-cppcoreguidelines-interfaces-global-init
   ,-cppcoreguidelines-owning-memory

2

.flake8

View File

 @ -5,6 +5,8 @@ max-line-length = 120
 # E501 is not flexible enough, we're using B950 instead
 ignore =
     E203,E305,E402,E501,E721,E741,F403,F405,F821,F841,F999,W503,W504,C408,E302,W291,E303,
     # EXE001 is skipped for now because some files use shebang to determine Python version.
     EXE001,
     # these ignores are from flake8-bugbear; please fix!
     B007,B008,
     # these ignores are from flake8-comprehensions; please fix!

									
										1

.github/pytorch-probot.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1 @@

				tracking_issue: 24422

									
										48

.github/workflows/lint.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,48 @@

				name: Lint

				on:

				  push:

				    branches:

				    - master

				  pull_request:

				jobs:

				  flake8-py3:

				    runs-on: ubuntu-latest

				    steps:

				      - name: Setup Python

				        uses: actions/setup-python@v1

				        with:

				          python-version: 3.7.4

				          architecture: x64

				      - name: Fetch PyTorch

				        uses: actions/checkout@master

				      - name: Checkout PR tip

				        run: |

				          set -eux

				          if [ -z "${GITHUB_HEAD_REF}" ]; then

				            # We are on master, just set the SHA from our current location

				            echo ::set-output name=commit_sha::${GITHUB_SHA}

				          else

				            # We are on a PR, so actions/checkout leaves us on merge commit.

				            # Check out the actual tip of the branch.

				            PR_TIP=$(git rev-parse HEAD^2)

				            git checkout ${PR_TIP}

				            echo ::set-output name=commit_sha::${PR_TIP}

				          fi

				        id: get_pr_tip

				      - name: Run flake8

				        run: |

				          set -eux

				          pip install flake8

				          flake8 --exit-zero > ${GITHUB_WORKSPACE}/flake8-output.txt

				          cat ${GITHUB_WORKSPACE}/flake8-output.txt

				      - name: Add annotations

				        uses: pytorch/add-annotations-github-action@master

				        with:

				          check_name: 'flake8-py3'

				          linter_output_path: 'flake8-output.txt'

				          commit_sha: ${{ steps.get_pr_tip.outputs.commit_sha }}

				          regex: '^(?<filename>.*?):(?<lineNumber>\d+):(?<columnNumber>\d+): (?<errorCode>\w\d+) (?<errorDesc>.*)'

				        env:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

26

.gitignore vendored

View File

 @ -8,6 +8,9 @@
 ## PyTorch
 .coverage
 .gradle
 .hypothesis
 .mypy_cache
 */*.pyc
 */*.so*
 @ -27,14 +30,14 @@ dist/
 docs/src/**/*
 docs/cpp/build
 docs/cpp/source/api
 log
 test/.coverage
 test/.hypothesis/
 test/cpp/api/mnist
 test/custom_operator/model.pt
 test/data/gpu_tensors.pt
 test/data/legacy_modules.t7
 test/data/legacy_serialized.pt
 test/data/linear.pt
 test/data/*.pt
 test/backward_compatibility/new_schemas.txt
 dropout_model.pt
 test/generated_type_hints_smoketest.py
 test/htmlcov
 @ -43,6 +46,8 @@ third_party/build/
 tools/shared/_utils_internal.py
 torch.egg-info/
 torch/__init__.pyi
 torch/nn/functional.pyi
 torch/nn/modules/*.pyi
 torch/csrc/autograd/generated/*
 torch/csrc/cudnn/cuDNN.cpp
 torch/csrc/generated
 @ -85,6 +90,7 @@ torch/version.py
 # Root level file used in CI to specify certain env configs.
 # E.g., see .circleci/config.yaml
 env
 .circleci/scripts/COMMIT_MSG
 # IPython notebook checkpoints
 .ipynb_checkpoints
 @ -220,11 +226,6 @@ caffe2.egg-info
 # Files generated by CLion
 cmake-build-debug
 # Files generated by ctags
 CTAGS
 tags
 TAGS
 # BEGIN NOT-CLEAN-FILES (setup.py handles this marker. Do not change.)
 #
 # Below files are not deleted by "setup.py clean".
 @ -239,3 +240,12 @@ TAGS
 # Files generated when a patch is rejected
 *.orig
 *.rej
 # Files generated by ctags
 CTAGS
 GTAGS
 GRTAGS
 GSYMS
 GPATH
 tags
 TAGS

14

.gitmodules vendored

View File

 @ -21,7 +21,7 @@
 [submodule "third_party/protobuf"]
     ignore = dirty
     path = third_party/protobuf
     url = https://github.com/google/protobuf.git
     url = https://github.com/protocolbuffers/protobuf.git
 [submodule "third_party/ios-cmake"]
     ignore = dirty
     path = third_party/ios-cmake
 @ -57,7 +57,7 @@
 [submodule "third-party/cpuinfo"]
     ignore = dirty
     path = third_party/cpuinfo
     url = https://github.com/Maratyszcza/cpuinfo.git
     url = https://github.com/pytorch/cpuinfo.git
 [submodule "third_party/python-enum"]
     ignore = dirty
     path = third_party/python-enum
 @ -81,7 +81,7 @@
 [submodule "third_party/sleef"]
     ignore = dirty
     path = third_party/sleef
     url = https://github.com/zdevito/sleef
     url = https://github.com/shibatch/sleef
 [submodule "third_party/ideep"]
     ignore = dirty
     path = third_party/ideep
 @ -110,3 +110,11 @@
     ignore = dirty
     path = third_party/foxi
     url = https://github.com/houseroad/foxi.git
 [submodule "third_party/tbb"]
 	path = third_party/tbb
 	url = https://github.com/01org/tbb
 	branch = tbb_2018
 [submodule "android/libs/fbjni"]
     ignore = dirty
     path = android/libs/fbjni
     url = https://github.com/IvanKobzarev/fbjni.git

									
										34

.jenkins/caffe2/bench.sh
									
												View File
												
				@ -17,30 +17,44 @@ fi

				caffe2_pypath="$(cd /usr && $PYTHON -c 'import os; import caffe2; print(os.path.dirname(os.path.realpath(caffe2.__file__)))')"

				# Resnet50

				if (( $num_gpus == 0 )); then

				    "$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --train_data null --batch_size 128 --epoch_size 12800 --num_epochs 2 --use_cpu

				    "$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 128 --epoch_size 12800 --num_epochs 2 --use_cpu

				fi

				if (( $num_gpus >= 1 )); then

				    "$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --train_data null --batch_size 128 --epoch_size 12800 --num_epochs 2 --num_gpus 1

				    "$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --train_data null --batch_size 256 --epoch_size 25600 --num_epochs 2 --num_gpus 1 --float16_compute --dtype float16

				    "$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 128 --epoch_size 12800 --num_epochs 2 --num_gpus 1

				    "$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 256 --epoch_size 25600 --num_epochs 2 --num_gpus 1 --float16_compute --dtype float16

				fi

				if (( $num_gpus >= 2 )); then

				    "$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --train_data null --batch_size 256 --epoch_size 25600 --num_epochs 2 --num_gpus 2

				    "$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 256 --epoch_size 25600 --num_epochs 2 --num_gpus 2

				fi

				if (( $num_gpus >= 4 )); then

				    "$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --train_data null --batch_size 512 --epoch_size 51200 --num_epochs 2 --num_gpus 4

				    "$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 512 --epoch_size 51200 --num_epochs 2 --num_gpus 4

				fi

				# ResNext

				if (( $num_gpus == 0 )); then

				    "$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 32 --epoch_size 3200 --num_epochs 2 --use_cpu

				    "$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 32 --epoch_size 3200 --num_epochs 2 --use_cpu

				fi

				if (( $num_gpus >= 1 )); then

				    "$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 32 --epoch_size 3200 --num_epochs 2 --num_gpus 1

				    "$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 64 --epoch_size 3200 --num_epochs 2 --num_gpus 1 --float16_compute --dtype float16

				    "$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 32 --epoch_size 3200 --num_epochs 2 --num_gpus 1

				    "$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 64 --epoch_size 3200 --num_epochs 2 --num_gpus 1 --float16_compute --dtype float16

				fi

				if (( $num_gpus >= 2 )); then

				    "$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 64 --epoch_size 6400 --num_epochs 2 --num_gpus 2

				    "$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 64 --epoch_size 6400 --num_epochs 2 --num_gpus 2

				fi

				if (( $num_gpus >= 4 )); then

				    "$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 128 --epoch_size 12800 --num_epochs 2 --num_gpus 4

				    "$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 128 --epoch_size 12800 --num_epochs 2 --num_gpus 4

				fi

				# Shufflenet

				if (( $num_gpus == 0 )); then

				    "$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 32 --epoch_size 3200 --num_epochs 2 --use_cpu --model shufflenet

				fi

				if (( $num_gpus >= 1 )); then

				    "$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 32 --epoch_size 3200 --num_epochs 2 --num_gpus 1 --model shufflenet

				fi

				if (( $num_gpus >= 2 )); then

				    "$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 64 --epoch_size 6400 --num_epochs 2 --num_gpus 2 --model shufflenet

				fi

				if (( $num_gpus >= 4 )); then

				    "$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 128 --epoch_size 12800 --num_epochs 2 --num_gpus 4 --model shufflenet

				fi

									
										11

.jenkins/caffe2/build.sh
									
												View File
												
				@ -156,7 +156,9 @@ if [[ $BUILD_ENVIRONMENT == *cuda* ]]; then

				  build_args+=("TORCH_CUDA_ARCH_LIST=Maxwell")

				  # Explicitly set path to NVCC such that the symlink to ccache or sccache is used

				  build_args+=("CUDA_NVCC_EXECUTABLE=${CACHE_WRAPPER_DIR}/nvcc")

				  if [ -n "${CACHE_WRAPPER_DIR}" ]; then

				    build_args+=("CUDA_NVCC_EXECUTABLE=${CACHE_WRAPPER_DIR}/nvcc")

				  fi

				  # Ensure FindCUDA.cmake can infer the right path to the CUDA toolkit.

				  # Setting PATH to resolve to the right nvcc alone isn't enough.

				@ -255,7 +257,7 @@ else

				  # sccache will be stuck if  all cores are used for compiling

				  # see https://github.com/pytorch/pytorch/pull/7361

				  if [[ -n "${SCCACHE}" ]]; then

				  if [[ -n "${SCCACHE}" && $BUILD_ENVIRONMENT != *rocm* ]]; then

				    export MAX_JOBS=`expr $(nproc) - 1`

				  fi

				@ -271,4 +273,9 @@ fi

				# Install ONNX into a local directory

				pip install --user -b /tmp/pip_install_onnx "file://${ROOT_DIR}/third_party/onnx#egg=onnx"

				if [[ $BUILD_ENVIRONMENT == *rocm* ]]; then

				  # runtime compilation of MIOpen kernels manages to crash sccache - hence undo the wrapping

				  bash tools/amd_build/unwrap_clang.sh

				fi

				report_compile_cache_stats

									
										19

.jenkins/caffe2/test.sh
									
												View File
												
				@ -90,9 +90,6 @@ rocm_ignore_test=()

				if [[ $BUILD_ENVIRONMENT == *-rocm* ]]; then

				  # Currently these tests are failing on ROCM platform:

				  # Unknown reasons, need to debug

				  rocm_ignore_test+=("--ignore $caffe2_pypath/python/operator_test/piecewise_linear_transform_test.py")

				  # On ROCm, RCCL (distributed) development isn't complete.

				  # https://github.com/ROCmSoftwarePlatform/rccl

				  rocm_ignore_test+=("--ignore $caffe2_pypath/python/data_parallel_model_test.py")

				@ -101,6 +98,12 @@ fi

				# NB: Warnings are disabled because they make it harder to see what

				# the actual erroring test is

				echo "Running Python tests.."

				if [[ "$BUILD_ENVIRONMENT" == *py3* ]]; then

				  # locale setting is required by click package with py3

				  export LC_ALL=C.UTF-8

				  export LANG=C.UTF-8

				fi

				pip install --user pytest-sugar

				"$PYTHON" \

				  -m pytest \

				@ -121,5 +124,15 @@ pip install --user pytest-sugar

				#####################

				if [[ "$BUILD_ENVIRONMENT" == *onnx* ]]; then

				  pip install -q --user git+https://github.com/pytorch/vision.git

				  pip install -q --user ninja

				  # JIT C++ extensions require ninja, so put it into PATH.

				  export PATH="/var/lib/jenkins/.local/bin:$PATH"

				  if [[ "$BUILD_ENVIRONMENT" == *py3* ]]; then

				    # default pip version is too old(9.0.2), unable to support tag `manylinux2010`.

				    # Fix the pip error: Couldn't find a version that satisfies the requirement

				    sudo pip install --upgrade pip

				    pip install -q --user -i https://test.pypi.org/simple/ ort-nightly==0.5.0.dev905

				  fi

				  "$ROOT_DIR/scripts/onnx/test.sh"

				fi

									
										2

.jenkins/pytorch/build-asan.sh
									
												View File
												
				@ -33,7 +33,7 @@ export ASAN_OPTIONS=detect_leaks=0:symbolize=1

				CC="clang" CXX="clang++" LDSHARED="clang --shared" \

				  CFLAGS="-fsanitize=address -fsanitize=undefined -fno-sanitize-recover=all -shared-libasan -pthread" \

				  CXX_FLAGS="-pthread" \

				  USE_ASAN=1 NO_CUDA=1 USE_MKLDNN=0 \

				  USE_ASAN=1 USE_CUDA=0 USE_MKLDNN=0 \

				  python setup.py install

				assert_git_not_dirty

									
										47

.jenkins/pytorch/build.sh
									
												View File
												
				@ -32,7 +32,7 @@ if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9*gcc7* ]] || [[ "$BUILD_ENVIRONMENT"

				  sudo mkdir -p /var/run/sshd

				fi

				if [[ "$BUILD_ENVIRONMENT" == *pytorch-linux-xenial-py3-clang5-asan* ]]; then

				if [[ "$BUILD_ENVIRONMENT" == *-linux-xenial-py3-clang5-asan* ]]; then

				  exec "$(dirname "${BASH_SOURCE[0]}")/build-asan.sh" "$@"

				fi

				@ -46,7 +46,7 @@ echo "CMake version:"

				cmake --version

				# TODO: Don't run this...

				pip install -q -r requirements.txt || true

				pip_install -r requirements.txt || true

				# TODO: Don't install this here

				if ! which conda; then

				@ -54,7 +54,7 @@ if ! which conda; then

				  # intel cpu and later run tests on machines with amd cpu.

				  # Also leave out two builds to make sure non-mkldnn builds still work.

				  if [[ "$BUILD_ENVIRONMENT" != *rocm* && "$BUILD_ENVIRONMENT" != *-trusty-py3.5-* && "$BUILD_ENVIRONMENT" != *-xenial-cuda9-cudnn7-py3-* ]]; then

				    pip install -q mkl mkl-devel

				    pip_install mkl mkl-devel

				    export USE_MKLDNN=1

				  else

				    export USE_MKLDNN=0

				@ -65,9 +65,16 @@ fi

				if [[ "${BUILD_ENVIRONMENT}" == *-android* ]]; then

				  export ANDROID_NDK=/opt/ndk

				  build_args=()

				  build_args+=("-DBUILD_CAFFE2_MOBILE=OFF")

				  build_args+=("-DCMAKE_PREFIX_PATH=$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())')")

				  build_args+=("-DPYTHON_EXECUTABLE=$(python -c 'import sys; print(sys.executable)')")

				  if [[ "${BUILD_ENVIRONMENT}" == *-arm-v7a* ]]; then

				    build_args+=("-DANDROID_ABI=armeabi-v7a")

				  elif [[ "${BUILD_ENVIRONMENT}" == *-arm-v8a* ]]; then

				    build_args+=("-DANDROID_ABI=arm64-v8a")

				  elif [[ "${BUILD_ENVIRONMENT}" == *-x86_32* ]]; then

				    build_args+=("-DANDROID_ABI=x86")

				  elif [[ "${BUILD_ENVIRONMENT}" == *-x86_64* ]]; then

				    build_args+=("-DANDROID_ABI=x86_64")

				  fi

				  export BUILD_PYTORCH_MOBILE=1

				  exec ./scripts/build_android.sh "${build_args[@]}" "$@"

				fi

				@ -90,7 +97,7 @@ if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then

				    fi

				    # Setup wrapper scripts

				    for compiler in cc c++ gcc g++; do

				    for compiler in cc c++ gcc g++ clang clang++; do

				      (

				        echo "#!/bin/sh"

				        echo "exec $SCCACHE $(which $compiler) \"\$@\""

				@ -108,6 +115,10 @@ if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then

				  # OPENCV is needed to enable ImageInput operator in caffe2 resnet5_trainer

				  # LMDB is needed to read datasets from https://download.caffe2.ai/databases/resnet_trainer.zip

				  USE_ROCM=1 USE_LMDB=1 USE_OPENCV=1 python setup.py install --user

				  # runtime compilation of MIOpen kernels manages to crash sccache - hence undo the wrapping

				  bash tools/amd_build/unwrap_clang.sh

				  exit 0

				fi

				@ -126,7 +137,7 @@ if [[ "$BUILD_ENVIRONMENT" == *ppc64le* ]]; then

				  export TORCH_CUDA_ARCH_LIST="6.0"

				fi

				if [[ "$BUILD_ENVIRONMENT" == *trusty-py3.6-gcc5.4* ]]; then

				if [[ "$BUILD_ENVIRONMENT" == *xenial-py3.6-gcc5.4* ]]; then

				  export DEBUG=1

				fi

				@ -136,6 +147,11 @@ if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then

				  ./xla/scripts/apply_patches.sh

				fi

				if [[ "${BUILD_ENVIRONMENT}" == *clang* ]]; then

				  export CC=clang

				  export CXX=clang++

				fi

				# check that setup.py would fail with bad arguments

				echo "The next three invocations are expected to fail with invalid command error messages."

				@ -146,19 +162,24 @@ echo "The next three invocations are expected to fail with invalid command error

				# ppc64le build fails when WERROR=1

				# set only when building other architectures

				# only use for "python setup.py install" line

				if [[ "$BUILD_ENVIRONMENT" != *ppc64le* ]]; then

				if [[ "$BUILD_ENVIRONMENT" != *ppc64le*  && "$BUILD_ENVIRONMENT" != *clang* ]]; then

				  WERROR=1 python setup.py install

				else

				  python setup.py install

				fi

				if which sccache > /dev/null; then

				  echo 'PyTorch Build Statistics'

				  sccache --show-stats

				fi

				assert_git_not_dirty

				# Test documentation build

				if [[ "$BUILD_ENVIRONMENT" == *xenial-cuda9-cudnn7-py3* ]]; then

				  pushd docs

				  # TODO: Don't run this here

				  pip install -q -r requirements.txt || true

				  pip_install -r requirements.txt || true

				  LC_ALL=C make html

				  popd

				  assert_git_not_dirty

				@ -201,10 +222,10 @@ fi

				if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then

				  # TODO: Move this to Dockerfile.

				  pip install -q lark-parser

				  pip_install lark-parser

				  # Bazel doesn't work with sccache gcc. https://github.com/bazelbuild/bazel/issues/3642

				  sudo add-apt-repository "deb http://apt.llvm.org/trusty/ llvm-toolchain-trusty-7 main"

				  sudo add-apt-repository "deb http://apt.llvm.org/xenial/ llvm-toolchain-xenial-7 main"

				  wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key|sudo apt-key add -

				  sudo apt-get -qq update

				@ -235,7 +256,7 @@ if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then

				    exit 1

				  fi

				  bazels3cache --bucket=ossci-compiler-cache-circleci-xla --maxEntrySizeBytes=0

				  bazels3cache --bucket=${XLA_CLANG_CACHE_S3_BUCKET_NAME} --maxEntrySizeBytes=0

				  pushd xla

				  export CC=clang-7 CXX=clang++-7

				  # Use cloud cache to build when available.

									
										24

.jenkins/pytorch/common.sh
									
												View File
												
				@ -17,9 +17,21 @@ function cleanup {

				set -ex

				# Save the SCRIPT_DIR absolute path in case later we chdir (as occurs in the gpu perf test)

				SCRIPT_DIR="$( cd "$(dirname "${BASH_SOURCE[0]}")" ; pwd -P )"

				# Required environment variables:

				#   $BUILD_ENVIRONMENT (should be set by your Docker image)

				# Figure out which Python to use for ROCm

				if [[ "${BUILD_ENVIRONMENT}" == *rocm* ]] && [[ "${BUILD_ENVIRONMENT}" =~ py((2|3)\.?[0-9]?\.?[0-9]?) ]]; then

				  PYTHON=$(which "python${BASH_REMATCH[1]}")

				  # non-interactive bashs do not expand aliases by default

				  shopt -s expand_aliases

				  export PYTORCH_TEST_WITH_ROCM=1

				  alias python="$PYTHON"

				fi

				# This token is used by a parser on Jenkins logs for determining

				# if a failure is a legitimate problem, or a problem with the build

				# system; to find out more, grep for this string in ossci-job-dsl.

				@ -89,7 +101,7 @@ if which sccache > /dev/null; then

				  sccache --zero-stats

				  function sccache_epilogue() {

				    echo '=================== sccache compilation log ==================='

				    python "$(dirname "${BASH_SOURCE[0]}")/print_sccache_log.py" ~/sccache_error.log

				    python "$SCRIPT_DIR/print_sccache_log.py" ~/sccache_error.log 2>/dev/null

				    echo '=========== If your build fails, please take a look at the log above for possible reasons ==========='

				    sccache --show-stats

				    sccache --stop-server || true

				@ -144,6 +156,16 @@ if [[ "$BUILD_ENVIRONMENT" == *pytorch-linux-xenial-cuda* ]]; then

				  fi

				fi

				function pip_install() {

				  # retry 3 times

				  pip install --progress-bar off "$@" || pip install --progress-bar off "$@" || pip install --progress-bar off "$@"

				}

				function pip_uninstall() {

				  # uninstall 2 times

				  pip uninstall -y "$@" || pip uninstall -y "$@"

				}

				function get_exit_code() {

				  set +e

				  "$@"

									
										50

.jenkins/pytorch/macos-build.sh
									
												View File
												
				@ -1,27 +1,11 @@

				#!/bin/bash

				# shellcheck disable=SC2034

				COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}"

				export PATH="/usr/local/bin:$PATH"

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				# Set up conda environment

				export PYTORCH_ENV_DIR="${HOME}/pytorch-ci-env"

				# If a local installation of conda doesn't exist, we download and install conda

				if [ ! -d "${PYTORCH_ENV_DIR}/miniconda3" ]; then

				  mkdir -p ${PYTORCH_ENV_DIR}

				  curl https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -o ${PYTORCH_ENV_DIR}/miniconda3.sh

				  bash ${PYTORCH_ENV_DIR}/miniconda3.sh -b -p ${PYTORCH_ENV_DIR}/miniconda3

				fi

				export PATH="${PYTORCH_ENV_DIR}/miniconda3/bin:$PATH"

				source ${PYTORCH_ENV_DIR}/miniconda3/bin/activate

				conda install -y mkl mkl-include numpy pyyaml setuptools cmake cffi ninja

				rm -rf ${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages/torch*

				source "$(dirname "${BASH_SOURCE[0]}")/macos-common.sh"

				git submodule sync --recursive

				git submodule update --init --recursive

				export CMAKE_PREFIX_PATH=${PYTORCH_ENV_DIR}/miniconda3/

				export CMAKE_PREFIX_PATH=${WORKSPACE_DIR}/miniconda3/

				# Build PyTorch

				if [[ "${BUILD_ENVIRONMENT}" == *cuda9.2* ]]; then

				@ -30,7 +14,7 @@ if [[ "${BUILD_ENVIRONMENT}" == *cuda9.2* ]]; then

				  export PATH=/Developer/NVIDIA/CUDA-${CUDA_VERSION}/bin${PATH:+:${PATH}}

				  export DYLD_LIBRARY_PATH=/Developer/NVIDIA/CUDA-${CUDA_VERSION}/lib${DYLD_LIBRARY_PATH:+:${DYLD_LIBRARY_PATH}}

				  export CUDA_HOME=/Developer/NVIDIA/CUDA-${CUDA_VERSION}

				  export NO_CUDA=0

				  export USE_CUDA=1

				  if [ -z "${IN_CIRCLECI}" ]; then

				    # Eigen gives "explicit specialization of class must precede its first use" error

				@ -43,35 +27,29 @@ else

				  fi

				fi

				export MACOSX_DEPLOYMENT_TARGET=10.9

				export CXX=clang++

				export CC=clang

				if which sccache > /dev/null; then

				  printf "#!/bin/sh\nexec sccache $(which clang++) \$*" > "${PYTORCH_ENV_DIR}/clang++"

				  chmod a+x "${PYTORCH_ENV_DIR}/clang++"

				  printf "#!/bin/sh\nexec sccache $(which clang++) \$*" > "${WORKSPACE_DIR}/clang++"

				  chmod a+x "${WORKSPACE_DIR}/clang++"

				  printf "#!/bin/sh\nexec sccache $(which clang) \$*" > "${PYTORCH_ENV_DIR}/clang"

				  chmod a+x "${PYTORCH_ENV_DIR}/clang"

				  printf "#!/bin/sh\nexec sccache $(which clang) \$*" > "${WORKSPACE_DIR}/clang"

				  chmod a+x "${WORKSPACE_DIR}/clang"

				  if [[ "${BUILD_ENVIRONMENT}" == *cuda* ]]; then

				    printf "#!/bin/sh\nexec sccache $(which nvcc) \$*" > "${PYTORCH_ENV_DIR}/nvcc"

				    chmod a+x "${PYTORCH_ENV_DIR}/nvcc"

				    export CUDA_NVCC_EXECUTABLE="${PYTORCH_ENV_DIR}/nvcc"

				    printf "#!/bin/sh\nexec sccache $(which nvcc) \$*" > "${WORKSPACE_DIR}/nvcc"

				    chmod a+x "${WORKSPACE_DIR}/nvcc"

				    export CUDA_NVCC_EXECUTABLE="${WORKSPACE_DIR}/nvcc"

				  fi

				  export PATH="${PYTORCH_ENV_DIR}:$PATH"

				  export PATH="${WORKSPACE_DIR}:$PATH"

				fi

				# If we run too many parallel jobs, we will OOM

				export MAX_JOBS=2

				export IMAGE_COMMIT_TAG=${BUILD_ENVIRONMENT}-${IMAGE_COMMIT_ID}

				python setup.py install

				MAX_JOBS=2 USE_DISTRIBUTED=1 python setup.py install

				assert_git_not_dirty

				# Upload torch binaries when the build job is finished

				if [ -z "${IN_CIRCLECI}" ]; then

				  7z a ${IMAGE_COMMIT_TAG}.7z ${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages/torch*

				  7z a ${IMAGE_COMMIT_TAG}.7z ${WORKSPACE_DIR}/miniconda3/lib/python3.6/site-packages/torch*

				  aws s3 cp ${IMAGE_COMMIT_TAG}.7z s3://ossci-macos-build/pytorch/${IMAGE_COMMIT_TAG}.7z --acl public-read

				fi

									
										48

.jenkins/pytorch/macos-common.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,48 @@

				#!/bin/bash

				# Common prelude for macos-build.sh and macos-test.sh

				# shellcheck disable=SC2034

				COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}"

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				export PATH="/usr/local/bin:$PATH"

				export WORKSPACE_DIR="${HOME}/workspace"

				mkdir -p ${WORKSPACE_DIR}

				# If a local installation of conda doesn't exist, we download and install conda

				if [ ! -d "${WORKSPACE_DIR}/miniconda3" ]; then

				  mkdir -p ${WORKSPACE_DIR}

				  curl https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -o ${WORKSPACE_DIR}/miniconda3.sh

				  bash ${WORKSPACE_DIR}/miniconda3.sh -b -p ${WORKSPACE_DIR}/miniconda3

				fi

				export PATH="${WORKSPACE_DIR}/miniconda3/bin:$PATH"

				source ${WORKSPACE_DIR}/miniconda3/bin/activate

				conda install -y mkl mkl-include numpy pyyaml setuptools cmake cffi ninja

				# The torch.hub tests make requests to GitHub.

				#

				# The certifi package from conda-forge is new enough to make the

				# following error disappear (included for future reference):

				#

				# > ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED]

				# > certificate verify failed: unable to get local issuer certificate

				# > (_ssl.c:1056)

				#

				conda install -y -c conda-forge certifi

				# Needed by torchvision, which is imported from TestHub in test_utils.py.

				conda install -y pillow

				# Building with USE_DISTRIBUTED=1 requires libuv (for Gloo).

				conda install -y libuv pkg-config

				# Image commit tag is used to persist the build from the build job

				# and to retrieve the build from the test job.

				export IMAGE_COMMIT_TAG=${BUILD_ENVIRONMENT}-${IMAGE_COMMIT_ID}

				# These are required for both the build job and the test job.

				# In the latter to test cpp extensions.

				export MACOSX_DEPLOYMENT_TARGET=10.9

				export CXX=clang++

				export CC=clang

									
										38

.jenkins/pytorch/macos-test.sh
									
												View File
												
				@ -1,23 +1,9 @@

				#!/bin/bash

				# shellcheck disable=SC2034

				COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}"

				source "$(dirname "${BASH_SOURCE[0]}")/macos-common.sh"

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				export PATH="/usr/local/bin:$PATH"

				# Set up conda environment

				export PYTORCH_ENV_DIR="${HOME}/pytorch-ci-env"

				# If a local installation of conda doesn't exist, we download and install conda

				if [ ! -d "${PYTORCH_ENV_DIR}/miniconda3" ]; then

				  mkdir -p ${PYTORCH_ENV_DIR}

				  curl https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -o ${PYTORCH_ENV_DIR}/miniconda3.sh

				  bash ${PYTORCH_ENV_DIR}/miniconda3.sh -b -p ${PYTORCH_ENV_DIR}/miniconda3

				fi

				export PATH="${PYTORCH_ENV_DIR}/miniconda3/bin:$PATH"

				source ${PYTORCH_ENV_DIR}/miniconda3/bin/activate

				conda install -y mkl mkl-include numpy pyyaml setuptools cmake cffi ninja six

				conda install -y six

				pip install -q hypothesis "librosa>=0.6.2" psutil

				# faulthandler become built-in since 3.3

				@ -26,12 +12,12 @@ if [[ ! $(python -c "import sys; print(int(sys.version_info >= (3, 3)))") == "1"

				fi

				if [ -z "${IN_CIRCLECI}" ]; then

				  rm -rf ${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages/torch*

				  rm -rf ${WORKSPACE_DIR}/miniconda3/lib/python3.6/site-packages/torch*

				fi

				git submodule sync --recursive

				git submodule update --init --recursive

				export CMAKE_PREFIX_PATH=${PYTORCH_ENV_DIR}/miniconda3/

				export CMAKE_PREFIX_PATH=${WORKSPACE_DIR}/miniconda3/

				# Test PyTorch

				if [ -z "${IN_CIRCLECI}" ]; then

				@ -43,19 +29,12 @@ if [ -z "${IN_CIRCLECI}" ]; then

				    export DEVELOPER_DIR=/Applications/Xcode9.app/Contents/Developer

				  fi

				fi

				export MACOSX_DEPLOYMENT_TARGET=10.9

				export CXX=clang++

				export CC=clang

				# If we run too many parallel jobs, we will OOM

				export MAX_JOBS=2

				export IMAGE_COMMIT_TAG=${BUILD_ENVIRONMENT}-${IMAGE_COMMIT_ID}

				# Download torch binaries in the test jobs

				if [ -z "${IN_CIRCLECI}" ]; then

				  rm -rf ${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages/torch*

				  rm -rf ${WORKSPACE_DIR}/miniconda3/lib/python3.6/site-packages/torch*

				  aws s3 cp s3://ossci-macos-build/pytorch/${IMAGE_COMMIT_TAG}.7z ${IMAGE_COMMIT_TAG}.7z

				  7z x ${IMAGE_COMMIT_TAG}.7z -o"${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages"

				  7z x ${IMAGE_COMMIT_TAG}.7z -o"${WORKSPACE_DIR}/miniconda3/lib/python3.6/site-packages"

				fi

				# Test that OpenMP is enabled

				@ -67,6 +46,10 @@ fi

				popd

				test_python_all() {

				  # The CircleCI worker hostname doesn't resolve to an address.

				  # This environment variable makes ProcessGroupGloo default to

				  # using the address associated with the loopback interface.

				  export GLOO_SOCKET_IFNAME=lo0

				  echo "Ninja version: $(ninja --version)"

				  python test/run_test.py --verbose

				  assert_git_not_dirty

				@ -116,6 +99,7 @@ test_custom_script_ops() {

				  # Run tests Python-side and export a script module.

				  python test_custom_ops.py -v

				  python test_custom_classes.py -v

				  python model.py --export-script-module=model.pt

				  # Run tests C++-side and load the exported script module.

				  build/test_custom_ops ./model.pt

									
										4

.jenkins/pytorch/multigpu-test.sh
									
												View File
												
				@ -27,5 +27,9 @@ if [ -n "${IN_CIRCLECI}" ]; then

				  fi

				fi

				python tools/download_mnist.py --quiet -d test/cpp/api/mnist

				OMP_NUM_THREADS=2 TORCH_CPP_TEST_MNIST_PATH="test/cpp/api/mnist" "$PWD/../cpp-build"/caffe2/build/bin/test_api

				time python test/run_test.py --verbose -i distributed

				time python test/run_test.py --verbose -i c10d

				time python test/run_test.py --verbose -i c10d_spawn

				assert_git_not_dirty

									
										83

.jenkins/pytorch/test.sh
									
												View File
												
				@ -35,30 +35,30 @@ fi

				# --user breaks ppc64le builds and these packages are already in ppc64le docker

				if [[ "$BUILD_ENVIRONMENT" != *ppc64le* ]]; then

				  # JIT C++ extensions require ninja.

				  pip install -q ninja --user

				  pip_install --user ninja

				  # ninja is installed in /var/lib/jenkins/.local/bin

				  export PATH="/var/lib/jenkins/.local/bin:$PATH"

				  # TODO: move this to Docker

				  pip install -q hypothesis --user

				  pip_install --user hypothesis

				  # TODO: move this to Docker

				  PYTHON_VERSION=$(python -c 'import platform; print(platform.python_version())'|cut -c1)

				  echo $PYTHON_VERSION

				  if [[ $PYTHON_VERSION == "2" ]]; then

				    pip install -q https://s3.amazonaws.com/ossci-linux/wheels/tensorboard-1.14.0a0-py2-none-any.whl --user

				  else

				    pip install -q https://s3.amazonaws.com/ossci-linux/wheels/tensorboard-1.14.0a0-py3-none-any.whl --user

				  fi

				  # if [[ $PYTHON_VERSION == "2" ]]; then

				  #   pip_install --user https://s3.amazonaws.com/ossci-linux/wheels/tensorboard-1.14.0a0-py2-none-any.whl

				  # else

				  #   pip_install --user https://s3.amazonaws.com/ossci-linux/wheels/tensorboard-1.14.0a0-py3-none-any.whl

				  # fi

				  pip_install --user tb-nightly

				  # mypy will fail to install on Python <3.4.  In that case,

				  # we just won't run these tests.

				  pip install mypy --user || true

				  pip_install --user mypy || true

				fi

				# faulthandler become built-in since 3.3

				if [[ ! $(python -c "import sys; print(int(sys.version_info >= (3, 3)))") == "1" ]]; then

				  pip install -q faulthandler --user

				  pip_install --user faulthandler

				fi

				# DANGER WILL ROBINSON.  The LD_PRELOAD here could cause you problems

				@ -92,10 +92,6 @@ if [[ "$BUILD_ENVIRONMENT" == *asan* ]]; then

				    (cd test && ! get_exit_code python -c "import torch; torch._C._crash_if_aten_asan(3)")

				fi

				if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then

				  export PYTORCH_TEST_WITH_ROCM=1

				fi

				if [[ "${BUILD_ENVIRONMENT}" == *-NO_AVX-* ]]; then

				  export ATEN_CPU_CAPABILITY=default

				elif [[ "${BUILD_ENVIRONMENT}" == *-NO_AVX2-* ]]; then

				@ -108,7 +104,7 @@ test_python_nn() {

				}

				test_python_all_except_nn() {

				  time python test/run_test.py --exclude nn --verbose

				  time python test/run_test.py --exclude nn --verbose --bring-to-front quantization quantized quantized_tensor quantized_nn_mods quantizer

				  assert_git_not_dirty

				}

				@ -138,22 +134,7 @@ test_aten() {

				}

				test_torchvision() {

				  rm -rf ninja

				  echo "Installing torchvision at branch master"

				  rm -rf vision

				  # TODO: This git clone is bad, it means pushes to torchvision can break

				  # PyTorch CI

				  git clone https://github.com/pytorch/vision --quiet

				  pushd vision

				  # python setup.py install with a tqdm dependency is broken in the

				  # Travis Python nightly (but not in latest Python nightlies, so

				  # this should be a transient requirement...)

				  # See https://github.com/pytorch/pytorch/issues/7525

				  #time python setup.py install

				  pip install -q --user .

				  popd

				  rm -rf vision

				  pip_install --user git+https://github.com/pytorch/vision.git@2b73a4846773a670632b29fb2fc2ac57df7bce5d

				}

				test_libtorch() {

				@ -162,13 +143,13 @@ test_libtorch() {

				    python test/cpp/jit/tests_setup.py setup

				    CPP_BUILD="$PWD/../cpp-build"

				    if [[ "$BUILD_ENVIRONMENT" == *cuda* ]]; then

				      "$CPP_BUILD"/caffe2/bin/test_jit

				      "$CPP_BUILD"/caffe2/build/bin/test_jit

				    else

				      "$CPP_BUILD"/caffe2/bin/test_jit "[cpu]"

				      "$CPP_BUILD"/caffe2/build/bin/test_jit "[cpu]"

				    fi

				    python test/cpp/jit/tests_setup.py shutdown

				    python tools/download_mnist.py --quiet -d test/cpp/api/mnist

				    OMP_NUM_THREADS=2 TORCH_CPP_TEST_MNIST_PATH="test/cpp/api/mnist" "$CPP_BUILD"/caffe2/bin/test_api

				    OMP_NUM_THREADS=2 TORCH_CPP_TEST_MNIST_PATH="test/cpp/api/mnist" "$CPP_BUILD"/caffe2/build/bin/test_api

				    assert_git_not_dirty

				  fi

				}

				@ -181,6 +162,7 @@ test_custom_script_ops() {

				    cp -a "$CUSTOM_OP_BUILD" build

				    # Run tests Python-side and export a script module.

				    python test_custom_ops.py -v

				    python test_custom_classes.py -v

				    python model.py --export-script-module=model.pt

				    # Run tests C++-side and load the exported script module.

				    build/test_custom_ops ./model.pt

				@ -193,22 +175,47 @@ test_xla() {

				  export XLA_USE_XRT=1 XRT_DEVICE_MAP="CPU:0;/job:localservice/replica:0/task:0/device:XLA_CPU:0"

				  export XRT_WORKERS="localservice:0;grpc://localhost:40934"

				  pushd xla

				  python test/test_operations.py

				  echo "Running Python Tests"

				  ./test/run_tests.sh

				  echo "Running MNIST Test"

				  python test/test_train_mnist.py --tidy

				  echo "Running C++ Tests"

				  pushd test/cpp

				  CC=clang-7 CXX=clang++-7 ./run_tests.sh

				  popd

				  assert_git_not_dirty

				}

				# Do NOT run this test before any other tests, like test_python_nn, etc.

				# Because this function uninstalls the torch built from branch, and install

				# nightly version.

				test_backward_compatibility() {

				  set -x

				  pushd test/backward_compatibility

				  python dump_all_function_schemas.py --filename new_schemas.txt

				  pip_uninstall torch

				  pip_install --pre torch -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html

				  python check_backward_compatibility.py --new-schemas new_schemas.txt

				  popd

				  set +x

				  assert_git_not_dirty

				}

				(cd test && python -c "import torch; print(torch.__config__.show())")

				(cd test && python -c "import torch; print(torch.__config__.parallel_info())")

				if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then

				if [[ "${BUILD_ENVIRONMENT}" == *backward* ]]; then

				  test_backward_compatibility

				  # Do NOT add tests after bc check tests, see its comment.

				elif [[ "${BUILD_ENVIRONMENT}" == *xla* || "${JOB_BASE_NAME}" == *xla* ]]; then

				  test_torchvision

				  test_xla

				elif [[ "${BUILD_ENVIRONMENT}" == *-test1 ]]; then

				elif [[ "${BUILD_ENVIRONMENT}" == *-test1 || "${JOB_BASE_NAME}" == *-test1 ]]; then

				  test_torchvision

				  test_python_nn

				elif [[ "${BUILD_ENVIRONMENT}" == *-test2 ]]; then

				elif [[ "${BUILD_ENVIRONMENT}" == *-test2 || "${JOB_BASE_NAME}" == *-test2 ]]; then

				  test_python_all_except_nn

				  test_aten

				  test_libtorch

									
										21

.jenkins/pytorch/win-test-helpers/build_pytorch.bat
									
												View File
												
				@ -6,6 +6,12 @@ if "%DEBUG%" == "1" (

				set PATH=C:\Program Files\CMake\bin;C:\Program Files\7-Zip;C:\ProgramData\chocolatey\bin;C:\Program Files\Git\cmd;C:\Program Files\Amazon\AWSCLI;%PATH%

				:: This inflates our log size slightly, but it is REALLY useful to be

				:: able to see what our cl.exe commands are (since you can actually

				:: just copy-paste them into a local Windows setup to just rebuild a

				:: single file.)

				set CMAKE_VERBOSE_MAKEFILE=1

				set INSTALLER_DIR=%SCRIPT_HELPERS_DIR%\installation-helpers

				@ -69,16 +75,26 @@ set CXX=sccache cl

				set CMAKE_GENERATOR=Ninja

				:: The following code will try to build PyTorch twice if USE_CUDA is neither 0

				:: nor 1. It is intended so that both builds can be folded into 1 CI run.

				if not "%USE_CUDA%"=="1" (

				  if "%REBUILD%"=="" (

				    set NO_CUDA=1

				    :: Must save and restore the original value of USE_CUDA, otherwise the

				    :: `if not "%USE_CUDA%"=="0"` line can be messed up.

				    set OLD_USE_CUDA=%USE_CUDA%

				    set USE_CUDA=0

				    python setup.py install

				    set USE_CUDA=%OLD_USE_CUDA%

				  )

				  if errorlevel 1 exit /b 1

				  if not errorlevel 0 exit /b 1

				)

				if not "%USE_CUDA%"=="0" (

				  :: sccache will fail for CUDA builds if all cores are used for compiling

				  if not defined MAX_JOBS set /A MAX_JOBS=%NUMBER_OF_PROCESSORS%-1

				  if "%REBUILD%"=="" (

				    sccache --show-stats

				    sccache --zero-stats

				@ -93,13 +109,12 @@ if not "%USE_CUDA%"=="0" (

				  set CUDA_NVCC_EXECUTABLE=%TMP_DIR_WIN%\bin\nvcc

				  if "%REBUILD%"=="" set NO_CUDA=0

				  if "%REBUILD%"=="" set USE_CUDA=1

				  python setup.py install --cmake && sccache --show-stats && (

				    if "%BUILD_ENVIRONMENT%"=="" (

				      echo NOTE: To run `import torch`, please make sure to activate the conda environment by running `call %CONDA_PARENT_DIR%\Miniconda3\Scripts\activate.bat %CONDA_PARENT_DIR%\Miniconda3` in Command Prompt before running Git Bash.

				    ) else (

				      mv %CD%\build\bin\test_api.exe %CONDA_PARENT_DIR%\Miniconda3\Lib\site-packages\torch\lib

				      7z a %TMP_DIR_WIN%\%IMAGE_COMMIT_TAG%.7z %CONDA_PARENT_DIR%\Miniconda3\Lib\site-packages\torch %CONDA_PARENT_DIR%\Miniconda3\Lib\site-packages\caffe2 && python %SCRIPT_HELPERS_DIR%\upload_image.py %TMP_DIR_WIN%\%IMAGE_COMMIT_TAG%.7z

				    )

				  )

									
										2

.jenkins/pytorch/win-test-helpers/download_image.py
									
												View File
												
				@ -1,3 +1,5 @@

				#!/usr/bin/env python

				import os

				import sys

				import boto3

									
										4

.jenkins/pytorch/win-test-helpers/installation-helpers/install_mkl.bat
									
												View File
												
				@ -1,8 +1,8 @@

				if "%REBUILD%"=="" (

				  if "%BUILD_ENVIRONMENT%"=="" (

				    curl -k https://s3.amazonaws.com/ossci-windows/mkl_2018.2.185.7z --output %TMP_DIR_WIN%\mkl.7z

				    curl -k https://s3.amazonaws.com/ossci-windows/mkl_2019.4.245.7z --output %TMP_DIR_WIN%\mkl.7z

				  ) else (

				    aws s3 cp s3://ossci-windows/mkl_2018.2.185.7z %TMP_DIR_WIN%\mkl.7z --quiet

				    aws s3 cp s3://ossci-windows/mkl_2019.4.245.7z %TMP_DIR_WIN%\mkl.7z --quiet

				  )

				  7z x -aoa %TMP_DIR_WIN%\mkl.7z -o%TMP_DIR_WIN%\mkl

				)

									
										5

.jenkins/pytorch/win-test-helpers/setup_pytorch_env.bat
									
												View File
												
				@ -19,9 +19,10 @@ if NOT "%BUILD_ENVIRONMENT%"=="" (

				call %CONDA_PARENT_DIR%\Miniconda3\Scripts\activate.bat %CONDA_PARENT_DIR%\Miniconda3

				if NOT "%BUILD_ENVIRONMENT%"=="" (

				    :: We have to pin Python version to 3.6.7, until mkl supports Python 3.7

				    call conda install -y -q python=3.6.7 numpy mkl cffi pyyaml boto3 protobuf numba

				    :: Numba is pinned to 0.44.0 to avoid https://github.com/numba/numba/issues/4352

				    call conda install -y -q python=3.6.7 numpy mkl cffi pyyaml boto3 protobuf numba==0.44.0

				)

				pip install -q ninja future hypothesis "librosa>=0.6.2" psutil

				pip install -q ninja future hypothesis "librosa>=0.6.2" psutil pillow

				:: No need to install faulthandler since we only test Python >= 3.6 on Windows

				:: faulthandler is builtin since Python 3.3

									
										11

.jenkins/pytorch/win-test-helpers/test_custom_script_ops.bat
									
												View File
												
				@ -1,5 +1,6 @@

				call %SCRIPT_HELPERS_DIR%\setup_pytorch_env.bat

				git submodule update --init --recursive third_party/pybind11

				cd test\custom_operator

				:: Build the custom operator library.

				@ -23,8 +24,18 @@ popd

				:: Run tests Python-side and export a script module.

				python test_custom_ops.py -v

				if ERRORLEVEL 1 exit /b 1

				:: TODO: fix and re-enable this test

				:: See https://github.com/pytorch/pytorch/issues/25155

				:: python test_custom_classes.py -v

				:: if ERRORLEVEL 1 exit /b 1

				python model.py --export-script-module="build/model.pt"

				if ERRORLEVEL 1 exit /b 1

				:: Run tests C++-side and load the exported script module.

				cd build

				set PATH=C:\Program Files\NVIDIA Corporation\NvToolsExt\bin\x64;%TMP_DIR_WIN%\build\torch\lib;%PATH%

				test_custom_ops.exe model.pt

				if ERRORLEVEL 1 exit /b 1

									
										28

.jenkins/pytorch/win-test-helpers/test_libtorch.bat
									
												View File
												
				@ -1,9 +1,27 @@

				call %SCRIPT_HELPERS_DIR%\setup_pytorch_env.bat

				dir

				dir %TMP_DIR_WIN%\build

				dir %TMP_DIR_WIN%\build\torch

				dir %TMP_DIR_WIN%\build\torch\lib

				cd %TMP_DIR_WIN%\build\torch\lib

				cd %TMP_DIR_WIN%\build\torch\bin

				set PATH=C:\Program Files\NVIDIA Corporation\NvToolsExt\bin\x64;%TMP_DIR_WIN%\build\torch\lib;%PATH%

				test_api.exe --gtest_filter="-IntegrationTest.MNIST*"

				if errorlevel 1 exit /b 1

				cd %TMP_DIR_WIN%\build\torch\test

				for /r "." %%a in (*.exe) do (

				    call :libtorch_check "%%~na" "%%~fa"

				)

				goto :eof

				:libtorch_check

				:: See https://github.com/pytorch/pytorch/issues/25161

				if "%~1" == "c10_metaprogramming_test" goto :eof

				if "%~1" == "module_test" goto :eof

				:: See https://github.com/pytorch/pytorch/issues/25312

				if "%~1" == "converter_nomigraph_test" goto :eof

				echo Running "%~2"

				call "%~2"

				if errorlevel 1 exit /b 1

				goto :eof

									
										1

.jenkins/pytorch/win-test-helpers/test_python_all_except_nn.bat
									
												View File
												
				@ -1,2 +1,3 @@

				call %SCRIPT_HELPERS_DIR%\setup_pytorch_env.bat

				cd test && python run_test.py --exclude nn --verbose && cd ..

				if ERRORLEVEL 1 exit /b 1

									
										2

.jenkins/pytorch/win-test-helpers/upload_image.py
									
												View File
												
				@ -1,3 +1,5 @@

				#!/usr/bin/env python

				import os

				import sys

				import boto3

									
										58

.travis.yml
									
												View File
											
				@ -1,58 +0,0 @@

				# https://travis-ci.org/pytorch/pytorch

				language: python

				dist: trusty

				git:

				  submodules: false

				# This reportedly works around an issue downloading packages from pypi on

				# travis.  Consider removing this after the underlying issue is fixed.

				# https://github.com/travis-ci/travis-ci/issues/2389

				sudo: false

				matrix:

				    fast_finish: true

				    include:

				      - name: "Ensure consistent CircleCI YAML"

				        python: "3.6"

				        dist: xenial

				        script: cd .circleci && ./ensure-consistency.py

				      - name: "Shellcheck Jenkins scripts"

				        dist: xenial

				        install: sudo apt-get install -y shellcheck

				        script: .jenkins/run-shellcheck.sh

				      - name: "Ensure no tabs"

				        python: "2.7"

				        script:

				          - (! git grep -I -l $'\t' -- . ':(exclude)*.svg' ':(exclude)**Makefile' ':(exclude)**/contrib/**' ':(exclude)third_party' ':(exclude).gitattributes' ':(exclude).gitmodules' || (echo "The above files have tabs; please convert them to spaces"; false))

				      - name: "Python 2.7 Lint"

				        python: "2.7"

				        install: pip install flake8

				        script: flake8

				      - name: "Python 3.7 Lint"

				        python: "3.7"

				        dist: xenial    # required for Python 3.7 (travis-ci/travis-ci#9069)

				        sudo: required  # required for Python 3.7 (travis-ci/travis-ci#9069)

				        install:

				          - pip install flake8 flake8-mypy flake8-comprehensions flake8-pyi mccabe pycodestyle pyflakes

				          # Apparently Facebook runs master of this one

				          # https://github.com/PyCQA/flake8-bugbear/issues/53

				          - pip install git+https://github.com/PyCQA/flake8-bugbear.git@d9444713a51a9fb6ee8cd2d88fca85e9ff0c2d58

				        script: flake8

				      - name: "MyPy typecheck"

				        python: "3.6"

				        install: pip install mypy mypy-extensions

				        script: mypy @mypy-files.txt

				      - name: "CPP doc check"

				        python: "3.6"

				        install:

				          - sudo apt-get install -y doxygen

				          - pip install -r requirements.txt

				        script: cd docs/cpp/source && ./check-doxygen.sh

				      - name: "clang tidy"

				        python: "3.6"

				        script: tools/run-clang-tidy-in-ci.sh

				branches:

				  only:

				  - master

				  - /gh\/.*\/base/

									
										223

CMakeLists.txt
									
												View File
												
				@ -5,11 +5,25 @@ cmake_minimum_required(VERSION 3.5 FATAL_ERROR)

				# Use compiler ID "AppleClang" instead of "Clang" for XCode.

				# Not setting this sometimes makes XCode C compiler gets detected as "Clang",

				# even when the C++ one is detected as "AppleClang".

				cmake_policy(SET CMP0010 NEW)

				cmake_policy(SET CMP0025 NEW)

				# Suppress warning flags in default MSVC configuration.  It's not

				# mandatory that we do this (and we don't if cmake is old), but it's

				# nice when it's possible, and it's possible on our Windows configs.

				if(NOT CMAKE_VERSION VERSION_LESS 3.15.0)

				  cmake_policy(SET CMP0092 NEW)

				endif()

				# ---[ Project and semantic versioning.

				project(Caffe2 CXX C)

				if (${CMAKE_SYSTEM_NAME} STREQUAL "Linux")

				  set(LINUX TRUE)

				else()

				  set(LINUX FALSE)

				endif()

				set(CMAKE_INSTALL_MESSAGE NEVER)

				set(CMAKE_CXX_STANDARD 11)

				@ -19,6 +33,7 @@ endif()

				if (DEFINED GLIBCXX_USE_CXX11_ABI)

				  if (${GLIBCXX_USE_CXX11_ABI} EQUAL 1)

				    set(CXX_STANDARD_REQUIRED ON)

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -D_GLIBCXX_USE_CXX11_ABI=1")

				  endif()

				endif()

				@ -62,12 +77,28 @@ if(APPLE)

				  set(CMAKE_MACOSX_RPATH ON)

				endif()

				if (${CMAKE_HOST_SYSTEM_PROCESSOR} MATCHES "(x86_64|i[3-6]+86)")

				  set(CPU_INTEL ON)

				else ()

				  set(CPU_INTEL OFF)

				endif ()

				# For non-supported platforms, turn USE_DISTRIBUTED off by default.

				# It is not tested and likely won't work without additional changes.

				if(NOT LINUX)

				  set(USE_DISTRIBUTED OFF CACHE STRING "Use distributed")

				  # On macOS, if USE_DISTRIBUTED is enabled (specified by the user),

				  # then make Gloo build with the libuv transport.

				  if(APPLE AND USE_DISTRIBUTED)

				    set(USE_LIBUV ON CACHE STRING "")

				  endif()

				endif()

				# ---[ Options.

				# Note to developers: if you add an option below, make sure you also add it to

				# cmake/Summary.cmake so that the summary prints out the option values.

				include(CMakeDependentOption)

				option(ATEN_NO_TEST "Do not build ATen test binaries" OFF)

				option(BUILD_ATEN_ONLY "Build only a subset focused on ATen only" OFF)

				option(BUILD_BINARY "Build C++ binaries" OFF)

				option(BUILD_DOCS "Build Caffe2 documentation" OFF)

				option(BUILD_CUSTOM_PROTOBUF "Build and use Caffe2's own protobuf under third_party" ON)

				@ -75,6 +106,8 @@ option(BUILD_PYTHON "Build Python binaries" ON)

				option(BUILD_CAFFE2_OPS "Build Caffe2 operators" ON)

				option(BUILD_SHARED_LIBS "Build libcaffe2.so" ON)

				option(BUILD_CAFFE2_MOBILE "Build libcaffe2 for mobile (deprecating)" ON)

				option(BUILD_NAMEDTENSOR "Experimental: compile with namedtensor support" OFF)

				option(USE_STATIC_DISPATCH "Use static dispatch for ATen operators" OFF)

				cmake_dependent_option(

				    CAFFE2_LINK_LOCAL_PROTOBUF "If set, build protobuf inside libcaffe2.so." ON

				    "BUILD_SHARED_LIBS AND BUILD_CUSTOM_PROTOBUF" OFF)

				@ -93,8 +126,10 @@ option(CAFFE2_STATIC_LINK_CUDA "Statically link CUDA libraries" OFF)

				cmake_dependent_option(

				    USE_CUDNN "Use cuDNN" ON

				    "USE_CUDA" OFF)

				option(USE_FBGEMM "Use FBGEMM (quantized 8-bit server operators)" OFF)

				option(NAMEDTENSOR_ENABLED "Experimental: compile with namedtensor support" OFF)

				cmake_dependent_option(

				    USE_STATIC_CUDNN "Use cuDNN static libraries" OFF

				    "USE_CUDNN" OFF)

				option(USE_FBGEMM "Use FBGEMM (quantized 8-bit server operators)" ON)

				option(USE_FFMPEG "Use ffmpeg" OFF)

				option(USE_GFLAGS "Use GFLAGS" OFF)

				option(USE_GLOG "Use GLOG" OFF)

				@ -103,11 +138,20 @@ option(USE_LITE_PROTO "Use lite protobuf instead of full." OFF)

				option(USE_LMDB "Use LMDB" OFF)

				option(USE_METAL "Use Metal for iOS build" ON)

				option(USE_NATIVE_ARCH "Use -march=native" OFF)

				option(USE_NCCL "Use NCCL" ON)

				option(USE_SYSTEM_NCCL "Use system-wide NCCL" OFF)

				cmake_dependent_option(

				    USE_NCCL "Use NCCL" ON

				    "USE_CUDA;UNIX;NOT APPLE" OFF)

				cmake_dependent_option(

				    USE_STATIC_NCCL "Use static NCCL" OFF

				    "USE_NCCL" OFF)

				cmake_dependent_option(

				    USE_SYSTEM_NCCL "Use system-wide NCCL" OFF

				    "USE_NCCL" OFF)

				option(USE_NNAPI "Use NNAPI" OFF)

				option(USE_NNPACK "Use NNPACK" ON)

				option(USE_NUMA "Use NUMA (only available on Linux)" ON)

				cmake_dependent_option(

				    USE_NUMA "Use NUMA. Only available on Linux." ON

				    "LINUX" OFF)

				cmake_dependent_option(

				    USE_NVRTC "Use NVRTC. Only available if USE_CUDA is on." OFF

				    "USE_CUDA" OFF)

				@ -118,6 +162,7 @@ option(USE_OPENCV "Use OpenCV" OFF)

				option(USE_OPENMP "Use OpenMP for parallel code" ON)

				option(USE_PROF "Use profiling" OFF)

				option(USE_QNNPACK "Use QNNPACK (quantized 8-bit operators)" ON)

				option(USE_PYTORCH_QNNPACK "Use ATen/QNNPACK (quantized 8-bit operators)" ON)

				option(USE_REDIS "Use Redis" OFF)

				option(USE_ROCKSDB "Use RocksDB" OFF)

				option(USE_SNPE "Use Qualcomm's SNPE library" OFF)

				@ -126,7 +171,13 @@ option(USE_SYSTEM_EIGEN_INSTALL

				option(USE_TENSORRT "Using Nvidia TensorRT library" OFF)

				option(USE_ZMQ "Use ZMQ" OFF)

				option(USE_ZSTD "Use ZSTD" OFF)

				option(USE_MKLDNN "Use MKLDNN" OFF)

				cmake_dependent_option(

				  USE_MKLDNN "Use MKLDNN. Only available on x86 and x86_64." ON

				  "CPU_INTEL" OFF)

				set(MKLDNN_ENABLE_CONCURRENT_EXEC ${USE_MKLDNN})

				cmake_dependent_option(

				    USE_MKLDNN_CBLAS "Use CBLAS in MKLDNN" OFF

				    "USE_MKLDNN" OFF)

				option(USE_DISTRIBUTED "Use distributed" ON)

				cmake_dependent_option(

				    USE_MPI "Use MPI for Caffe2. Only available if USE_DISTRIBUTED is on." ON

				@ -134,43 +185,94 @@ cmake_dependent_option(

				cmake_dependent_option(

				    USE_GLOO "Use Gloo. Only available if USE_DISTRIBUTED is on." ON

				    "USE_DISTRIBUTED" OFF)

				cmake_dependent_option(

				    USE_GLOO_IBVERBS "Use Gloo IB verbs for distributed. Only available if USE_GLOO is on." OFF

				    "USE_GLOO" OFF)

				option(USE_TBB "Use TBB" OFF)

				# Used when building Caffe2 through setup.py

				option(BUILDING_WITH_TORCH_LIBS "Tell cmake if Caffe2 is being built alongside torch libs" OFF)

				option(BUILDING_WITH_TORCH_LIBS "Tell cmake if Caffe2 is being built alongside torch libs" ON)

				# /Z7 override option

				# When generating debug symbols, CMake default to use the flag /Zi.

				# However, it is not compatible with sccache. So we rewrite it off.

				# But some users don't use sccache; this override is for them.

				option(MSVC_Z7_OVERRIDE "Work around sccache bug by replacing /Zi and /ZI with /Z7 when using MSVC (if you are not using sccache, you can turn this OFF)" ON)

				cmake_dependent_option(

				  MSVC_Z7_OVERRIDE "Work around sccache bug by replacing /Zi and /ZI with /Z7 when using MSVC (if you are not using sccache, you can turn this OFF)" ON

				  "MSVC" OFF)

				SET(ONNX_NAMESPACE "onnx_c2" CACHE STRING "onnx namespace")

				set(ONNX_NAMESPACE "onnx_torch" CACHE STRING "A namespace for ONNX; needed to build with other frameworks that share ONNX.")

				# This is a fix for a rare build issue on Ubuntu:

				# symbol lookup error: miniconda3/envs/pytorch-py3.7/lib/libmkl_intel_lp64.so: undefined symbol: mkl_blas_dsyrk

				# https://software.intel.com/en-us/articles/symbol-lookup-error-when-linking-intel-mkl-with-gcc-on-ubuntu

				if(LINUX)

				  set(CMAKE_SHARED_LINKER_FLAGS "-Wl,--no-as-needed")

				endif()

				# For MSVC,

				# 1. Replace /Zi and /ZI with /Z7

				# 2. Switch off incremental linking in debug builds

				if (MSVC)

				  if(MSVC_Z7_OVERRIDE)

				    foreach(flag_var

				        CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_DEBUG CMAKE_CXX_FLAGS_RELEASE

				        CMAKE_CXX_FLAGS_MINSIZEREL CMAKE_CXX_FLAGS_RELWITHDEBINFO)

				  foreach(flag_var

				      CMAKE_C_FLAGS CMAKE_C_FLAGS_DEBUG CMAKE_C_FLAGS_RELEASE

				      CMAKE_C_FLAGS_MINSIZEREL CMAKE_C_FLAGS_RELWITHDEBINFO

				      CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_DEBUG CMAKE_CXX_FLAGS_RELEASE

				      CMAKE_CXX_FLAGS_MINSIZEREL CMAKE_CXX_FLAGS_RELWITHDEBINFO)

				    # Replace /Zi and /ZI with /Z7

				    if(MSVC_Z7_OVERRIDE)

				      if(${flag_var} MATCHES "/Z[iI]")

				        string(REGEX REPLACE "/Z[iI]" "/Z7" ${flag_var} "${${flag_var}}")

				      endif(${flag_var} MATCHES "/Z[iI]")

				    endforeach(flag_var)

				  endif(MSVC_Z7_OVERRIDE)

				    endif(MSVC_Z7_OVERRIDE)

				    # Turn off warnings on Windows.  In an ideal world we'd be warning

				    # clean on Windows too, but this is too much work for our

				    # non-Windows developers.

				    #

				    # NB: Technically, this is not necessary if CMP0092 was applied

				    # properly, but only cmake >= 3.15 has this policy, so we nail

				    # it one more time just be safe.

				    #

				    # NB2: This is NOT enough to prevent warnings from nvcc on MSVC.  At the

				    # moment only CMP0092 is enough to prevent those warnings too.

				    string(REPLACE "/W3" "" ${flag_var} "${${flag_var}}")

				    # Suppress EHs is overridden by EHa warning

				    string(REPLACE "/EHsc" "" ${flag_var} "${${flag_var}}")

				    # Turn off warnings (Windows build is currently is extremely warning

				    # unclean and the warnings aren't telling us anything useful.)

				    #

				    # Turn on EHa; I'm not altogether clear why we use the asynchronous

				    # exception handling model, but someone added it at some point, so

				    # keep using it.

				    string(APPEND ${flag_var} " /w /EHa")

				    if (${CAFFE2_USE_MSVC_STATIC_RUNTIME})

				      if(${flag_var} MATCHES "/MD")

				        string(REGEX REPLACE "/MD" "/MT" ${flag_var} "${${flag_var}}")

				      endif(${flag_var} MATCHES "/MD")

				    else()

				      if(${flag_var} MATCHES "/MT")

				        string(REGEX REPLACE "/MT" "/MD" ${flag_var} "${${flag_var}}")

				      endif()

				    endif()

				    # /bigobj increases number of sections in .obj file, which is needed to link

				    # against libaries in Python 2.7 under Windows

				    set(${flag_var} "${${flag_var}} /MP /bigobj")

				  endforeach(flag_var)

				  foreach(flag_var

				      CMAKE_SHARED_LINKER_FLAGS_DEBUG CMAKE_STATIC_LINKER_FLAGS_DEBUG

				      CMAKE_EXE_LINKER_FLAGS_DEBUG CMAKE_MODULE_LINKER_FLAGS_DEBUG)

				    # Switch off incremental linking in debug builds

				    if(${flag_var} MATCHES "/INCREMENTAL" AND NOT ${flag_var} MATCHES "/INCREMENTAL:NO")

				      string(REGEX REPLACE "/INCREMENTAL" "/INCREMENTAL:NO" ${flag_var} "${${flag_var}}")

				    endif()

				  endforeach(flag_var)

				  # Turning off USE_DISTRIBUTED on default

				  set(USE_DISTRIBUTED OFF)

				  foreach(flag_var

				      CMAKE_SHARED_LINKER_FLAGS CMAKE_STATIC_LINKER_FLAGS

				      CMAKE_EXE_LINKER_FLAGS CMAKE_MODULE_LINKER_FLAGS)

				    string(APPEND ${flag_var} " /ignore:4049 /ignore:4217")

				  endforeach(flag_var)

				  # Try harder

				  list(APPEND CUDA_NVCC_FLAGS "-Xcompiler /w -w")

				endif(MSVC)

				# Set INTERN_BUILD_MOBILE for all mobile builds. Components that are not

				@ -179,6 +281,13 @@ if (ANDROID OR IOS)

				  set(INTERN_BUILD_MOBILE ON)

				endif()

				# Setting `PYTORCH_BUILD_MOBILE` environment variable can force it to do mobile

				# build with host toolchain.

				if (DEFINED ENV{PYTORCH_BUILD_MOBILE})

				  set(INTERN_BUILD_MOBILE ON)

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DC10_MOBILE")

				endif()

				# INTERN_BUILD_ATEN_OPS is used to control whether to build ATen/TH operators.

				# It's disabled for caffe2 mobile library.

				if (INTERN_BUILD_MOBILE AND BUILD_CAFFE2_MOBILE)

				@ -192,27 +301,21 @@ endif()

				# When it's disabled it builds libtorch mobile library, which contains ATen/TH ops and native support for

				# TorchScript model, but doesn't contain not-yet-unified caffe2 ops;

				if (INTERN_BUILD_MOBILE AND NOT BUILD_CAFFE2_MOBILE)

				  if (NOT BUILD_SHARED_LIBS)

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DNO_EXPORT")

				  endif()

				  set(BUILD_PYTHON OFF)

				  set(BUILD_TORCH ON)

				  set(BUILD_CAFFE2_OPS OFF)

				  set(USE_DISTRIBUTED OFF)

				  set(FEATURE_TORCH_MOBILE ON)

				endif()

				if (BUILD_ATEN_ONLY)

				  set(BUILD_CAFFE2_OPS OFF)

				  set(BUILD_PYTHON OFF)

				  set(USE_NUMA OFF)

				  set(USE_LEVELDB OFF)

				  set(USE_GFLAGS OFF)

				  set(USE_GLOG OFF)

				  set(USE_NCCL OFF)

				  set(USE_NNPACK OFF)

				  set(USE_NUMPY OFF)

				  set(USE_OPENCV OFF)

				  set(USE_MKLDNN OFF)

				  set(USE_DISTRIBUTED OFF)

				  set(USE_LMDB OFF)

				  set(NO_API ON)

				  set(USE_FBGEMM OFF)

				  set(USE_PYTORCH_QNNPACK ON)

				  set(USE_QNNPACK OFF)

				  set(USE_STATIC_DISPATCH ON)

				  set(INTERN_DISABLE_ONNX ON)

				  set(INTERN_DISABLE_AUTOGRAD ON)

				  set(INTERN_USE_EIGEN_BLAS ON)

				endif()

				# ---[ Utils

				@ -221,8 +324,12 @@ include(cmake/Utils.cmake)

				include(cmake/public/utils.cmake)

				# ---[ Version numbers for generated libraries

				set(TORCH_DEFAULT_VERSION "1.0.0")

				set(TORCH_DEFAULT_VERSION "1.1.0")

				set(TORCH_BUILD_VERSION "${TORCH_DEFAULT_VERSION}" CACHE STRING "Torch build version")

				if (DEFINED ENV{PYTORCH_BUILD_VERSION})

				  set(TORCH_BUILD_VERSION "$ENV{PYTORCH_BUILD_VERSION}"

				    CACHE STRING "Torch build version" FORCE)

				endif()

				if (NOT TORCH_BUILD_VERSION)

				  # An empty string was specified so force version to the default

				  set(TORCH_BUILD_VERSION "${TORCH_DEFAULT_VERSION}"

				@ -269,8 +376,16 @@ if(USE_FBGEMM)

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DUSE_FBGEMM")

				endif()

				if(NAMEDTENSOR_ENABLED)

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DNAMEDTENSOR_ENABLED")

				if(BUILD_NAMEDTENSOR)

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DBUILD_NAMEDTENSOR")

				endif()

				if(USE_QNNPACK)

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DUSE_QNNPACK")

				endif()

				if(USE_PYTORCH_QNNPACK)

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DUSE_PYTORCH_QNNPACK")

				endif()

				# ---[ Whitelist file if whitelist is specified

				@ -353,25 +468,6 @@ if(NOT MSVC)

				  set (CMAKE_LINKER_FLAGS_DEBUG "${CMAKE_STATIC_LINKER_FLAGS_DEBUG} -fno-omit-frame-pointer -O0")

				  set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fno-math-errno")

				  set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fno-trapping-math")

				else()

				  foreach(flag_var

				      CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_DEBUG CMAKE_CXX_FLAGS_RELEASE

				      CMAKE_CXX_FLAGS_MINSIZEREL CMAKE_CXX_FLAGS_RELWITHDEBINFO

				      CMAKE_C_FLAGS CMAKE_C_FLAGS_DEBUG CMAKE_C_FLAGS_RELEASE

				      CMAKE_C_FLAGS_MINSIZEREL CMAKE_C_FLAGS_RELWITHDEBINFO)

				    if (${CAFFE2_USE_MSVC_STATIC_RUNTIME})

				      if(${flag_var} MATCHES "/MD")

				        string(REGEX REPLACE "/MD" "/MT" ${flag_var} "${${flag_var}}")

				      endif(${flag_var} MATCHES "/MD")

				    else()

				      if(${flag_var} MATCHES "/MT")

				        string(REGEX REPLACE "/MT" "/MD" ${flag_var} "${${flag_var}}")

				      endif()

				    endif()

				    # /bigobj increases number of sections in .obj file, which is needed to link

				    # against libaries in Python 2.7 under Windows

				    set(${flag_var} "${${flag_var}} /MP /bigobj")

				  endforeach(flag_var)

				endif()

				if (USE_ASAN)

				@ -418,6 +514,7 @@ include_directories(BEFORE ${PROJECT_SOURCE_DIR})

				include_directories(BEFORE ${PROJECT_BINARY_DIR})

				include_directories(BEFORE ${PROJECT_SOURCE_DIR}/aten/src/)

				include_directories(BEFORE ${PROJECT_BINARY_DIR}/aten/src/)

				# ---[ Main build

				add_subdirectory(c10)

18

CODEOWNERS

View File

 @ -4,8 +4,18 @@
 /docs/cpp @goldsborough @ebetica @yf225
 /torch/csrc/api/ @ebetica @goldsborough @yf225
 /test/cpp/api/ @ebetica @goldsborough @yf225
 /torch/lib/c10d/ @apaszke @pietern @mrshenli
 /torch/csrc/distributed/ @apaszke @pietern @mrshenli
 /torch/lib/c10d/ @pietern @mrshenli
 /torch/csrc/distributed/ @pietern @mrshenli
 /torch/distributed/ @apaszke @pietern @mrshenli
 /test/test_c10d.py @apaszke @pietern @mrshenli
 /torch/utils/cpp_extension.py @goldsborough @fmassa @apaszke @soumith @ezyang
 /test/test_c10d.py @pietern @mrshenli
 /torch/utils/cpp_extension.py @goldsborough @fmassa @soumith @ezyang
 # Not there to stricly require the approval, but to be tagged as a reviewer
 # on the PRs to push them into a high priority inbox.
 /torch/csrc/api/data/ @apaszke
 /torch/csrc/autograd/ @apaszke
 /torch/csrc/jit/ @apaszke
 /torch/nn/ @apaszke
 /torch/autograd/ @apaszke
 /torch/jit/ @apaszke
 /torch/utils/data/ @apaszke

									
										246

CONTRIBUTING.md
									
												View File
												
				@ -185,7 +185,7 @@ pytest test/test_nn.py -k Loss -v

				The above is an example of testing a change to Loss functions: this command runs tests such as

				`TestNN.test_BCELoss` and `TestNN.test_MSELoss` and can be useful to save keystrokes.

				## Writing documentation

				## Writing Documentation

				PyTorch uses [Google style](http://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html)

				for formatting docstrings. Length of line inside docstrings block must be limited to 80 characters to

				@ -204,7 +204,87 @@ We run Doxygen in CI (Travis) to verify that you do not use invalid Doxygen

				commands. To run this check locally, run `./check-doxygen.sh` from inside

				`docs/cpp`.

				## Managing multiple build trees

				### Building Documentation

				To build the documentation:

				1. Build and install PyTorch

				2. Install the prequesities

				```bash

				cd docs

				pip install -r requirements.txt

				# `katex` must also be available in your PATH.

				# If you are using Ubuntu or Debian, you can install it with:

				# sudo apt install katex

				```

				3. Generate the documentation HTML files. The generated files will be in `docs/build/html`.

				```bash

				cd docs

				make html

				```

				4. To view HTML files, you must start an HTTP server. For example

				```bash

				# Start a server from the current directory (Python 3 only)

				cd docs/build/html

				python -m http.server

				```

				If you are developing on a remote machine, you can set up an SSH tunnel so that

				you can access the HTTP server on the remote machine on your local machine. To map

				remote port 8086 to local port 8086, use either of the following commands.

				```bash

				# For SSH

				ssh my_machine -L 8086:my_machine:8086

				# For Eternal Terminal

				et my_machine -t="8086:8086"

				```

				Then navigate to `localhost:8086` in your web browser.

				#### Tips

				The `.rst` source files live in [docs/source](docs/source). Some of the `.rst`

				files pull in docstrings from PyTorch Python code (for example, via

				the `autofunction` or `autoclass` directives). To vastly shorten doc build times,

				it is helpful to remove the files you are not working on, only keeping the base

				`index.rst` file and the files you are editing. The Sphinx build will produce

				missing file warnings but will still complete. For example, to work on `jit.rst`:

				```bash

				cd docs/source

				ls | grep rst | grep -v index | grep -v jit | xargs rm

				# Make your changes, build the docs, etc.

				# Don't commit the deletions!

				git add index.rst jit.rst 

				...

				```

				### Adding Documentation Tests

				It is easy for code snippets in docstrings and `.rst` files to get out of date. The docs

				build includes the [Sphinx Doctest Extension](https://www.sphinx-doc.org/en/master/usage/extensions/doctest.html),

				which can run code in documentation as a unit test. To use the extension, use

				the `.. testcode::` directive in your `.rst` and docstrings.

				To manually run these tests, follow steps 1 and 2 above, then run:

				```bash

				cd docs

				make doctest

				```

				## Managing Multiple Build Trees

				One downside to using `python setup.py develop` is that your development

				version of PyTorch will be installed globally on your account (e.g., if

				@ -243,18 +323,27 @@ only interested in a specific component.

				  Caffe2 operators.

				On the initial build, you can also speed things up with the environment

				variables `DEBUG` and `NO_CUDA`.

				variables `DEBUG`, `USE_DISTRIBUTED`, `USE_MKLDNN`, `USE_CUDA`, `BUILD_TEST`, `USE_FBGEMM`, `USE_NNPACK` and `USE_QNNPACK`.

				- `DEBUG=1` will enable debug builds (-g -O0)

				- `REL_WITH_DEB_INFO=1` will enable debug symbols with optimizations (-g -O3)

				- `NO_CUDA=1` will disable compiling CUDA (in case you are developing on something not CUDA related), to save compile time.

				- `USE_DISTRIBUTED=0` will disable distributed (c10d, gloo, mpi, etc.) build.

				- `USE_MKLDNN=0` will disable using MKL-DNN.

				- `USE_CUDA=0` will disable compiling CUDA (in case you are developing on something not CUDA related), to save compile time.

				- `BUILD_TEST=0` will disable building C++ test binaries.

				- `USE_FBGEMM=0` will disable using FBGEMM (quantized 8-bit server operators).

				- `USE_NNPACK=0` will disable compiling with NNPACK.

				- `USE_QNNPACK=0` will disable QNNPACK build (quantized 8-bit operators).

				For example:

				```bash

				NO_CUDA=1 DEBUG=1 python setup.py develop

				DEBUG=1 USE_DISTRIBUTED=0 USE_MKLDNN=0 USE_CUDA=0 BUILD_TEST=0 USE_FBGEMM=0 USE_NNPACK=0 USE_QNNPACK=0 python setup.py develop

				```

				Make sure you continue to pass these flags on subsequent builds.

				For subsequent builds (i.e., when `build/CMakeCache.txt` exists), the build

				options passed for the first time will persist; please run `ccmake build/`, run

				`cmake-gui build/`, or directly edit `build/CMakeCache.txt` to adapt build

				options.

				### Code completion and IDE support

				@ -349,6 +438,16 @@ ccache -F 0

				# deploy (and add to ~/.bashrc for later)

				export PATH="/usr/lib/ccache:$PATH"

				```

				#### Use a faster linker

				If you are editing a single file and rebuilding in a tight loop, the time spent

				linking will dominate. The system linker available in most Linux distributions

				(GNU `ld`) is quite slow. Use a faster linker, like [lld](https://lld.llvm.org/).

				The easiest way to use `lld` this is download the

				[latest LLVM binaries](http://releases.llvm.org/download.html#8.0.0) and run:

				```

				ln -s /path/to/downloaded/ld.lld /usr/local/bin/ld

				```

				## CUDA Development tips

				@ -359,6 +458,39 @@ If you are working on the CUDA code, here are some useful CUDA debugging tips:

				    slow down the build process for about 50% (compared to only `DEBUG=1`), so use wisely.

				2. `cuda-gdb` and `cuda-memcheck` are your best CUDA debugging friends. Unlike`gdb`,

				   `cuda-gdb` can display actual values in a CUDA tensor (rather than all zeros).

				3. CUDA supports a lot of C++11 features such as, `std::numeric_limits`, `std::nextafter`,

				   `std::tuple` etc. in device code. Many of such features are possible because of the

				   [--expt-relaxed-constexpr](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#constexpr-functions)

				   nvcc flag. There is a known [issue](https://github.com/ROCm-Developer-Tools/HIP/issues/374)

				   that ROCm errors out on device code, which uses such stl functions.

				4. A good performance metric for a CUDA kernel is the

				   [Effective Memory Bandwidth](https://devblogs.nvidia.com/how-implement-performance-metrics-cuda-cc/).

				   It is useful for you to measure this metric whenever you are writing/optimizing a CUDA

				   kernel. Following script shows how we can measure the effective bandwidth of CUDA `uniform_`

				   kernel.

				   ```python

				   import torch

				   import time

				   size = 128*512

				   nrep = 100

				   nbytes_read_write = 4 # this is number of bytes read + written by a kernel. Change this to fit your kernel.

				   for i in range(10):

				       a=torch.Tensor(size).cuda().uniform_()

				       torch.cuda.synchronize()

				       start = time.time()

				       # dry run to alloc

				       out = a.uniform_()

				       torch.cuda.synchronize()

				       start = time.time()

				       for i in range(nrep):

				         out = a.uniform_()

				       torch.cuda.synchronize()

				       end = time.time()

				       timec = (end-start)/nrep

				       print("uniform, size, elements", size, "forward", timec, "bandwidth (GB/s)", size*(nbytes_read_write)*1e-9/timec)

				       size *=2

				   ```

				Hope this helps, and thanks for considering to contribute.

				@ -472,6 +604,11 @@ static_assert(std::is_same(A*, decltype(A::singleton()))::value, "hmm");

				  This causes preprocessor tokens inside the literal like an`#endif`  to be incorrectly

				  treated as preprocessor directives. See https://godbolt.org/z/eVTIJq as an example.

				* Either MSVC or the Windows headers have a PURE macro defined and will replace

				  any occurrences of the PURE token in code with an empty string. This is why

				  we have AliasAnalysisKind::PURE_FUNCTION and not AliasAnalysisKind::PURE.

				  The same is likely true for other identifiers that we just didn't try to use yet.

				### Running Clang-Tidy

				[Clang-Tidy](https://clang.llvm.org/extra/clang-tidy/index.html) is a C++

				@ -505,7 +642,8 @@ which is in PyTorch's `requirements.txt`.

				### Pre-commit Tidy/Linting Hook

				We use clang-tidy and flake8 (installed with flake-mypy) to perform additional

				We use clang-tidy and flake8 (installed with flake8-bugbear,

				flake8-comprehensions, flake8-mypy, and flake8-pyi) to perform additional

				formatting and semantic checking of code. We provide a pre-commit git hook for

				performing these checks, before a commit is created:

				@ -517,6 +655,100 @@ You'll need to install an appropriately configured flake8; see

				[Lint as you type](https://github.com/pytorch/pytorch/wiki/Lint-as-you-type)

				for documentation on how to do this.

				### Building PyTorch with ASAN

				[ASAN](https://github.com/google/sanitizers/wiki/AddressSanitizer) is very

				useful for debugging memory errors in C++. We run it in CI, but here's how to

				get the same thing to run on your local machine.

				First, install LLVM 8. The easiest way is to get [prebuilt

				binaries](http://releases.llvm.org/download.html#8.0.0) and extract them to

				folder (later called `$LLVM_ROOT`).

				Then set up the appropriate scripts. You can put this in your `.bashrc`:

				```

				LLVM_ROOT=<wherever your llvm install is>

				PYTORCH_ROOT=<wherever your pytorch checkout is>

				LIBASAN_RT="$LLVM_ROOT/lib/clang/8.0.0/lib/linux/libclang_rt.asan-x86_64.so"

				build_with_asan()

				{

				  LD_PRELOAD=${LIBASAN_RT} \

				  CC="$LLVM_ROOT/bin/clang" \

				  CXX="$LLVM_ROOT/bin/clang++" \

				  LDSHARED="clang --shared" \

				  LDFLAGS="-stdlib=libstdc++" \

				  CFLAGS="-fsanitize=address -fno-sanitize-recover=all -shared-libasan -pthread" \

				  CXX_FLAGS="-pthread" \

				  NO_CUDA=1 USE_OPENMP=0 BUILD_CAFFE2_OPS=0 NO_DISTRIBUTED=1 DEBUG=1 \

				  python setup.py develop

				}

				run_with_asan()

				{

				  LD_PRELOAD=${LIBASAN_RT} $@

				}

				# you can look at build-asan.sh to find the latest options the CI uses

				export ASAN_OPTIONS=detect_leaks=0:symbolize=1:strict_init_order=true

				export UBSAN_OPTIONS=print_stacktrace=1:suppressions=$PYTORCH_ROOT/ubsan.supp

				export ASAN_SYMBOLIZER_PATH=$LLVM_ROOT/bin/llvm-symbolizer

				```

				Then you can use the scripts like:

				```

				suo-devfair ~/pytorch ❯ build_with_asan

				suo-devfair ~/pytorch ❯ run_with_asan python test/test_jit.py

				```

				#### Getting `ccache` to work

				The scripts above specify the `clang` and `clang++` binaries directly, which

				bypasses `ccache`. Here's how to get `ccache` to work:

				1. Make sure the ccache symlinks for `clang` and `clang++` are set up (see

				   CONTRIBUTING.md)

				2. Make sure `$LLVM_ROOT/bin` is available on your `$PATH`.

				3. Change the `CC` and `CXX` variables in `build_with_asan()` to point

				   directly to `clang` and `clang++`.

				#### Why this stuff with `LD_PRELOAD` and `LIBASAN_RT`?

				The “standard” workflow for ASAN assumes you have a standalone binary:

				1. Recompile your binary with `-fsanitize=address`.

				2. Run the binary, and ASAN will report whatever errors it find.

				Unfortunately, PyTorch is a distributed as a shared library that is loaded by

				a third-party executable (Python). It’s too much of a hassle to recompile all

				of Python every time we want to use ASAN. Luckily, the ASAN folks have a

				workaround for cases like this:

				1. Recompile your library with `-fsanitize=address -shared-libasan`. The

				   extra `-shared-libasan` tells the compiler to ask for the shared ASAN

				   runtime library.

				2. Use `LD_PRELOAD` to tell the dynamic linker to load the ASAN runtime

				   library before anything else.

				More information can be found

				[here](https://github.com/google/sanitizers/wiki/AddressSanitizerAsDso).

				#### Why LD_PRELOAD in the build function?

				We need `LD_PRELOAD` because there is a cmake check that ensures that a

				simple program builds and runs. If we are building with ASAN as a shared

				library, we need to `LD_PRELOAD` the runtime library, otherwise there will

				dynamic linker errors and the check will fail.

				We don’t actually need either of these if we fix the cmake checks.

				#### Why no Leak detection?

				Python leaks a lot of memory. Possibly we could configure a suppression file,

				but we haven’t gotten around to it.

				## Caffe2 notes

				In 2018, we merged Caffe2 into the PyTorch source repository. While the

									
										56

README.md
									
												View File
												
				@ -159,7 +159,7 @@ If you want to compile with CUDA support, install

				- [NVIDIA CUDA](https://developer.nvidia.com/cuda-downloads) 9 or above

				- [NVIDIA cuDNN](https://developer.nvidia.com/cudnn) v7 or above

				If you want to disable CUDA support, export environment variable `NO_CUDA=1`.

				If you want to disable CUDA support, export environment variable `USE_CUDA=0`.

				Other potentially useful environment variables may be found in `setup.py`.

				If you are building for NVIDIA's Jetson platforms (Jetson Nano, TX1, TX2, AGX Xavier), Instructions to [are available here](https://devtalk.nvidia.com/default/topic/1049071/jetson-nano/pytorch-for-jetson-nano/)

				@ -212,27 +212,69 @@ If the version of Visual Studio 2017 is higher than 15.4.5, installing of "VC++

				NVTX is a part of CUDA distributive, where it is called "Nsight Compute". For installing it onto already installed CUDA run CUDA installation once again and check the corresponding checkbox.

				Be sure that CUDA with Nsight Compute is installed after Visual Studio 2017.

				Currently VS 2017, VS 2019 and Ninja are supported as the generator of CMake. If `ninja.exe` is detected in `PATH`, then Ninja will be used as the default generator, otherwise it will use VS 2017.

				<br/> If Ninja is selected as the generator, the latest MSVC which is newer than VS 2015 (14.0) will get selected as the underlying toolchain if you have Python > 3.5, otherwise VS 2015 will be selected so you'll have to activate the environment. If you use CMake <= 3.14.2 and has VS 2019 installed, then even if you specify VS 2017 as the generator, VS 2019 will get selected as the generator.

				CUDA and MSVC has strong version dependencies, so even if you use VS 2017 / 2019, you will get build errors like `nvcc fatal : Host compiler targets unsupported OS`. For this kind of problem, please install the corresponding VS toolchain in the table below and then you can either specify the toolset during activation (recommended) or set `CUDAHOSTCXX` to override the cuda host compiler (not recommended if there are big version differences).

				| CUDA version | Newest supported VS version                             |

				| ------------ | ------------------------------------------------------- |

				| 9.0 / 9.1    | Visual Studio 2017 Update 4 (15.4) (`_MSC_VER` <= 1911) |

				| 9.2          | Visual Studio 2017 Update 5 (15.5) (`_MSC_VER` <= 1912) |

				| 10.0         | Visual Studio 2017 (15.X) (`_MSC_VER` < 1920)           |

				| 10.1         | Visual Studio 2019 (16.X) (`_MSC_VER` < 1930)           |

				```cmd

				cmd

				REM [Optional] The following two lines are needed for Python 2.7, but the support for it is very experimental.

				:: [Optional] Only add the next two lines if you need Python 2.7. If you use Python 3, ignore these two lines.

				set MSSdk=1

				set FORCE_PY27_BUILD=1

				set CMAKE_GENERATOR=Visual Studio 15 2017 Win64

				set DISTUTILS_USE_SDK=1

				:: [Optional] If you want to build with VS 2019 generator, please change the value in the next line to `Visual Studio 16 2019`.

				:: Note: This value is useless if Ninja is detected. However, you can force that by using `set USE_NINJA=OFF`.

				set CMAKE_GENERATOR=Visual Studio 15 2017

				REM Run "Visual Studio 2017 Developer Command Prompt"

				for /f "usebackq tokens=*" %i in (`"%ProgramFiles(x86)%\Microsoft Visual Studio\Installer\vswhere.exe" -version [15^,16^) -products * -latest -property installationPath`) do call "%i\VC\Auxiliary\Build\vcvarsall.bat" x64 -vcvars_ver=14.11

				:: Read the content in the previous section carefully before you preceed.

				:: [Optional] If you want to override the underlying toolset used by Ninja and Visual Studio with CUDA, please run the following script block.

				:: "Visual Studio 2017 Developer Command Prompt" will be run automatically.

				:: Make sure you have CMake >= 3.12 before you do this when you use the Visual Studio generator.

				:: It's an essential step if you use Python 3.5.

				set CMAKE_GENERATOR_TOOLSET_VERSION=14.11

				set DISTUTILS_USE_SDK=1

				for /f "usebackq tokens=*" %i in (`"%ProgramFiles(x86)%\Microsoft Visual Studio\Installer\vswhere.exe" -version [15^,16^) -products * -latest -property installationPath`) do call "%i\VC\Auxiliary\Build\vcvarsall.bat" x64 -vcvars_ver=%CMAKE_GENERATOR_TOOLSET_VERSION%

				:: [Optional] If you want to override the cuda host compiler

				set CUDAHOSTCXX=C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\VC\Tools\MSVC\14.11.25503\bin\HostX64\x64\cl.exe

				python setup.py install

				```

				##### Adjust Build Options (Optional)

				You can adjust the configuration of cmake variables optionally (without building first), by doing

				the following. For example, adjusting the pre-detected directories for CuDNN or BLAS can be done

				with such a step.

				On Linux

				```bash

				export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}

				python setup.py build --cmake-only

				ccmake build  # or cmake-gui build

				```

				On macOS

				```bash

				export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}

				MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py build --cmake-only

				ccmake build  # or cmake-gui build

				```

				### Docker Image

				Dockerfile is supplied to build images with cuda support and cudnn v7. You can pass `-e PYTHON_VERSION=x.y` flag to specify which Python version is to be used by Miniconda, or leave it unset to use the default. Build from pytorch repo directory as docker needs to copy git repo into docker filesystem while building the image.

				```

				docker build -t pytorch -f docker/pytorch/Dockerfile .

				docker build -t pytorch -f docker/pytorch/Dockerfile .  # [optional] --build-arg WITH_TORCHVISION=0

				```

				You can also pull a pre-built docker image from Docker Hub and run with nvidia-docker,

16

android/.gitignore vendored Normal file

View File

 @ -0,0 +1,16 @@
 local.properties
 **/*.iml
 .gradle
 gradlew*
 gradle/wrapper
 .idea/*
 .externalNativeBuild
 build
 pytorch_android/src/main/cpp/libtorch_include/x86/**
 pytorch_android/src/main/cpp/libtorch_include/x86_64/**
 pytorch_android/src/main/cpp/libtorch_include/armeabi-v7a/**
 pytorch_android/src/main/cpp/libtorch_include/arm64-v8a/**
 pytorch_android/src/main/jniLibs/x86/**
 pytorch_android/src/main/jniLibs/x86_64/**
 pytorch_android/src/main/jniLibs/armeabi-v7a/**
 pytorch_android/src/main/jniLibs/arm64-v8a/**

									
										41

android/build.gradle
									
										Normal file
									
												View File
												
				@ -0,0 +1,41 @@

				buildscript {

				    ext {

				        minSdkVersion = 21

				        targetSdkVersion = 28

				        compileSdkVersion = 28

				        buildToolsVersion = '28.0.3'

				        coreVersion = "1.2.0"

				        extJUnitVersion = "1.1.1"

				        runnerVersion = "1.2.0"

				        rulesVersion = "1.2.0"

				        junitVersion = "4.12"

				    }

				    repositories {

				        google()

				        mavenLocal()

				        mavenCentral()

				        jcenter()

				    }

				    dependencies {

				        classpath 'com.android.tools.build:gradle:3.3.2'

				        classpath "com.jfrog.bintray.gradle:gradle-bintray-plugin:${GRADLE_BINTRAY_PLUGIN_VERSION}"

				        classpath "com.github.dcendents:android-maven-gradle-plugin:${ANDROID_MAVEN_GRADLE_PLUGIN_VERSION}"

				        classpath "org.jfrog.buildinfo:build-info-extractor-gradle:4.9.8"

				    }

				}

				allprojects {

				    repositories {

				        google()

				        jcenter()

				    }

				}

				ext.isPublishing = { ['uploadArchives', 'bintrayUpload'].any { gradle.startParameter.taskNames.contains(it) } }

				ext.deps = [

				        jsr305: 'com.google.code.findbugs:jsr305:3.0.1',

				]

									
										24

android/gradle.properties
									
										Normal file
									
												View File
												
				@ -0,0 +1,24 @@

				ABI_FILTERS=armeabi-v7a,arm64-v8a,x86,x86_64

				VERSION_NAME=1.3.0

				GROUP=org.pytorch

				MAVEN_GROUP=org.pytorch

				POM_URL=https://github.com/pytorch/pytorch/tree/master/android

				POM_SCM_URL=https://github.com/pytorch/pytorch.git

				POM_SCM_CONNECTION=scm:git:https://github.com/pytorch/pytorch

				POM_SCM_DEV_CONNECTION=scm:git:git@github.com:pytorch/pytorch.git

				POM_LICENSE_NAME=BSD 3-Clause

				POM_LICENSE_URL=https://github.com/pytorch/pytorch/blob/master/LICENSE

				POM_ISSUES_URL=https://github.com/pytorch/pytorch/issues

				POM_LICENSE_DIST=repo

				POM_DEVELOPER_ID=pytorch

				POM_DEVELOPER_NAME=pytorch

				syncWithMavenCentral=true

				GRADLE_BINTRAY_PLUGIN_VERSION=1.8.0

				GRADLE_VERSIONS_PLUGIN_VERSION=0.15.0

				ANDROID_MAVEN_GRADLE_PLUGIN_VERSION=2.1

				# Gradle internals

				org.gradle.internal.repository.max.retries=1

				org.gradle.jvmargs=-XX:MaxMetaspaceSize=1024m

									
										32

android/gradle/android_maven_install.gradle
									
										Normal file
									
												View File
												
				@ -0,0 +1,32 @@

				apply plugin: 'com.github.dcendents.android-maven'

				version = VERSION_NAME

				group = GROUP

				project.archivesBaseName = POM_ARTIFACT_ID

				install {

				    repositories.mavenInstaller {

				        pom.project {

				            name POM_NAME

				            artifactId POM_ARTIFACT_ID

				            packaging POM_PACKAGING

				            description POM_DESCRIPTION

				            url projectUrl

				            scm {

				                url scmUrl

				                connection scmConnection

				                developerConnection scmDeveloperConnection

				            }

				            licenses projectLicenses

				            developers {

				                developer {

				                    id developerId

				                    name developerName

				                }

				            }

				        }

				    }

				}

									
										95

android/gradle/android_tasks.gradle
									
										Normal file
									
												View File
												
				@ -0,0 +1,95 @@

				import java.nio.file.Files

				import java.nio.file.Paths

				import java.io.FileOutputStream

				import java.util.zip.ZipFile

				// Android tasks for Javadoc and sources.jar generation

				afterEvaluate { project ->

				    if (POM_PACKAGING == 'aar') {

				        task androidJavadoc(type: Javadoc, dependsOn: assembleDebug) {

				            source += files(android.sourceSets.main.java.srcDirs)

				            failOnError false

				            // This task will try to compile *everything* it finds in the above directory and

				            // will choke on text files it doesn't understand.

				            exclude '**/BUCK'

				            exclude '**/*.md'

				        }

				        task androidJavadocJar(type: Jar, dependsOn: androidJavadoc) {

				            classifier = 'javadoc'

				            from androidJavadoc.destinationDir

				        }

				        task androidSourcesJar(type: Jar) {

				            classifier = 'sources'

				            from android.sourceSets.main.java.srcDirs

				        }

				        android.libraryVariants.all { variant ->

				            def name = variant.name.capitalize()

				            task "jar${name}"(type: Jar, dependsOn: variant.javaCompileProvider) {

				                from variant.javaCompileProvider.get().destinationDir

				            }

				            androidJavadoc.doFirst {

				                classpath += files(android.bootClasspath)

				                classpath += files(variant.javaCompileProvider.get().classpath.files)

				                // This is generated by `assembleDebug` and holds the JARs generated by the APT.

				                classpath += fileTree(dir: "$buildDir/intermediates/bundles/debug/", include: '**/*.jar')

				                // Process AAR dependencies

				                def aarDependencies = classpath.filter { it.name.endsWith('.aar') }

				                classpath -= aarDependencies

				                aarDependencies.each { aar ->

				                    // Extract classes.jar from the AAR dependency, and add it to the javadoc classpath

				                    def outputPath = "$buildDir/tmp/aarJar/${aar.name.replace('.aar', '.jar')}"

				                    classpath += files(outputPath)

				                    // Use a task so the actual extraction only happens before the javadoc task is run

				                    dependsOn task(name: "extract ${aar.name}").doLast {

				                        extractEntry(aar, 'classes.jar', outputPath)

				                    }

				                }

				            }

				        }

				        artifacts.add('archives', androidJavadocJar)

				        artifacts.add('archives', androidSourcesJar)

				    }

				    if (POM_PACKAGING == 'jar') {

				        task javadocJar(type: Jar, dependsOn: javadoc) {

				            classifier = 'javadoc'

				            from javadoc.destinationDir

				        }

				        task sourcesJar(type: Jar, dependsOn: classes) {

				            classifier = 'sources'

				            from sourceSets.main.allSource

				        }

				        artifacts.add('archives', javadocJar)

				        artifacts.add('archives', sourcesJar)

				    }

				}

				// Utility method to extract only one entry in a zip file

				private def extractEntry(archive, entryPath, outputPath) {

				    if (!archive.exists()) {

				        throw new GradleException("archive $archive not found")

				    }

				    def zip = new ZipFile(archive)

				    zip.entries().each {

				        if (it.name == entryPath) {

				            def path = Paths.get(outputPath)

				            if (!Files.exists(path)) {

				                Files.createDirectories(path.getParent())

				                Files.copy(zip.getInputStream(it), path)

				            }

				        }

				    }

				    zip.close()

				}

									
										64

android/gradle/bintray.gradle
									
										Normal file
									
												View File
												
				@ -0,0 +1,64 @@

				apply plugin: 'com.jfrog.bintray'

				def getBintrayUsername() {

				    return project.hasProperty('bintrayUsername') ? property('bintrayUsername') : System.getenv('BINTRAY_USERNAME')

				}

				def getBintrayApiKey() {

				    return project.hasProperty('bintrayApiKey') ? property('bintrayApiKey') : System.getenv('BINTRAY_API_KEY')

				}

				def getBintrayGpgPassword() {

				    return project.hasProperty('bintrayGpgPassword') ? property('bintrayGpgPassword') : System.getenv('BINTRAY_GPG_PASSWORD')

				}

				def getMavenCentralUsername() {

				    return project.hasProperty('mavenCentralUsername') ? property('mavenCentralUsername') : System.getenv('MAVEN_CENTRAL_USERNAME')

				}

				def getMavenCentralPassword() {

				    return project.hasProperty('mavenCentralPassword') ? property('mavenCentralPassword') : System.getenv('MAVEN_CENTRAL_PASSWORD')

				}

				def shouldSyncWithMavenCentral() {

				    return project.hasProperty('syncWithMavenCentral') ? property('syncWithMavenCentral').toBoolean() : false

				}

				def dryRunOnly() {

				    return project.hasProperty('dryRun') ? property('dryRun').toBoolean() : false

				}

				bintray {

				    user = getBintrayUsername()

				    key = getBintrayApiKey()

				    override = false

				    configurations = ['archives']

				    pkg {

				        repo = bintrayRepo

				        userOrg = bintrayUserOrg

				        name = bintrayName

				        desc = bintrayDescription

				        websiteUrl = projectUrl

				        issueTrackerUrl = issuesUrl

				        vcsUrl = scmUrl

				        licenses = [ POM_LICENSE_NAME ]

				        dryRun = dryRunOnly()

				        override = false

				        publish = true

				        publicDownloadNumbers = true

				        version {

				            name = versionName

				            desc = bintrayDescription

				            gpg {

				                sign = true

				                passphrase = getBintrayGpgPassword()

				            }

				            mavenCentralSync {

				                sync = shouldSyncWithMavenCentral()

				                user = getMavenCentralUsername()

				                password = getMavenCentralPassword()

				                close = '1' // If set to 0, you have to manually click release

				            }

				        }

				    }

				}

									
										81

android/gradle/gradle_maven_push.gradle
									
										Normal file
									
												View File
												
				@ -0,0 +1,81 @@

				apply plugin: 'signing'

				version = VERSION_NAME

				group = MAVEN_GROUP

				def isReleaseBuild() {

				  return !VERSION_NAME.contains('SNAPSHOT')

				}

				def getReleaseRepositoryUrl() {

				  return hasProperty('RELEASE_REPOSITORY_URL') ? RELEASE_REPOSITORY_URL

				      : "https://oss.sonatype.org/service/local/staging/deploy/maven2/"

				}

				def getSnapshotRepositoryUrl() {

				  return hasProperty('SNAPSHOT_REPOSITORY_URL') ? SNAPSHOT_REPOSITORY_URL

				      : "https://oss.sonatype.org/content/repositories/snapshots/"

				}

				def getRepositoryUsername() {

				  return hasProperty('SONATYPE_NEXUS_USERNAME') ? SONATYPE_NEXUS_USERNAME : ""

				}

				def getRepositoryPassword() {

				  return hasProperty('SONATYPE_NEXUS_PASSWORD') ? SONATYPE_NEXUS_PASSWORD : ""

				}

				afterEvaluate { project ->

				  uploadArchives {

				    repositories {

				      mavenDeployer {

				        beforeDeployment { MavenDeployment deployment -> signing.signPom(deployment) }

				        pom.groupId = MAVEN_GROUP

				        pom.artifactId = POM_ARTIFACT_ID

				        pom.version = VERSION_NAME

				        repository(url: getReleaseRepositoryUrl()) {

				          authentication(userName: getRepositoryUsername(), password: getRepositoryPassword())

				        }

				        snapshotRepository(url: getSnapshotRepositoryUrl()) {

				          authentication(userName: getRepositoryUsername(), password: getRepositoryPassword())

				        }

				        pom.project {

				          name POM_NAME

				          packaging POM_PACKAGING

				          description POM_DESCRIPTION

				          url POM_URL

				          scm {

				            url POM_SCM_URL

				            connection POM_SCM_CONNECTION

				            developerConnection POM_SCM_DEV_CONNECTION

				          }

				          licenses {

				            license {

				              name POM_LICENSE_NAME

				              url POM_LICENSE_URL

				              distribution POM_LICENSE_DIST

				            }

				          }

				          developers {

				            developer {

				              id POM_DEVELOPER_ID

				              name POM_DEVELOPER_NAME

				            }

				          }

				        }

				      }

				    }

				  }

				  signing {

				    required { isReleaseBuild() && gradle.taskGraph.hasTask('uploadArchives') }

				    sign configurations.archives

				  }

				}

									
										5

android/gradle/release.gradle
									
										Normal file
									
												View File
												
				@ -0,0 +1,5 @@

				apply from: rootProject.file('gradle/android_tasks.gradle')

				apply from: rootProject.file('gradle/release_bintray.gradle')

				apply from: rootProject.file('gradle/gradle_maven_push.gradle')

									
										32

android/gradle/release_bintray.gradle
									
										Normal file
									
												View File
												
				@ -0,0 +1,32 @@

				ext {

				    bintrayRepo = 'maven'

				    bintrayUserOrg = 'pytorch'

				    bintrayName = "${GROUP}:${POM_ARTIFACT_ID}"

				    bintrayDescription = POM_DESCRIPTION

				    projectUrl = POM_URL

				    issuesUrl = POM_ISSUES_URL

				    scmUrl = POM_SCM_URL

				    scmConnection = POM_SCM_CONNECTION

				    scmDeveloperConnection = POM_SCM_DEV_CONNECTION

				    publishedGroupId = GROUP

				    libraryName = 'pytorch_android'

				    artifact = 'pytorch_android'

				    developerId = POM_DEVELOPER_ID

				    developerName = POM_DEVELOPER_NAME

				    versionName = VERSION_NAME

				    projectLicenses = {

				        license = {

				            name = POM_LICENSE_NAME

				            url = POM_LICENSE_URL

				            distribution = POM_LICENSE_DIST

				        }

				    }

				}

				apply from: rootProject.file('gradle/android_maven_install.gradle')

				apply from: rootProject.file('gradle/bintray.gradle')

1

android/libs/fbjni Submodule

Submodule android/libs/fbjni added at dc916917e1

									
										47

android/libs/fbjni_local/build.gradle
									
										Normal file
									
												View File
												
				@ -0,0 +1,47 @@

				apply plugin: 'com.android.library'

				apply plugin: 'maven'

				android {

				    compileSdkVersion rootProject.compileSdkVersion

				    buildToolsVersion rootProject.buildToolsVersion

				    defaultConfig {

				        minSdkVersion rootProject.minSdkVersion

				        targetSdkVersion rootProject.targetSdkVersion

				        sourceSets {

				            main {

				                manifest.srcFile '../fbjni/ApplicationManifest.xml'

				                java {

				                    srcDir '../fbjni/java'

				                }

				            }

				        }

				    }

				    buildTypes {

				        debug {

				            minifyEnabled false

				        }

				        release {

				            minifyEnabled false

				        }

				    }

				    externalNativeBuild {

				        cmake {

				            path "../fbjni/CMakeLists.txt"

				        }

				    }

				}

				dependencies {

				    compileOnly 'com.google.code.findbugs:jsr305:3.0.1'

				}

				apply from: rootProject.file('gradle/release.gradle')

				task sourcesJar(type: Jar) {

				    from android.sourceSets.main.java.srcDirs

				    classifier = 'sources'

				}

				artifacts.add('archives', sourcesJar)

									
										4

android/libs/fbjni_local/gradle.properties
									
										Normal file
									
												View File
												
				@ -0,0 +1,4 @@

				POM_NAME=pytorch_android_fbjni

				POM_DESCRIPTION=pytorch_android_fbjni

				POM_ARTIFACT_ID=pytorch_android_fbjni

				POM_PACKAGING=aar

									
										63

android/pytorch_android/CMakeLists.txt
									
										Normal file
									
												View File
												
				@ -0,0 +1,63 @@

				cmake_minimum_required(VERSION 3.4.1)

				project(pytorch CXX)

				set(CMAKE_CXX_STANDARD 11)

				set(CMAKE_VERBOSE_MAKEFILE ON)

				set(pytorch_android_DIR ${CMAKE_CURRENT_LIST_DIR}/src/main/cpp)

				set(libtorch_include_DIR ${pytorch_android_DIR}/libtorch_include/${ANDROID_ABI})

				message(STATUS "libtorch dir:${libtorch_DIR}")

				file(GLOB pytorch_android_SOURCES

				  ${pytorch_android_DIR}/*.cpp

				)

				add_library(pytorch SHARED

				    ${pytorch_android_SOURCES}

				)

				target_compile_options(pytorch PRIVATE

				  -fexceptions

				)

				target_include_directories(pytorch PUBLIC

				    ${libtorch_include_DIR}

				)

				set(BUILD_DIR ${CMAKE_SOURCE_DIR}/build)

				file(MAKE_DIRECTORY ${BUILD_DIR})

				set(fbjni_DIR ${CMAKE_CURRENT_LIST_DIR}/../libs/fbjni/)

				set(fbjni_BUILD_DIR ${BUILD_DIR}/fbjni/${ANDROID_ABI})

				add_subdirectory(${fbjni_DIR} ${fbjni_BUILD_DIR})

				function(import_static_lib name)

				  add_library(${name} STATIC IMPORTED)

				  set_property(

				      TARGET ${name}

				      PROPERTY IMPORTED_LOCATION

				      ${CMAKE_CURRENT_LIST_DIR}/src/main/jniLibs/${ANDROID_ABI}/${name}.a)

				endfunction(import_static_lib)

				import_static_lib(libtorch)

				import_static_lib(libc10)

				import_static_lib(libnnpack)

				import_static_lib(libpytorch_qnnpack)

				import_static_lib(libeigen_blas)

				import_static_lib(libcpuinfo)

				import_static_lib(libclog)

				target_link_libraries(pytorch

				    fbjni

				    -Wl,--gc-sections

				    -Wl,--whole-archive

				    libtorch

				    -Wl,--no-whole-archive

				    libc10

				    libnnpack

				    libpytorch_qnnpack

				    libeigen_blas

				    libcpuinfo

				    libclog

				)

									
										75

android/pytorch_android/build.gradle
									
										Normal file
									
												View File
												
				@ -0,0 +1,75 @@

				apply plugin: 'com.android.library'

				apply plugin: 'maven'

				android {

				    compileSdkVersion rootProject.compileSdkVersion

				    buildToolsVersion rootProject.buildToolsVersion

				    defaultConfig {

				        minSdkVersion rootProject.minSdkVersion

				        targetSdkVersion rootProject.targetSdkVersion

				        versionCode 0

				        versionName "0.1"

				        testInstrumentationRunner "androidx.test.runner.AndroidJUnitRunner"

				        ndk {

				            abiFilters ABI_FILTERS.split(",")

				        }

				    }

				    buildTypes {

				        debug {

				            minifyEnabled false

				            debuggable true

				        }

				        release {

				            minifyEnabled false

				        }

				    }

				    sourceSets {

				        main {

				            jniLibs.srcDirs = ['src/main/jniLibs']

				        }

				    }

				    externalNativeBuild {

				        cmake {

				            path "CMakeLists.txt"

				        }

				    }

				    packagingOptions {

				        if (rootProject.isPublishing()) {

				            exclude '**/libfbjni.so'

				        } else {

				            pickFirst '**/libfbjni.so'

				        }

				    }

				    useLibrary 'android.test.runner'

				    useLibrary 'android.test.base'

				    useLibrary 'android.test.mock'

				}

				dependencies {

				    api project(':fbjni')

				    implementation 'com.android.support:appcompat-v7:28.0.0'

				    testImplementation 'junit:junit:' + rootProject.junitVersion

				    testImplementation 'androidx.test:core:' + rootProject.coreVersion

				    androidTestImplementation 'junit:junit:' + rootProject.junitVersion

				    androidTestImplementation 'androidx.test:core:' + rootProject.coreVersion

				    androidTestImplementation 'androidx.test.ext:junit:' + rootProject.extJUnitVersion

				    androidTestImplementation 'androidx.test:rules:' + rootProject.rulesVersion

				    androidTestImplementation 'androidx.test:runner:' + rootProject.runnerVersion

				}

				apply from: rootProject.file('gradle/release.gradle')

				task sourcesJar(type: Jar) {

				    from android.sourceSets.main.java.srcDirs

				    classifier = 'sources'

				}

				artifacts.add('archives', sourcesJar)

									
										111

android/pytorch_android/generate_test_torchscripts.py
									
										Normal file
									
												View File
												
				@ -0,0 +1,111 @@

				import torch

				OUTPUT_DIR = "src/androidTest/assets/"

				def scriptAndSave(module, fileName):

				    print('-' * 80)

				    script_module = torch.jit.script(module)

				    print(script_module.graph)

				    outputFileName = OUTPUT_DIR + fileName

				    script_module.save(outputFileName)

				    print("Saved to " + outputFileName)

				    print('=' * 80)

				class Test(torch.jit.ScriptModule):

				    def __init__(self):

				        super(Test, self).__init__()

				    @torch.jit.script_method

				    def forward(self, input):

				        return None

				    @torch.jit.script_method

				    def eqBool(self, input):

				        # type: (bool) -> bool

				        return input

				    @torch.jit.script_method

				    def eqInt(self, input):

				        # type: (int) -> int

				        return input

				    @torch.jit.script_method

				    def eqFloat(self, input):

				        # type: (float) -> float

				        return input

				    @torch.jit.script_method

				    def eqStr(self, input):

				        # type: (str) -> str

				        return input

				    @torch.jit.script_method

				    def eqTensor(self, input):

				        # type: (Tensor) -> Tensor

				        return input

				    @torch.jit.script_method

				    def eqDictStrKeyIntValue(self, input):

				        # type: (Dict[str, int]) -> Dict[str, int]

				        return input

				    @torch.jit.script_method

				    def eqDictIntKeyIntValue(self, input):

				        # type: (Dict[int, int]) -> Dict[int, int]

				        return input

				    @torch.jit.script_method

				    def eqDictFloatKeyIntValue(self, input):

				        # type: (Dict[float, int]) -> Dict[float, int]

				        return input

				    @torch.jit.script_method

				    def listIntSumReturnTuple(self, input):

				        # type: (List[int]) -> Tuple[List[int], int]

				        sum = 0

				        for x in input:

				            sum += x

				        return (input, sum)

				    @torch.jit.script_method

				    def listBoolConjunction(self, input):

				        # type: (List[bool]) -> bool

				        res = True

				        for x in input:

				            res = res and x

				        return res

				    @torch.jit.script_method

				    def listBoolDisjunction(self, input):

				        # type: (List[bool]) -> bool

				        res = False

				        for x in input:

				            res = res or x

				        return res

				    @torch.jit.script_method

				    def tupleIntSumReturnTuple(self, input):

				        # type: (Tuple[int, int, int]) -> Tuple[Tuple[int, int, int], int]

				        sum = 0

				        for x in input:

				            sum += x

				        return (input, sum)

				    @torch.jit.script_method

				    def optionalIntIsNone(self, input):

				        # type: (Optional[int]) -> bool

				        return input is None

				    @torch.jit.script_method

				    def intEq0None(self, input):

				        # type: (int) -> Optional[int]

				        if input == 0:

				            return None

				        return input

				    @torch.jit.script_method

				    def str3Concat(self, input):

				        # type: (str) -> str

				        return input + input + input

				scriptAndSave(Test(), "test.pt")

Compare commits

3103 Commits v1.2.0a0 ... v1.3.1

468 .circleci/README.md Unescape Escape View File

34 .circleci/cimodel/data/binary_build_data.py Unescape Escape View File

152 .circleci/cimodel/data/binary_build_definitions.py Unescape Escape View File

37 .circleci/cimodel/data/caffe2_build_data.py Unescape Escape View File

118 .circleci/cimodel/data/caffe2_build_definitions.py Unescape Escape View File

3 .circleci/cimodel/data/dimensions.py Unescape Escape View File

92 .circleci/cimodel/data/pytorch_build_data.py Unescape Escape View File

202 .circleci/cimodel/data/pytorch_build_definitions.py Unescape Escape View File

27 .circleci/cimodel/lib/conf_tree.py Unescape Escape View File

18 .circleci/cimodel/lib/miniyaml.py Unescape Escape View File

7545 .circleci/config.yml View File

38 .circleci/generate_config_yml.py Unescape Escape View File

4 .circleci/scripts/README.md Normal file Unescape Escape View File

2 .circleci/scripts/binary_checkout.sh Unescape Escape View File

38 .circleci/scripts/binary_ios_build.sh Normal file Unescape Escape View File

44 .circleci/scripts/binary_ios_upload.sh Normal file Unescape Escape View File

2 .circleci/scripts/binary_linux_build.sh Unescape Escape View File

12 .circleci/scripts/binary_linux_test.sh Unescape Escape View File

2 .circleci/scripts/binary_linux_upload.sh Unescape Escape View File

2 .circleci/scripts/binary_macos_upload.sh Unescape Escape View File

47 .circleci/scripts/binary_populate_env.sh Unescape Escape View File

59 .circleci/scripts/build_android_gradle.sh Executable file Unescape Escape View File

127 .circleci/scripts/cpp_doc_push_script.sh Executable file Unescape Escape View File

44 .circleci/scripts/publish_android_snapshot.sh Executable file Unescape Escape View File

118 .circleci/scripts/python_doc_push_script.sh Executable file Unescape Escape View File

57 .circleci/scripts/setup_ci_environment.sh Unescape Escape View File

18 .circleci/scripts/setup_linux_system_environment.sh Unescape Escape View File

132 .circleci/scripts/should_run_job.py Normal file Unescape Escape View File

29 .circleci/scripts/should_run_job.sh Executable file Unescape Escape View File

44 .circleci/validate-docker-version.py Executable file Unescape Escape View File

54 .circleci/verbatim-sources/binary-build-params.yml Normal file Unescape Escape View File

261 .circleci/verbatim-sources/binary-job-specs.yml Normal file Unescape Escape View File

8 .circleci/verbatim-sources/binary_update_htmls.yml Unescape Escape View File

28 .circleci/verbatim-sources/caffe2-build-params.yml Normal file Unescape Escape View File

200 .circleci/verbatim-sources/caffe2-job-specs.yml Normal file Unescape Escape View File

90 .circleci/verbatim-sources/commands.yml Normal file Unescape Escape View File

150 .circleci/verbatim-sources/header-section.yml Unescape Escape View File

358 .circleci/verbatim-sources/job-specs-custom.yml Unescape Escape View File

22 .circleci/verbatim-sources/job-specs-setup.yml Unescape Escape View File

96 .circleci/verbatim-sources/linux-binary-build-defaults.yml Unescape Escape View File

218 .circleci/verbatim-sources/linux-build-defaults.yml Unescape Escape View File

66 .circleci/verbatim-sources/macos-binary-build-defaults.yml Unescape Escape View File

85 .circleci/verbatim-sources/macos-build-defaults.yml Unescape Escape View File

18 .circleci/verbatim-sources/nightly-binary-build-defaults.yml Unescape Escape View File

64 .circleci/verbatim-sources/nightly-build-smoke-tests-defaults.yml Unescape Escape View File

39 .circleci/verbatim-sources/pytorch-build-params.yml Normal file Unescape Escape View File

117 .circleci/verbatim-sources/pytorch-job-specs.yml Normal file Unescape Escape View File

93 .circleci/verbatim-sources/workflows-binary-builds-smoke-subset.yml Unescape Escape View File

56 .circleci/verbatim-sources/workflows-nightly-android-binary-builds.yml Normal file Unescape Escape View File

31 .circleci/verbatim-sources/workflows-nightly-ios-binary-builds.yml Normal file Unescape Escape View File

12 .circleci/verbatim-sources/workflows-pytorch-android-gradle-build.yml Normal file Unescape Escape View File

13 .circleci/verbatim-sources/workflows-pytorch-ios-builds.yml Normal file Unescape Escape View File

2 .circleci/verbatim-sources/workflows-pytorch-macos-builds.yml Unescape Escape View File

19 .circleci/verbatim-sources/workflows-s3-html.yml Unescape Escape View File

1 .circleci/verbatim-sources/workflows.yml Unescape Escape View File

1 .clang-tidy Unescape Escape View File

2 .flake8 Unescape Escape View File

1 .github/pytorch-probot.yml vendored Normal file Unescape Escape View File

48 .github/workflows/lint.yml vendored Normal file Unescape Escape View File

26 .gitignore vendored Unescape Escape View File

14 .gitmodules vendored Unescape Escape View File

34 .jenkins/caffe2/bench.sh Unescape Escape View File

11 .jenkins/caffe2/build.sh Unescape Escape View File

19 .jenkins/caffe2/test.sh Unescape Escape View File

2 .jenkins/pytorch/build-asan.sh Unescape Escape View File

47 .jenkins/pytorch/build.sh Unescape Escape View File

24 .jenkins/pytorch/common.sh Unescape Escape View File

50 .jenkins/pytorch/macos-build.sh Unescape Escape View File

48 .jenkins/pytorch/macos-common.sh Executable file Unescape Escape View File

38 .jenkins/pytorch/macos-test.sh Unescape Escape View File

4 .jenkins/pytorch/multigpu-test.sh Unescape Escape View File

83 .jenkins/pytorch/test.sh Unescape Escape View File

21 .jenkins/pytorch/win-test-helpers/build_pytorch.bat Unescape Escape View File

2 .jenkins/pytorch/win-test-helpers/download_image.py Unescape Escape View File

4 .jenkins/pytorch/win-test-helpers/installation-helpers/install_mkl.bat Unescape Escape View File

5 .jenkins/pytorch/win-test-helpers/setup_pytorch_env.bat Unescape Escape View File

11 .jenkins/pytorch/win-test-helpers/test_custom_script_ops.bat Unescape Escape View File

28 .jenkins/pytorch/win-test-helpers/test_libtorch.bat Unescape Escape View File

3103 Commits

v1.2.0a0 ... v1.3.1

468

.circleci/README.md

View File

34

.circleci/cimodel/data/binary_build_data.py

View File

152

.circleci/cimodel/data/binary_build_definitions.py

View File

37

.circleci/cimodel/data/caffe2_build_data.py

View File

118

.circleci/cimodel/data/caffe2_build_definitions.py

View File

3

.circleci/cimodel/data/dimensions.py

View File

92

.circleci/cimodel/data/pytorch_build_data.py

View File

202

.circleci/cimodel/data/pytorch_build_definitions.py

View File

27

.circleci/cimodel/lib/conf_tree.py

View File

18

.circleci/cimodel/lib/miniyaml.py

View File

7545

.circleci/config.yml

View File

38

.circleci/generate_config_yml.py

View File

4

.circleci/scripts/README.md Normal file

View File

2

.circleci/scripts/binary_checkout.sh

View File

38

.circleci/scripts/binary_ios_build.sh Normal file

View File

44

.circleci/scripts/binary_ios_upload.sh Normal file

View File

2

.circleci/scripts/binary_linux_build.sh

View File

12

.circleci/scripts/binary_linux_test.sh

View File

2

.circleci/scripts/binary_linux_upload.sh

View File

2

.circleci/scripts/binary_macos_upload.sh

View File

47

.circleci/scripts/binary_populate_env.sh

View File

59

.circleci/scripts/build_android_gradle.sh Executable file

View File

127

.circleci/scripts/cpp_doc_push_script.sh Executable file

View File

44

.circleci/scripts/publish_android_snapshot.sh Executable file

View File

118

.circleci/scripts/python_doc_push_script.sh Executable file

View File

57

.circleci/scripts/setup_ci_environment.sh

View File

18

.circleci/scripts/setup_linux_system_environment.sh

View File

132

.circleci/scripts/should_run_job.py Normal file

View File

29

.circleci/scripts/should_run_job.sh Executable file

View File

44

.circleci/validate-docker-version.py Executable file

View File

54

.circleci/verbatim-sources/binary-build-params.yml Normal file

View File

261

.circleci/verbatim-sources/binary-job-specs.yml Normal file

View File

8

.circleci/verbatim-sources/binary_update_htmls.yml

View File

28

.circleci/verbatim-sources/caffe2-build-params.yml Normal file

View File

200

.circleci/verbatim-sources/caffe2-job-specs.yml Normal file

View File

90

.circleci/verbatim-sources/commands.yml Normal file

View File

150

.circleci/verbatim-sources/header-section.yml

View File

358

.circleci/verbatim-sources/job-specs-custom.yml

View File

22

.circleci/verbatim-sources/job-specs-setup.yml

View File

96

.circleci/verbatim-sources/linux-binary-build-defaults.yml

View File

218

.circleci/verbatim-sources/linux-build-defaults.yml

View File

66

.circleci/verbatim-sources/macos-binary-build-defaults.yml

View File

85

.circleci/verbatim-sources/macos-build-defaults.yml

View File

18

.circleci/verbatim-sources/nightly-binary-build-defaults.yml

View File

64

.circleci/verbatim-sources/nightly-build-smoke-tests-defaults.yml

View File

39

.circleci/verbatim-sources/pytorch-build-params.yml Normal file

View File

117

.circleci/verbatim-sources/pytorch-job-specs.yml Normal file

View File

93

.circleci/verbatim-sources/workflows-binary-builds-smoke-subset.yml

View File

56

.circleci/verbatim-sources/workflows-nightly-android-binary-builds.yml Normal file

View File

31

.circleci/verbatim-sources/workflows-nightly-ios-binary-builds.yml Normal file

View File

12

.circleci/verbatim-sources/workflows-pytorch-android-gradle-build.yml Normal file

View File

13

.circleci/verbatim-sources/workflows-pytorch-ios-builds.yml Normal file

View File

2

.circleci/verbatim-sources/workflows-pytorch-macos-builds.yml

View File

19

.circleci/verbatim-sources/workflows-s3-html.yml

View File

1

.circleci/verbatim-sources/workflows.yml

View File

1

.clang-tidy

View File

2

.flake8

View File

1

.github/pytorch-probot.yml vendored Normal file

View File

48

.github/workflows/lint.yml vendored Normal file

View File

26

.gitignore vendored

View File

14

.gitmodules vendored

View File

34

.jenkins/caffe2/bench.sh

View File

11

.jenkins/caffe2/build.sh

View File

19

.jenkins/caffe2/test.sh

View File

2

.jenkins/pytorch/build-asan.sh

View File

47

.jenkins/pytorch/build.sh

View File

24

.jenkins/pytorch/common.sh

View File

50

.jenkins/pytorch/macos-build.sh

View File

48

.jenkins/pytorch/macos-common.sh Executable file

View File

38

.jenkins/pytorch/macos-test.sh

View File

4

.jenkins/pytorch/multigpu-test.sh

View File

83

.jenkins/pytorch/test.sh

View File

21

.jenkins/pytorch/win-test-helpers/build_pytorch.bat

View File

2

.jenkins/pytorch/win-test-helpers/download_image.py

View File

4

.jenkins/pytorch/win-test-helpers/installation-helpers/install_mkl.bat

View File

5

.jenkins/pytorch/win-test-helpers/setup_pytorch_env.bat

View File

11

.jenkins/pytorch/win-test-helpers/test_custom_script_ops.bat

View File

28

.jenkins/pytorch/win-test-helpers/test_libtorch.bat

View File

1

.jenkins/pytorch/win-test-helpers/test_python_all_except_nn.bat

View File