pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Nikita Shulga	b9adbb5002	Fix/relax CMake linter rules (#35574 ) Summary: Ignore mixed upper-case/lower-case style for now Fix space between function and its arguments violation Pull Request resolved: https://github.com/pytorch/pytorch/pull/35574 Test Plan: CI Differential Revision: D20712969 Pulled By: malfet fbshipit-source-id: 0012d430aed916b4518599a0b535e82d15721f78	2020-03-27 16:52:33 -07:00
anjali411	96eec95ece	torch.from_numpy for complex dtypes (#35531 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35531 Differential Revision: D20693581 Pulled By: anjali411 fbshipit-source-id: d53e26b4175452fa00b287efbfceea18104c1364	2020-03-27 14:40:28 -07:00
Orion Reblitz-Richardson	f101949390	Remove python2 support from setup.py (#35539 ) Summary: As a followup to https://github.com/pytorch/pytorch/pull/35042 this removes python2 from setup.py and adds Python 3.8 to the list of supported versions. We're already testing this in CircleCI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35539 Differential Revision: D20709060 Pulled By: orionr fbshipit-source-id: 5d40bc14cb885374fec370fc7c5d3cde8769039a	2020-03-27 14:33:11 -07:00
peter	45c9ed825a	Formatting cmake (to lowercase without space for if/elseif/else/endif) (#35521 ) Summary: Running commands: ```bash shopt -s globstar sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i CMakeLists.txt sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i caffe2//CMakeLists.txt sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i torch//CMakeLists.txt sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i c10//CMakeLists.txt sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i cmake//.cmake sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i cmake//.cmake.in ``` We may further convert all the commands into lowercase according to the following issue: `77543bde41`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35521 Differential Revision: D20704382 Pulled By: malfet fbshipit-source-id: 42186b9b1660c34428ab7ceb8d3f7a0ced5d2e80	2020-03-27 14:25:17 -07:00
Jerry Zhang	04a3345335	[quant] Make conv2d_prepack and linear_prepack pure (#35073 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35073 We want to do constant propagation for quantize_per_tensor/quantize_per_channel which will produce results that's consumed by these ops, and since we need to make sure the output of the node has no writer before constant prop through the node, the consumer needs to be pure as well. Test Plan: see next PR Imported from OSS Differential Revision: D20655310 fbshipit-source-id: 3e33662224c21b889c8121b823f8ce0b7da75eed	2020-03-27 14:19:32 -07:00
Eli Uriegas	e1773f2ac0	.circleci: Change default CUDA for pip, cu101 -> cu102 (#35309 ) Summary: So that packages are correctly marked when looking through the html pages. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/35309 Differential Revision: D20626737 Pulled By: seemethere fbshipit-source-id: 0fad3d99f0b0086898939fde94ddbbc9861d257e	2020-03-27 14:13:37 -07:00
Mario Kostelac	02d6e6e55f	histc: Add a note on elements outside of given bounds (#34889 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34889 Differential Revision: D20625916 Pulled By: albanD fbshipit-source-id: febb769f40d86bae8e1c7bb51d719b92bf4a572d	2020-03-27 14:04:51 -07:00
Nikita Shulga	4529d03971	Move test_libtorch from win-test2 to win-test1 group (#35540 ) Summary: Let see if it makes both test branches a bit more balanced Pull Request resolved: https://github.com/pytorch/pytorch/pull/35540 Test Plan: CI Differential Revision: D20704642 Pulled By: malfet fbshipit-source-id: 4e2ab5a80adfe78620206d4eaea30207194379cc	2020-03-27 13:10:53 -07:00
Basil Hosmer	ef511d884b	Calls to _empty_affine_quantized pass MemoryFormat by TensorOptions (#34248 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34248 This argument will no longer exist in positional form when MemoryFormat is moved into TensorOptions by codegen, so we must stop using it when we make calls from C++. This diff eliminates all direct positional calls, making them be passed in using TensorOptions. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20683398 Pulled By: bhosmer fbshipit-source-id: 6928cfca67abb22fbc667ecc2af8453d93489bd6	2020-03-27 13:02:13 -07:00
Shihao Xu	05e973d673	Add WorkerInfo through TorchBind to make it an available type in TorchScript (#35447 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35447 as titled Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork && \ buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \ -r test_return_local_script_class_rref_in_py_and_use_in_script ``` Differential Revision: D7923053 fbshipit-source-id: 7b80e0b28aa66343249b8af328ba251314674dcc	2020-03-27 12:41:28 -07:00
Johannes M Dieterich	835ee34e38	[ROCm] Update to ROCm 3.1.1 (#35552 ) Summary: Redux. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35552 Differential Revision: D20701593 Pulled By: ezyang fbshipit-source-id: 1946d1e8fb47d597da903bae5d355bf52a5f017f	2020-03-27 12:21:12 -07:00
Eli Uriegas	ff71a4192d	Bump base version to 1.6.0a0 (#35495 ) Summary: Since we've done the branch cut for 1.5.0 we should bump nightlies to 1.6.0 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/35495 Differential Revision: D20697043 Pulled By: seemethere fbshipit-source-id: 3646187a5e729994138bf2c68625f25f11430b3a	2020-03-27 12:14:49 -07:00
Nikolay Korovaiko	9e22d15f14	Enable tensorexpr cpp tests in CI. try #2 (#35454 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35454 Differential Revision: D20665160 Pulled By: Krovatkin fbshipit-source-id: e04cbe92b2ee5a3288f3c4e5c83533bfea85bf85	2020-03-27 12:09:55 -07:00
Vitaly Fedyunin	930d218fbf	Increase Channels Last test coverage (#35504 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35504 Test Plan: Imported from OSS Differential Revision: D20682117 Pulled By: VitalyFedyunin fbshipit-source-id: ddd7ef1f075ea2c5c35df7bd698974fc5c59bc40	2020-03-27 12:04:47 -07:00
Alexander Fix	3af46c90bd	[caffe2] Header path in byte_order.h (#35519 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35519 Fix include of THHalf.h to be TH/THHalf.h. Makes the include consistent with the rest of caffe2. Test Plan: CI Differential Revision: D20685997 fbshipit-source-id: 893b6e96e4f1a1e7306ba2e40e4e8ee738f0344f	2020-03-27 11:57:21 -07:00
Jerry Zhang	2c300df2ac	[fix] at::print for quantized Tensor (#35545 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35545 Looks like we have never printed a quantized Tensor in cpp before (Note: this ignores all push blocking failures!) Test Plan: . Imported from OSS Differential Revision: D20699748 fbshipit-source-id: 9d029815c6e75f626afabf92194154efc83f5545	2020-03-27 11:15:28 -07:00
Nikita Shulga	3cc43bcbb5	Skip slow quanitized tests under ASAN (#35533 ) Summary: Skip tests that take more than finish under a sec normally but take 20+ min under ASAN Pull Request resolved: https://github.com/pytorch/pytorch/pull/35533 Test Plan: CI Differential Revision: D20700245 Pulled By: malfet fbshipit-source-id: 7620b12d3aba1bafb2baa9073fa27c4a0b3dd9eb	2020-03-27 10:55:14 -07:00
peter	0c16cedafe	Fix some incorrect annotations found by clang-cl (#35364 ) Summary: Fixes incorrect usages of symbol annotations including: 1. Exporting or importing a function/class in an anonymous namespace. 2. Exporting or importing a function/class implementation in a header file. However, by removing the symbol annotations, they are now local symbols. If they need to be remain global, I can move the implementations to the source file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35364 Differential Revision: D20670031 Pulled By: ezyang fbshipit-source-id: cd8018dee703e2424482c27fe9608e040d8105b8	2020-03-27 10:40:04 -07:00
Hong Xu	b33e38ec47	Allow a higher-precision step type for Vec256::arange (#34555 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34555 This is sometimes necessary, such as when T=int and the step size is of type double. Test Plan: Imported from OSS Differential Revision: D20687063 Pulled By: ezyang fbshipit-source-id: 33086d4252d06e7539733a9b1b3d6774e177b6da	2020-03-27 10:22:05 -07:00
Hong Xu	5a02930d3a	Vectorize (CPU) generic types for binary bitwise operators (#34338 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34338 For those types not optimized for AVX2, this commit would give bitwise operations on them a boost. Benchmark (RHEL 7.7, Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz, Turbo off, Release build): ```python import timeit for op in ('bitwise_and', 'bitwise_or', 'bitwise_xor'): for dtype in ('torch.int8', 'torch.uint8'): for n, t in [(10_000, 200000), (100_000, 20000)]: print(f'a.{op}_(b), numel() == {n} for {t} times, dtype={dtype}') print(timeit.timeit(f'a.{op}_(b)', setup=f'import torch; a = torch.arange(1, {n}, dtype={dtype}); b = torch.arange({n}, 1, -1, dtype={dtype})', number=t)) ``` Before: ``` a.bitwise_and_(b), numel() == 10000 for 200000 times, dtype=torch.int8 1.353799690001324 a.bitwise_and_(b), numel() == 100000 for 20000 times, dtype=torch.int8 1.056434961999912 a.bitwise_and_(b), numel() == 10000 for 200000 times, dtype=torch.uint8 1.2957618809996347 a.bitwise_and_(b), numel() == 100000 for 20000 times, dtype=torch.uint8 1.0591609650000464 a.bitwise_or_(b), numel() == 10000 for 200000 times, dtype=torch.int8 1.3113185389993305 a.bitwise_or_(b), numel() == 100000 for 20000 times, dtype=torch.int8 1.0693870880022587 a.bitwise_or_(b), numel() == 10000 for 200000 times, dtype=torch.uint8 1.3075691039994126 a.bitwise_or_(b), numel() == 100000 for 20000 times, dtype=torch.uint8 1.0589785859992844 a.bitwise_xor_(b), numel() == 10000 for 200000 times, dtype=torch.int8 1.3036618039986934 a.bitwise_xor_(b), numel() == 100000 for 20000 times, dtype=torch.int8 1.0595013140009542 a.bitwise_xor_(b), numel() == 10000 for 200000 times, dtype=torch.uint8 1.2947387999993225 a.bitwise_xor_(b), numel() == 100000 for 20000 times, dtype=torch.uint8 1.059969027999614 ``` After: ``` a.bitwise_and_(b), numel() == 10000 for 200000 times, dtype=torch.int8 0.9562859639991075 a.bitwise_and_(b), numel() == 100000 for 20000 times, dtype=torch.int8 0.6811799210008758 a.bitwise_and_(b), numel() == 10000 for 200000 times, dtype=torch.uint8 0.9522694869992847 a.bitwise_and_(b), numel() == 100000 for 20000 times, dtype=torch.uint8 0.6815469840003061 a.bitwise_or_(b), numel() == 10000 for 200000 times, dtype=torch.int8 0.8609786279994296 a.bitwise_or_(b), numel() == 100000 for 20000 times, dtype=torch.int8 0.5794818879985542 a.bitwise_or_(b), numel() == 10000 for 200000 times, dtype=torch.uint8 0.8534434389985108 a.bitwise_or_(b), numel() == 100000 for 20000 times, dtype=torch.uint8 0.5764101290005783 a.bitwise_xor_(b), numel() == 10000 for 200000 times, dtype=torch.int8 0.9634105910008657 a.bitwise_xor_(b), numel() == 100000 for 20000 times, dtype=torch.int8 0.6819724230008433 a.bitwise_xor_(b), numel() == 10000 for 200000 times, dtype=torch.uint8 1.0901075929978106 a.bitwise_xor_(b), numel() == 100000 for 20000 times, dtype=torch.uint8 0.816546294001455 ``` Test Plan: Imported from OSS Differential Revision: D20687081 Pulled By: ezyang fbshipit-source-id: 59b06460430ce181fb761e45a5bdd6379611b391	2020-03-27 10:15:53 -07:00
anjali411	3c02de0011	copy_ fixed on cuda so removing the workaround in test_many_promotions (#35528 ) Summary: copy_() launch failure fixed on cuda for complex https://github.com/pytorch/pytorch/issues/35344 so removing the workaround added in PR https://github.com/pytorch/pytorch/issues/34093 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35528 Differential Revision: D20693228 Pulled By: anjali411 fbshipit-source-id: dbb6369aa5a21574a0a4fe878ca10e4ecc605f6b	2020-03-27 09:39:46 -07:00
Edward Yang	77ad3c5aeb	Revert D20683972: [pytorch][PR] Fix PyTorch separate compilation Test Plan: revert-hammer Differential Revision: D20683972 Original commit changeset: bc1492aa9d1d fbshipit-source-id: 8994cbb36877d4338b8677ac6bc807dd16efa67c	2020-03-27 09:18:48 -07:00
Jongsoo Park	16394a9d3f	[caffe2] early return for empty indices in SLS (#35498 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35498 As title Test Plan: Need to run remote predictor canary In SKL T6, numactl -m 0 -C 3 ./sparse_lengths_sum_benchmark.par -d float -e 100000 --embedding-dim 1 --average-len 0 --batch-size 16 -i 1000000 Before this diff 0.000302733 ms. 100%. SparseLengthsSum After this diff 0.000214509 ms. 100%. SparseLengthsSum Reviewed By: jianyuh, ellie-wen Differential Revision: D20678075 fbshipit-source-id: c0c8359036b82ffcbcc8b2a89dfb62db7f0a9c14	2020-03-27 09:10:45 -07:00
peter	25fe7f33ce	Add cmakelint to CI (#35525 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35525 Differential Revision: D20696655 Pulled By: malfet fbshipit-source-id: 1b15cd730066c8a80440b39110f7f0d51f8ebad0	2020-03-27 09:04:36 -07:00
Xiaomeng Yang	58f5a89c9a	Refactor RoIAlignOp on CPU (#34698 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34698 Refactor RoIAlignOp on CPU Test Plan: buck test mode/dev-nosan //caffe2/caffe2/python/operator_test:roi_align_rotated_op_test Reviewed By: houseroad Differential Revision: D20432434 fbshipit-source-id: 9125eb3bdc83c734222d7d4947c175e3b585afa7	2020-03-27 07:53:58 -07:00
Linbin Yu	2d023fe6a7	[7] add missing roi_align_rotated op to lite interpreter (#35244 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35244 add roi_align_rotated op to lite interpreter for detectron2go model (Note: this ignores all push blocking failures!) Test Plan: try to run model in https://home.fburl.com/~stzpz/text_det/fbnet_300_20/ Reviewed By: iseeyuan Differential Revision: D20560485 fbshipit-source-id: a81f3a590b9cc5a02d4da676b3cfa52b0e0a68c3	2020-03-27 07:26:02 -07:00
Alban Desmaison	181da12126	Revert D20687652: [pytorch][PR] Report results from cpp unittests on Windows and Linux Test Plan: revert-hammer Differential Revision: D20687652 Original commit changeset: fc370b7e2614 fbshipit-source-id: 8153815c8ed8f3d4f472caa95eda76180b038a42	2020-03-27 06:56:53 -07:00
Alban Desmaison	45e1be9762	Revert D19710370: [pytorch][PR] ONNX Update training ops and training amenable export API Test Plan: revert-hammer Differential Revision: D19710370 Original commit changeset: e5e79d385529 fbshipit-source-id: d0114dc561a3415869805d3fbf43b92730bbcf54	2020-03-27 06:51:05 -07:00
Linbin Yu	e5cd17cc9e	[4] register quantized ops for lite interpreter (#35247 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35247 add a leading "_" to register quantized ops for lite interpreter. They are needed by d2go model (Note: this ignores all push blocking failures!) Test Plan: (whole stack) buck build -c user.ndk_cxxflags='-g1' -c caffe2.expose_op_to_c10=1 //xplat/caffe2/fb/pytorch_predictor:maskrcnnAndroid#android-armv7 Reviewed By: iseeyuan Differential Revision: D20528760 fbshipit-source-id: 5b26d075456641b02d82f15a2d19f2266001f23b	2020-03-27 02:26:03 -07:00
Lara Haidar	025a0abe5a	ONNX Update training ops and training amenable export API (#32950 ) Summary: - Update Dropout and Batchnorm in opset 12 : https://github.com/onnx/onnx/pull/2568 - Update api logic for exporting to ONNX training amenable models Pull Request resolved: https://github.com/pytorch/pytorch/pull/32950 Reviewed By: hl475 Differential Revision: D19710370 Pulled By: houseroad fbshipit-source-id: e5e79d38552936966662c41d39ddf33be1ba3e35	2020-03-27 00:39:39 -07:00
Shihao Xu	ac639d927a	Reland "[RPC] Use qualified name str directly in RPC torch script code path" (#35489 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35489 Relanding https://github.com/pytorch/pytorch/pull/34733. Fix is in https://github.com/pytorch/pytorch/pull/34988 Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork ``` ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork && \ buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \ -r test_return_local_script_class_rref_in_py_and_use_in_script buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork && \ buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \ -r test_return_local_script_module_rref_in_py_and_use_in_script ``` Differential Revision: D20661748 fbshipit-source-id: d550daab8d689d0a9aa2450f3bdb7417ab79dae2	2020-03-26 23:41:51 -07:00
Nikita Shulga	d2d40c45b6	Report results from cpp unittests on Windows and Linux (#35500 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35500 Test Plan: Test in production :) Results should eventually be published to: https://circleci.com/build-insights/gh/pytorch/pytorch/master Differential Revision: D20687652 Pulled By: malfet fbshipit-source-id: fc370b7e261402e14b427f42038ecb2d95bad059	2020-03-26 23:00:33 -07:00
Martin Yuan	da4e68faed	Make operator names consistent between export_opnames and the lite interpreter (#34674 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34674 Two changes to make sure the op_names dumped in export_opnames() are consistent to what are actually used in bytecode. * Inline graph before dumping the operator names. * Use code of the graph (which is used in bytecode) instead of the nodes of graph. Test Plan: Imported from OSS Differential Revision: D20610715 Pulled By: iseeyuan fbshipit-source-id: 53fa9c3b36f4f242b7f2b99b421f4adf20d4b1f6	2020-03-26 22:50:59 -07:00
Elias Ellison	8c90ae11b3	[JIT] fix glow subgraph inputs ordering (#35508 ) Summary: My PR https://github.com/pytorch/pytorch/pull/33020 changed subgraph_utils made subgraph utils non-deterministic by using a set instead of a vector for closed over values. This broke a downstream glow test. We're in the process of working with glow to not rely on the subgraph input order, but in the interim make it ordered again to fix the test. An alternative is to use a `set` instead of a vector, but I don't particularly like committing to fixed ordering for the subgraph, especially for things like if nodes and while loops where an order doesn't really have any meaning. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35508 Differential Revision: D20683959 Pulled By: eellison fbshipit-source-id: bb39b29fef2904e52b9dc42be194bb57cbea59c4	2020-03-26 22:44:54 -07:00
pinzhenx	bd604cb5b7	Upgrade MKL-DNN to DNNL v1.2 (#32422 ) Summary: ## Motivation This PR upgrades MKL-DNN from v0.20 to DNNL v1.2 and resolves https://github.com/pytorch/pytorch/issues/30300. DNNL (Deep Neural Network Library) is the new brand of MKL-DNN, which improves performance, quality, and usability over the old version. This PR focuses on the migration of all existing functionalities, including minor fixes, performance improvement and code clean up. It serves as the cornerstone of our future efforts to accommodate new features like OpenCL support, BF16 training, INT8 inference, etc. and to let the Pytorch community derive more benefits from the Intel Architecture. <br> ## What's included? Even DNNL has many breaking changes to the API, we managed to absorb most of them in ideep. This PR contains minimalist changes to the integration code in pytorch. Below is a summary of the changes: <br> General: 1. Replace op-level allocator with global-registered allocator ``` // before ideep::sum::compute<AllocForMKLDNN>(scales, {x, y}, z); // after ideep::sum::compute(scales, {x, y}, z); ``` The allocator is now being registeted at `aten/src/ATen/native/mkldnn/IDeepRegistration.cpp`. Thereafter all tensors derived from the `cpu_engine` (by default) will use the c10 allocator. ``` RegisterEngineAllocator cpu_alloc( ideep::engine::cpu_engine(), [](size_t size) { return c10::GetAllocator(c10::DeviceType::CPU)->raw_allocate(size); }, [](void* p) { c10::GetAllocator(c10::DeviceType::CPU)->raw_deallocate(p); } ); ``` ------ 2. Simplify group convolution We had such a scenario in convolution where ideep tensor shape mismatched aten tensor: when `groups > 1`, DNNL expects weights tensors to be 5-d with an extra group dimension, e.g. `goihw` instead of `oihw` in 2d conv case. As shown below, a lot of extra checks came with this difference in shape before. Now we've completely hidden this difference in ideep and all tensors are going to align with pytorch's definition. So we could safely remove these checks from both aten and c2 integration code. ``` // aten/src/ATen/native/mkldnn/Conv.cpp if (w.ndims() == x.ndims() + 1) { AT_ASSERTM( groups > 1, "Only group _mkldnn_conv2d weights could have been reordered to 5d"); kernel_size[0] = w.get_dim(0) * w.get_dim(1); std::copy_n( w.get_dims().cbegin() + 2, x.ndims() - 1, kernel_size.begin() + 1); } else { std::copy_n(w.get_dims().cbegin(), x.ndims(), kernel_size.begin()); } ``` ------ 3. Enable DNNL built-in cache Previously, we stored DNNL jitted kernels along with intermediate buffers inside ideep using an LRU cache. Now we are switching to the newly added DNNL built-in cache, and no longer caching buffers in order to reduce memory footprint. This change will be mainly reflected in lower memory usage from memory profiling results. On the code side, we removed couple of lines of `op_key_` that depended on the ideep cache before. ------ 4. Use 64-bit integer to denote dimensions We changed the type of `ideep::dims` from `vector<int32_t>` to `vector<int64_t>`. This renders ideep dims no longer compatible with 32-bit dims used by caffe2. So we use something like `{stride_.begin(), stride_.end()}` to cast parameter `stride_` into a int64 vector. <br> Misc changes in each commit: Commit: change build options Some build options were slightly changed, mainly to avoid name collisions with other projects that include DNNL as a subproject. In addition, DNNL built-in cache is enabled by option `DNNL_ENABLE_PRIMITIVE_CACHE`. Old \| New -- \| -- WITH_EXAMPLE \| MKLDNN_BUILD_EXAMPLES WITH_TEST \| MKLDNN_BUILD_TESTS MKLDNN_THREADING \| MKLDNN_CPU_RUNTIME MKLDNN_USE_MKL \| N/A (not use MKL anymore) ------ Commit: aten reintegration - aten/src/ATen/native/mkldnn/BinaryOps.cpp Implement binary ops using new operation `binary` provided by DNNL - aten/src/ATen/native/mkldnn/Conv.cpp Clean up group convolution checks Simplify conv backward integration - aten/src/ATen/native/mkldnn/MKLDNNConversions.cpp Simplify prepacking convolution weights - test/test_mkldnn.py Fixed an issue in conv2d unit test: it didn't check conv results between mkldnn and aten implementation before. Instead, it compared the mkldnn with mkldnn as the default cpu path will also go into mkldnn. Now we use `torch.backends.mkldnn.flags` to fix this issue - torch/utils/mkldnn.py Prepack weight tensor on module `__init__` to achieve better performance significantly ------ Commit: caffe2 reintegration - caffe2/ideep/ideep_utils.h Clean up unused type definitions - caffe2/ideep/operators/adam_op.cc & caffe2/ideep/operators/momentum_sgd_op.cc Unify tensor initialization with `ideep::tensor::init`. Obsolete `ideep::tensor::reinit` - caffe2/ideep/operators/conv_op.cc & caffe2/ideep/operators/quantization/int8_conv_op.cc Clean up group convolution checks Revamp convolution API - caffe2/ideep/operators/conv_transpose_op.cc Clean up group convolution checks Clean up deconv workaround code ------ Commit: custom allocator - Register c10 allocator as mentioned above <br><br> ## Performance We tested inference on some common models based on user scenarios, and most performance numbers are either better than or on par with DNNL 0.20. ratio: new / old \| Latency (batch=1 4T) \| Throughput (batch=64 56T) -- \| -- \| -- pytorch resnet18 \| 121.4% \| 99.7% pytorch resnet50 \| 123.1% \| 106.9% pytorch resnext101_32x8d \| 116.3% \| 100.1% pytorch resnext50_32x4d \| 141.9% \| 104.4% pytorch mobilenet_v2 \| 163.0% \| 105.8% caffe2 alexnet \| 303.0% \| 99.2% caffe2 googlenet-v3 \| 101.1% \| 99.2% caffe2 inception-v1 \| 102.2% \| 101.7% caffe2 mobilenet-v1 \| 356.1% \| 253.7% caffe2 resnet101 \| 100.4% \| 99.8% caffe2 resnet152 \| 99.8% \| 99.8% caffe2 shufflenet \| 141.1% \| 69.0% † caffe2 squeezenet \| 98.5% \| 99.2% caffe2 vgg16 \| 136.8% \| 100.6% caffe2 googlenet-v3 int8 \| 100.0% \| 100.7% caffe2 mobilenet-v1 int8 \| 779.2% \| 943.0% caffe2 resnet50 int8 \| 99.5% \| 95.5% _Configuration: Platform: Skylake 8180 Latency Test: 4 threads, warmup 30, iteration 500, batch size 1 Throughput Test: 56 threads, warmup 30, iteration 200, batch size 64_ † Shufflenet is one of the few models that require temp buffers during inference. The performance degradation is an expected issue since we no longer cache any buffer in the ideep. As for the solution, we suggest users opt for caching allocator like jemalloc as a drop-in replacement for system allocator in such heavy workloads. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32422 Test Plan: Perf results: https://our.intern.facebook.com/intern/fblearner/details/177790608?tab=Experiment%20Results 10% improvement for ResNext with avx512, neutral on avx2 More results: https://fb.quip.com/ob10AL0bCDXW#NNNACAUoHJP Reviewed By: yinghai Differential Revision: D20381325 Pulled By: dzhulgakov fbshipit-source-id: 803b906fd89ed8b723c5fcab55039efe3e4bcb77	2020-03-26 22:07:59 -07:00
Orion Reblitz-Richardson	8240db11e1	[pytorch] Remove python2 support from tests and torch.jit (#35042 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35042 Removing python2 tests and some compat code in torch.jit. Check if dependent projects and external tests have any issues after these changes. Test Plan: waitforsandcastle Reviewed By: suo, seemethere Differential Revision: D18942633 fbshipit-source-id: d76cc41ff20bee147dd8d44d70563c10d8a95a35	2020-03-26 21:29:51 -07:00
Rohan Varma	98362d11ff	[rpc] create error string in listenLoop outside of lock (#35393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35393 this was being created inside the lock scope, but we don't need to hold the lock for this. ghstack-source-id: 100953426 Test Plan: CI Differential Revision: D20632225 fbshipit-source-id: dbf6746f638b7df5fefd9bbfceaa6b1a542580e2	2020-03-26 20:57:01 -07:00
Ailing Zhang	77bbbf042d	[JIT]Support converting str to float. (#35352 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35352 Differential Revision: D20649286 Pulled By: ailzhang fbshipit-source-id: e9b09bddd0fe3c962a7514d45fd069cd0b4e6df1	2020-03-26 20:24:59 -07:00
Jiakai Liu	00a261fddd	[pytorch] add fallthrough variable kernel for C10_MOBILE (#35491 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35491 The goal of this diff is to avoid having to set AutoNonVariableTypeMode guard in client code that uses custom mobile build. The guard was necessary because custom mobile build might not include variable kernels, in which AutoNonVariableTypeMode guard is usually set. It's hard to enforce all callsites to follow this rule, so we make this change to simplify it. Another goal of the diff is to not break FL where real variable kernels are registered. ghstack-source-id: 100944553 Test Plan: - With stacked diff, tested lite-trainer with MnistModel: ``` buck run xplat/caffe2/fb/lite_trainer:lite_trainer \ -c pt.disable_gen_tracing=1 \ -- --model=/home/liujiakai/ptmodels/MnistModel.bc ``` - Will test with the papaya sample app. Differential Revision: D20643627 fbshipit-source-id: 37ea937919259c183809c2b7acab0741eff84d33	2020-03-26 20:08:05 -07:00
peter	f5383a213f	Fix openmp detection with clang-cl (#35365 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35365 Differential Revision: D20653049 Pulled By: ezyang fbshipit-source-id: 193c0d956b1aea72b3daa104ef49c4bf167a165a	2020-03-26 19:59:53 -07:00
anjali411	5371fdb1a0	[C++ API Parity] [Optimizers] Merged Optimizer and LossClosureOptimizer (#34957 ) Summary: 1. Removed LossClosureOptimizer, and merged Optimizer into OptimizerBase (and renamed the merged class to Optimizer) 2. Merged the LBFGS-specific serialize test function and the generic test_serialize_optimizer function. 3. BC-compatibility serialization test for LBFGS 4. Removed mentions of parameters_ in optimizer.cpp, de-virtualize all functions 5. Made defaults_ optional argument in all optimizers except SGD TODO: add BC-breaking notes for this PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/34957 Test Plan: Imported from GitHub, without a `Test Plan:` line. Differential Revision: D20678162 Pulled By: yf225 fbshipit-source-id: 74e062e42d86dc118f0fbaddd794e438b2eaf35a	2020-03-26 19:53:02 -07:00
Elias Ellison	e68afe3ab9	[JIT] remove prim::shape op (#34286 ) Summary: Desugar prim::shape to aten::size so that passes don't need to reason about both ops. Serialized models still resolve to `prim::shape` so this doesn't break BC. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34286 Differential Revision: D20316818 Pulled By: eellison fbshipit-source-id: d1585687212843f51e9396e07c108f5c08017818	2020-03-26 19:29:25 -07:00
Pritam Damania	8f18cdf2b8	[Autograd Testing] Few refactors to test_autograd.py (#35443 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35443 Addressing Wanchao's comments from https://github.com/pytorch/pytorch/pull/35268. ghstack-source-id: 100944390 Test Plan: waitforbuildbot Differential Revision: D20662292 fbshipit-source-id: d98bf27106e858fe81e0f7755639c7da0f322913	2020-03-26 18:57:52 -07:00
svcscm	5d9694250c	Updating submodules Summary: GitHub commits: `6a867586ed` `bf0ba207b5` `b90f25fcfe` `ea2ad0ad00` `f32a0cc4a7` `23826a3f97` `6301dbe7a7` `3332b50f59` `b6cf025c4f` `683abef629` `099bb93f87` `10214d1d1b` `5b848ab61d` `a6e81fb889` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: cfd395231e68b7d026fce966bcb8cddf10996770	2020-03-26 18:51:35 -07:00
Michael Suo	9970be2fd2	Update git-pre-commit (#35511 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35511 Differential Revision: D20684849 Pulled By: suo fbshipit-source-id: e059e15230d1a4064f45df5c7895b220c9cc20d9	2020-03-26 18:45:33 -07:00
Shihao Xu	9b4bbaab53	Add RRef.local_value() for TorchScript (#35433 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35433 Make RRef TorchScript API the same as RRef Python API. Differential Revision: D7923050 fbshipit-source-id: 62589a429bcaa834b55db6ae8cfb10c0a2ee01ff	2020-03-26 18:06:13 -07:00
Tristan Rice	d4f3bc7f8e	[dt] [caffe2] add/fix shape inference for StumpFunc, SliceGradient and ResizeLike (#35430 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35430 This fixes and adds tests for several commonly used operators. There's some formatting differences due to running clang-format on one of the files. Test Plan: buck test //caffe2/caffe2/fb/operators:hypothesis_test //caffe2/caffe2/python/operator_test:utility_ops_test //caffe2/caffe2/python/operator_test:concat_split_op_test Reviewed By: yyetim Differential Revision: D20657405 fbshipit-source-id: 51d86d0834003b8ac8d6acb5149ae13d7bbfc6ab	2020-03-26 17:50:32 -07:00
Nikita Shulga	2e739f822b	Fix PyTorch separate compilation (#34863 ) Summary: Looks like there is a bug in CUDA device linker, but kernels that uses `thust::sort_by_key` can not be linked with other kernels Solve the problem by splitting 5 thrust-heavy .cu files into `__torch_cuda_sp` library which is statically linked into `torch_cuda` For default compilation workflow it should not make any difference. Test Plan: Compile with `-DCUDA_SEPARABLE_COMPILATION=YES` and observe library size difference: 310Mb before, 173Mb after if compiled for sm_75 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34863 Differential Revision: D20683972 Pulled By: malfet fbshipit-source-id: bc1492aa9d1d2d21c48e8764a8a7b403feaec5da	2020-03-26 17:49:07 -07:00
Ailing Zhang	2f6f1781af	Add warning to a known autograd issue on XLA backend. (#35449 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35449 Differential Revision: D20676835 Pulled By: ailzhang fbshipit-source-id: c351eb5650ff09654f7c2e3588dfea19dcde3856	2020-03-26 17:44:12 -07:00
Supriya Rao	8074779328	[quant][graph] Update dynamic quant tests to use new qconfig (#35451 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35451 default_dynamic_qconfig now holds activation observer Test Plan: python test/test_quantize_script.py Imported from OSS Differential Revision: D20664585 fbshipit-source-id: 78cb6747705d230d2bbcfdae59210b4b998d0d15	2020-03-26 17:39:49 -07:00

1 2 3 4 5 ...

25206 Commits