pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-21 13:44:15 +08:00

Author	SHA1	Message	Date
Gaoxiang Liu	735f8cc6c2	[DI] Allow explicit taskLauncher for torchscript interpreter (#46865 ) Summary: By default, TorchScript execution is single threaded and uses the caller's thread pool. For the use case of distributed inference, we hope there is a way to customize the behavior where the interpreter in torch script can be executed in other places. This diff allows an explicit taskLauncher for torchscript interpreter. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46865 Test Plan: unit test is passed. fbshipit-source-id: 1d7b003926c0d1f8facc53206efb960cff8897ac Fixes #{issue number} Reviewed By: houseroad Differential Revision: D24616102 Pulled By: garroud fbshipit-source-id: 79202b62f92d0b0baf72e4bf7aa3f05e0da91d59	2020-11-04 17:07:55 -08:00
Elias Ellison	564296f051	[2/3] [JIT] Make sure fusion occurs in test_tensorexpr (#45789 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45789 Making sure that more tests invoke a run with a Fusion Group. Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D24169535 Pulled By: eellison fbshipit-source-id: 54d7af434772ba52144b12d15d32ae30460c0c3c	2020-10-08 12:06:16 -07:00
Mikhail Zolotukhin	bcf97b8986	[JIT] Cleanup some places where we log graphs in executors. (#44588 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44588 1) SOURCE_DUMP crashes when invoked on a backward graph since `prim::GradOf` nodes can't be printed as sources (they don't have schema). 2) Dumping graph each time we execute an optimized plan produces lots of output in tests where we run the graph multiple times (e.g. benchmarks). Outputting that on the least level of verbosity seems like an overkill. 3) Duplicated log statement is removed. Differential Revision: D23666812 Test Plan: Imported from OSS Reviewed By: bertmaher Pulled By: ZolotukhinM fbshipit-source-id: b9a30e34fd39c85f3e13c3f1e3594e157e1c130f	2020-09-13 11:31:02 -07:00
Sujoy Saraswati	54931ebb7b	Release saved variable from DifferentiableGraphBackward (#42994 ) Summary: When the backward ops execute via the autograd engine evaluate_function(), the fn.release_variables() is called to release the SavedVariables. For the eager mode ops, this releases the saved inputs that was required for backward grad function. However, with TorchScript, we get a DifferentableGraph and the DifferentiableGraphBackward() doesn't implement a release_variables(). This leads to the SavedVariables to be alive longer. Implement release_variables() for DifferentiableGraphBackward to release these SavedVariables early. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42994 Reviewed By: izdeby Differential Revision: D23503172 Pulled By: albanD fbshipit-source-id: d87127498cfa72883ae6bb31d0e6c7056c4c36d4	2020-09-08 14:36:52 -07:00
Nikolay Korovaiko	f91bdbeabd	Enable function calls in TEFuser and SpecializeAutogradZero (#43866 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/43866 Reviewed By: ezyang Differential Revision: D23452798 Pulled By: Krovatkin fbshipit-source-id: 2cff4c905bf1b5d9de56e7869458ffa6fce1f1b5	2020-09-03 14:42:52 -07:00
Lillian Johnson	cd58114c6c	Adjust level of verbosity of debug dumps in graph executor T74227880 (#43682 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43682 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23397980 Pulled By: Lilyjjo fbshipit-source-id: b0114efbd63b2a29eb14086b0a8963880023c2a8	2020-09-02 08:45:16 -07:00
Nikolay Korovaiko	000739c31a	Function calls for fallback paths (#43274 ) Summary: This PR adds API to package unoptimized/fallback blocks as function calls. It's mainly meant to be used by TensorExpressionsFuser and SpecializeAutogradZero passes as both specialize the original graph but would also like to provide a fallback path in case the assumptions under which the graph was specialized do not hold for some inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43274 Reviewed By: malfet Differential Revision: D23406961 Pulled By: Krovatkin fbshipit-source-id: ef21fc9ad886953461b09418d02c75c58375490c	2020-08-28 23:31:02 -07:00
Elias Ellison	e189ef5577	Refactor pass to class (#43630 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43630 No functional changes here - just refactoring specialize autograd zero to a class, and standardizing its API to take in a shared_ptr<Graph> Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D23358805 Pulled By: eellison fbshipit-source-id: 42e19ef2e14df66b44592252497a47d03cb07a7f	2020-08-27 14:35:30 -07:00
Mikhail Zolotukhin	cc596ac3a8	[JIT] Add debug dumps in between passes in graph executor. (#42688 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42688 Both the profiling executor and the legacy executor have the debug loggin now. Ideally, if we had a pass manager, this could be done as a part of it, but since we have none, I had to insert the debug statements manually. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D22981675 Pulled By: ZolotukhinM fbshipit-source-id: 22b8789e860aa90d5802fc72a4113b22c6fc4da5	2020-08-06 15:16:35 -07:00
generatedunixname89002005287564	86f72953dd	[Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D22452776 fbshipit-source-id: a103da6a5b1db7f1c91ca25490358da268fdfe96	2020-07-09 08:49:32 -07:00
Elias Ellison	3f32332ee6	[JIT][Easy]move remove mutation to own file (#41137 ) Summary: This should be in its own file... Pull Request resolved: https://github.com/pytorch/pytorch/pull/41137 Reviewed By: jamesr66a Differential Revision: D22437922 Pulled By: eellison fbshipit-source-id: 1b62dde1a4ebac673b5c60aea4f398f734d62501	2020-07-08 17:00:35 -07:00
Sebastian Messmer	53af9df557	Unify boxed function signature between jit and c10 (#37034 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37034 c10 takes a Stack* in boxed functions while JIT took Stack&. c10 doesn't return anything while JIT returns an int which is always zero. This changes JIT to follow the c10 behavior. ghstack-source-id: 106834069 Test Plan: unit tests Differential Revision: D20567950 fbshipit-source-id: 1a7aea291023afc52ae706957e9a5ca576fbb53b	2020-06-29 19:24:26 -07:00
Nikolay Korovaiko	4fcd1c3123	run te only for profiling executor (#38591 ) Summary: * Disable the mode where PE can still run the old fuser. * Clean up Pull Request resolved: https://github.com/pytorch/pytorch/pull/38591 Differential Revision: D21643664 Pulled By: Krovatkin fbshipit-source-id: 6753ed6bdc544698a1340e59a624608ff3abf7f9	2020-05-26 18:35:25 -07:00
Elias Ellison	5183e3aa16	[JIT] Rename canonicalize ops (#38734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38734 As far as I can tell, this pass only exists to canonicalize ops that are generating in the graph fuser, so it's kind of a misnomer. Test Plan: Imported from OSS Differential Revision: D21673109 Pulled By: eellison fbshipit-source-id: b7bedf34ccaf1fcd442bfb2bbb990e64915f51d4	2020-05-21 21:45:15 -07:00
Jerry Zhang	0ed7fc581c	[quant][graphmode][refactor] Split quantization.cpp (#37975 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37975 Test Plan: . Imported from OSS Differential Revision: D21468497 fbshipit-source-id: 35cbf98a344ca6e4094d616a4040eacf017fd2de	2020-05-08 12:24:50 -07:00
Mikhail Zolotukhin	067f08c148	[TensorExpr] Move controlling knob out of the TE fuser pass. (#37970 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37970 This change makes the pass friendlier for users who try to invoke it directly. Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D21444832 Pulled By: ZolotukhinM fbshipit-source-id: 8be4b5028b3bd84082874e16f38a70b245af5d19	2020-05-07 12:18:31 -07:00
Nikolay Korovaiko	4cdaa5956c	capitalize fuseTensorExpr (#37780 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37780 Differential Revision: D21386092 Pulled By: Krovatkin fbshipit-source-id: c190f891fe25b3cee9a34b5173756c39efd49c66	2020-05-04 12:40:49 -07:00
Elias Ellison	c516f84525	[JIT] Add Lower Tuples Call & Run remove mutation after list unrolling (#36829 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36829 This changes the IR complexity from the previous PR for the following tests: ``` ('Name', 'Ifs/Loops', 'non-tensor ops') Before: ('max_unpool1d', 0, 3) After: ('max_unpool1d', 0, 0) Before: ('max_unpool2d', 0, 3) After: ('max_unpool2d', 0, 0) Before: ('max_unpool3d', 0, 4) After: ('max_unpool3d', 0, 0) Before: ('adaptive_max_pool2d', 0, 3) After: ('adaptive_max_pool2d', 0, 0) Before: ('adaptive_max_pool3d', 0, 4) After: ('adaptive_max_pool3d', 0, 0) Before: ('adaptive_avg_pool2d', 0, 3) After: ('adaptive_avg_pool2d', 0, 0) Before: ('adaptive_avg_pool3d', 0, 4) After: ('adaptive_avg_pool3d', 0, 0) Before: ('upsample', 13, 68) After: ('upsample', 4, 28) Before: ('upsample', 13, 68) After: ('upsample', 0, 5) Before: ('interpolate', 14, 68) After: ('interpolate', 0, 4) Before: ('interpolate', 13, 67) After: ('interpolate', 4, 27) Before: ('interpolate', 14, 68) After: ('interpolate', 0, 4) Before: ('interpolate', 14, 68) After: ('interpolate', 0, 4) Before: ('interpolate', 13, 67) After: ('interpolate', 4, 27) Before: ('interpolate', 14, 68) After: ('interpolate', 0, 4) Before: ('interpolate', 14, 68) After: ('interpolate', 0, 4) Before: ('interpolate', 13, 67) After: ('interpolate', 4, 27) Before: ('interpolate', 14, 68) After: ('interpolate', 0, 4) Before: ('interpolate', 14, 68) After: ('interpolate', 0, 4) Before: ('interpolate', 13, 67) After: ('interpolate', 4, 27) Before: ('interpolate', 14, 68) After: ('interpolate', 0, 4) Before: ('interpolate', 14, 59) After: ('interpolate', 0, 3) Before: ('interpolate', 13, 57) After: ('interpolate', 4, 21) Before: ('interpolate', 14, 59) After: ('interpolate', 0, 3) Before: ('interpolate', 14, 59) After: ('interpolate', 0, 3) Before: ('interpolate', 13, 57) After: ('interpolate', 4, 21) Before: ('interpolate', 14, 59) After: ('interpolate', 0, 3) Before: ('interpolate', 14, 59) After: ('interpolate', 0, 3) Before: ('interpolate', 13, 57) After: ('interpolate', 4, 21) Before: ('interpolate', 14, 59) After: ('interpolate', 0, 3) Before: ('interpolate', 13, 77) After: ('interpolate', 4, 33) Before: ('interpolate', 14, 77) After: ('interpolate', 0, 5) Before: ('interpolate', 14, 77) After: ('interpolate', 0, 5) Before: ('interpolate', 13, 77) After: ('interpolate', 4, 33) Before: ('interpolate', 14, 77) After: ('interpolate', 0, 5) Before: ('interpolate', 14, 77) After: ('interpolate', 0, 5) Before: ('interpolate', 13, 77) After: ('interpolate', 4, 33) Before: ('interpolate', 14, 77) After: ('interpolate', 0, 5) Before: ('interpolate', 14, 68) After: ('interpolate', 0, 4) Before: ('interpolate', 14, 68) After: ('interpolate', 0, 4) Before: ('interpolate', 15, 103) After: ('interpolate', 1, 23) Before: ('interpolate', 14, 70) After: ('interpolate', 0, 6) Before: ('interpolate', 15, 103) After: ('interpolate', 1, 21) Before: ('interpolate', 14, 70) After: ('interpolate', 0, 6) Before: ('interpolate', 15, 91) After: ('interpolate', 1, 13) Before: ('interpolate', 14, 59) After: ('interpolate', 0, 3) Before: ('interpolate', 15, 93) After: ('interpolate', 1, 16) Before: ('interpolate', 14, 61) After: ('interpolate', 0, 5) Before: ('interpolate', 15, 111) After: ('interpolate', 1, 28) Before: ('interpolate', 14, 77) After: ('interpolate', 0, 5) Before: ('interpolate', 15, 113) After: ('interpolate', 1, 27) Before: ('interpolate', 14, 79) After: ('interpolate', 0, 7) Before: ('test_nn_AdaptiveMaxPool2d_single', 0, 3) After: ('test_nn_AdaptiveMaxPool2d_single', 0, 0) Before: ('test_nn_AdaptiveMaxPool2d_tuple', 0, 3) After: ('test_nn_AdaptiveMaxPool2d_tuple', 0, 0) Before: ('test_nn_AdaptiveMaxPool3d_single', 0, 4) After: ('test_nn_AdaptiveMaxPool3d_single', 0, 0) Before: ('test_nn_AdaptiveMaxPool3d_tuple', 0, 4) After: ('test_nn_AdaptiveMaxPool3d_tuple', 0, 0) Before: ('test_nn_AdaptiveMaxPool3d_single_nonatomic', 0, 4) After: ('test_nn_AdaptiveMaxPool3d_single_nonatomic', 0, 0) Before: ('test_nn_AdaptiveMaxPool3d_tuple_nonatomic', 0, 4) After: ('test_nn_AdaptiveMaxPool3d_tuple_nonatomic', 0, 0) Before: ('test_nn_AdaptiveAvgPool2d_single', 0, 3) After: ('test_nn_AdaptiveAvgPool2d_single', 0, 0) Before: ('test_nn_AdaptiveAvgPool2d_single_1x1output', 0, 3) After: ('test_nn_AdaptiveAvgPool2d_single_1x1output', 0, 0) Before: ('test_nn_AdaptiveAvgPool2d_tuple', 0, 3) After: ('test_nn_AdaptiveAvgPool2d_tuple', 0, 0) Before: ('test_nn_AdaptiveAvgPool3d_single', 0, 4) After: ('test_nn_AdaptiveAvgPool3d_single', 0, 0) Before: ('test_nn_AdaptiveAvgPool3d_tuple', 0, 4) After: ('test_nn_AdaptiveAvgPool3d_tuple', 0, 0) ``` Test Plan: Imported from OSS Differential Revision: D21160758 Pulled By: eellison fbshipit-source-id: 68ccbf3af74398e8dbad7e6bedb639635dafdb2e	2020-04-28 23:28:02 -07:00
Elias Ellison	cdc0880632	add post unroll optimizations (#36828 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36828 This changes ir complexity for the following: ``` ("Name", "Ifs/Loops", "non-tensor ops") Before: ('max_unpool1d', 0, 12) After: ('max_unpool1d', 0, 3) Before: ('max_unpool2d', 0, 22) After: ('max_unpool2d', 0, 3) Before: ('max_unpool3d', 0, 33) After: ('max_unpool3d', 0, 4) Before: ('adaptive_max_pool2d', 0, 6) After: ('adaptive_max_pool2d', 0, 3) Before: ('adaptive_max_pool3d', 0, 9) After: ('adaptive_max_pool3d', 0, 4) Before: ('adaptive_avg_pool2d', 0, 6) After: ('adaptive_avg_pool2d', 0, 3) Before: ('adaptive_avg_pool3d', 0, 9) After: ('adaptive_avg_pool3d', 0, 4) Before: ('instance_norm', 1, 6) After: ('instance_norm', 0, 0) Before: ('group_norm', 1, 6) After: ('group_norm', 0, 0) Before: ('upsample', 13, 71) After: ('upsample', 13, 68) Before: ('upsample', 13, 71) After: ('upsample', 13, 68) Before: ('interpolate', 14, 71) After: ('interpolate', 14, 68) Before: ('interpolate', 13, 70) After: ('interpolate', 13, 67) Before: ('interpolate', 14, 71) After: ('interpolate', 14, 68) Before: ('interpolate', 14, 71) After: ('interpolate', 14, 68) Before: ('interpolate', 13, 70) After: ('interpolate', 13, 67) Before: ('interpolate', 14, 71) After: ('interpolate', 14, 68) Before: ('interpolate', 14, 71) After: ('interpolate', 14, 68) Before: ('interpolate', 13, 70) After: ('interpolate', 13, 67) Before: ('interpolate', 14, 71) After: ('interpolate', 14, 68) Before: ('interpolate', 14, 71) After: ('interpolate', 14, 68) Before: ('interpolate', 13, 70) After: ('interpolate', 13, 67) Before: ('interpolate', 14, 71) After: ('interpolate', 14, 68) Before: ('interpolate', 14, 60) After: ('interpolate', 14, 59) Before: ('interpolate', 13, 58) After: ('interpolate', 13, 57) Before: ('interpolate', 14, 60) After: ('interpolate', 14, 59) Before: ('interpolate', 14, 60) After: ('interpolate', 14, 59) Before: ('interpolate', 13, 58) After: ('interpolate', 13, 57) Before: ('interpolate', 14, 60) After: ('interpolate', 14, 59) Before: ('interpolate', 14, 60) After: ('interpolate', 14, 59) Before: ('interpolate', 13, 58) After: ('interpolate', 13, 57) Before: ('interpolate', 14, 60) After: ('interpolate', 14, 59) Before: ('interpolate', 13, 82) After: ('interpolate', 13, 77) Before: ('interpolate', 14, 82) After: ('interpolate', 14, 77) Before: ('interpolate', 14, 82) After: ('interpolate', 14, 77) Before: ('interpolate', 13, 82) After: ('interpolate', 13, 77) Before: ('interpolate', 14, 82) After: ('interpolate', 14, 77) Before: ('interpolate', 14, 82) After: ('interpolate', 14, 77) Before: ('interpolate', 13, 82) After: ('interpolate', 13, 77) Before: ('interpolate', 14, 82) After: ('interpolate', 14, 77) Before: ('interpolate', 14, 71) After: ('interpolate', 14, 68) Before: ('interpolate', 14, 71) After: ('interpolate', 14, 68) Before: ('interpolate', 15, 106) After: ('interpolate', 15, 103) Before: ('interpolate', 14, 73) After: ('interpolate', 14, 70) Before: ('interpolate', 15, 106) After: ('interpolate', 15, 103) Before: ('interpolate', 14, 73) After: ('interpolate', 14, 70) Before: ('interpolate', 15, 92) After: ('interpolate', 15, 91) Before: ('interpolate', 14, 60) After: ('interpolate', 14, 59) Before: ('interpolate', 15, 94) After: ('interpolate', 15, 93) Before: ('interpolate', 14, 62) After: ('interpolate', 14, 61) Before: ('interpolate', 15, 116) After: ('interpolate', 15, 111) Before: ('interpolate', 14, 82) After: ('interpolate', 14, 77) Before: ('interpolate', 15, 118) After: ('interpolate', 15, 113) Before: ('interpolate', 14, 84) After: ('interpolate', 14, 79) Before: ('test_nn_BatchNorm1d_3d_input', 3, 9) After: ('test_nn_BatchNorm1d_3d_input', 2, 3) Before: ('test_nn_BatchNorm1d_3d_input_not_affine', 3, 9) After: ('test_nn_BatchNorm1d_3d_input_not_affine', 2, 3) Before: ('test_nn_BatchNorm1d_zero_batch', 3, 9) After: ('test_nn_BatchNorm1d_zero_batch', 2, 3) Before: ('test_nn_BatchNorm2d', 3, 13) After: ('test_nn_BatchNorm2d', 2, 3) Before: ('test_nn_BatchNorm2d_2d_simple_average', 3, 15) After: ('test_nn_BatchNorm2d_2d_simple_average', 2, 5) Before: ('test_nn_BatchNorm2d_momentum', 3, 13) After: ('test_nn_BatchNorm2d_momentum', 2, 3) Before: ('test_nn_BatchNorm2d_not_affine', 3, 13) After: ('test_nn_BatchNorm2d_not_affine', 2, 3) Before: ('test_nn_BatchNorm2d_not_tracking_stats', 1, 10) After: ('test_nn_BatchNorm2d_not_tracking_stats', 0, 0) Before: ('test_nn_BatchNorm2d_zero_batch', 3, 13) After: ('test_nn_BatchNorm2d_zero_batch', 2, 3) Before: ('test_nn_BatchNorm3d', 3, 17) After: ('test_nn_BatchNorm3d', 2, 3) Before: ('test_nn_BatchNorm3d_3d_simple_average', 3, 19) After: ('test_nn_BatchNorm3d_3d_simple_average', 2, 5) Before: ('test_nn_BatchNorm3d_momentum', 3, 17) After: ('test_nn_BatchNorm3d_momentum', 2, 3) Before: ('test_nn_BatchNorm3d_not_affine', 3, 17) After: ('test_nn_BatchNorm3d_not_affine', 2, 3) Before: ('test_nn_BatchNorm3d_not_tracking_stats', 1, 14) After: ('test_nn_BatchNorm3d_not_tracking_stats', 0, 0) Before: ('test_nn_BatchNorm3d_zero_batch', 3, 17) After: ('test_nn_BatchNorm3d_zero_batch', 2, 3) Before: ('test_nn_InstanceNorm1d', 1, 6) After: ('test_nn_InstanceNorm1d', 0, 0) Before: ('test_nn_InstanceNorm1d_tracking_stats', 1, 6) After: ('test_nn_InstanceNorm1d_tracking_stats', 0, 0) Before: ('test_nn_InstanceNorm2d', 1, 10) After: ('test_nn_InstanceNorm2d', 0, 0) Before: ('test_nn_InstanceNorm2d_tracking_stats', 1, 10) After: ('test_nn_InstanceNorm2d_tracking_stats', 0, 0) Before: ('test_nn_InstanceNorm3d', 1, 14) After: ('test_nn_InstanceNorm3d', 0, 0) Before: ('test_nn_InstanceNorm3d_tracking_stats', 1, 14) After: ('test_nn_InstanceNorm3d_tracking_stats', 0, 0) Before: ('test_nn_GroupNorm_1d_affine', 1, 6) After: ('test_nn_GroupNorm_1d_affine', 0, 0) Before: ('test_nn_GroupNorm_1d_no_affine_IN', 1, 6) After: ('test_nn_GroupNorm_1d_no_affine_IN', 0, 0) Before: ('test_nn_GroupNorm_1d_no_affine_LN', 1, 6) After: ('test_nn_GroupNorm_1d_no_affine_LN', 0, 0) Before: ('test_nn_GroupNorm_2d_affine', 1, 10) After: ('test_nn_GroupNorm_2d_affine', 0, 0) Before: ('test_nn_GroupNorm_2d_no_affine_IN', 1, 10) After: ('test_nn_GroupNorm_2d_no_affine_IN', 0, 0) Before: ('test_nn_GroupNorm_2d_no_affine_LN', 1, 10) After: ('test_nn_GroupNorm_2d_no_affine_LN', 0, 0) Before: ('test_nn_AdaptiveMaxPool2d_single', 0, 6) After: ('test_nn_AdaptiveMaxPool2d_single', 0, 3) Before: ('test_nn_AdaptiveMaxPool2d_tuple', 0, 6) After: ('test_nn_AdaptiveMaxPool2d_tuple', 0, 3) Before: ('test_nn_AdaptiveMaxPool3d_single', 0, 9) After: ('test_nn_AdaptiveMaxPool3d_single', 0, 4) Before: ('test_nn_AdaptiveMaxPool3d_tuple', 0, 9) After: ('test_nn_AdaptiveMaxPool3d_tuple', 0, 4) Before: ('test_nn_AdaptiveMaxPool3d_single_nonatomic', 0, 9) After: ('test_nn_AdaptiveMaxPool3d_single_nonatomic', 0, 4) Before: ('test_nn_AdaptiveMaxPool3d_tuple_nonatomic', 0, 9) After: ('test_nn_AdaptiveMaxPool3d_tuple_nonatomic', 0, 4) Before: ('test_nn_AdaptiveAvgPool2d_single', 0, 6) After: ('test_nn_AdaptiveAvgPool2d_single', 0, 3) Before: ('test_nn_AdaptiveAvgPool2d_single_1x1output', 0, 6) After: ('test_nn_AdaptiveAvgPool2d_single_1x1output', 0, 3) Before: ('test_nn_AdaptiveAvgPool2d_tuple', 0, 6) After: ('test_nn_AdaptiveAvgPool2d_tuple', 0, 3) Before: ('test_nn_AdaptiveAvgPool3d_single', 0, 9) After: ('test_nn_AdaptiveAvgPool3d_single', 0, 4) Before: ('test_nn_AdaptiveAvgPool3d_tuple', 0, 9) After: ('test_nn_AdaptiveAvgPool3d_tuple', 0, 4) ``` Test Plan: Imported from OSS Differential Revision: D21160759 Pulled By: eellison fbshipit-source-id: 91ca6ef2269ee364ca354c8d0843847744145d25	2020-04-28 23:27:57 -07:00
Artyom Astafurov	901bb3c350	Delete as_variable_ref (#36096 ) Summary: This PR closes https://github.com/pytorch/pytorch/issues/34895 and builds on work started by ayushtues in https://github.com/pytorch/pytorch/pull/35184 Pull Request resolved: https://github.com/pytorch/pytorch/pull/36096 Reviewed By: zou3519 Differential Revision: D20893693 Pulled By: astaff fbshipit-source-id: 13aac1feaef3bcf86f7a4cf92d26e7a1ae43a3b3	2020-04-08 08:57:01 -07:00
Nikolay Korovaiko	6f8017bf07	Enable simple executor for FBCODE (#34748 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34748 Differential Revision: D20909390 Pulled By: Krovatkin fbshipit-source-id: b3d0c981825d362d3d4f9012ff8151ffc7a59671	2020-04-08 00:19:49 -07:00
Mikhail Zolotukhin	af5121f62a	Invoke TensorExpr fuser pass from a graph executor. (#35913 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35913 The pass itself is still disabled by default, but with this change we don't need to register it as a custom pass anymore. It allows us to control its behavior with env variables more easily. Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D20827189 Pulled By: ZolotukhinM fbshipit-source-id: e74d90b5e46422e7ab7bc40974a805220da50fbc	2020-04-03 12:20:26 -07:00
Christian Sarofeen	6d24f8fe21	Infrastructure for a new CUDA Fuser (#34785 ) Summary: Summary: This PR contains the infrastructure of a new CUDA fuser. This CUDA fuser is based on many of the same principles of TensorExpressions and Halide, however the implementation is ground up. The fusion pass itself is similar to the default CUDA fuser, however, it has undergone some refactoring and is using the new code generation infrastructure. For those who are interested in how the code generation in this PR works, I would recommend reviewing _test/cpp/jit/test_gpu_fusion.cpp_ as well as the long comment section at the beginning of _torch/csrc/jit/codegen/cuda/transform_replay.h_ One of the largest differences between our approach and that of TVM/Halide, is the concept of "TensorView". TensorView from a high level should be thought of similarly to how we think of working with Tensors in PyTorch. It's an N-D object which can undergo transformations that change its dimensionality. Dimensionality changes are done through the operations split/merge/reorder/computeAt. These transformations are similar to split/fuse/reorder/compute_at of TVM, they modify how a tensor is iterated over to generate GPU code. Interestingly, in our scheme these transformations are applied to tensors and only impact how that tensor is generated. Warning: This PR is purposefully not feature complete with the current fuser. We wanted to separate out the infrastructure from the fusion capabilities. Once in, smaller incremental PRs will be submitted to expand capabilities of the fuser. Short term goals: Parity with current CUDA fuser (including performance): - Dynamic shapes (no recompilation) - Implicit handling of braodcast (broadcasted tensors are treated as tensors of the braodcasted size in the generated code) - Dropout Mid-term goals: - Transposes fused with pointwise operations where transpose involves only 2 axes (across the fused operation). - 1-D reductions fused with pointwise operations Pull Request resolved: https://github.com/pytorch/pytorch/pull/34785 Reviewed By: ZolotukhinM Differential Revision: D20650977 Pulled By: soumith fbshipit-source-id: ee39c95a880e1b9822e874ed4cc180971572bf63	2020-04-02 09:22:42 -07:00
Michael Suo	0ed3f881c5	clang-fmt (#35796 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35796 Test Plan: Imported from OSS Reviewed By: shannonzhu Differential Revision: D20788673 Pulled By: suo fbshipit-source-id: 3555a6204ef174c28e561a8931e13814846813a3	2020-04-01 00:14:36 -07:00
Jeremy Lilley	8d64a3848c	[jit] In RPC Server, handle TorchScript continuations asynchronously (#34109 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34109 This change adds glue to GraphExecutor to give the RPC server access to the future-based Interpreter::runAsync() api. Previously, if a server encounted a TorchScript continuation-based block with fork/wait, it would simply block in the server thread until the handler completed, since it uses the synchronous Interpreter::run() api. With the ivalue::Future returned by the Interpreter, we can run the TorchScript code asynchronously from c++ simply by connecting its callback to the server callback. We add test cases to cover the new logic, both rpc_async and remote. ghstack-source-id: 101245438 Test Plan: buck test mode/dev-nosan caffe2/test/distributed/rpc/... Differential Revision: D20194321 fbshipit-source-id: 16785ec5d9ed0b16cb1ffab0a9771a77de30fcb0	2020-03-31 17:21:46 -07:00
Meghan Lele	6384c2d81b	[JIT] clang-format JIT code (#35115 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35115 This commit runs the newly added tools/clang_format.py on the JIT codebase and includes all of the formatting changes thus produced. Testing: Ran the script, CI. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D20568523 Pulled By: SplitInfinity fbshipit-source-id: e09bdb982ccf090eecfb7c7b461b8d0681eef82b	2020-03-26 11:24:51 -07:00
Edward Yang	cf8b728255	Delete OperatorOptions, absorb AliasAnalysisKind into FunctionSchema. (#34588 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34588 I constructed the patch by deleting OperatorOptions and then rerouting all queries for AliasAnalysisKind to FunctionSchema. Some of the behavior is kind of bogus: we really shouldn't be mutating FunctionSchema after the fact, but that won't get fixed until we actually switch to true schema merging. Reland of https://github.com/pytorch/pytorch/pull/34160 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20387079 Pulled By: ezyang fbshipit-source-id: d189f7a6ad8cd186b88b6fbfa3f189994eea14e8	2020-03-11 20:59:46 -07:00
Edward Yang	6f8a8e4e47	Revert D20282846: Delete OperatorOptions, absorb AliasAnalysisKind into FunctionSchema. Test Plan: revert-hammer Differential Revision: D20282846 Original commit changeset: ba7bca6e8adc fbshipit-source-id: b9e15d2b2c3d1dbc6e971ab3c0bdf380e769dcf1	2020-03-11 07:50:29 -07:00
Edward Yang	9d42177a31	Delete OperatorOptions, absorb AliasAnalysisKind into FunctionSchema. (#34160 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34160 I constructed the patch by deleting OperatorOptions and then rerouting all queries for AliasAnalysisKind to FunctionSchema. Some of the behavior is kind of bogus: we really shouldn't be mutating FunctionSchema after the fact, but that won't get fixed until we actually switch to true schema merging. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20282846 Pulled By: ezyang fbshipit-source-id: ba7bca6e8adc3365789639b88e54c4e881b1692e	2020-03-11 07:15:18 -07:00
Zachary DeVito	358450e02b	improved TorchScript traceback (#33834 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33834 This changes how we report Tracebacks to make them more clear when there are both serialized and non-serialized ranges. It now looks like: ``` Traceback (most recent call last): File "foo.py", line 25, in <module> s2(a, b) File "/scratch/zdevito/pytorch/torch/nn/modules/module.py", line 550, in __call__ result = self.forward(input, kwargs) RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript, serialized code (most recent call last): File "code/__torch__.py", line 7, in forward x: Tensor, y: Tensor) -> Tensor: return (self).bar(x, y, ) ~~~~~~~~~ <--- HERE def bar(self: __torch__.Moo, x: Tensor, File "code/__torch__.py", line 11, in bar x: Tensor, y: Tensor) -> Tensor: _0 = (self).baz(x, y, ) ~~~~~~~~~ <--- HERE _1 = torch.ones([3], dtype=None, layout=None, device=None, pin_memory=None) return torch.add(_0, _1, alpha=1) File "code/__torch__.py", line 17, in baz x: Tensor, y: Tensor) -> Tensor: return torch.add(x, y, alpha=1) ~~~~~~~~~ <--- HERE Traceback of TorchScript, original code (most recent call last): File "foo.py", line 11, in forward def forward(self, x, y): return self.bar(x, y) ~~~~~~~~ <--- HERE File "foo.py", line 9, in bar def bar(self, x, y): return self.baz(x, y) + torch.ones(3) ~~~~~~~~ <--- HERE File "foo.py", line 7, in baz def baz(self, x, y): return x + y ~~~~~ <--- HERE RuntimeError: The size of tensor a (4) must match the size of tensor b (5) at non-singleton dimension 1 ``` It follows Python convension of having the most important information last and reading from the bottom up. Changes: Moved the error message to the end, to copy Python * Report original traceback separate from serialized traceback * Make sure root functions have names in the interpreter trace. Test Plan: Imported from OSS Differential Revision: D20126136 Pulled By: zdevito fbshipit-source-id: fd01f9985e5d74e04c4d064c02e8bc320f4fac13	2020-03-03 12:27:38 -08:00
Zino Benaissa	cab8772c6c	Freezing Torchscript modules (#32178 ) Summary: This patch enables folding GetAttr nodes with their corresponding values. _jit_pass_freeze_module API returns a new TorchScipt module where all function calls and get attributes are inlined. Usage: frozen_model = torch._C._freeze_module(scrited_model._c) frozen_model.forward(...) This API currently optimizes the forward method. We will follow up to to preserve and optimize methods and attributes that are annotated as torch.jit.interface. Several future improvements to JIT optimizations are required to maximize clean up/de-sugar the graph and eliminate redundancies. Ideally, we want to produce a graph that can easily be lowered to GLOW and other low-level backends. __ Pull Request resolved: https://github.com/pytorch/pytorch/pull/32178 Differential Revision: D19419640 Pulled By: bzinodev fbshipit-source-id: 52baffaba9bca2cd60a8e747baa68d57711ad42b	2020-03-02 11:38:36 -08:00
Michael Suo	dbe850af5b	[jit] do the code reorg (#33851 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33851 Rationale and context described in #33828. Script to reproduce the move: https://gist.github.com/suo/16cbefaaeb67ca5a7c6caffd49b7f6e9 ghstack-source-id: 99079645 Test Plan: Make sure CI passes Reviewed By: jamesr66a Differential Revision: D20133869 fbshipit-source-id: 390e9241a9c85366d9005c492ac31f10aa96488e	2020-02-27 13:02:51 -08:00

32 Commits