Commit Graph

32 Commits

Author SHA1 Message Date
735f8cc6c2 [DI] Allow explicit taskLauncher for torchscript interpreter (#46865)
Summary:
By default, TorchScript execution is single threaded and uses the caller's thread pool. For the use case of distributed inference, we hope there is a way to customize the behavior where the  interpreter in torch script can be executed in other places. This diff allows an explicit taskLauncher for torchscript interpreter.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46865

Test Plan:
unit test is passed.

fbshipit-source-id: 1d7b003926c0d1f8facc53206efb960cff8897ac

Fixes #{issue number}

Reviewed By: houseroad

Differential Revision: D24616102

Pulled By: garroud

fbshipit-source-id: 79202b62f92d0b0baf72e4bf7aa3f05e0da91d59
2020-11-04 17:07:55 -08:00
564296f051 [2/3] [JIT] Make sure fusion occurs in test_tensorexpr (#45789)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45789

Making sure that more tests invoke a run with a Fusion Group.

Test Plan: Imported from OSS

Reviewed By: Krovatkin

Differential Revision: D24169535

Pulled By: eellison

fbshipit-source-id: 54d7af434772ba52144b12d15d32ae30460c0c3c
2020-10-08 12:06:16 -07:00
bcf97b8986 [JIT] Cleanup some places where we log graphs in executors. (#44588)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44588

1) SOURCE_DUMP crashes when invoked on a backward graph since
   `prim::GradOf` nodes can't be printed as sources (they don't have
   schema).
2) Dumping graph each time we execute an optimized plan produces lots of
   output in tests where we run the graph multiple times (e.g.
   benchmarks). Outputting that on the least level of verbosity seems
   like an overkill.
3) Duplicated log statement is removed.

Differential Revision: D23666812

Test Plan: Imported from OSS

Reviewed By: bertmaher

Pulled By: ZolotukhinM

fbshipit-source-id: b9a30e34fd39c85f3e13c3f1e3594e157e1c130f
2020-09-13 11:31:02 -07:00
54931ebb7b Release saved variable from DifferentiableGraphBackward (#42994)
Summary:
When the backward ops execute via the autograd engine evaluate_function(), the fn.release_variables() is called to release the SavedVariables. For the eager mode ops, this releases the saved inputs that was required for backward grad function. However, with TorchScript, we get a DifferentableGraph and the DifferentiableGraphBackward() doesn't implement a release_variables(). This leads to the SavedVariables to be alive longer. Implement release_variables() for DifferentiableGraphBackward to release these SavedVariables  early.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42994

Reviewed By: izdeby

Differential Revision: D23503172

Pulled By: albanD

fbshipit-source-id: d87127498cfa72883ae6bb31d0e6c7056c4c36d4
2020-09-08 14:36:52 -07:00
f91bdbeabd Enable function calls in TEFuser and SpecializeAutogradZero (#43866)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43866

Reviewed By: ezyang

Differential Revision: D23452798

Pulled By: Krovatkin

fbshipit-source-id: 2cff4c905bf1b5d9de56e7869458ffa6fce1f1b5
2020-09-03 14:42:52 -07:00
cd58114c6c Adjust level of verbosity of debug dumps in graph executor T74227880 (#43682)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43682

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D23397980

Pulled By: Lilyjjo

fbshipit-source-id: b0114efbd63b2a29eb14086b0a8963880023c2a8
2020-09-02 08:45:16 -07:00
000739c31a Function calls for fallback paths (#43274)
Summary:
This PR adds API to package unoptimized/fallback blocks as function calls. It's mainly meant to be used by TensorExpressionsFuser and SpecializeAutogradZero passes as both specialize the original graph but would also like to provide a fallback path in case the assumptions under which the graph was specialized do not hold for some inputs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43274

Reviewed By: malfet

Differential Revision: D23406961

Pulled By: Krovatkin

fbshipit-source-id: ef21fc9ad886953461b09418d02c75c58375490c
2020-08-28 23:31:02 -07:00
e189ef5577 Refactor pass to class (#43630)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43630

No functional changes here - just refactoring specialize autograd zero to a class, and standardizing its API to take in a shared_ptr<Graph>

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D23358805

Pulled By: eellison

fbshipit-source-id: 42e19ef2e14df66b44592252497a47d03cb07a7f
2020-08-27 14:35:30 -07:00
cc596ac3a8 [JIT] Add debug dumps in between passes in graph executor. (#42688)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42688

Both the profiling executor and the legacy executor have the debug
loggin now.

Ideally, if we had a pass manager, this could be done as a part of it,
but since we have none, I had to insert the debug statements manually.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D22981675

Pulled By: ZolotukhinM

fbshipit-source-id: 22b8789e860aa90d5802fc72a4113b22c6fc4da5
2020-08-06 15:16:35 -07:00
86f72953dd [Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D22452776

fbshipit-source-id: a103da6a5b1db7f1c91ca25490358da268fdfe96
2020-07-09 08:49:32 -07:00
3f32332ee6 [JIT][Easy]move remove mutation to own file (#41137)
Summary:
This should be in its own file...

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41137

Reviewed By: jamesr66a

Differential Revision: D22437922

Pulled By: eellison

fbshipit-source-id: 1b62dde1a4ebac673b5c60aea4f398f734d62501
2020-07-08 17:00:35 -07:00
53af9df557 Unify boxed function signature between jit and c10 (#37034)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37034

c10 takes a Stack* in boxed functions while JIT took Stack&.
c10 doesn't return anything while JIT returns an int which is always zero.

This changes JIT to follow the c10 behavior.
ghstack-source-id: 106834069

Test Plan: unit tests

Differential Revision: D20567950

fbshipit-source-id: 1a7aea291023afc52ae706957e9a5ca576fbb53b
2020-06-29 19:24:26 -07:00
4fcd1c3123 run te only for profiling executor (#38591)
Summary:
* Disable the mode where PE can still run the old fuser.
* Clean up
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38591

Differential Revision: D21643664

Pulled By: Krovatkin

fbshipit-source-id: 6753ed6bdc544698a1340e59a624608ff3abf7f9
2020-05-26 18:35:25 -07:00
5183e3aa16 [JIT] Rename canonicalize ops (#38734)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38734

As far as I can tell, this pass only exists to canonicalize ops that are generating in the graph fuser, so it's kind of a misnomer.

Test Plan: Imported from OSS

Differential Revision: D21673109

Pulled By: eellison

fbshipit-source-id: b7bedf34ccaf1fcd442bfb2bbb990e64915f51d4
2020-05-21 21:45:15 -07:00
0ed7fc581c [quant][graphmode][refactor] Split quantization.cpp (#37975)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37975

Test Plan:
.

Imported from OSS

Differential Revision: D21468497

fbshipit-source-id: 35cbf98a344ca6e4094d616a4040eacf017fd2de
2020-05-08 12:24:50 -07:00
067f08c148 [TensorExpr] Move controlling knob out of the TE fuser pass. (#37970)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37970

This change makes the pass friendlier for users who try to invoke it
directly.

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D21444832

Pulled By: ZolotukhinM

fbshipit-source-id: 8be4b5028b3bd84082874e16f38a70b245af5d19
2020-05-07 12:18:31 -07:00
4cdaa5956c capitalize fuseTensorExpr (#37780)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37780

Differential Revision: D21386092

Pulled By: Krovatkin

fbshipit-source-id: c190f891fe25b3cee9a34b5173756c39efd49c66
2020-05-04 12:40:49 -07:00
c516f84525 [JIT] Add Lower Tuples Call & Run remove mutation after list unrolling (#36829)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36829

This changes the IR complexity from the previous PR for the following tests:
```
('Name', 'Ifs/Loops', 'non-tensor ops')
Before:  ('max_unpool1d', 0, 3)
After:  ('max_unpool1d', 0, 0)
Before:  ('max_unpool2d', 0, 3)
After:  ('max_unpool2d', 0, 0)
Before:  ('max_unpool3d', 0, 4)
After:  ('max_unpool3d', 0, 0)
Before:  ('adaptive_max_pool2d', 0, 3)
After:  ('adaptive_max_pool2d', 0, 0)
Before:  ('adaptive_max_pool3d', 0, 4)
After:  ('adaptive_max_pool3d', 0, 0)
Before:  ('adaptive_avg_pool2d', 0, 3)
After:  ('adaptive_avg_pool2d', 0, 0)
Before:  ('adaptive_avg_pool3d', 0, 4)
After:  ('adaptive_avg_pool3d', 0, 0)
Before:  ('upsample', 13, 68)
After:  ('upsample', 4, 28)
Before:  ('upsample', 13, 68)
After:  ('upsample', 0, 5)
Before:  ('interpolate', 14, 68)
After:  ('interpolate', 0, 4)
Before:  ('interpolate', 13, 67)
After:  ('interpolate', 4, 27)
Before:  ('interpolate', 14, 68)
After:  ('interpolate', 0, 4)
Before:  ('interpolate', 14, 68)
After:  ('interpolate', 0, 4)
Before:  ('interpolate', 13, 67)
After:  ('interpolate', 4, 27)
Before:  ('interpolate', 14, 68)
After:  ('interpolate', 0, 4)
Before:  ('interpolate', 14, 68)
After:  ('interpolate', 0, 4)
Before:  ('interpolate', 13, 67)
After:  ('interpolate', 4, 27)
Before:  ('interpolate', 14, 68)
After:  ('interpolate', 0, 4)
Before:  ('interpolate', 14, 68)
After:  ('interpolate', 0, 4)
Before:  ('interpolate', 13, 67)
After:  ('interpolate', 4, 27)
Before:  ('interpolate', 14, 68)
After:  ('interpolate', 0, 4)
Before:  ('interpolate', 14, 59)
After:  ('interpolate', 0, 3)
Before:  ('interpolate', 13, 57)
After:  ('interpolate', 4, 21)
Before:  ('interpolate', 14, 59)
After:  ('interpolate', 0, 3)
Before:  ('interpolate', 14, 59)
After:  ('interpolate', 0, 3)
Before:  ('interpolate', 13, 57)
After:  ('interpolate', 4, 21)
Before:  ('interpolate', 14, 59)
After:  ('interpolate', 0, 3)
Before:  ('interpolate', 14, 59)
After:  ('interpolate', 0, 3)
Before:  ('interpolate', 13, 57)
After:  ('interpolate', 4, 21)
Before:  ('interpolate', 14, 59)
After:  ('interpolate', 0, 3)
Before:  ('interpolate', 13, 77)
After:  ('interpolate', 4, 33)
Before:  ('interpolate', 14, 77)
After:  ('interpolate', 0, 5)
Before:  ('interpolate', 14, 77)
After:  ('interpolate', 0, 5)
Before:  ('interpolate', 13, 77)
After:  ('interpolate', 4, 33)
Before:  ('interpolate', 14, 77)
After:  ('interpolate', 0, 5)
Before:  ('interpolate', 14, 77)
After:  ('interpolate', 0, 5)
Before:  ('interpolate', 13, 77)
After:  ('interpolate', 4, 33)
Before:  ('interpolate', 14, 77)
After:  ('interpolate', 0, 5)
Before:  ('interpolate', 14, 68)
After:  ('interpolate', 0, 4)
Before:  ('interpolate', 14, 68)
After:  ('interpolate', 0, 4)
Before:  ('interpolate', 15, 103)
After:  ('interpolate', 1, 23)
Before:  ('interpolate', 14, 70)
After:  ('interpolate', 0, 6)
Before:  ('interpolate', 15, 103)
After:  ('interpolate', 1, 21)
Before:  ('interpolate', 14, 70)
After:  ('interpolate', 0, 6)
Before:  ('interpolate', 15, 91)
After:  ('interpolate', 1, 13)
Before:  ('interpolate', 14, 59)
After:  ('interpolate', 0, 3)
Before:  ('interpolate', 15, 93)
After:  ('interpolate', 1, 16)
Before:  ('interpolate', 14, 61)
After:  ('interpolate', 0, 5)
Before:  ('interpolate', 15, 111)
After:  ('interpolate', 1, 28)
Before:  ('interpolate', 14, 77)
After:  ('interpolate', 0, 5)
Before:  ('interpolate', 15, 113)
After:  ('interpolate', 1, 27)
Before:  ('interpolate', 14, 79)
After:  ('interpolate', 0, 7)
Before:  ('test_nn_AdaptiveMaxPool2d_single', 0, 3)
After:  ('test_nn_AdaptiveMaxPool2d_single', 0, 0)
Before:  ('test_nn_AdaptiveMaxPool2d_tuple', 0, 3)
After:  ('test_nn_AdaptiveMaxPool2d_tuple', 0, 0)
Before:  ('test_nn_AdaptiveMaxPool3d_single', 0, 4)
After:  ('test_nn_AdaptiveMaxPool3d_single', 0, 0)
Before:  ('test_nn_AdaptiveMaxPool3d_tuple', 0, 4)
After:  ('test_nn_AdaptiveMaxPool3d_tuple', 0, 0)
Before:  ('test_nn_AdaptiveMaxPool3d_single_nonatomic', 0, 4)
After:  ('test_nn_AdaptiveMaxPool3d_single_nonatomic', 0, 0)
Before:  ('test_nn_AdaptiveMaxPool3d_tuple_nonatomic', 0, 4)
After:  ('test_nn_AdaptiveMaxPool3d_tuple_nonatomic', 0, 0)
Before:  ('test_nn_AdaptiveAvgPool2d_single', 0, 3)
After:  ('test_nn_AdaptiveAvgPool2d_single', 0, 0)
Before:  ('test_nn_AdaptiveAvgPool2d_single_1x1output', 0, 3)
After:  ('test_nn_AdaptiveAvgPool2d_single_1x1output', 0, 0)
Before:  ('test_nn_AdaptiveAvgPool2d_tuple', 0, 3)
After:  ('test_nn_AdaptiveAvgPool2d_tuple', 0, 0)
Before:  ('test_nn_AdaptiveAvgPool3d_single', 0, 4)
After:  ('test_nn_AdaptiveAvgPool3d_single', 0, 0)
Before:  ('test_nn_AdaptiveAvgPool3d_tuple', 0, 4)
After:  ('test_nn_AdaptiveAvgPool3d_tuple', 0, 0)
```

Test Plan: Imported from OSS

Differential Revision: D21160758

Pulled By: eellison

fbshipit-source-id: 68ccbf3af74398e8dbad7e6bedb639635dafdb2e
2020-04-28 23:28:02 -07:00
cdc0880632 add post unroll optimizations (#36828)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36828

This changes ir complexity for the following:

```
("Name", "Ifs/Loops", "non-tensor ops")
Before:  ('max_unpool1d', 0, 12)
After:  ('max_unpool1d', 0, 3)
Before:  ('max_unpool2d', 0, 22)
After:  ('max_unpool2d', 0, 3)
Before:  ('max_unpool3d', 0, 33)
After:  ('max_unpool3d', 0, 4)
Before:  ('adaptive_max_pool2d', 0, 6)
After:  ('adaptive_max_pool2d', 0, 3)
Before:  ('adaptive_max_pool3d', 0, 9)
After:  ('adaptive_max_pool3d', 0, 4)
Before:  ('adaptive_avg_pool2d', 0, 6)
After:  ('adaptive_avg_pool2d', 0, 3)
Before:  ('adaptive_avg_pool3d', 0, 9)
After:  ('adaptive_avg_pool3d', 0, 4)
Before:  ('instance_norm', 1, 6)
After:  ('instance_norm', 0, 0)
Before:  ('group_norm', 1, 6)
After:  ('group_norm', 0, 0)
Before:  ('upsample', 13, 71)
After:  ('upsample', 13, 68)
Before:  ('upsample', 13, 71)
After:  ('upsample', 13, 68)
Before:  ('interpolate', 14, 71)
After:  ('interpolate', 14, 68)
Before:  ('interpolate', 13, 70)
After:  ('interpolate', 13, 67)
Before:  ('interpolate', 14, 71)
After:  ('interpolate', 14, 68)
Before:  ('interpolate', 14, 71)
After:  ('interpolate', 14, 68)
Before:  ('interpolate', 13, 70)
After:  ('interpolate', 13, 67)
Before:  ('interpolate', 14, 71)
After:  ('interpolate', 14, 68)
Before:  ('interpolate', 14, 71)
After:  ('interpolate', 14, 68)
Before:  ('interpolate', 13, 70)
After:  ('interpolate', 13, 67)
Before:  ('interpolate', 14, 71)
After:  ('interpolate', 14, 68)
Before:  ('interpolate', 14, 71)
After:  ('interpolate', 14, 68)
Before:  ('interpolate', 13, 70)
After:  ('interpolate', 13, 67)
Before:  ('interpolate', 14, 71)
After:  ('interpolate', 14, 68)
Before:  ('interpolate', 14, 60)
After:  ('interpolate', 14, 59)
Before:  ('interpolate', 13, 58)
After:  ('interpolate', 13, 57)
Before:  ('interpolate', 14, 60)
After:  ('interpolate', 14, 59)
Before:  ('interpolate', 14, 60)
After:  ('interpolate', 14, 59)
Before:  ('interpolate', 13, 58)
After:  ('interpolate', 13, 57)
Before:  ('interpolate', 14, 60)
After:  ('interpolate', 14, 59)
Before:  ('interpolate', 14, 60)
After:  ('interpolate', 14, 59)
Before:  ('interpolate', 13, 58)
After:  ('interpolate', 13, 57)
Before:  ('interpolate', 14, 60)
After:  ('interpolate', 14, 59)
Before:  ('interpolate', 13, 82)
After:  ('interpolate', 13, 77)
Before:  ('interpolate', 14, 82)
After:  ('interpolate', 14, 77)
Before:  ('interpolate', 14, 82)
After:  ('interpolate', 14, 77)
Before:  ('interpolate', 13, 82)
After:  ('interpolate', 13, 77)
Before:  ('interpolate', 14, 82)
After:  ('interpolate', 14, 77)
Before:  ('interpolate', 14, 82)
After:  ('interpolate', 14, 77)
Before:  ('interpolate', 13, 82)
After:  ('interpolate', 13, 77)
Before:  ('interpolate', 14, 82)
After:  ('interpolate', 14, 77)
Before:  ('interpolate', 14, 71)
After:  ('interpolate', 14, 68)
Before:  ('interpolate', 14, 71)
After:  ('interpolate', 14, 68)
Before:  ('interpolate', 15, 106)
After:  ('interpolate', 15, 103)
Before:  ('interpolate', 14, 73)
After:  ('interpolate', 14, 70)
Before:  ('interpolate', 15, 106)
After:  ('interpolate', 15, 103)
Before:  ('interpolate', 14, 73)
After:  ('interpolate', 14, 70)
Before:  ('interpolate', 15, 92)
After:  ('interpolate', 15, 91)
Before:  ('interpolate', 14, 60)
After:  ('interpolate', 14, 59)
Before:  ('interpolate', 15, 94)
After:  ('interpolate', 15, 93)
Before:  ('interpolate', 14, 62)
After:  ('interpolate', 14, 61)
Before:  ('interpolate', 15, 116)
After:  ('interpolate', 15, 111)
Before:  ('interpolate', 14, 82)
After:  ('interpolate', 14, 77)
Before:  ('interpolate', 15, 118)
After:  ('interpolate', 15, 113)
Before:  ('interpolate', 14, 84)
After:  ('interpolate', 14, 79)
Before:  ('test_nn_BatchNorm1d_3d_input', 3, 9)
After:  ('test_nn_BatchNorm1d_3d_input', 2, 3)
Before:  ('test_nn_BatchNorm1d_3d_input_not_affine', 3, 9)
After:  ('test_nn_BatchNorm1d_3d_input_not_affine', 2, 3)
Before:  ('test_nn_BatchNorm1d_zero_batch', 3, 9)
After:  ('test_nn_BatchNorm1d_zero_batch', 2, 3)
Before:  ('test_nn_BatchNorm2d', 3, 13)
After:  ('test_nn_BatchNorm2d', 2, 3)
Before:  ('test_nn_BatchNorm2d_2d_simple_average', 3, 15)
After:  ('test_nn_BatchNorm2d_2d_simple_average', 2, 5)
Before:  ('test_nn_BatchNorm2d_momentum', 3, 13)
After:  ('test_nn_BatchNorm2d_momentum', 2, 3)
Before:  ('test_nn_BatchNorm2d_not_affine', 3, 13)
After:  ('test_nn_BatchNorm2d_not_affine', 2, 3)
Before:  ('test_nn_BatchNorm2d_not_tracking_stats', 1, 10)
After:  ('test_nn_BatchNorm2d_not_tracking_stats', 0, 0)
Before:  ('test_nn_BatchNorm2d_zero_batch', 3, 13)
After:  ('test_nn_BatchNorm2d_zero_batch', 2, 3)
Before:  ('test_nn_BatchNorm3d', 3, 17)
After:  ('test_nn_BatchNorm3d', 2, 3)
Before:  ('test_nn_BatchNorm3d_3d_simple_average', 3, 19)
After:  ('test_nn_BatchNorm3d_3d_simple_average', 2, 5)
Before:  ('test_nn_BatchNorm3d_momentum', 3, 17)
After:  ('test_nn_BatchNorm3d_momentum', 2, 3)
Before:  ('test_nn_BatchNorm3d_not_affine', 3, 17)
After:  ('test_nn_BatchNorm3d_not_affine', 2, 3)
Before:  ('test_nn_BatchNorm3d_not_tracking_stats', 1, 14)
After:  ('test_nn_BatchNorm3d_not_tracking_stats', 0, 0)
Before:  ('test_nn_BatchNorm3d_zero_batch', 3, 17)
After:  ('test_nn_BatchNorm3d_zero_batch', 2, 3)
Before:  ('test_nn_InstanceNorm1d', 1, 6)
After:  ('test_nn_InstanceNorm1d', 0, 0)
Before:  ('test_nn_InstanceNorm1d_tracking_stats', 1, 6)
After:  ('test_nn_InstanceNorm1d_tracking_stats', 0, 0)
Before:  ('test_nn_InstanceNorm2d', 1, 10)
After:  ('test_nn_InstanceNorm2d', 0, 0)
Before:  ('test_nn_InstanceNorm2d_tracking_stats', 1, 10)
After:  ('test_nn_InstanceNorm2d_tracking_stats', 0, 0)
Before:  ('test_nn_InstanceNorm3d', 1, 14)
After:  ('test_nn_InstanceNorm3d', 0, 0)
Before:  ('test_nn_InstanceNorm3d_tracking_stats', 1, 14)
After:  ('test_nn_InstanceNorm3d_tracking_stats', 0, 0)
Before:  ('test_nn_GroupNorm_1d_affine', 1, 6)
After:  ('test_nn_GroupNorm_1d_affine', 0, 0)
Before:  ('test_nn_GroupNorm_1d_no_affine_IN', 1, 6)
After:  ('test_nn_GroupNorm_1d_no_affine_IN', 0, 0)
Before:  ('test_nn_GroupNorm_1d_no_affine_LN', 1, 6)
After:  ('test_nn_GroupNorm_1d_no_affine_LN', 0, 0)
Before:  ('test_nn_GroupNorm_2d_affine', 1, 10)
After:  ('test_nn_GroupNorm_2d_affine', 0, 0)
Before:  ('test_nn_GroupNorm_2d_no_affine_IN', 1, 10)
After:  ('test_nn_GroupNorm_2d_no_affine_IN', 0, 0)
Before:  ('test_nn_GroupNorm_2d_no_affine_LN', 1, 10)
After:  ('test_nn_GroupNorm_2d_no_affine_LN', 0, 0)
Before:  ('test_nn_AdaptiveMaxPool2d_single', 0, 6)
After:  ('test_nn_AdaptiveMaxPool2d_single', 0, 3)
Before:  ('test_nn_AdaptiveMaxPool2d_tuple', 0, 6)
After:  ('test_nn_AdaptiveMaxPool2d_tuple', 0, 3)
Before:  ('test_nn_AdaptiveMaxPool3d_single', 0, 9)
After:  ('test_nn_AdaptiveMaxPool3d_single', 0, 4)
Before:  ('test_nn_AdaptiveMaxPool3d_tuple', 0, 9)
After:  ('test_nn_AdaptiveMaxPool3d_tuple', 0, 4)
Before:  ('test_nn_AdaptiveMaxPool3d_single_nonatomic', 0, 9)
After:  ('test_nn_AdaptiveMaxPool3d_single_nonatomic', 0, 4)
Before:  ('test_nn_AdaptiveMaxPool3d_tuple_nonatomic', 0, 9)
After:  ('test_nn_AdaptiveMaxPool3d_tuple_nonatomic', 0, 4)
Before:  ('test_nn_AdaptiveAvgPool2d_single', 0, 6)
After:  ('test_nn_AdaptiveAvgPool2d_single', 0, 3)
Before:  ('test_nn_AdaptiveAvgPool2d_single_1x1output', 0, 6)
After:  ('test_nn_AdaptiveAvgPool2d_single_1x1output', 0, 3)
Before:  ('test_nn_AdaptiveAvgPool2d_tuple', 0, 6)
After:  ('test_nn_AdaptiveAvgPool2d_tuple', 0, 3)
Before:  ('test_nn_AdaptiveAvgPool3d_single', 0, 9)
After:  ('test_nn_AdaptiveAvgPool3d_single', 0, 4)
Before:  ('test_nn_AdaptiveAvgPool3d_tuple', 0, 9)
After:  ('test_nn_AdaptiveAvgPool3d_tuple', 0, 4)
```

Test Plan: Imported from OSS

Differential Revision: D21160759

Pulled By: eellison

fbshipit-source-id: 91ca6ef2269ee364ca354c8d0843847744145d25
2020-04-28 23:27:57 -07:00
901bb3c350 Delete as_variable_ref (#36096)
Summary:
This PR closes https://github.com/pytorch/pytorch/issues/34895 and builds on work started by ayushtues in https://github.com/pytorch/pytorch/pull/35184
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36096

Reviewed By: zou3519

Differential Revision: D20893693

Pulled By: astaff

fbshipit-source-id: 13aac1feaef3bcf86f7a4cf92d26e7a1ae43a3b3
2020-04-08 08:57:01 -07:00
6f8017bf07 Enable simple executor for FBCODE (#34748)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34748

Differential Revision: D20909390

Pulled By: Krovatkin

fbshipit-source-id: b3d0c981825d362d3d4f9012ff8151ffc7a59671
2020-04-08 00:19:49 -07:00
af5121f62a Invoke TensorExpr fuser pass from a graph executor. (#35913)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35913

The pass itself is still disabled by default, but with this change we
don't need to register it as a custom pass anymore. It allows us to
control its behavior with env variables more easily.

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D20827189

Pulled By: ZolotukhinM

fbshipit-source-id: e74d90b5e46422e7ab7bc40974a805220da50fbc
2020-04-03 12:20:26 -07:00
6d24f8fe21 Infrastructure for a new CUDA Fuser (#34785)
Summary:
**Summary:** This PR contains the infrastructure of a new CUDA fuser. This CUDA fuser is based on many of the same principles of TensorExpressions and Halide, however the implementation is ground up. The fusion pass itself is similar to the default CUDA fuser, however, it has undergone some refactoring and is using the new code generation infrastructure. For those who are interested in how the code generation in this PR works, I would recommend reviewing _test/cpp/jit/test_gpu_fusion.cpp_ as well as the long comment section at the beginning of _torch/csrc/jit/codegen/cuda/transform_replay.h_  One of the largest differences between our approach and that of TVM/Halide, is the concept of "TensorView". TensorView from a high level should be thought of similarly to how we think of working with Tensors in PyTorch. It's an N-D object which can undergo transformations that change its dimensionality. Dimensionality changes are done through the operations split/merge/reorder/computeAt. These transformations are similar to split/fuse/reorder/compute_at of TVM, they modify how a tensor is iterated over to generate GPU code. Interestingly, in our scheme these transformations are applied to tensors and only impact how that tensor is generated.

**Warning:** This PR is purposefully not feature complete with the current fuser. We wanted to separate out the infrastructure from the fusion capabilities. Once in, smaller incremental PRs will be submitted to expand capabilities of the fuser.

**Short term goals:**

Parity with current CUDA fuser (including performance):
- Dynamic shapes (no recompilation)
- Implicit handling of braodcast (broadcasted tensors are treated as tensors of the braodcasted size in the generated code)
- Dropout

**Mid-term goals:**

- Transposes fused with pointwise operations where transpose involves only 2 axes (across the fused operation).
- 1-D reductions fused with pointwise operations
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34785

Reviewed By: ZolotukhinM

Differential Revision: D20650977

Pulled By: soumith

fbshipit-source-id: ee39c95a880e1b9822e874ed4cc180971572bf63
2020-04-02 09:22:42 -07:00
0ed3f881c5 clang-fmt (#35796)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35796

Test Plan: Imported from OSS

Reviewed By: shannonzhu

Differential Revision: D20788673

Pulled By: suo

fbshipit-source-id: 3555a6204ef174c28e561a8931e13814846813a3
2020-04-01 00:14:36 -07:00
8d64a3848c [jit] In RPC Server, handle TorchScript continuations asynchronously (#34109)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34109

This change adds glue to GraphExecutor to give the RPC server
access to the future-based Interpreter::runAsync() api.

Previously, if a server encounted a TorchScript continuation-based block
with fork/wait, it would simply block in the server thread until the handler
completed, since it uses the synchronous Interpreter::run() api.

With the ivalue::Future returned by the Interpreter, we can run the
TorchScript code asynchronously from c++ simply by connecting its
callback to the server callback.

We add test cases to cover the new logic, both rpc_async and remote.

ghstack-source-id: 101245438

Test Plan: buck test mode/dev-nosan caffe2/test/distributed/rpc/...

Differential Revision: D20194321

fbshipit-source-id: 16785ec5d9ed0b16cb1ffab0a9771a77de30fcb0
2020-03-31 17:21:46 -07:00
6384c2d81b [JIT] clang-format JIT code (#35115)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35115

This commit runs the newly added tools/clang_format.py on the JIT
codebase and includes all of the formatting changes thus produced.

Testing:
Ran the script, CI.

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D20568523

Pulled By: SplitInfinity

fbshipit-source-id: e09bdb982ccf090eecfb7c7b461b8d0681eef82b
2020-03-26 11:24:51 -07:00
cf8b728255 Delete OperatorOptions, absorb AliasAnalysisKind into FunctionSchema. (#34588)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34588

I constructed the patch by deleting OperatorOptions and then rerouting
all queries for AliasAnalysisKind to FunctionSchema.  Some of the
behavior is kind of bogus: we really shouldn't be mutating FunctionSchema
after the fact, but that won't get fixed until we actually switch to
true schema merging.

Reland of https://github.com/pytorch/pytorch/pull/34160

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20387079

Pulled By: ezyang

fbshipit-source-id: d189f7a6ad8cd186b88b6fbfa3f189994eea14e8
2020-03-11 20:59:46 -07:00
6f8a8e4e47 Revert D20282846: Delete OperatorOptions, absorb AliasAnalysisKind into FunctionSchema.
Test Plan: revert-hammer

Differential Revision:
D20282846

Original commit changeset: ba7bca6e8adc

fbshipit-source-id: b9e15d2b2c3d1dbc6e971ab3c0bdf380e769dcf1
2020-03-11 07:50:29 -07:00
9d42177a31 Delete OperatorOptions, absorb AliasAnalysisKind into FunctionSchema. (#34160)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34160

I constructed the patch by deleting OperatorOptions and then rerouting
all queries for AliasAnalysisKind to FunctionSchema.  Some of the
behavior is kind of bogus: we really shouldn't be mutating FunctionSchema
after the fact, but that won't get fixed until we actually switch to
true schema merging.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20282846

Pulled By: ezyang

fbshipit-source-id: ba7bca6e8adc3365789639b88e54c4e881b1692e
2020-03-11 07:15:18 -07:00
358450e02b improved TorchScript traceback (#33834)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33834

This changes how we report Tracebacks to make them more clear when
there are both serialized and non-serialized ranges. It now looks like:

```
Traceback (most recent call last):
  File "foo.py", line 25, in <module>
    s2(a, b)
  File "/scratch/zdevito/pytorch/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__.py", line 7, in forward
    x: Tensor,
    y: Tensor) -> Tensor:
    return (self).bar(x, y, )
            ~~~~~~~~~ <--- HERE
  def bar(self: __torch__.Moo,
    x: Tensor,
  File "code/__torch__.py", line 11, in bar
    x: Tensor,
    y: Tensor) -> Tensor:
    _0 = (self).baz(x, y, )
          ~~~~~~~~~ <--- HERE
    _1 = torch.ones([3], dtype=None, layout=None, device=None, pin_memory=None)
    return torch.add(_0, _1, alpha=1)
  File "code/__torch__.py", line 17, in baz
    x: Tensor,
    y: Tensor) -> Tensor:
    return torch.add(x, y, alpha=1)
           ~~~~~~~~~ <--- HERE

Traceback of TorchScript, original code (most recent call last):
  File "foo.py", line 11, in forward
    def forward(self, x, y):
        return self.bar(x, y)
               ~~~~~~~~ <--- HERE
  File "foo.py", line 9, in bar
    def bar(self, x, y):
        return self.baz(x, y) + torch.ones(3)
               ~~~~~~~~ <--- HERE
  File "foo.py", line 7, in baz
    def baz(self, x, y):
        return x + y
               ~~~~~ <--- HERE
RuntimeError: The size of tensor a (4) must match the size of tensor b (5) at non-singleton dimension 1
```

It follows Python convension of having the most important information last
and reading from the bottom up.

Changes:
* Moved the error message to the end, to copy Python
* Report original traceback separate from serialized traceback
* Make sure root functions have names in the interpreter trace.

Test Plan: Imported from OSS

Differential Revision: D20126136

Pulled By: zdevito

fbshipit-source-id: fd01f9985e5d74e04c4d064c02e8bc320f4fac13
2020-03-03 12:27:38 -08:00
cab8772c6c Freezing Torchscript modules (#32178)
Summary:
This patch enables folding GetAttr nodes with their corresponding
values. _jit_pass_freeze_module API returns a new TorchScipt module
where all function calls and get attributes are inlined.
Usage:

frozen_model = torch._C._freeze_module(scrited_model._c)
frozen_model.forward(...)

This API currently optimizes the forward method. We will follow up to
to preserve and optimize methods and attributes that are annotated as
 torch.jit.interface.

Several future improvements to JIT optimizations are required to maximize
clean up/de-sugar the graph and eliminate redundancies.
Ideally, we want to produce a graph that can easily be lowered to
GLOW and other low-level backends.
__
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32178

Differential Revision: D19419640

Pulled By: bzinodev

fbshipit-source-id: 52baffaba9bca2cd60a8e747baa68d57711ad42b
2020-03-02 11:38:36 -08:00
dbe850af5b [jit] do the code reorg (#33851)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33851

Rationale and context described in #33828.

Script to reproduce the move:
https://gist.github.com/suo/16cbefaaeb67ca5a7c6caffd49b7f6e9
ghstack-source-id: 99079645

Test Plan: Make sure CI passes

Reviewed By: jamesr66a

Differential Revision: D20133869

fbshipit-source-id: 390e9241a9c85366d9005c492ac31f10aa96488e
2020-02-27 13:02:51 -08:00