pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Yan Li	6964aa2ced	backout D33469839 (#71443 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71443 cogwheel test inline_cvr_infer_canary_pyper_model_publish is timing out. The convert_fx call takes > 20 mins for local and local_ro sub modules, which used to take ~ 2 mins. Test Plan: Fblearn flow run * the following cmd took 1113 seconds before the diff and 5002 seconds after. flow-cli clone-locally 320014219 --run-as-secure-group pytorch_at_scale --operators pyper_model_publish_workflow.pyper_model_publish_workflow.process_torch_package_model_files.process_non_sparse_parameters[0] Cogwheel test * Cogwheel test with packages in B3588 (the last good run) took 4694.48s * Cogwheel test with packages in B3590 (the first timeout) took 13975.83s * Cogwheel test with the following packages took 4535.04s * all packages in B3588 except the model publish * the model publish built with D33469839 (`043e84b3d2`) reversed (created D33633570) Reviewed By: albanD, jerryzh168 Differential Revision: D33633570 fbshipit-source-id: dc5e777c48a90c551641a3f79126461f6a60449e (cherry picked from commit 03ab65023a9f4175584ddac1cca7eab51397c84a)	2022-01-18 23:51:51 +00:00
CodemodService FBSourceClangFormatLinterBot	88012c7daf	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D33577744 fbshipit-source-id: 7ecc8367998ee1dffde54c2f4dd3cfafe19a53c9	2022-01-14 06:10:57 -08:00
John Clow	ade83ed90c	Building Default Inference for Device Type (#69049 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69049 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D33555885 Pulled By: Gamrix fbshipit-source-id: 7364066cbc544ab8442a47c82ea89f0e73eaaa06	2022-01-13 13:57:08 -08:00
Elias Ellison	39be20f259	[JIT][NNC] Add handling of strides to dynamic shape support. (#70464 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70464 Add handling of strided input tensors to dynamic fusion. This is done with the same set of input striding specializations as https://github.com/pytorch/pytorch/pull/60684/: ``` S_ONE, // STRIDE_ONE: packed S_CONT, // STRIDE_CONTIGUOUS: stride[i + 1] * sizes[i + 1] S_TRAN_CONT, // STRIDE_TRANSPOSED_CONTIGUOUS: stride[i-1] * sizes[i-1] S_AS_ARG, // STRIDE_AS_ARG: stride passed in as runtime value ``` and then two additional specializations for a) contiguous tensor and b) channels-last tensor. channels-last is a common case and we should optimize for it. additionally, tensors natively store whether they are contiguous/channels-last contiguous, which makes it faster to check if tensors follow this pattern. Output striding will be done in a follow up. The striding is stored on both the TensorGroup node and on the guard node. The striding descriptors are stored as a vector of strings on the node for debugability and to make use of storing ivalues as attributes on nodes. As an example: ``` %8 : Double(10, 11, 12, 13, strides=[1716, 1, 143, 11], requires_grad=0, device=cpu) = prim::TensorExprGroup_0[symbolic_shape_inputs=[-37, -36, -35, -34], striding_inputs_desc=[["TENSOR_CONT_CHANNELS_LAST"]](%x, %24, %23, %22, %21)``` ``` Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D33458649 Pulled By: eellison fbshipit-source-id: c42616d3c683d70f6258180d23d3841a31a6030d	2022-01-12 09:11:31 -08:00
CodemodService FBSourceClangFormatLinterBot	fb8a9732d9	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D33524330 fbshipit-source-id: 112291a23e2efe2d573bee86ead8ce2fc3957e5b	2022-01-11 04:33:21 -08:00
anjali411	043e84b3d2	Per-overload torch.ops API (#67254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67254 Fixes https://github.com/pytorch/pytorch/issues/65997 BC breaking: `output = torch.ops._test.leaky_relu(self=torch.tensor(-1.0))` now fails with the error `TypeError: __call__() got multiple values for argument 'self'` since we call into `OpOverloadBundle`'s `__call__` method that has `self` bound to it as its first argument. Follow up work: 1. disallow `default` as an overload name for aten operators. 2. Add a method to obtain a list of all overloads (exclude the ones registered by JIT) 3. Add methods/properties to `OpOverload` to access more schema information (types of input and output args etc) cc ezyang gchanan Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D33469839 Pulled By: anjali411 fbshipit-source-id: c3fc43460f1c7c9651c64b4d46337be21c400621	2022-01-10 17:29:06 -08:00
John Clow	80659b71a5	Hoisting common expressions out of If blocks [retry] (#65645 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65645 This is a retry of PR: https://github.com/pytorch/pytorch/pull/59492 Latest Changes: Added more tests, added the getOrCreateDB pattern, updated logic to remove unnecessary checks addressed all comments. Adding code to find common expressions from the two subblocks of an if operation and hoist them before the if block. This also allows Dead Code Elimination to then eliminate some if blocks. Test Plan: python test_jit.py TestIfHoisting Reviewed By: eellison Differential Revision: D33302065 Pulled By: Gamrix fbshipit-source-id: a5a184a480cf07354359aaca344c6e27b687a3c2	2022-01-10 13:28:17 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	8bdbe94344	Add forward compatability tests in CI (#64139 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64139 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D30626912 Pulled By: tugsbayasgalan fbshipit-source-id: 781a88386701b42e2e86daaca0a779d1fc1c4df3	2022-01-05 23:40:06 -08:00
Michael Suo	402f2934bf	Revert D33262228: Per-overload torch.ops API Test Plan: revert-hammer Differential Revision: D33262228 (`8e6d1738a4`) Original commit changeset: 600dbf511514 Original Phabricator Diff: D33262228 (`8e6d1738a4`) fbshipit-source-id: 238fa88ea9c4f26c7511334765c07452fbca9655	2022-01-05 22:10:11 -08:00
anjali411	8e6d1738a4	Per-overload torch.ops API (#67254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67254 Fixes https://github.com/pytorch/pytorch/issues/65997 TODO: disallow `default` as an overload name for aten operators. BC breaking: `output = torch.ops._test.leaky_relu(self=torch.tensor(-1.0))` now fails with the error `TypeError: __call__() got multiple values for argument 'self'` since we call into `OpOverloadBundle`'s `__call__` method that has `self` bound to it as its first argument. cc ezyang gchanan Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33262228 Pulled By: anjali411 fbshipit-source-id: 600dbf511514ea9b41aea3e6b1bc1102dab08909	2022-01-05 15:17:41 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	4ae71c8d34	Add graph op replacement pass (#69915 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69915 Test Plan: Imported from OSS Reviewed By: samdow Differential Revision: D33198158 Pulled By: tugsbayasgalan fbshipit-source-id: f2b924edf9959aaf51f97db994fae031fa062cf8	2021-12-25 13:03:19 -08:00
jjsjann123	e429a68478	Allow single node fusion for nvfuser (#70000 ) Summary: Setting `PYTORCH_NVFUSER_ONE_OP_FUSION=1` will take all nodes nvFuser support, instead of waiting for fusion opportunity. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70000 Reviewed By: samdow Differential Revision: D33292195 Pulled By: davidberard98 fbshipit-source-id: 8ed5ce5e82fbb6737e8ab5ce4223b038eaf47756	2021-12-23 17:07:57 -08:00
David Berard	c21169ea41	[JIT] optimize_for_inference on methods other than forward (#69367 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69367 Test Plan: Imported from OSS Reviewed By: cpuhrsch Differential Revision: D32835529 Pulled By: davidberard98 fbshipit-source-id: d3066c23d071bc2a3bee59b8ab03b6ab0e43efcf	2021-12-07 12:36:47 -08:00
Nikolay Korovaiko	ab1d879b33	[WIP] forbid aliasing between the outputs of a differentiable graph (#67732 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/67732 Reviewed By: cpuhrsch Differential Revision: D32522826 Pulled By: Krovatkin fbshipit-source-id: 9fdf3509dcd1b885f7c7f06d22b340c0f93bbe12	2021-11-18 15:03:35 -08:00
John Clow	a9c2f11d2a	Update Freezing Logic and add new passes (#68024 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68024 Pull Request resolved: #67949 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D32260614 Pulled By: eellison fbshipit-source-id: 41d7a9b45e33297a17560a22eba8973e2fc48b43	2021-11-09 13:21:52 -08:00
John Clow	ec8a71f9ac	Dtype Analysis for Unary and Binary ops with Metatensors (#66898 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66898 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32175961 Pulled By: Gamrix fbshipit-source-id: 72721259b900e5a311b6bcb5c350366ba420b734	2021-11-04 19:00:50 -07:00
Natalia Gimelshein	3d4a6ff15d	Revert D32154788: Move Concat Linear out of Optimize Numerics Test Plan: revert-hammer Differential Revision: D32154788 (`ea94dde573`) Original commit changeset: faa6465c89b3 fbshipit-source-id: 0dcaa65268b68ed01e6a5bc7b73ade1f51163b33	2021-11-04 12:20:02 -07:00
John Clow	ea94dde573	Move Concat Linear out of Optimize Numerics (#67196 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67196 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D32154788 Pulled By: Gamrix fbshipit-source-id: faa6465c89b3676d6b1ff7c20a677738a7fbdf88	2021-11-04 11:30:39 -07:00
Elias Ellison	2486061c72	[JIT] make x (+ or -) 0 and x (* or /) 1 peepholes type promotion aware (#67688 ) Summary: Some of the "no-ops" are not actually no-ops because they can change the dtype Pull Request resolved: https://github.com/pytorch/pytorch/pull/67688 Reviewed By: davidberard98 Differential Revision: D32104601 Pulled By: eellison fbshipit-source-id: ccb99179a4b30fd20b5a9228374584f2cdc8ec21	2021-11-03 20:11:46 -07:00
Nikolay Korovaiko	3db536e55e	add jit_trace_module python binding (#67425 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67425 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D31998564 Pulled By: Krovatkin fbshipit-source-id: f7e38c8c3f560f2c4e5ed62e1acae2c100efebd4	2021-11-02 23:55:23 -07:00
jjsjann123	1ec732bc46	Add fp16/fp32 autocasting to JIT/TorchScript (#63939 ) Summary: Adds mixed precision autocasting support between fp32/fp16 to torchscript/JIT. More in depth descriptoin can be found at [torch/csrc/jit/JIT-AUTOCAST.md](https://github.com/pytorch/pytorch/pull/63939/files#diff-1f1772aaa508841c5bb58b74ab98f49a1e577612cd9ea5c386c8714a75db830b) This PR implemented an autocast optimization pass that inserts casting ops per AMP rule (torch/csrc/jit/passes/autocast.cpp), that mimics the behavior of eager autocast. The pass also takes into consideration the context of `torch.cuda.amp.autocast` and only inserts casting ops within the enabled context manager, giving feature parity as with eager amp autocast. We currently provide JIT AMP autocast as a prototyping feature, so it is default off and could be turned on via `torch._C._jit_set_autocast_mode(True)` The JIT support for autocast is subject to different constraints compared to the eager mode implementation (mostly related to the fact that TorchScript is statically typed), restriction on the user facing python code is described in doc torch/csrc/jit/JIT-AUTOCAST.md This is a prototype, there are also implementation limitation that's necessary to keep this PR small and get something functioning quickly on upstream, so we can iterate on designs. Few limitation/challenge that is not properly resolved in this PR: 1. Autocast inserts cast operation, which would have impact on scalar type of output tensor feeding downstream operations. We are not currently propagating the updated scalar types, this would give issues/wrong results on operations in promotion rules. 2. Backward for autodiff in JIT misses the casting of dgrad to input scalar type, as what autograd does in eager. This forces us to explicitly mark the casting operation for certain operations (e.g. binary ops), otherwise, we might be feeding dgrad with mismatch scalar type to input. This could potentially break gradient function consuming dgrad. (e.g. gemm backwards, which assumes grad_output to be of same scalar type as input') 3. `torch.autocast` api has an optional argument `dtype` which is not currently supported in the JIT autocast and we require a static value. Credit goes mostly to: tlemo kevinstephano Pull Request resolved: https://github.com/pytorch/pytorch/pull/63939 Reviewed By: navahgar Differential Revision: D31093381 Pulled By: eellison fbshipit-source-id: da6e26c668c38b01e296f304507048d6c1794314	2021-10-27 12:11:36 -07:00
Nikolay Korovaiko	a7ebf76a15	jit trace (#59949 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/59949 Reviewed By: ZolotukhinM Differential Revision: D31366787 Pulled By: Krovatkin fbshipit-source-id: 798cbcd97e8ecfba984f98cd70214954be9309af	2021-10-24 18:04:22 -07:00
Nikita Shulga	6f3f302d9f	[ONNX] Deprecate fold_if pass (#65697 ) (#66145 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66145 Deprecate fold_if pass Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D31424097 fbshipit-source-id: 25b89679c756393a1065ca6aaa24d29db960cbd4 Co-authored-by: jiafatom <jiafa@microsoft.com>	2021-10-22 13:46:20 -07:00
Nikita Shulga	53a163a015	[ONNX] Export nn.Module call as ONNX local function (#63589 ) (#66140 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66140 * Add new argument to export api to enable users specifying `nn.Module` classes that they wish to be exported as local function in ONNX model. * Refactor `torch/csrc/jit/serialization/export.cpp`, and remove redundant `EncoderBase` class. * ~~Contains changes from #63268~~ * Depends on #63716 to update onnx submodule. Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D31424098 fbshipit-source-id: c949d0b01c206c30b4182c2dd1a5b90e32b7a0d3 Co-authored-by: BowenBao <bowbao@microsoft.com>	2021-10-22 13:44:56 -07:00
Elias Ellison	63b41e1f4d	[JIT] Add partial evaluation graph stitching logic (#65377 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65377 When we run symbolic shape analysis on ``` conv = torch.nn.Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) max_pool = torch.nn.MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) mod = nn.Sequential(conv1, max_pool) ... graph(%self : __torch__.torch.nn.modules.container.___torch_mangle_0.Sequential, %input.1 : Tensor): %18 : bool = prim::Constant[value=0]() %30 : int[] = prim::Constant[value=[1, 1]]() %29 : int[] = prim::Constant[value=[3, 3]]() %28 : int[] = prim::Constant[value=[2, 2]]() %6 : int = prim::Constant[value=1]() %self.0.bias : NoneType = prim::Constant() %self.0.weight : Double(64, 3, 7, 7, strides=[147, 49, 7, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]() %input.5 : Tensor(SS(-2), 64, SS(-3), SS(-4)) = aten::conv2d(%input.1, %self.0.weight, %self.0.bias, %28, %29, %30, %6) %input.9 : Tensor(SS(-2), 64, SS(-5), SS(-6)) = aten::max_pool2d(%input.5, %29, %28, %30, %30, %18) return (%input.9) ``` we partially evaluate the shape compute graph of `conv2d`, whose output gets passed in and used to partially evaluate the shape compute graph of `max_pool2d`. The conv2d remaining partially eval'd graph is [here](https://gist.github.com/eellison/0598bd224a422211efa1a45d2b7560b7), and the maxpool2d eval'd graph is [here](https://gist.github.com/eellison/625540b84f650ddbefd3ae5511ab8814). We can take the partially eval'd graphs of a series of operators and stitch them together, which allows us to a) recover symbolic equivalences by CSE'ing & other optimizations b) calculate shapes for a whole block of operators just on the input, such as for fusing the whole model to nnc with dynamic shapes and then passing along the computed symbolic shapes. the calculation will also handle error handling. c) (future-looking) generate inputs on demand for straight-line networks that are composed just of aten operators The combined graph of the two gives us compute for the unknown symbolic dimensions - `SS(-2), SS(-3), SS(-4), SS(-5), and SS(-6)`. ``` graph(%input.1 : int[]): %42 : bool = prim::Constant[value=0]() # <string>:152:17 %15 : int = prim::Constant[value=3]() %input_batch_size_dim.1 : int = prim::Constant[value=0]() # <string>:417:41 %13 : int = prim::Constant[value=1]() # <string>:426:61 %12 : int = prim::Constant[value=4]() # <string>:437:32 %11 : str = prim::Constant[value="AssertionError: "]() %9 : int = prim::Constant[value=2]() %8 : int = prim::Constant[value=6]() %7 : int = prim::Constant[value=7]() %16 : int = aten::len(%input.1) # <string>:438:17 %17 : bool = aten::eq(%16, %12) # <string>:438:17 = prim::If(%17) # <string>:438:10 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:438:10 -> () %18 : int = aten::__getitem__(%input.1, %13) # <string>:407:17 %19 : bool = aten::eq(%18, %15) # <string>:407:17 = prim::If(%19) # <string>:407:10 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:407:10 -> () %20 : int = aten::__getitem__(%input.1, %9) # <string>:411:20 %21 : int = aten::add(%20, %8) # <string>:411:20 %22 : bool = aten::ge(%21, %7) # <string>:411:20 = prim::If(%22) # <string>:411:12 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:411:12 -> () %23 : int = aten::__getitem__(%input.1, %15) # <string>:411:20 %24 : int = aten::add(%23, %8) # <string>:411:20 %25 : bool = aten::ge(%24, %7) # <string>:411:20 = prim::If(%25) # <string>:411:12 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:411:12 -> () %26 : int = aten::__getitem__(%input.1, %input_batch_size_dim.1) # <string>:422:29 %27 : int = aten::sub(%20, %13) # <string>:428:32 %28 : int = aten::floordiv(%27, %9) # <string>:428:32 %29 : int = aten::add(%28, %13) # <string>:428:32 %30 : int = aten::sub(%23, %13) # <string>:428:32 %31 : int = aten::floordiv(%30, %9) # <string>:428:32 %32 : int = aten::add(%31, %13) # <string>:428:32 %48 : int = aten::floordiv(%28, %9) # <string>:133:17 %outputSize.2 : int = aten::add(%48, %13) # <string>:136:23 %51 : int = aten::floordiv(%31, %9) # <string>:133:17 %outputSize.1 : int = aten::add(%51, %13) # <string>:136:23 %53 : bool = aten::ne(%29, %input_batch_size_dim.1) # <string>:156:41 %54 : bool = prim::If(%53) # <string>:157:64 block0(): %55 : bool = aten::ne(%32, %input_batch_size_dim.1) # <string>:157:93 -> (%55) block1(): -> (%42) = prim::If(%54) # <string>:157:10 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:157:10 -> () %56 : bool = aten::ge(%outputSize.1, %13) # <string>:160:17 %57 : bool = prim::If(%56) # <string>:160:17 block0(): %58 : bool = aten::ge(%outputSize.2, %13) # <string>:160:38 -> (%58) block1(): -> (%42) = prim::If(%57) # <string>:160:10 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:160:10 -> () return (%26, %29, %32, %outputSize.2, %outputSize.1) ``` This PR runs shape analysis, retains the partially evaluated graphs, and then stitches them together, keeping track of what inputs in the partial eval graph correspond to what inputs in the encompassing graph IR and what outputs correspond to what symbolic shape. Adding NNC ppl as reviewers because it is relevant to dynamic shape fusion. Question for reviewers : should I make this a separate file ? Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31797472 Pulled By: eellison fbshipit-source-id: a41ed31fad085d3563e71c815f49af0cd18aaeed	2021-10-20 16:12:58 -07:00
Michael Suo	70c9eb130d	Revert D31732419: [JIT] Add partial evaluation graph stitching logic Test Plan: revert-hammer Differential Revision: D31732419 (`5db7db667f`) Original commit changeset: 883a55cbeef0 fbshipit-source-id: f5faba69dfb6b54aeb29d1beaeec8c5b0373830f	2021-10-19 20:07:04 -07:00
Elias Ellison	5db7db667f	[JIT] Add partial evaluation graph stitching logic (#65377 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65377 When we run symbolic shape analysis on ``` conv = torch.nn.Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) max_pool = torch.nn.MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) mod = nn.Sequential(conv1, max_pool) ... graph(%self : __torch__.torch.nn.modules.container.___torch_mangle_0.Sequential, %input.1 : Tensor): %18 : bool = prim::Constant[value=0]() %30 : int[] = prim::Constant[value=[1, 1]]() %29 : int[] = prim::Constant[value=[3, 3]]() %28 : int[] = prim::Constant[value=[2, 2]]() %6 : int = prim::Constant[value=1]() %self.0.bias : NoneType = prim::Constant() %self.0.weight : Double(64, 3, 7, 7, strides=[147, 49, 7, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]() %input.5 : Tensor(SS(-2), 64, SS(-3), SS(-4)) = aten::conv2d(%input.1, %self.0.weight, %self.0.bias, %28, %29, %30, %6) %input.9 : Tensor(SS(-2), 64, SS(-5), SS(-6)) = aten::max_pool2d(%input.5, %29, %28, %30, %30, %18) return (%input.9) ``` we partially evaluate the shape compute graph of `conv2d`, whose output gets passed in and used to partially evaluate the shape compute graph of `max_pool2d`. The conv2d remaining partially eval'd graph is [here](https://gist.github.com/eellison/0598bd224a422211efa1a45d2b7560b7), and the maxpool2d eval'd graph is [here](https://gist.github.com/eellison/625540b84f650ddbefd3ae5511ab8814). We can take the partially eval'd graphs of a series of operators and stitch them together, which allows us to a) recover symbolic equivalences by CSE'ing & other optimizations b) calculate shapes for a whole block of operators just on the input, such as for fusing the whole model to nnc with dynamic shapes and then passing along the computed symbolic shapes. the calculation will also handle error handling. c) (future-looking) generate inputs on demand for straight-line networks that are composed just of aten operators The combined graph of the two gives us compute for the unknown symbolic dimensions - `SS(-2), SS(-3), SS(-4), SS(-5), and SS(-6)`. ``` graph(%input.1 : int[]): %42 : bool = prim::Constant[value=0]() # <string>:152:17 %15 : int = prim::Constant[value=3]() %input_batch_size_dim.1 : int = prim::Constant[value=0]() # <string>:417:41 %13 : int = prim::Constant[value=1]() # <string>:426:61 %12 : int = prim::Constant[value=4]() # <string>:437:32 %11 : str = prim::Constant[value="AssertionError: "]() %9 : int = prim::Constant[value=2]() %8 : int = prim::Constant[value=6]() %7 : int = prim::Constant[value=7]() %16 : int = aten::len(%input.1) # <string>:438:17 %17 : bool = aten::eq(%16, %12) # <string>:438:17 = prim::If(%17) # <string>:438:10 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:438:10 -> () %18 : int = aten::__getitem__(%input.1, %13) # <string>:407:17 %19 : bool = aten::eq(%18, %15) # <string>:407:17 = prim::If(%19) # <string>:407:10 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:407:10 -> () %20 : int = aten::__getitem__(%input.1, %9) # <string>:411:20 %21 : int = aten::add(%20, %8) # <string>:411:20 %22 : bool = aten::ge(%21, %7) # <string>:411:20 = prim::If(%22) # <string>:411:12 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:411:12 -> () %23 : int = aten::__getitem__(%input.1, %15) # <string>:411:20 %24 : int = aten::add(%23, %8) # <string>:411:20 %25 : bool = aten::ge(%24, %7) # <string>:411:20 = prim::If(%25) # <string>:411:12 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:411:12 -> () %26 : int = aten::__getitem__(%input.1, %input_batch_size_dim.1) # <string>:422:29 %27 : int = aten::sub(%20, %13) # <string>:428:32 %28 : int = aten::floordiv(%27, %9) # <string>:428:32 %29 : int = aten::add(%28, %13) # <string>:428:32 %30 : int = aten::sub(%23, %13) # <string>:428:32 %31 : int = aten::floordiv(%30, %9) # <string>:428:32 %32 : int = aten::add(%31, %13) # <string>:428:32 %48 : int = aten::floordiv(%28, %9) # <string>:133:17 %outputSize.2 : int = aten::add(%48, %13) # <string>:136:23 %51 : int = aten::floordiv(%31, %9) # <string>:133:17 %outputSize.1 : int = aten::add(%51, %13) # <string>:136:23 %53 : bool = aten::ne(%29, %input_batch_size_dim.1) # <string>:156:41 %54 : bool = prim::If(%53) # <string>:157:64 block0(): %55 : bool = aten::ne(%32, %input_batch_size_dim.1) # <string>:157:93 -> (%55) block1(): -> (%42) = prim::If(%54) # <string>:157:10 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:157:10 -> () %56 : bool = aten::ge(%outputSize.1, %13) # <string>:160:17 %57 : bool = prim::If(%56) # <string>:160:17 block0(): %58 : bool = aten::ge(%outputSize.2, %13) # <string>:160:38 -> (%58) block1(): -> (%42) = prim::If(%57) # <string>:160:10 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:160:10 -> () return (%26, %29, %32, %outputSize.2, %outputSize.1) ``` This PR runs shape analysis, retains the partially evaluated graphs, and then stitches them together, keeping track of what inputs in the partial eval graph correspond to what inputs in the encompassing graph IR and what outputs correspond to what symbolic shape. Adding NNC ppl as reviewers because it is relevant to dynamic shape fusion. Question for reviewers : should I make this a separate file ? Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31732419 Pulled By: eellison fbshipit-source-id: 883a55cbeef0fd5a6068a779ffa89b6f537245b3	2021-10-19 16:41:19 -07:00
John Clow	3bad54069b	Concatting multiple linear layers with same input Tensor (different weight/bias) (#63198 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63198 Linear layers using the same input tensor can be concatted together as long as the weights and biases are compatible. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31240642 fbshipit-source-id: 1e78daa6b89822412ba2513d326ee0e072ceff1e	2021-10-08 10:55:46 -07:00
John Clow	6cdea8239e	Precomputing Transposes for frozen linear layers (#65631 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65631 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D31314248 Pulled By: Gamrix fbshipit-source-id: 85611f3ccfe7b91a183d5d12f7fb9aca3c51acb0	2021-10-05 20:08:32 -07:00
jjsjann123	d609957c95	patching graph_for (#55139 ) Summary: Allows individual DifferentiableGraphOp to display optimized forward graph. This improves user visibility to graph mutation via optimization pass, especially fusion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55139 Reviewed By: albanD Differential Revision: D31330909 Pulled By: dzhulgakov fbshipit-source-id: c745b482fdc34876dc404cbe3bacd99dcf2ac724	2021-10-04 21:50:22 -07:00
Hariom Narang	2828ce53fd	Added jit log stream changing function and some refactor (#65768 ) Summary: Description: - Have only added `stdout` and `stderr` as possible options from python API for now. We can do file path passing later maybe. - Put the class `JitLoggingConfig` in the cpp file as none of its methods were being used outside of this file. Python API: `torch._C._jit_set_logging_stream('stdout\|stderr')` C++ API: `::torch::jit::set_jit_logging_output_stream(ostream);` Testing: - Tested python API locally. - Unit test for the C++ API is written Fixes https://github.com/pytorch/pytorch/issues/54182 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65768 Reviewed By: mrshenli Differential Revision: D31291739 Pulled By: ZolotukhinM fbshipit-source-id: eee72edc20488efad78a01c5b0ed8a132886a08d	2021-09-30 23:25:11 -07:00
Elias Ellison	928a4bbafb	[JIT] Fix compilation unit reference link in constant object upon load (#65784 ) Summary: Follow up to https://github.com/pytorch/pytorch/pull/65442, make sure objects inserted into the graph from load do not holding owning reference. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65784 Reviewed By: suo Differential Revision: D31251033 Pulled By: eellison fbshipit-source-id: 59efe19ce6f70744383de4eebf0f89f79f3eb03a	2021-09-30 09:32:28 -07:00
Pruthvi Madugundu	085e2f7bdd	[ROCm] Changes not to rely on CUDA_VERSION or HIP_VERSION (#65610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65610 - Replace HIP_PLATFORM_HCC with USE_ROCM - Dont rely on CUDA_VERSION or HIP_VERSION and use USE_ROCM and ROCM_VERSION. - In the next PR - Will be removing the mapping from CUDA_VERSION to HIP_VERSION and CUDA to HIP in hipify. - HIP_PLATFORM_HCC is deprecated, so will add HIP_PLATFORM_AMD to support HIP host code compilation on gcc. cc jeffdaily sunway513 jithunnair-amd ROCmSupport amathews-amd Reviewed By: jbschlosser Differential Revision: D30909053 Pulled By: ezyang fbshipit-source-id: 224a966ebf1aaec79beccbbd686fdf3d49267e06	2021-09-29 09:55:43 -07:00
David Berard	8eb21488fd	[JIT] Improve BatchMM mutability handling (#65097 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65097 Previously, BatchMM would skip any block containing any mutable operators. Now it will avoid batching any operation whose inputs or outputs are ever mutated. Specifically: consider a tree of ADD, T, and MM nodes rooted at an ADD node. If any input or output to any node in the tree is ever mutated, then the entire tree will be ignored by BatchMM. Test Plan: python test/test_jit.py TestBatchMM Reviewed By: eellison Differential Revision: D30973515 Pulled By: davidberard98 fbshipit-source-id: 9d836faa1ef0c9e3fefe0ffc0bd265f275471f48	2021-09-16 10:46:14 -07:00
James Reed	e1c3e5f830	[resubmit][FX] Prototype for guarding against mutable operations in tracing (#64467 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64467 Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D30744870 Pulled By: jamesr66a fbshipit-source-id: fc652f8b17748f90dbeb83fabf3bd5bb57d6ff1a	2021-09-02 21:13:21 -07:00
Eli Uriegas	32a93c2424	Revert D30675780: [FX] Prototype for guarding against mutable operations in tracing Test Plan: revert-hammer Differential Revision: D30675780 (`795387477f`) Original commit changeset: b2116b51dcc8 fbshipit-source-id: d4f1173f4989556ea54974f4c2739ef85a705fae	2021-09-02 16:07:29 -07:00
James Reed	795387477f	[FX] Prototype for guarding against mutable operations in tracing (#64295 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64295 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D30675780 Pulled By: jamesr66a fbshipit-source-id: b2116b51dcc87357f0c84192c4c336680875e27a	2021-09-02 15:17:04 -07:00
Meghan Lele	95d0b3199b	Back out "[ONNX] Fix an issue that optimizations might adjust graph inputs unexpectedly. (#61280 )" (#64004 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64004 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63904 Fixes T98808160 Test Plan: T98808160 Reviewed By: msaroufim Differential Revision: D30527450 fbshipit-source-id: 6262901a78ca929cecda1cf740893139aa26f1b4	2021-08-26 12:49:42 -07:00
Bert Maher	8dda299d96	Re-apply: [nnc] Support thread level parallelism in fused kernels (#63776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63776 I reverted this out of an abundance of caution because some test failures occurred, but they were all due to precision issues fixed lower in this stack. Let's try again. I've rolled the elimination of the allow-parallelism-in-fusions toggle into this diff since they're pretty tightly coupled. ghstack-source-id: 136529847 Test Plan: CI Reviewed By: huiguoo Differential Revision: D30484555 fbshipit-source-id: 38fd33520f710585d1130c365a8c60c9ce794a59	2021-08-24 18:56:55 -07:00
Bert Maher	a709ab34a8	[nnc] Re-enable CPU fusion" (#63665 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63665 This reverts commit 125e2d02e575612eb427104e7c67f1c28f090db8. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30471646 Pulled By: bertmaher fbshipit-source-id: 4189869566f03b5f9ada78d78830f6a34946eed6	2021-08-23 12:42:42 -07:00
Bert Maher	76da46ccdc	Revert D30417127: Remove flag to toggle CPU fusion in the presence of parallelism Test Plan: revert-hammer Differential Revision: D30417127 (`6600bc9651`) Original commit changeset: b77d7c68364f fbshipit-source-id: 6b52fb83a84fe241945e3cb3eeb71050d1d9c8f1	2021-08-21 03:38:07 -07:00
BowenBao	8760254911	[ONNX] Fix an issue that optimizations might adjust graph inputs unexpectedly. (#61280 ) (#62763 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62763 This PR is to fix the issue that the graph inputs might be updated when we export the model in inference mode. When a model is export in inference mode, some optimizations will be made. One side effect of these optimizations is: the inputs of graph might be adjusted. Such optimizatiosn include: 1. Conv and BatchNorm op fusion. 2. Do constant folding. If the user sets export_params=False, or set keep_initializers_as_inputs=True, it's highly possible that the user wants to provide the corresponding parameters or initiliazers as the inputs of the graph. In such situation, no matter the model is export in inference mode or training mode, exporter needs to prevent above optimizations from adjusting the graph inputs. By this, the inputs of graph could match inputs that users provided. The changes in this PR, add an additional common judgement to see if the above optimizations needs to be done or not. From the value of export_params and keep_initializers_as_inputs arguments, infer if the graph inputs are allowed to be adjusted. If no, these optimizations will be ignored, even other requirements are matched. Besides these code changes, the comments of some parameters below have been updated so that users have more thoughts when they consider how to leverage these parameters for different purposes: 1. export_params 2. training 3. do_constant_folding 4. keep_initializers_as_inputs Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D30375183 Pulled By: msaroufim fbshipit-source-id: 4db8b9695649eb32a3a0fefa950ee2e5651bdba0 Co-authored-by: fatcat-z <jiz@microsoft.com>	2021-08-20 12:46:52 -07:00
Alban Desmaison	125e2d02e5	Revert D30417370: [nnc] Enable CPU fusion Test Plan: revert-hammer Differential Revision: D30417370 (`b9fc656cf2`) Original commit changeset: 84ce7a578a36 fbshipit-source-id: cd23774cdc3273fd72f8a05f1900eaf36f373e6b	2021-08-20 12:30:21 -07:00
Bert Maher	b9fc656cf2	[nnc] Enable CPU fusion (#63545 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63545 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30417370 Pulled By: bertmaher fbshipit-source-id: 84ce7a578a3678d5562bab99d1dc00330c4f72d1	2021-08-20 11:18:21 -07:00
Bert Maher	6600bc9651	Remove flag to toggle CPU fusion in the presence of parallelism (#63514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63514 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30417127 Pulled By: bertmaher fbshipit-source-id: b77d7c68364f2af73570740540f3b1152313016e	2021-08-20 11:18:19 -07:00
Alban Desmaison	ce61100923	Revert D29399533: Hoisting common expressions out of If blocks Test Plan: revert-hammer Differential Revision: D29399533 (`9477211e7d`) Original commit changeset: 9336b9dc48c0 fbshipit-source-id: f081c7280203f40328bcbb0c03a7c6a007acedb7	2021-08-19 06:20:40 -07:00
John Clow	9477211e7d	Hoisting common expressions out of If blocks (#59492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59492 Adding code to find common expressions from the two subblocks of an if operation and hoist them before the if block. This also allows Dead Code Elimination to then eliminate some if blocks. Also eliminated some dead code in the codebase. Test Plan: python test_jit.py TestIfHoisting Imported from OSS Reviewed By: ngimel Differential Revision: D29399533 fbshipit-source-id: 9336b9dc48c02c38862f98f98cd72fc1767a1802	2021-08-18 16:29:30 -07:00
Elias Ellison	ea808df25d	Test shape analysis with opinfos (#59814 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59814 Using opinfos to test shape analysis. By default, we just check that we don't give incorrect answers, and then if `assert_jit_shape_analysis` is true, tests that we correctly propagates the full shape. and it found a couple bugs {emoji:1f603} Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D30200058 Pulled By: eellison fbshipit-source-id: 6226be87f5390277cfa5a1fffaa1b072d4bc8803	2021-08-10 09:47:33 -07:00
Richard Barnes	9e77113e85	irange-ify 11 (#62121 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62121 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29879701 fbshipit-source-id: 5c51879c88fa6a5790db241c8b33ec0dc4b177ca	2021-07-28 13:32:09 -07:00
Meghan Lele	05b802d4e0	[pytorch] Bring back RemoveInplaceOps() (#62200 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62200 This commit brings back the `RemoveInplaceOps` pass removed in D29523283 (`dec5aa2260`) that apparently had a bunch of internal users. Test Plan: danthe3rd Reviewed By: danthe3rd Differential Revision: D29833316 fbshipit-source-id: 6cf13d463ab0a5e50ba3eb3243f79a9c51623809	2021-07-28 12:00:38 -07:00

1 2 3 4 5 ...

327 Commits