pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-21 05:34:18 +08:00

Author	SHA1	Message	Date
PyTorch MergeBot	6063bb9d45	Revert "[BE][Easy] enable postponed annotations in `torchgen` (#129376 )" This reverts commit 494057d6d4e9b40daf81a6a4d7a8c839b7424b14. Reverted https://github.com/pytorch/pytorch/pull/129376 on behalf of https://github.com/huydhn due to Sorry for reverting your change but I need to revert to cleanly revert https://github.com/pytorch/pytorch/pull/129374, please do a rebase and reland this ([comment](https://github.com/pytorch/pytorch/pull/129375#issuecomment-2197800541))	2024-06-29 00:44:25 +00:00
Xuehai Pan	494057d6d4	[BE][Easy] enable postponed annotations in `torchgen` (#129376 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/129376 Approved by: https://github.com/ezyang ghstack dependencies: #129375	2024-06-28 15:37:57 +00:00
Xuehai Pan	b697808056	[BE][Easy] eliminate relative import in `torchgen` (#128872 ) Fix generated by: ```bash ruff check --config 'lint.flake8-tidy-imports.ban-relative-imports="all"' --fix --select=TID $(fd '.pyi?$' torchgen) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/128872 Approved by: https://github.com/zou3519	2024-06-21 14:11:46 +00:00
angelayi	493478db4a	[effects] Add inductor support for tokens (#122347 ) Given the following code/dynamo graph: ``` class GraphModule(torch.nn.Module): def forward(self, L_x_ : torch.Tensor): l_x_ = L_x_ _print = torch.ops.aten._print('moo') res = l_x_ + l_x_; l_x_ = None _print_1 = torch.ops.aten._print('moo') return (res,) ``` AOTAutograd will trace the following program, threading tokens from the inputs, through the effectful operator calls (torch.ops.aten._print), and as an output: ``` class <lambda>(torch.nn.Module): def forward(self, arg0_1: "f32[0]", arg1_1: "f32[2, 3]"): with_effects = torch._higher_order_ops.effects.with_effects(arg0_1, torch.ops.aten._print.default, 'moo'); arg0_1 = None getitem: "f32[0]" = with_effects[0]; with_effects = None add: "f32[2, 3]" = torch.ops.aten.add.Tensor(arg1_1, arg1_1); arg1_1 = None with_effects_1 = torch._higher_order_ops.effects.with_effects(getitem, torch.ops.aten._print.default, 'moo'); getitem = None getitem_2: "f32[0]" = with_effects_1[0]; with_effects_1 = None return (getitem_2, add) ``` However when we get to inductor, since we want the inductor generated code to not have any token inputs/outputs for better readability, we want to modify the aten graph by removing the tokens from inputs, and creating them through `torch.ops.aten._make_dep_token`, and sinking them through the `torch.ops.aten._sink_tokens` operators. This has to be done after the partitioner, otherwise the partitioner will add the make_token/sink_token operators to the backwards graph. ``` class <lambda>(torch.nn.Module): def forward(self, arg1_1: "f32[2, 3]"): _make_dep_token_default: "f32[0]" = torch.ops.aten._make_dep_token.default() with_effects = torch._higher_order_ops.effects.with_effects(_make_dep_token_default, torch.ops.aten._print.default, 'moo'); _make_dep_token_default = None getitem: "f32[0]" = with_effects[0]; with_effects = None add: "f32[2, 3]" = torch.ops.aten.add.Tensor(arg1_1, arg1_1); arg1_1 = None with_effects_1 = torch._higher_order_ops.effects.with_effects(getitem, torch.ops.aten._print.default, 'moo'); getitem = None getitem_2: "f32[0]" = with_effects_1[0]; with_effects_1 = None _sink_tokens_default = torch.ops.aten._sink_tokens.default((getitem_2,)); getitem_2 = None return (add,) ``` When doing inductor lowering, we convert `with_effects` calls to an `EffectfulKernel`, which just a `FallbackKernel` but with a pointer to previous effectful operator's call. During scheduling, we will create a `StarDep` between the EffectfulKernel and its previous EffectfulKernel so that they don't get reordered. The inductor generated python code looks like: ``` def call(args): arg1_1, = args args.clear() assert_size_stride(arg1_1, (2, 3), (3, 1)) # Source Nodes: [_print], Original ATen: [] buf2 = aten._print.default('moo') # Source Nodes: [_print_1], Original ATen: [] buf3 = aten._print.default('moo') buf4 = empty_strided_cpu((2, 3), (3, 1), torch.float32) cpp_fused_add_0(arg1_1, buf4) del arg1_1 return (buf4, ) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/122347 Approved by: https://github.com/bdhirsh	2024-04-09 03:22:32 +00:00
Joel Schlosser	cd6bfc7965	Proper view support for jagged layout NestedTensor (#113279 ) This PR: * Introduces an ATen op for creating true jagged views from a dense values buffer * `_nested_view_from_jagged(values, offsets, lengths, ragged_idx, dummy)` * This ops is implemented on the Python side using torch.library so we can return a subclass instance * `jagged_from_list()` now uses this instead of the old autograd.Function `NestedViewFromBuffer` * The latter op is used for non-contiguous JTs returned via `torch.nested.narrow()` * `dummy` is an awful hack to ensure that `NestedTensor.__torch_dispatch__()` is invoked for our view * Introduces an ATen op for accessing the `values` component of an NT via a view * `_nested_get_values(nt)` * Removes the autograd.Functions `ViewNestedFromBuffer` and `ViewBufferFromNested` in favor of `nested_from_values_offsets()` / `nested_from_values_offsets_lengths()` and `nt.values()`, respectively. * Changes test code to prefer `as_nested_tensor()` over `jagged_from_list()` directly * Similarly, avoid `buffer_from_jagged()`, preferring `values()` * Depends on general subclass view fake-ification on the PT2 side (handled solely in previous PRs in the stack) With these changes, the semantics of jagged layout NTs are such that they are considered a true view of the underlying `values` buffer. This means views of jagged NTs are views of the underlying buffer as well, simplifying some handling. Differential Revision: [D54269922](https://our.internmc.facebook.com/intern/diff/D54269922) Co-authored-by: voznesenskym <voznesenskym@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113279 Approved by: https://github.com/ezyang	2024-03-22 02:12:36 +00:00
PyTorch MergeBot	224beecee6	Revert "Proper view support for jagged layout NestedTensor (#113279 )" This reverts commit 5855c490f09a028bfdfefea8b93c9833eb55dc5c. Reverted https://github.com/pytorch/pytorch/pull/113279 on behalf of https://github.com/jbschlosser due to Need to fix BC thing ([comment](https://github.com/pytorch/pytorch/pull/113279#issuecomment-2013899762))	2024-03-21 22:03:01 +00:00
Joel Schlosser	5855c490f0	Proper view support for jagged layout NestedTensor (#113279 ) This PR: * Introduces an ATen op for creating true jagged views from a dense values buffer * `_nested_view_from_jagged(values, offsets, lengths, ragged_idx, dummy)` * This ops is implemented on the Python side using torch.library so we can return a subclass instance * `jagged_from_list()` now uses this instead of the old autograd.Function `NestedViewFromBuffer` * The latter op is used for non-contiguous JTs returned via `torch.nested.narrow()` * `dummy` is an awful hack to ensure that `NestedTensor.__torch_dispatch__()` is invoked for our view * Introduces an ATen op for accessing the `values` component of an NT via a view * `_nested_get_values(nt)` * Removes the autograd.Functions `ViewNestedFromBuffer` and `ViewBufferFromNested` in favor of `nested_from_values_offsets()` / `nested_from_values_offsets_lengths()` and `nt.values()`, respectively. * Changes test code to prefer `as_nested_tensor()` over `jagged_from_list()` directly * Similarly, avoid `buffer_from_jagged()`, preferring `values()` * Depends on general subclass view fake-ification on the PT2 side (handled solely in previous PRs in the stack) With these changes, the semantics of jagged layout NTs are such that they are considered a true view of the underlying `values` buffer. This means views of jagged NTs are views of the underlying buffer as well, simplifying some handling. Differential Revision: [D54269922](https://our.internmc.facebook.com/intern/diff/D54269922) Co-authored-by: voznesenskym <voznesenskym@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113279 Approved by: https://github.com/ezyang	2024-03-20 23:45:34 +00:00
angelayi	f064dec7e0	Add torch.ops.aten.print (#120295 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120295 Approved by: https://github.com/zou3519	2024-02-27 01:34:59 +00:00
PyTorch MergeBot	b01bd1f7a1	Revert "Add torch.ops.aten.print (#120295 )" This reverts commit 3b944113c837e1111510487f4525aa07039462fe. Reverted https://github.com/pytorch/pytorch/pull/120295 on behalf of https://github.com/kit1980 due to breaking internal builds, see D54123688 ([comment](https://github.com/pytorch/pytorch/pull/120295#issuecomment-1965618191))	2024-02-27 01:18:48 +00:00
angelayi	3b944113c8	Add torch.ops.aten.print (#120295 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120295 Approved by: https://github.com/zou3519	2024-02-23 17:01:22 +00:00
Sam Larsen	40a6710ad3	Mark set_ as an inplace view op (#115769 ) Summary: To be used in https://github.com/pytorch/pytorch/pull/113873. Since set_ is effectively an inplace view op, we'll need to skip caching them. Test Plan: Built pytorch; specifically this step: `/home/slarsen/local/miniconda3/envs/pytorch-3.10/bin/python -m torchgen.gen --source-path /home/slarsen/local/pytorch/cmake/../aten/src/ATen --install_dir /home/slarsen/local/pytorch/build/aten/src/ATen --per-operator-headers --generate sources --output-dependencies /home/slarsen/local/pytorch/build/aten/src/ATen/generated_sources.cmake` Differential Revision: [D52814561](https://our.internmc.facebook.com/intern/diff/D52814561) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115769 Approved by: https://github.com/bdhirsh	2024-01-17 15:32:18 +00:00
Edward Z. Yang	003c900d5e	Add _assert_scalar (#117378 ) Peeled off from https://github.com/pytorch/pytorch/pull/114148, because that PR is going to take a while to actually land. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/117378 Approved by: https://github.com/jansel	2024-01-14 00:50:36 +00:00
PyTorch MergeBot	1174e82bde	Revert "Add _assert_scalar and teach Inductor to codegen it (#114148 )" This reverts commit b6028acfa46363c1d3262a1522741a06c307843f. Reverted https://github.com/pytorch/pytorch/pull/114148 on behalf of https://github.com/osalpekar due to Going to revert this given the broken torchrec PT2 tests internally: [D52648865](https://www.internalfb.com/diff/D52648865). Logs aren't too clear but @dstaay-fb can help debug as well ([comment](https://github.com/pytorch/pytorch/pull/114148#issuecomment-1886100368))	2024-01-11 02:30:22 +00:00
Edward Z. Yang	b6028acfa4	Add _assert_scalar and teach Inductor to codegen it (#114148 ) Inductor codegen for `_assert_async` is currently disabled because we don't really understand how to codegen `scalar_to_tensor` on a Sympy expression. I initially tried to see if I could get this to work, but I got into some weird problem involving stride sorting, so I decided to fix it properly by not going through a tensor. So we introduce an `_assert_scalar` which takes a scalar as an argument, avoiding needing to turn a SymBool into a tensor before asserting on it. I also add `_functional_assert_scalar` for good luck, although this doesn't do anything right now because https://github.com/pytorch/pytorch/pull/104203 still hasn't been landed. I need to customize the codegen for this operator, so I decide to directly implement it in Inductor, rather than trying to treat it as a generic ExternKernel. This leads to the new AssertScalar IR node. This is written carefully so that it doesn't get DCE'd by Inductor. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/114148 Approved by: https://github.com/jansel	2024-01-09 23:21:26 +00:00
PyTorch MergeBot	497777e302	Revert "Mark set_ as an inplace view op (#115769 )" This reverts commit cd449e260c830c9ce0f06ed4833b46aa638f1529. Reverted https://github.com/pytorch/pytorch/pull/115769 on behalf of https://github.com/jeanschmidt due to breaking landing signals internally, more details on the diff, author is tagged ([comment](https://github.com/pytorch/pytorch/pull/115769#issuecomment-1866846607))	2023-12-21 19:53:32 +00:00
Sam Larsen	cd449e260c	Mark set_ as an inplace view op (#115769 ) Summary: To be used in https://github.com/pytorch/pytorch/pull/113873. Since set_ is effectively an inplace view op, we'll need to skip caching them. Test Plan: Built pytorch; specifically this step: `/home/slarsen/local/miniconda3/envs/pytorch-3.10/bin/python -m torchgen.gen --source-path /home/slarsen/local/pytorch/cmake/../aten/src/ATen --install_dir /home/slarsen/local/pytorch/build/aten/src/ATen --per-operator-headers --generate sources --output-dependencies /home/slarsen/local/pytorch/build/aten/src/ATen/generated_sources.cmake` Pull Request resolved: https://github.com/pytorch/pytorch/pull/115769 Approved by: https://github.com/bdhirsh	2023-12-19 23:08:05 +00:00
Jesse Cai	4471fe6c39	[sparse][semi-structured] add alg_id to _cslt_sparse_mm and _cslt_sparse_mm_search (#115178 ) Summary: cuSPARSELt has support for different alg_id, which are set via `cusparseLTMatmulAlgSetAttribute`, in total there are 4 different alg_ids, 0 - 3. Previously we were just using the default alg_id, as from our initial experiments we found that for most shapes the default alg_id is the fastest and that they made no difference on numerical correctness, just performance. From our previous experiments the fastest alg_id seemed to differ only on small matmul shapes. danthe3rd found a performance regression when running with cuSPARSELt v0.4.0 vs v0.5.0, on LLM shapes, which match these characteristics (activations are small, weights are large). However it's likely that this is due to the alg_id ordering changing, as mentioned in the release notes for v0.5.0. ``` cusparseLtMatmulAlgSelectionInit() does not ensure the same ordering of algorithm id alg as in v0.4.0. ``` This PR adds in the following: - support for passing in alg_id to _cslt_sparse_mm - a new op, _cslt_sparse_mm_search, which returns the optimal alg_id for a given matmul _cslt_sparse_mm_search has the same function signature as _cslt_sparse_mm, minus the alg_id parameter. We are able to achieve v0.4.0 performance with alg_id=1 on the shapes that daniel provided. We will address autoselecting the best alg_id in a future PR, possibly with torch.compile. Test Plan: ``` python test/test_sparse_semi_structured -k cslt ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/115178 Approved by: https://github.com/cpuhrsch	2023-12-11 23:08:51 +00:00
PyTorch MergeBot	40a14e07ef	Revert "[sparse][semi-structured] add alg_id to _cslt_sparse_mm and _cslt_spasre_mm_search (#115178 )" This reverts commit 1e5636f7915035b09dce22ad1d2170a65f344214. Reverted https://github.com/pytorch/pytorch/pull/115178 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but the Window build failure looks legit `1e5636f791` ([comment](https://github.com/pytorch/pytorch/pull/115178#issuecomment-1850605711))	2023-12-11 18:07:17 +00:00
Jesse Cai	1e5636f791	[sparse][semi-structured] add alg_id to _cslt_sparse_mm and _cslt_spasre_mm_search (#115178 ) Summary: cuSPARSELt has support for different alg_id, which are set via `cusparseLTMatmulAlgSetAttribute`, in total there are 4 different alg_ids, 0 - 3. Previously we were just using the default alg_id, as from our initial experiments we found that for most shapes the default alg_id is the fastest and that they made no difference on numerical correctness, just performance. From our previous experiments the fastest alg_id seemed to differ only on small matmul shapes. danthe3rd found a performance regression when running with cuSPARSELt v0.4.0 vs v0.5.0, on LLM shapes, which match these characteristics (activations are small, weights are large). However it's likely that this is due to the alg_id ordering changing, as mentioned in the release notes for v0.5.0. ``` cusparseLtMatmulAlgSelectionInit() does not ensure the same ordering of algorithm id alg as in v0.4.0. ``` This PR adds in the following: - support for passing in alg_id to _cslt_sparse_mm - a new op, _cslt_sparse_mm_search, which returns the optimal alg_id for a given matmul _cslt_sparse_mm_search has the same function signature as _cslt_sparse_mm, minus the alg_id parameter. We are able to achieve v0.4.0 performance with alg_id=1 on the shapes that daniel provided. We will address autoselecting the best alg_id in a future PR, possibly with torch.compile. Test Plan: ``` python test/test_sparse_semi_structured -k cslt ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/115178 Approved by: https://github.com/cpuhrsch	2023-12-11 15:47:28 +00:00
rzou	71dca16610	Grandfather autogen'ed ops as pt2_compliant (#113036 ) Summary: I missed this when I grandfathered torchgen'ed aten ops as pt2_compliant. Test Plan: New test. Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/113036 Approved by: https://github.com/williamwen42	2023-11-06 23:43:17 +00:00
Kazuaki Ishizaki	ac48c11ab7	Fix typo under torchgen directory (#111154 ) This PR fixes typo in comments and messages in files under `torchgen` directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111154 Approved by: https://github.com/rajveer43, https://github.com/Skylion007	2023-10-13 16:43:46 +00:00
Tugsbayasgalan Manlaibaatar	20c5add133	[export] Refactor `constrain_as_value` and `constrain_as_size` (#106591 ) Some notable changes: 1. `constrain_as_size` allows min value to be less than 2 as it will unconditionally assume min >= 2 for compiler purposes. Instead, we add additional check to make sure max value is always greater than 2. 2. Previously, we used to runtime assert on the unbacked symint's val range which would be always between [2, max]. I modified this logic to assert on [0, max] unless user explicitly specifies the min range. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106591 Approved by: https://github.com/gmagogsfm, https://github.com/ezyang	2023-08-15 05:41:43 +00:00
PyTorch MergeBot	745d29b0cc	Revert "[export] Refactor `constrain_as_value` and `constrain_as_size` (#106591 )" This reverts commit 18989890bfc4d74dbf4a175d425b5b291e09cb8b. Reverted https://github.com/pytorch/pytorch/pull/106591 on behalf of https://github.com/izaitsevfb due to Breaks inductor test on trunk ([comment](https://github.com/pytorch/pytorch/pull/106591#issuecomment-1675069091))	2023-08-11 16:37:47 +00:00
Tugsbayasgalan Manlaibaatar	18989890bf	[export] Refactor `constrain_as_value` and `constrain_as_size` (#106591 ) Some notable changes: 1. `constrain_as_size` allows min value to be less than 2 as it will unconditionally assume min >= 2 for compiler purposes. Instead, we add additional check to make sure max value is always greater than 2. 2. Previously, we used to runtime assert on the unbacked symint's val range which would be always between [2, max]. I modified this logic to assert on [0, max] unless user explicitly specifies the min range. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106591 Approved by: https://github.com/gmagogsfm, https://github.com/ezyang	2023-08-11 05:29:22 +00:00
Alexander Pivovarov	02abbb8109	Fix some typos, mostly "that that" (#106901 ) Fix some typos Pull Request resolved: https://github.com/pytorch/pytorch/pull/106901 Approved by: https://github.com/janeyx99	2023-08-10 19:46:53 +00:00
Aaron Gokaslan	6d43c89f37	[BE]: Update Ruff to 0.0.280 (#105724 ) Removes unusued loop values in python dictionary iteration. Automated fix from Ruff master Pull Request resolved: https://github.com/pytorch/pytorch/pull/105724 Approved by: https://github.com/ezyang, https://github.com/janeyx99	2023-07-22 23:03:34 +00:00
xuanqi	b27c3558a4	[RFC]: Create aten native op for constrain_range (#103346 ) At high current implementation of constrains functions (constrain_as_) will raise exception for the following code snippets: ``` def f(x): a = x.item() constrain_as_size(a, 4, 7) return torch.empty((a, 4)) inp = torch.tensor([5]) ep = torch._export.export(f, (inp,)) ``` The reason is because current constrain logic is: 1) Purely python so it won't survive AOT export (the full node is gone after AOT export since AOT export only maintains aten level op). 2) Utilize side effect to add range constraints for traced symbol's shape env ([code](`9591e52880/torch/fx/experimental/symbolic_shapes.py (L370-L372)`)). 3) If runtime assertion is turned on (by default). [`_AddRuntimeAssertionsForConstraintsPass`](`9591e52880/torch/_export/passes/add_runtime_assertions_for_constraints_pass.py (L98-L100)`) will try to append assertion node based on range constrains extracted from shape env of symbol during another interpretation round. 4). However, since 1), in the round of AOT export, range constraints logic won't run for symbols generated during this round. And later there is no range constrains information available for assertion round and caused issue. 5) As a result of above, it will failure at `torch.empty((a, 4))` (there is no constrains for `a` that it must be positive). The fix here is just to implement range constrain logic as a native aten op (CPU implementation as no-op) to make it be able to survive AOT export. NOTE:** [Logic](`2d745b95d7/torch/fx/experimental/symbolic_shapes.py (L350-L365C15)`) within [`constrain_range`](`2d745b95d7/torch/fx/experimental/symbolic_shapes.py (LL313C74-L313C74)`) is split out as `constrain_range_int` to capture case when non `SymInt` is passed in and reused in the new `_constrain_range`. The reason is when non `SymInt` is provided: * If it directly calls `sym_constrain_range`, the C++ version will be called which will be no-op. * So in this case it calls `constrain_range_int` instead to be able to capture issue like user provides a input whose tensor's shape could be out of range during exporting, like the following for above code example: ``` ... inp = torch.tensor([10]) ep = torch._export.export(f, (inp,)) # immediately raise error ``` Differential Revision: [D46734204](https://our.internmc.facebook.com/intern/diff/D46734204) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103346 Approved by: https://github.com/tugsbayasgalan	2023-06-16 14:55:40 +00:00
Tugsbayasgalan Manlaibaatar	d4bf76c2a4	Persist torch.assert in aten graph (#100101 ) This PR introduces a new operator called aten._assert_async.msg, which allows passing a tensor value and assertion message as inputs. As part of TorchDynamo, we're replacing the use of torch._assert with this new operator so that make_fx also knows how to handle assertions. This is subset of https://github.com/pytorch/pytorch/pull/98878, refer there for historic reviews. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100101 Approved by: https://github.com/jansel	2023-04-28 07:31:43 +00:00
Joel Schlosser	77e73b9b7a	Refactor NT offsets metadata to be a Tensor (#96909 ) It's tedious work, but somebody's gotta do it. Benefits: * Enable access to offsets metadata from Python via private API (for validation, etc.) * Consistency with nested sizes / strides metadata * Needed for SymInt-ifying offsets metadata * more TBD Bonus: * Remove `_tensor` suffixes from metadata / getter names Pull Request resolved: https://github.com/pytorch/pytorch/pull/96909 Approved by: https://github.com/drisspg	2023-03-21 18:51:35 +00:00
BowenBao	60a68477a6	Bump black version to 23.1.0 (#96578 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96578 Approved by: https://github.com/ezyang	2023-03-15 06:27:59 +00:00
Aaron Gokaslan	67d9790985	[BE] Apply almost all remaining flake8-comprehension checks (#94676 ) Applies the remaining flake8-comprehension fixes and checks. This changes replace all remaining unnecessary generator expressions with list/dict/set comprehensions which are more succinct, performant, and better supported by our torch.jit compiler. It also removes useless generators such as 'set(a for a in b)`, resolving it into just the set call. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94676 Approved by: https://github.com/ezyang	2023-02-12 01:01:25 +00:00
Jacob Szwejbka	2e9107ec1e	[Pytorch][Executorch] Handwritten view copy out ops should resize out (#91194 ) Summary: Handwritten out ops should have feature parity with the codegend ones. This means they should resize out to the appropriate size. Q. Why are these handwritten instead of codegend anyway? Q2. Wheres a good spot to put the resize and copy helpers since they are reused in the codegend out kernels Test Plan: ci. Differential Revision: D42177051 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91194 Approved by: https://github.com/ezyang	2023-01-30 23:07:14 +00:00
Driss Guessous	51c6c5e156	[SDPA] Standardizes the return shape for dense tensor of SDPA regardless of fused kernel called (#90776 ) # Summary Continues to fix up the meta output story of SDPA to be more correct Pull Request resolved: https://github.com/pytorch/pytorch/pull/90776 Approved by: https://github.com/cpuhrsch	2022-12-14 18:08:02 +00:00
Driss Guessous	b291c1213a	Create native function for determining which implementation of SDP to call (#89029 ) # Summary Creates a callable native function that can determine which implementation of scaled dot product will get called. This allows to bump re-order the runtime dispatch of SDP to enable autograd. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89029 Approved by: https://github.com/cpuhrsch	2022-11-16 03:07:54 +00:00
kshitij12345	56a41b5998	[composite compliance] ctc_loss (#84752 ) #Ref #69991 I have mixed feelings about adding new (private) operators. Backends writers will have to override them as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84752 Approved by: https://github.com/zou3519	2022-09-22 00:21:11 +00:00
drisspg	bda8a5729b	[Nested Tensor] Create differentiable nt to tensor view functions (#83371 ) This PR attempts to implements 2) "the safe way" of creating a view of nested tensor that returns a regular tensor. The rest of the break down is here: https://fb.quip.com/J8QCAx41af11 https://gist.github.com/drisspg/8622e9c97d374fa920ac647e1167cabc This is a short list of some edge cases. After some more work I was able to address two of the test cases in the above gist. There are few complex aspects here that I left defeated comments inline. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83371 Approved by: https://github.com/bdhirsh	2022-09-13 20:35:58 +00:00
Eli Uriegas	93aef3a010	Use presence of _symint in kernel name to generate symint sig or not (#84579 ) Something people found confusing was that whether or not a native:: signature would get SymInt or not in its type was based on the dispatch key. This changes it so that SymInt or not in type is based on whether or not you have _symint in the name of the kernel or not. This means that even when we make operators support SymInt, you no longer have to go and update all the preexisting definitions; instead, you now selectively write _symint to opt individual kernels into SymInt support. I then go and update a bunch of kernels that don't have proper SymInt support to make use of this convention. There is some hacking around for view generation code. I also add support for external backends to specify 'symint' operators, for which we generate SymInt signatures instead of regular signatures. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D39310060](https://our.internmc.facebook.com/intern/diff/D39310060) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84579 Approved by: https://github.com/wconstab	2022-09-09 18:31:56 +00:00
YifanShenSZ	673b35c847	Better reshape with autograd support (#82754 ) (#84154 ) The original author is @YifanShenSZ and the original PR is: #82754 # Summary: Previous reshape [https://github.com/pytorch/pytorch/issues/80981](https://github.com/pytorch/pytorch/pull/80981) is ok for forward, but needs improvement for backward: need to handle "sometimes view sometimes copy" behavior. This pull request fixes it by: 1. add a new alias dispatch key `CompositeImplicitAutogradNestedTensor`, which ideally would work as nested-tensor version of `CompositeImplicitAutograd` 2. register `reshape_nested` to `reshape` by `CompositeImplicitAutogradNestedTensor` Side changes: * add contiguous memory format support to `clone_nested` * add `view_nested` * add `reshape_as_nested` Fix issue [https://github.com/pytorch/pytorch/issues/83041](https://github.com/pytorch/pytorch/issues/83041) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82754 Test Plan: Imported from GitHub, without a `Test Plan:` line. Static Docs Preview: executorch \|[Full Site](https://our.intern.facebook.com/intern/staticdocs/eph/D39023822/V13/executorch/)\| \|Modified Pages\| Reviewed By: albanD Differential Revision: D39023822 Pulled By: drisspg Pull Request resolved: https://github.com/pytorch/pytorch/pull/84154 Approved by: https://github.com/bdhirsh, https://github.com/albanD	2022-09-01 20:01:39 +00:00
Edward Z. Yang	9152144944	Coverage for nondeterministic_seeded, respect it in constant prop (#83650 ) - nondeterministic_seeded was not applied to enough functions. I added some heuristics to codegen for identifying functions that are likely to be random and added a bunch of these tags to functions. Not sure I got all of them. - Don't constant propagate through nondeterministic functions in FX tracing. It would be better to do some testing for the tag but this would be quite an effort. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/83650 Approved by: https://github.com/bdhirsh, https://github.com/eellison	2022-08-18 22:18:10 +00:00
Mengwei Liu	d0d6b1f222	[torchgen] Generate out variant for functional operator (#81437 ) Summary: Previously we don't generate out variant (both schema and kernel) for an operator with functional variant only. This adds support for that and adds test. ## Changes on `native_function_generation.py` We are generating out variant for all functional variants if possible. This PR introduces a lot of newly generated out variants and `native_functions.yaml` needs to incorporate the changes by adding `autogen` keywords. The logic for determining what operators we should generate an out variant for is the following: 1. No existing out variant for this `NativeFunction` 2. Contains an existing in place, mutable or functional variant 3. Contains at least 1 tensor like return(s) For operators matching the first two conditions but failing the third, I listed them in `FUNCTIONAL_OPS_THAT_CANNOT_GET_AN_OUT_VARIANT`. ## Special handling The following operators satisfy all 3 criteria above but we chose to not autogen them, with some reasons. * `mkldnn_adaptive_avg_pool2d`, the generated out variant `mkldnn_adaptive_avg_pool2d.out` is colliding with the `mkldnn_adaptive_avg_pool2d_out` kernel in `adaptive_avg_pool2d.out` operator. I manually created `mkldnn_adaptive_avg_pool2d.out` and renamed `mkldnn_adaptive_avg_pool2d_out` to `mkldnn_adaptive_avg_pool2d_out_stub`. * `min`, `max` and `mean`. There already exist `min.out`, `max.out` and `mean.out` but they are having different semantics with the functional ones. I manually created `min.unary_out`, `max.unary_out` and `mean.dtype_out` to disambiguate. ## Autograd Changes We introduced a logic to not match derivatives info in `derivatives.yaml` to out variant, since we are generating `NOT_IMPLEMENTED` kernels for those out variants anyway. The issue we are seeing with the original logic is that it doesn't handle `TensorOption` arguments really well. For example we have these two operators: * `_to_copy(Tensor self, , ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, bool non_blocking=False, MemoryFormat? memory_format=None) -> Tensor` `_to_copy.out(Tensor self, *, bool non_blocking=False, MemoryFormat? memory_format=None, Tensor(a!) out) -> Tensor(a!)` If we uses `_to_copy` derivative info, there will be compilation error since `dtype` is missing from `_to_copy.out` signature. Test Plan: Rely on unit test Differential Revision: D37832342 Pull Request resolved: https://github.com/pytorch/pytorch/pull/81437 Approved by: https://github.com/iseeyuan, https://github.com/bdhirsh	2022-08-13 05:44:53 +00:00
Brian Hirsh	684ce1b0bc	add inplace_view tag to resize_() (#82667 ) `resize_()` is annoying because it needs special casing for functionalization. It's technically an inplace-view op, but it can't really have a pure view variant, since calling resize_() might bust the old storage. I gave it an `inplace_view` tag so that stuff like `FakeTensor` that relies on tags will pick it up properly, which required jumping through some codegen hoops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82667 Approved by: https://github.com/eellison	2022-08-03 18:13:00 +00:00
Brian Hirsh	960758b0b7	fix overload ambiguity with functional ops; fix _foreach op grouping (#80556 ) This should fix the last issue that @anijain2305 hit when running ResNet with TorchDynamo <> functionalization. Today if you try to call an `OpOverloadPacket` from python with some arguments, we will use the types of those arguments to perform overload resolution. With some functional variants of ops, this can be ambiguous. Today this affects just one op: `_fused_moving_avg_obs_fq_helper`, although it would potentially affect e.g. `native_batch_norm` in the future. Example: ``` # There are technically two overloads: # torch.ops.aten._fused_moving_avg_obs_fq_helper.default (returns 2 argument, mutates 4 of its inputs inplace) # torch.ops.aten._fused_moving_avg_obs_fq_helper.functional (returns 6 argument, mutates none of its inputs) # We pick the wrong one - no way to know that we should pick the functional one, just from the call site. outs = torch.ops.aten._fused_moving_avg_obs_fq_helper(a, a, a, a, a, a, a, 1.0, 0, 1, 0) # raises an error - tries to call the overload with only 2 returns return _fused_moving_avg_obs_fq_helper_functional[5] ``` Specifically, functionalization will bake `_fused_moving_avg_obs_fq_helper.functional` into the graph, but when AOTAutograd tries to compile with TorchScript, it needs to remove the overload name (TS doesn't know how to parse overload names directly, so we need to remove the overload name and let it infer the right overload at runtime later- so it picks the wrong one). The situation is pretty similar to inplace; `ops.aten.add` and `ops.aten.add_` represent two different `OverloadPacket` objects; they can't be overloads of the same op, because their schemas would be ambiguous - the alias annotations are different, but that isn't enough to disambiguate). In this PR, I try to fix the situation in a pretty similar way to how we handle `inplace` in the data model: `inplace` ops get their own base operator name, but they are represented as a flag inside of `BaseOperatorName` in the data model. Two other important changes that I made as part of this PR: (1) Originally, there were ~100 different `_functional` operators: e.g. we had operators named `resize.functional` and `zero.functional`. The `_functional` bit isn't actually necessary in most cases: it's only necessary for operators that also* have a `SchemaKind.mutable` variant, where `_fused_moving_avg_obs_fq_helper` is the only op that fits that description today. So I removed the unnecessary notion of "functional" from those other ops. I also added a bunch of assertions to force this restriction. I think that makes more sense in the long run, because it eliminates an unnecessary difference in the model. E.g. we don't have `add_.Tensor` and `add.Tensor_functional`. We just have `add_.Tensor` and `add.Tensor`. (2) I noticed that we actually still weren't pairing up a bunch of `_foreach` operators correctly, because their input arguments were different (`self` vs. `tensors`). Since they're private API's, I went ahead and changed the argument names directly so they get matched up. Before this PR, we were generating a separate `_foreach_add` and `_foreach_add.functional` variant in a bunch of cases, that really did the same thing (but happened to have a different name for the first argument). Pull Request resolved: https://github.com/pytorch/pytorch/pull/80556 Approved by: https://github.com/ezyang, https://github.com/albanD	2022-07-06 12:45:11 +00:00
Brian Hirsh	adf8060600	add a new alias key for functional to view op decompositions Pull Request resolved: https://github.com/pytorch/pytorch/pull/79615 Approved by: https://github.com/zou3519	2022-06-15 23:18:09 +00:00
Mengwei Liu	24050a5801	[RFC][Codegen] Add custom namespace support (#78015 ) Summary: Adding a feature to allow user to specify namespaces for operator and kernels. # Feature There's a feature request to allow DSL to: 1. take in an operator namespace other than `aten`. 2. take in a kernel that is in a different namespace than `at::native`. For both features, we only allow user to have a single layer of namespace for the sake of simplicity. If user specify `custom::function` as kernel, the codegen will depend on `custom::native::function` where `native` is hardcoded. # Proposal For feature 1, add a `namespace` attribute to data class `NativeFunction`. The namespace will be extract out by matching pattern "::" on the `func` variable. For `NativeFunctionsGroup` there's an assumption that all variants (function, inplace, out) will have the same namespace. By default (if not specified) the namespace will be "aten". For feature 2, add a `namespace` attribute to `BackendMetadata` class, similarly match pattern "::" on the kernel field. Remove the `cpp_namespace` field from `register_dispatch_key` data class. By default (if not specified) the namespace for a kernel would be "at::native". Test Plan: Example yaml entries: ``` - func: custom::gelu.out(Tensor self, , str approximate='none', Tensor(a!) out) -> Tensor(a!) structured: True structured_inherits: TensorIteratorBase device_check: NoCheck # TensorIterator python_module: nn dispatch: CPU: custom::gelu_out_cpu CUDA: custom::gelu_out_cuda MPS: custom::gelu_out_mps - func: custom::gelu_(Tensor(a!) self, , str approximate='none') -> Tensor(a!) structured_delegate: gelu.out device_check: NoCheck # TensorIterator python_module: nn dispatch: NestedTensorCPU, NestedTensorCUDA: custom::NestedTensor_gelu_ - func: custom::gelu(Tensor self, , str approximate='none') -> Tensor structured_delegate: gelu.out device_check: NoCheck # TensorIterator python_module: nn dispatch: MkldnnCPU: custom::mkldnn_gelu QuantizedCPU: custom::gelu_quantized_cpu NestedTensorCPU, NestedTensorCUDA: custom::NestedTensor_gelu ``` see generated code: `RegisterCPU.cpp`: ``` TORCH_LIBRARY_IMPL(aten, CPU, m) { ... } TORCH_LIBRARY_IMPL(custom, CPU, m) { m.impl("gelu", TORCH_FN(wrapper_gelu)); m.impl("gelu.out", TORCH_FN(wrapper_gelu_out_out)); m.impl("gelu_", TORCH_FN(wrapper_gelu_)); }; ``` ``` struct structured_gelu_out_cpu_inplace final : public custom::native::structured_gelu_out_cpu { structured_gelu_out_cpu_inplace(Tensor& self) : outputs_{std::ref(self)} {} void set_output_strided( int64_t output_idx, IntArrayRef sizes, IntArrayRef strides, TensorOptions options, DimnameList names ) override { const auto& out = outputs_[output_idx].get(); check_inplace(out, sizes, options); auto maybe_proxy = maybe_create_proxy(out, sizes, strides, options); if (C10_UNLIKELY(maybe_proxy.has_value())) { proxy_outputs_[output_idx] = c10::ExclusivelyOwned<Tensor>(std::move(maybe_proxy).value()); } if (!names.empty()) { namedinference::propagate_names(outputs_[output_idx], names); } // super must happen after, so that downstream can use maybe_get_output // to retrieve the output custom::native::structured_gelu_out_cpu::set_output_raw_strided(output_idx, sizes, strides, options, names); } void set_output_raw_strided( int64_t output_idx, IntArrayRef sizes, IntArrayRef strides, TensorOptions options, DimnameList names ) override { const auto& out = outputs_[output_idx].get(); check_inplace(out, sizes, options); if (!names.empty()) { namedinference::propagate_names(outputs_[output_idx], names); } // super must happen after, so that downstream can use maybe_get_output // to retrieve the output custom::native::structured_gelu_out_cpu::set_output_raw_strided(output_idx, sizes, strides, options, names); } const Tensor& maybe_get_output(int64_t output_idx) override { return proxy_outputs_[output_idx].has_value() ? proxy_outputs_[output_idx] : outputs_[output_idx].get(); } std::array<std::reference_wrapper<Tensor>, 1> outputs_; std::array<c10::optional<c10::ExclusivelyOwned<Tensor>>, 1> proxy_outputs_; }; ``` `RegisterSchema.cpp` ``` TORCH_LIBRARY(aten, m) { ... } TORCH_LIBRARY(custom, m) { m.def("gelu.out(Tensor self, , str approximate='none', Tensor(a!) out) -> Tensor(a!)"); m.def("gelu_(Tensor(a!) self, , str approximate='none') -> Tensor(a!)"); m.def("gelu(Tensor self, , str approximate='none') -> Tensor"); }; ``` Differential Revision: D36558459 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78015 Approved by: https://github.com/bdhirsh	2022-06-10 21:04:36 +00:00
Brian Hirsh	67b27a7bae	generate kernels for codegend out= operators Pull Request resolved: https://github.com/pytorch/pytorch/pull/78626 Approved by: https://github.com/ezyang, https://github.com/JacobSzwejbka, https://github.com/larryliu0820	2022-06-06 15:36:28 +00:00
Brian Hirsh	0161e9eb00	[test] attempt to functionalize ops with mutable positional-only args Pull Request resolved: https://github.com/pytorch/pytorch/pull/76320 Approved by: https://github.com/ezyang	2022-05-19 18:50:34 +00:00

46 Commits