208 Commits

Author SHA1 Message Date
01ec8df6d8 [Compiled Autograd] Introduce BackwardState capture (#120382)
This adds support for backwards hooks that are *both*:
1) Interior to the graph; and
2) Dynamically generated (e.g. lambdas)

We do this by creating a BackwardState object that is used to register the hooks in the forward, then populated by dynamo *after* the forwards runs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120382
Approved by: https://github.com/xmfan
2024-02-28 20:36:47 +00:00
55483fc2c9 Min-cut partitioner always saves tensors that are returned as-is in backward (#114970)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114970
Approved by: https://github.com/Chillee
2024-02-13 00:04:41 +00:00
de6a906093 Expose aggressive_recomputation as an inductor config (#118943)
Summary:
As title.

We found aggressive_recomputation shows memory savings (7% on APS COFFEE model) with 2% QPS loss.

It also gives very promising signal on our auto ac experiments: https://docs.google.com/document/d/1S2qgMg1CwAQ4U1Ffuk2epbEOx06ogZhioX2jKCwL7ZQ/edit

 {F1426175073}

Test Plan:
APS COFFEE from silverlakeli
- Zoom of baseline job: https://www.internalfb.com/intern/zoomer/?profiling_run_fbid=927380488801910&tab=overview
- Zoom of job with aggressive_recomputation: https://www.internalfb.com/intern/zoomer/?profiling_run_fbid=1126815608217470&tab=overview

APS 1100x shrunk version:
- baseline: https://www.internalfb.com/mast/job/aps-yuzhenhuang-afe049505a
- test: https://www.internalfb.com/mast/job/aps-yuzhenhuang-709e41bf0d
Memory from 42.98% -> 41.04%.

Reviewed By: yf225, yuxihu, silverlakeli, richqyz

Differential Revision: D53248057

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118943
Approved by: https://github.com/anijain2305, https://github.com/yanboliang
2024-02-03 00:17:03 +00:00
1562dae62c [BE]: Apply RUF025 dict.fromkeys preview rule (#118637)
Simplifies and optimizes dict construction using the `fromkeys` classmethod ctor. This also makes it really obvious when all the keys will have the same static value, which could be a bug if unintentional. It is also significantly faster than using a dict comprehension. The rule is in preview, but I am adding a forward fix for when it becomes stable.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118637
Approved by: https://github.com/albanD
2024-01-30 20:46:54 +00:00
fe10b1800f LazyGraphModule (#117911)
I feel it's easier to open a new PR rather than iterating on the previous PR (https://github.com/pytorch/pytorch/pull/105257 ) since this is more like a rewrite.

In this PR, instead of changing GraphModule directly which can easily causes BC issue, I create a LazyGraphModule class as Zachary & Jason suggested in comments from the previous PR.

The difference between LazyGraphModule and GraphModule is mainly about how re-compile for the graph module happens. In GraphModule the recompilation happens 'eagerly': constructing a GraphModule will cause the recompilation. While in LazyGraphModule, we just mark the module as needing recompilation. The real recompilation only happens when absolutely required (e.g. call forward method, access the code property etc.). In a lot of cases in torch.compile, the real recompilation eventually is not triggered at all. This can save a few seconds of compilation time.

By default, GraphModule rather than LazyGraphModule is used. `use_lazy_graph_module(True)` context manager can be used to pick LazyGraphModule instead. This has been applied to the torch.compile stack.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117911
Approved by: https://github.com/jansel
2024-01-27 04:10:18 +00:00
9bce208dfb Replace follow_imports = silent with normal (#118414)
This is a lot of files changed! Don't panic! Here's how it works:

* Previously, we set `follow_imports = silent` for our mypy.ini configuration. Per https://mypy.readthedocs.io/en/stable/running_mypy.html#follow-imports, what this does is whenever we have an import to a module which is not listed as a file to be typechecked in mypy, we typecheck it as normal but suppress all errors that occurred in that file.
* When mypy is run inside lintrunner, the list of files is precisely the files covered by the glob in lintrunner.toml, but with files in excludes excluded.
* The top-level directive `# mypy: ignore-errors` instructs mypy to typecheck the file as normal, but ignore all errors.
* Therefore, it should be equivalent to set `follow_imports = normal`, if we put `# mypy: ignore-errors` on all files that were previously excluded from the file list.
* Having done this, we can remove the exclude list from .lintrunner.toml, since excluding a file from typechecking is baked into the files themselves.
* torch/_dynamo and torch/_inductor were previously in the exclude list, because they were covered by MYPYINDUCTOR. It is not OK to mark these as `# mypy: ignore-errors` as this will impede typechecking on the alternate configuration. So they are temporarily being checked twice, but I am suppressing the errors in these files as the configurations are not quite the same. I plan to unify the configurations so this is only a temporary state.
* There were some straggler type errors after these changes somehow, so I fixed them as needed. There weren't that many.

In the future, to start type checking a file, just remove the ignore-errors directive from the top of the file.

The codemod was done with this script authored by GPT-4:

```
import glob

exclude_patterns = [
    ...
]

for pattern in exclude_patterns:
    for filepath in glob.glob(pattern, recursive=True):
        if filepath.endswith('.py'):
            with open(filepath, 'r+') as f:
                content = f.read()
                f.seek(0, 0)
                f.write('# mypy: ignore-errors\n\n' + content)
```

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118414
Approved by: https://github.com/thiagocrepaldi, https://github.com/albanD
2024-01-27 02:44:11 +00:00
670e7992fd [Easy] Document AGGRESSIVE_RECOMPUTATION flag in min-cut partitioner (#114007)
As titled.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114007
Approved by: https://github.com/wanchaol
2024-01-04 05:05:08 +00:00
bd10fea79a [BE]: Enable F821 and fix bugs (#116579)
Fixes #112371

I tried to fix as many of the bugs as I could, a few I could not figure out what the proper fix for them was though and so I left them with noqas.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116579
Approved by: https://github.com/ezyang
2024-01-01 08:40:46 +00:00
ee5d981249 [BE]: Enable RUFF PERF402 and apply fixes (#115505)
* Enable PERF402. Makes code more efficient and succinct by removing useless list copies that could be accomplished either via a list constructor or extend call. All test cases have noqa added since performance is not as sensitive in that folder.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115505
Approved by: https://github.com/malfet
2023-12-20 18:01:24 +00:00
5c3f03e2dd [inductor] add a config to specify the shape attribute for the generated svg graphs (#114811)
We draw our fx graphs with the "record" shape attribute by default.
Sometimes, when the graph is very complex, we may hit dot errors like below:
  "flat edge between adjacent nodes one of which has a record shape -
   replace records with HTML-like labels"
and thus fail to generate a graph. So, let's give the user an option
to specify the shape attribute for the dot graph. For example, passing
INDUCTOR_DOT_GRAPH_SHAPE_SVG = "none" would let us generate HTML-like lables
to workaround the above failure.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114811
Approved by: https://github.com/weifengpy
2023-11-30 06:10:37 +00:00
b7b2178204 [BE]: Remove useless lambdas (#113602)
Applies PLW0108 which removes useless lambda calls in Python, the rule is in preview so it is not ready to be enabled by default just yet. These are the autofixes from the rule.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113602
Approved by: https://github.com/albanD
2023-11-14 20:06:48 +00:00
e6f0960762 [inductor] Make debug.py pass follow-imports typechecking (#113307)
pydot accepts both a str and a list of str for its `prog` parameter.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113307
Approved by: https://github.com/Skylion007
ghstack dependencies: #113304, #113305, #113306
2023-11-09 22:08:17 +00:00
1f3fa13f0a Handle unbacked SymInt sized outputs in AOTAutograd (#113159)
Thanks aakhundov for constructing the test case. This PR was constructed by running the failing test case, and then fixing problems until we got all the way to the end. There are a few distinct fixes:

* AOTAutograd performs equality tests on tensor metadata to determine if a metadata mutation had occurred. If we test i0 vs i1, we should report these are NOT equal, since obviously we have somehow resized the tensor from i0 to i1 (even if, on a particular run, it is possible i0 == i1).
* There's a sketchy fix for `test_aot_autograd_exhaustive_matmul_cpu_float32` where we check if the output shape equals the tangent shape. Unfortunately, the same `definitely_true` treatment does not work here, it still fails on the example. I piled an extra sketchy fix on top of it, where I just try my best to avoid doing the view. Maybe we should have some sort of logging here.
* Partitioner needs to get out a size for unbacked SymInt when partitioning. I just feed it a random heuristic value in this case, similar to how we've been dealing with this in Inductor.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113159
Approved by: https://github.com/aakhundov, https://github.com/bdhirsh
2023-11-08 04:28:38 +00:00
8219bf051b [BE]: Apply RUF015 to torch folder (#113025)
Removes unnecessary allocations of iterators. There is a small chance this may have side effects as the entire iterator is no longer consumed, but this is a way more efficient method for retrieving the first element.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113025
Approved by: https://github.com/ezyang, https://github.com/malfet
2023-11-07 00:48:15 +00:00
66c32d099a Use pytree.arg_tree_leaves everywhere (#112394)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112394
Approved by: https://github.com/lezcano
ghstack dependencies: #112391, #112392, #112393
2023-10-31 15:57:06 +00:00
bbd5b935e4 Use pytree.tree_leaves everywhere (#112324)
This changes all the instances I could find of `tree_flatten(...)[0]` or
`x, _ = tree_flatten` to use `tree_leaves`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112324
Approved by: https://github.com/lezcano
ghstack dependencies: #112327, #112323
2023-10-30 03:39:04 +00:00
47ccf04885 Split SymNode into its own file (#112037)
This PR:

- Moves TrueDiv, LShift, RShift, IsNonOverlappingAndDenseIndicator to `_sympy.functions.py`
- Moves SymNode to `fx.experimental.sym_node`.
  - This file does not have any SymPy dependencies at import time
  - It installs the magic methods in Sym{Bool,Int,Float}.
  - N.b. With this split, we may be able to move Sym{Bool,Int,Float} to this file, and remove quite a few of the hacks around these classes
- Imports `sym_node` in `torch/__init__.py` rather than the whole `symbolic_shapes.py`.
  This breaks the import-time dependency between torch and SymPy

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112037
Approved by: https://github.com/peterbell10
ghstack dependencies: #112035, #112036
2023-10-26 23:32:27 +00:00
6d7744ca46 Fix typo under torch/_functorch directory (#111067)
This PR fixes typo the the of comments and exception messages in files under `torch/_functorch` directory.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111067
Approved by: https://github.com/Skylion007
2023-10-11 23:09:36 +00:00
f767a6c57a Made pattern-matcher diagnostics lazily reported + added TORCH_COMPILE_CPROFILE (#110504)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110504
Approved by: https://github.com/mlazos, https://github.com/eellison
ghstack dependencies: #110501
2023-10-05 15:47:30 +00:00
1e4c0641ce Revert "Made pattern-matcher diagnostics lazily reported + added TORCH_COMPILE_CPROFILE (#110504)"
This reverts commit 9648df1a6af8509ba2f5455a8465e0c67d0dd0c2.

Reverted https://github.com/pytorch/pytorch/pull/110504 on behalf of https://github.com/PaliC due to temporarily will revert as it's causing problems with difftrain import ([comment](https://github.com/pytorch/pytorch/pull/110504#issuecomment-1749132253))
2023-10-05 15:28:23 +00:00
9648df1a6a Made pattern-matcher diagnostics lazily reported + added TORCH_COMPILE_CPROFILE (#110504)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110504
Approved by: https://github.com/mlazos, https://github.com/eellison
ghstack dependencies: #110501
2023-10-05 01:34:57 +00:00
e686341f64 Consider that ops can be fused into cat in the min-cut partitioner (#110501)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110501
Approved by: https://github.com/eellison
2023-10-05 01:34:57 +00:00
772e104dfd [inductor] visualize fused ops in svg graph (#107752)
example usage
* `TORCH_COMPILE_DEBUG=1 INDUCTOR_ORIG_FX_SVG=1 INDUCTOR_POST_FUSION_SVG=1 python trig.py`: show original fx node name, file, and code. see snapshot 2 where we have origin_0, 1, 2
* trig.py can be found in P816304818

Implementation
* keep original fx graph in GraphLowering, ```self.orig_gm: torch.fx.GraphModule = gm.__copy__()```
* draw original fx graph with origins ir_post_fusion ```V.debug.draw_orig_fx_graph(self.orig_gm, self.scheduler.nodes)```. node.meta["buff_meta"] tracks buf_name

<img width="350" alt="Screenshot 2023-08-29 at 12 40 24 PM" src="https://github.com/pytorch/pytorch/assets/134637289/c4e197cb-ab3b-4a09-a584-c1356376accb">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107752
Approved by: https://github.com/mlazos
2023-09-21 08:03:05 +00:00
8b7b824dca [inductor][ac] preserve recompute tags through pattern matching (#107742)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107742
Approved by: https://github.com/eellison
2023-08-25 03:48:26 +00:00
918df10198 [Easy] use dtype.itemsize in partitions (#107749)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107749
Approved by: https://github.com/davidberard98
2023-08-24 16:07:05 +00:00
61fe49b8ed pt2: make aot_eager backend handle basic float8 operations (#107783)
Summary:

Reland of https://github.com/pytorch/pytorch/pull/107642 with a fix for tests on Windows.

Makes aot_eager backend of torch.compile handle basic float8 operations.

This is useful for float8 training UX.

Test Plan:

```
python test/test_quantization.py -k test_pt2_traceable_aot_eager
```

Reviewers:

Subscribers:

Tasks:

Tags:

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107783
Approved by: https://github.com/albanD
2023-08-23 18:10:53 +00:00
5025fb9213 Revert "pt2: make aot_eager backend handle basic float8 operations (#107642)"
This reverts commit 24147a8e1c6855489c1669c612ff5cb1b09a09dd.

Reverted https://github.com/pytorch/pytorch/pull/107642 on behalf of https://github.com/huydhn due to Sorry for reverting this, but it is failing Windows CPU test in trunk. The Windows failures on your PR looks related I think ([comment](https://github.com/pytorch/pytorch/pull/107642#issuecomment-1688999380))
2023-08-22 22:17:36 +00:00
24147a8e1c pt2: make aot_eager backend handle basic float8 operations (#107642)
Summary:

Makes aot_eager backend of torch.compile handle basic float8 operations.

This is useful for float8 training UX.

Test Plan:

```
python test/test_quantization.py -k test_pt2_traceable_aot_eager
```

Reviewers:

Subscribers:

Tasks:

Tags:

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107642
Approved by: https://github.com/albanD
2023-08-22 18:57:14 +00:00
0b11da0ccb [partitioners][ac][dynamic] Fix output signature of fwd with symints (#105771)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105771
Approved by: https://github.com/Chillee
2023-07-22 03:04:11 +00:00
5ab2b27353 Revert "Re-enable low memory dropout (#103330)"
This reverts commit f32593630bceed0eb51656304841d9f5de09ec7c.

Reverted https://github.com/pytorch/pytorch/pull/103330 on behalf of https://github.com/davidberard98 due to large compilation time regression ([comment](https://github.com/pytorch/pytorch/pull/103330#issuecomment-1622304072))
2023-07-05 19:00:40 +00:00
f32593630b Re-enable low memory dropout (#103330)
On attention_is_all_you_need_pytorch:

Perf: 1.526x -> 1.544x
Memory: 1.00 -> 1.05x

Fix for https://github.com/pytorch/pytorch/issues/102319, although I'm not sure all the perf is recovered.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103330
Approved by: https://github.com/jansel
2023-06-29 16:27:02 +00:00
f7fdaf8191 Revert "Re-enable low memory dropout (#103330)"
This reverts commit 2d14395f176b38b8416c2713d285e5ae55695a5f.

Reverted https://github.com/pytorch/pytorch/pull/103330 on behalf of https://github.com/malfet due to Lots of tests failed with 'prims' object has no attribute 'inductor_random' ([comment](https://github.com/pytorch/pytorch/pull/103330#issuecomment-1610691147))
2023-06-28 04:27:37 +00:00
2d14395f17 Re-enable low memory dropout (#103330)
On attention_is_all_you_need_pytorch:

Perf: 1.526x -> 1.544x
Memory: 1.00 -> 1.05x

Fix for https://github.com/pytorch/pytorch/issues/102319, although I'm not sure all the perf is recovered.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103330
Approved by: https://github.com/jansel
2023-06-28 03:13:41 +00:00
75dab587ef [dynamo] FSDP + AC + torch.compile (#103953)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103953
Approved by: https://github.com/wanchaol
2023-06-24 01:40:56 +00:00
f9c64a1156 [debugging] aot_eager backend to use the min-cut partitioner (#103555)
default_partitioner is kind of broken when it comes to memory footprint. Moving aot_eager to use min-cut partitioner is better debugging experience.

One bad thing though would be that we will much lower speedup numbers, because min cut partitioner will try to recompute ops.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103555
Approved by: https://github.com/eellison, https://github.com/jansel
2023-06-22 09:31:08 +00:00
b7ae40f4c8 [min-cut partitioner] Disable a heuristic if graph has recomputable ops (#103635)
Removing this heuristic leads to major memory compression and speedup bump for activation-checkpointed models. Here is the data
![image](https://github.com/pytorch/pytorch/assets/13822661/64a491ab-173d-435a-b858-61b847fbb08b)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103635
Approved by: https://github.com/Chillee
2023-06-21 22:27:17 +00:00
df0505743f [activation checkpointing] Tagging based min cut partitioner (#103357)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103357
Approved by: https://github.com/jansel
2023-06-14 20:15:43 +00:00
9c4fd72b53 [aot_autograd][functional_rng] Change calling convention (#102344)
Key change - seed, offset are the last 2 args in both the fwd and bwd graphs
Reason - The cudagraphs implementation in inductor currently relies on very simple ordering guarantees i.e. first n inputs are static for both fwd and bwd graphs. In the current implementation of functionalization of rng ops, this assumption is broken because the first 2 inputs are seed, offset.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102344
Approved by: https://github.com/eellison
2023-05-26 21:27:20 +00:00
c2093de5d9 [partitioner] fix for rng ops (#102123)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102123
Approved by: https://github.com/Chillee
2023-05-25 00:35:07 +00:00
d5169e7141 Use a stable ordering for saved values in functorch.default_partition (#100111)
Previously, due to the use of the Python set data structure, the ordering of saved values (and how they would appear in the graph) was unstable and changed across runs, making it hard to debug downstream applications. Here we use a dict (with insertion-ordering semantics) to deduplicate values in a way that preserves ordering
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100111
Approved by: https://github.com/Skylion007
2023-05-02 05:14:31 +00:00
fdbc8625a1 Functionalization of torch.rand/rand_like ops (#97377)
This PR introduces the functionalization of RNG ops. Key points are

* Introduces a new `philox_rand` prim operator that accepts seed, offset.
* Adds decompositions for random operators that use these philox_rand prims
* Adds a PhiloxStateTracker to track the offset for each occurence of rand ops
* Changes calling convention of AOT Autograd and adds <fwd_seed, fwd_base_offset> and <bwd_seed, bwd_base_offset>
* Monkeypatches set_rng_state and get_rng_state while AOT Autograd tracing to record the rng state behavior
* Raises assertion for CPU because CPU does not Philox RNG.

Not dealt in this PR
* dropout op - offset calculation is different
* other distributions like normal, poisson etc
* Inductor support
* Cudagraph support
* Dynamic shape support

An example
~~~

class Custom(torch.autograd.Function):
    @staticmethod
    def forward(ctx, x):
        ctx.save_for_backward(x)
        a = torch.rand_like(x) * x
        a = torch.rand_like(x) * a
        return a

    @staticmethod
    def backward(ctx, grad_out):
        x, = ctx.saved_tensors
        return grad_out * torch.rand_like(grad_out) * torch.cos(x)

====== Forward graph 0 ======
def forward(self, fwd_seed_1: i64[], fwd_base_offset_1: i64[], primals_1: f32[16, 16]):
    # No stacktrace found for following nodes
    add: i64[] = torch.ops.aten.add.Tensor(fwd_base_offset_1, 0)
    philox_rand: f32[16, 16] = torch.ops.prims.philox_rand.default([16, 16], fwd_seed_1, add, [16, 1], device(type='cuda', index=0), torch.float32);  add = None
    mul: f32[16, 16] = torch.ops.aten.mul.Tensor(philox_rand, primals_1);  philox_rand = None
    add_1: i64[] = torch.ops.aten.add.Tensor(fwd_base_offset_1, 4);  fwd_base_offset_1 = None
    philox_rand_1: f32[16, 16] = torch.ops.prims.philox_rand.default([16, 16], fwd_seed_1, add_1, [16, 1], device(type='cuda', index=0), torch.float32);  fwd_seed_1 = add_1 = None
    mul_1: f32[16, 16] = torch.ops.aten.mul.Tensor(philox_rand_1, mul);  philox_rand_1 = mul = None
    return [mul_1, primals_1]

====== Backward graph 0 ======
def forward(self, bwd_seed_1: i64[], bwd_base_offset_1: i64[], primals_1: f32[16, 16], tangents_1: f32[16, 16]):
    # No stacktrace found for following nodes
    add_2: i64[] = torch.ops.aten.add.Tensor(bwd_base_offset_1, 0);  bwd_base_offset_1 = None
    philox_rand_2: f32[16, 16] = torch.ops.prims.philox_rand.default([16, 16], bwd_seed_1, add_2, [16, 1], device(type='cuda', index=0), torch.float32);  bwd_seed_1 = add_2 = None
    mul_2: f32[16, 16] = torch.ops.aten.mul.Tensor(tangents_1, philox_rand_2);  tangents_1 = philox_rand_2 = None
    cos: f32[16, 16] = torch.ops.aten.cos.default(primals_1);  primals_1 = None
    mul_3: f32[16, 16] = torch.ops.aten.mul.Tensor(mul_2, cos);  mul_2 = cos = None
    return [mul_3]

~~~

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97377
Approved by: https://github.com/ezyang
2023-04-16 09:55:56 +00:00
039faf0dbf Add invariant that all symbolic shapes must be bound in graph (#99089)
Previously, we had a problem when partitioning forward-backward dynamic graphs, which is that we could end up with a backward graph that mentions a symbol in an input tensor (e.g., `f32[s0 + s1]`), but without this symbol being otherwise bound elsewhere. When this happens, we have no way of actually deriving the values of `s0` and `s1`. Our fix for this in https://github.com/pytorch/pytorch/pull/93059 was to just retrace the graph, so that s0 + s1 got allocated a new symbol s2 and everything was happy. However, this strategy had other problems, namely (1) we lost all information from the previous ShapeEnv, including guards and (2) we end up allocating a LOT of fresh new symbols in backwards.

With this change, we preserve the same ShapeEnv between forward and backwards. How do we do this? We simply require that every symbol which may be present inside tensors, ALSO be a plain SymInt input to the graph. This invariant is enforced by Dynamo. Once we have done this, we can straightforwardly modify the partitioner to preserve these SymInt as saved for backwards, if they are needed in the backwards graph to preserve the invariant as well.

This apparently breaks yolov3, but since everything else is OK I'm merging this as obviously good and investigating later.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99089
Approved by: https://github.com/voznesenskym
2023-04-16 01:48:19 +00:00
597b558c51 [BE]: Update flake8 and plugins and fix bugs (#97795)
Update flake8 and flake8-plugins in lintrunner to a modern version. Enables more checks and makes flake8 checks significantly faster. Added a few additional rule ignores that will need to be fixed in the future.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97795
Approved by: https://github.com/alexsio27444, https://github.com/janeyx99, https://github.com/ezyang
2023-03-28 23:51:55 +00:00
02f6d14b97 Only allow SymInt across partitioner boundaries, and fixes (#96653)
This PR does a few things all at once, as I needed to fix several bugs on the way here.  The main goal of the PR is to fix the  `'float' object has no attribute '_has_symbolic_sizes_strides'` error. The general idea is to heavily penalize non-SymInt but still SymNode cuts in the graph. This doesn't work for default partitioner, so essentially, dynamic shapes with default partitioner is not supported.

While doing this, I had a fix a few other bugs in the partitioner:
* SymNode operations weren't considered recomputable. But they are very cheap, go wild.
* zeros_like wasn't considered recomputable, and this prevented some gradient formulas (e.g., for angle with real inputs) from successfully finding a cut at all
* AOTAutograd tests use the default partitioner. I switch them to use min-cut partitioner...
* ...but this reveals a bug where if we have nodes in backward outputs that don't depend on tangents, they never get assigned to the backward graph. I fix this by making the backward outputs mandatory to be in backwards. I have to be careful to filter out None backward outputs; those never participate in flow analysis!

This causes some wobbling for the min-cut tests, but these seem legitimate: since we're now willing to recompute, the partitioner can reduce the number of SymInts it transmits by just doing some recompute in the backend.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96653
Approved by: https://github.com/ngimel
2023-03-14 18:30:56 +00:00
e9e6b3b6c5 [EASY] Add complex dtypes to partitioner (#96297)
Also, delete some redundant dtype setting.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96297
Approved by: https://github.com/Chillee
2023-03-08 21:08:26 +00:00
67d9790985 [BE] Apply almost all remaining flake8-comprehension checks (#94676)
Applies the remaining flake8-comprehension fixes and checks. This changes replace all remaining unnecessary generator expressions with list/dict/set comprehensions which are more succinct, performant, and better supported by our torch.jit compiler. It also removes useless generators such as 'set(a for a in b)`, resolving it into just the set call.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94676
Approved by: https://github.com/ezyang
2023-02-12 01:01:25 +00:00
3d82d8d0ed [BE] Enable more flake8-comprehensions checks (#94601)
I applied some flake8 fixes and enabled checking for them in the linter. I also enabled some checks for my previous comprehensions PR.

This is a follow up to #94323 where I enable the flake8 checkers for the fixes I made and fix a few more of them.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94601
Approved by: https://github.com/ezyang
2023-02-10 23:40:29 +00:00
dc70b00d0b Track and record hint on SymNode and use when possible (#94201)
Historically, we work out `size_hint` by working it out on the fly by doing a substitution on the sympy expression with the `var_to_val` mapping. With this change, we also maintain the hint directly on SymNode (in `expr._hint`) and use it in lieu of Sympy substitution when it is available (mostly guards on SymInt, etc; in particular, in idiomatic Inductor code, we typically manipulate Sympy expressions directly and so do not have a way to conveniently maintain hints.)

While it's possible this will give us modest performance improvements, this is not the point of this PR; the goal is to make it easier to carefully handle unbacked SymInts, where hints are expected not to be available. You can now easily test if a SymInt is backed or not by checking `symint.node.hint is None`.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94201
Approved by: https://github.com/voznesenskym
2023-02-09 00:00:44 +00:00
8fce9a09cd [BE]: pyupgrade Python to 3.8 - imports and object inheritance only (#94308)
Apply parts of pyupgrade to torch (starting with the safest changes).
This PR only does two things: removes the need to inherit from object and removes unused future imports.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-02-07 21:10:56 +00:00
d6c3468f70 Don't allow recomputing a node that *must* be materialized in the backwards pass (#90896)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90896
Approved by: https://github.com/ezyang
2023-01-20 22:34:41 +00:00