Turns out codegen'ing a nested step graph break is significantly more complicated than first thought. The optimized function should actually do:
- call graph/load values/do side effects etc.
- call into the leaf's resume function, but skipped (this essentially step graph break function for just the leaf function)
- call into all the other resume functions, traced.
This PR also adds `torch._dynamo.step_unsupported()`, which can be used for internal testing purposes to better test step graph break handling.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162737
Approved by: https://github.com/Lucaskabela
ghstack dependencies: #160601
This is needed because if we codegen cells for nested frames AFTER side effects, then reconstruction could get messed up. From below:
>The added test case demonstrates the reconstruction failure if we kept cell codegen at the original place (only happens with nested graph breaks since we reconstruct nested frame cells from VariableTracker rather than directly using LOAD_CLOSURE).
>At a high level, what happened before this change was that side_effects was pruning the cells (I don't recall exactly why this happens), and because cells were codegen'd after the side effects were applied, we were unable to properly reconstruct the cell. The error I was seeing was a list/tuple IndexError.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160601
Approved by: https://github.com/mlazos
Fix comments to reflect that we no longer codegen cells to be sent to resume function as inputs - they are instead codegen'd after the unsupported instruction in order to build resume functions that are closures.
Also simplify some codegen.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160138
Approved by: https://github.com/anijain2305
ghstack dependencies: #159329, #159678, #159817
We are refactoring dynamo code for convert frame so that we can have modularized pieces sharable between different compiler frontends (e.g. torch.compile, precompile and torch.export).
This PR adds a new helper function compile_frame() which takes a bytecode and a transform function and return compiled bytecode + output graph as DynamoOutput type.
Differential Revision: [D80430802](https://our.internmc.facebook.com/intern/diff/D80430802/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160855
Approved by: https://github.com/tugsbayasgalan
ghstack dependencies: #160814, #160815
We are refactoring dynamo code for convert frame so that we can have modularized pieces sharable between different compiler frontends (e.g. torch.compile, precompile and torch.export).
This PR follows the last one which separate out the part to run instruction translator on a given frame and return a DynamoTracerOutput.
The end result is a free function that runs instruction translator indepedently. A follow up diff will wrap the low level function.
Differential Revision: [D80388694](https://our.internmc.facebook.com/intern/diff/D80388694/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160815
Approved by: https://github.com/anijain2305
ghstack dependencies: #160814
We are refactoring dynamo code for convert frame so that we can have modularized pieces sharable between different compiler frontends (e.g. torch.compile, precompile and torch.export).
One incremental step we can take is to refactor out InstructionTranslator as a functional piece providing bytecode tracing.
To separate out this part, we notice currently the tracer object is being passed around in the entire convert frame compile function. This is not very ideal because we want to build a boundary between the tracing and downstream compiler stack. Ideally, we should extract all the relevant information out of the tracer object and return a new data structure that is free of internal states of InstructionTranslator.
Luckily, there aren't many data used from tracer, after tracing is finished. The major one is OutputGraph, other than that, we only need to record two boolean flags for error handling purposes.
The new type we're adding is called DynamoTracerOutput, which contains all the information needed by torch.compile internal after symbolic convert is finished. To simplify the current PR, we leave out the part which reduce OutputGraph into a minimal set, since this can be done in a separate PR.
Differential Revision: [D80388693](https://our.internmc.facebook.com/intern/diff/D80388693/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160814
Approved by: https://github.com/tugsbayasgalan
Old: ~pack resume function stack + locals into a list: we need to be able to pass frame stack+locals in lists to hand off to nested functions in the future, so we implement this part first.~
We are no longer doing this right now since GraphModule/guard variable naming gets messed up. Going forward, our approach will be to keep the top frame unpacked, but pack the rest of the contents of other frames in a list.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151056
Approved by: https://github.com/jansel
Old: ~pack resume function stack + locals into a list: we need to be able to pass frame stack+locals in lists to hand off to nested functions in the future, so we implement this part first.~
We are no longer doing this right now since GraphModule/guard variable naming gets messed up. Going forward, our approach will be to keep the top frame unpacked, but pack the rest of the contents of other frames in a list.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151056
Approved by: https://github.com/jansel
Summary:
We previously assign each compiled function variable a name based on in-process global counter. This works fine within the same process but when we're trying to serialize the states with precompile, we need a way to load back these compiled functions without causing collision to the existing global scope.
Changing the counter to a true global uuid seems to resolve this issue.
For example, the new variable name will look like:
```
__compiled_fn_0_7ce7d872_4fe8_4174_b8fd_2496b09b8b43
```
Test Plan: CI
Differential Revision: D75244901
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154148
Approved by: https://github.com/jansel
Also show the line of code relevant to a dynamo-compiled frame, instead of just the first line (this was broken for data-dependent jump graph breaks and for 3.11+).
Also collapses resume frames together (use config.verbose to see full stack trace - for developers).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148401
Approved by: https://github.com/zou3519, https://github.com/jansel
Prior to this patch, we are using `ConstantVariable.create` to create VT
for frozenset objects, and intended yet failed to predicate that on all
itmes being literals (see https://github.com/pytorch/pytorch/pull/140984#discussion_r1847393736).
The code was from https://github.com/pytorch/torchdynamo/commit/7c03434 and
the original goal was to help DBR quantization, but as the new test in
this patch shows, it could lead to silent incorrectness.
Upon a closer look, this exposes some subtleties in how Dynamo handles
`ConstantVariable` and `LOAD_CONST`, so this patch both fixes the
aforementioned issue and documents, enforces, and makes explicit the
invariants around `ConstantVariable` and `LOAD_CONST` -- only immutable
objects are supported.
Specifically, this patch:
1. refine the checks for wrapping a `frozenset` object, document why we
can't just wrap its items directly due to lack of `Sourcec` for set
items, and use a safe workaround (`SourcelessBuilder`) to ensure
soundness while keeping the DBR quantization support.
2. Adds more types to `common_constant_types`, thereby making
`ConstantVariable.is_base_literal` more lenient, and strictly checks
this property in the constructor of `ConstantVariable`.
3. Change relevant uses of `create_instruction("LOAD_CONST", ...)` to
`create_load_const` which checks `is_safe_constant`, and makes
developer overrides explicit by using `create_load_const_unchecked`
when needed.
4. In a few places, use more specific `VariableTracker`, e.g.,
`TypingVariable` rather than `ConstantVariable`, and
`FrozensetVariable` rather than `SetVariable`.
(2) and (3) are mainly to future-proof Dynamo against bugs like (1).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141504
Approved by: https://github.com/jansel
* Automatically applies ruff rule 401. Turns loops into equivalent list comprehensions which are faster and do not leak the scope of the loop variables.
* list comprehensions not only often have better typing, but are 50+% faster than for loops on overhead. They also preserve length information etc and are better for the interpreter to optimize.
* Manually went back and made mypy happy after the change.
* Also fixed style lints in files covered by flake8 but not by pyfmt
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140980
Approved by: https://github.com/justinchuby, https://github.com/malfet