Fix comments to reflect that we no longer codegen cells to be sent to resume function as inputs - they are instead codegen'd after the unsupported instruction in order to build resume functions that are closures.
Also simplify some codegen.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160138
Approved by: https://github.com/anijain2305
ghstack dependencies: #159329, #159678, #159817
We are refactoring dynamo code for convert frame so that we can have modularized pieces sharable between different compiler frontends (e.g. torch.compile, precompile and torch.export).
This PR adds a new helper function compile_frame() which takes a bytecode and a transform function and return compiled bytecode + output graph as DynamoOutput type.
Differential Revision: [D80430802](https://our.internmc.facebook.com/intern/diff/D80430802/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160855
Approved by: https://github.com/tugsbayasgalan
ghstack dependencies: #160814, #160815
We are refactoring dynamo code for convert frame so that we can have modularized pieces sharable between different compiler frontends (e.g. torch.compile, precompile and torch.export).
This PR follows the last one which separate out the part to run instruction translator on a given frame and return a DynamoTracerOutput.
The end result is a free function that runs instruction translator indepedently. A follow up diff will wrap the low level function.
Differential Revision: [D80388694](https://our.internmc.facebook.com/intern/diff/D80388694/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160815
Approved by: https://github.com/anijain2305
ghstack dependencies: #160814
We are refactoring dynamo code for convert frame so that we can have modularized pieces sharable between different compiler frontends (e.g. torch.compile, precompile and torch.export).
One incremental step we can take is to refactor out InstructionTranslator as a functional piece providing bytecode tracing.
To separate out this part, we notice currently the tracer object is being passed around in the entire convert frame compile function. This is not very ideal because we want to build a boundary between the tracing and downstream compiler stack. Ideally, we should extract all the relevant information out of the tracer object and return a new data structure that is free of internal states of InstructionTranslator.
Luckily, there aren't many data used from tracer, after tracing is finished. The major one is OutputGraph, other than that, we only need to record two boolean flags for error handling purposes.
The new type we're adding is called DynamoTracerOutput, which contains all the information needed by torch.compile internal after symbolic convert is finished. To simplify the current PR, we leave out the part which reduce OutputGraph into a minimal set, since this can be done in a separate PR.
Differential Revision: [D80388693](https://our.internmc.facebook.com/intern/diff/D80388693/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160814
Approved by: https://github.com/tugsbayasgalan
Old: ~pack resume function stack + locals into a list: we need to be able to pass frame stack+locals in lists to hand off to nested functions in the future, so we implement this part first.~
We are no longer doing this right now since GraphModule/guard variable naming gets messed up. Going forward, our approach will be to keep the top frame unpacked, but pack the rest of the contents of other frames in a list.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151056
Approved by: https://github.com/jansel
Old: ~pack resume function stack + locals into a list: we need to be able to pass frame stack+locals in lists to hand off to nested functions in the future, so we implement this part first.~
We are no longer doing this right now since GraphModule/guard variable naming gets messed up. Going forward, our approach will be to keep the top frame unpacked, but pack the rest of the contents of other frames in a list.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151056
Approved by: https://github.com/jansel
Summary:
We previously assign each compiled function variable a name based on in-process global counter. This works fine within the same process but when we're trying to serialize the states with precompile, we need a way to load back these compiled functions without causing collision to the existing global scope.
Changing the counter to a true global uuid seems to resolve this issue.
For example, the new variable name will look like:
```
__compiled_fn_0_7ce7d872_4fe8_4174_b8fd_2496b09b8b43
```
Test Plan: CI
Differential Revision: D75244901
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154148
Approved by: https://github.com/jansel
Also show the line of code relevant to a dynamo-compiled frame, instead of just the first line (this was broken for data-dependent jump graph breaks and for 3.11+).
Also collapses resume frames together (use config.verbose to see full stack trace - for developers).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148401
Approved by: https://github.com/zou3519, https://github.com/jansel
Prior to this patch, we are using `ConstantVariable.create` to create VT
for frozenset objects, and intended yet failed to predicate that on all
itmes being literals (see https://github.com/pytorch/pytorch/pull/140984#discussion_r1847393736).
The code was from https://github.com/pytorch/torchdynamo/commit/7c03434 and
the original goal was to help DBR quantization, but as the new test in
this patch shows, it could lead to silent incorrectness.
Upon a closer look, this exposes some subtleties in how Dynamo handles
`ConstantVariable` and `LOAD_CONST`, so this patch both fixes the
aforementioned issue and documents, enforces, and makes explicit the
invariants around `ConstantVariable` and `LOAD_CONST` -- only immutable
objects are supported.
Specifically, this patch:
1. refine the checks for wrapping a `frozenset` object, document why we
can't just wrap its items directly due to lack of `Sourcec` for set
items, and use a safe workaround (`SourcelessBuilder`) to ensure
soundness while keeping the DBR quantization support.
2. Adds more types to `common_constant_types`, thereby making
`ConstantVariable.is_base_literal` more lenient, and strictly checks
this property in the constructor of `ConstantVariable`.
3. Change relevant uses of `create_instruction("LOAD_CONST", ...)` to
`create_load_const` which checks `is_safe_constant`, and makes
developer overrides explicit by using `create_load_const_unchecked`
when needed.
4. In a few places, use more specific `VariableTracker`, e.g.,
`TypingVariable` rather than `ConstantVariable`, and
`FrozensetVariable` rather than `SetVariable`.
(2) and (3) are mainly to future-proof Dynamo against bugs like (1).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141504
Approved by: https://github.com/jansel
* Automatically applies ruff rule 401. Turns loops into equivalent list comprehensions which are faster and do not leak the scope of the loop variables.
* list comprehensions not only often have better typing, but are 50+% faster than for loops on overhead. They also preserve length information etc and are better for the interpreter to optimize.
* Manually went back and made mypy happy after the change.
* Also fixed style lints in files covered by flake8 but not by pyfmt
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140980
Approved by: https://github.com/justinchuby, https://github.com/malfet
Significant bytecode generation API change!
The new suggested convention to generating bytecode to call a function is now to wrap instructions that push a callable to the stack with `add_push_null`, then that callable is called with `create_call_function` with `push_null=False` (see diff for examples).
In Python 3.13, NULL is now expected to be pushed after the callable. In <=3.12, the NULL was pushed before the callable. This change abstracts away the exact placement of the NULL, but the developer must be aware that a NULL may be needed when codegen'ing a callable.
This abstraction also reduces the need for the `push_null=True` option in `create_call_function`, which removes the need to rotate a NULL to the right place on the stack with a sequence of `SWAP` instructions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129172
Approved by: https://github.com/jansel
This will be helpful in reducing some of the hardcoded and python-version-dependent bytecode generation in various places in dynamo - e.g. resume function generation and object reconstruction.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127359
Approved by: https://github.com/jansel
ghstack dependencies: #127329