This fixes most of https://github.com/huggingface/diffusers/issues/10795,
except for `torch.Tensor._make_subclass`, which will be fixed in a
subsequent patch.
The relevant tensor subclass from the aforementioned issue is defined
here: fbf6b856cc/src/diffusers/quantizers/gguf/utils.py (L398-L435).
There are two things to note about the tensor subclass:
1. it calls `super().__torch_function__`, which is
`torch._C._disabled_torch_function_impl`, so this patch updates
`SuperVariable.call_method` to handle it (we can't do a simpler
polyfill due to some bug with `var_getattr` raising
`NotImplementedError`, which forgot to restore symbolic context).
2. it sets and reads attributes (`quant_type`), and
defines new methods (`as_data`), so this patch adds support for those.
3. it has a `__init__`, which Dynamo needs to trace through in
`TensorSubclassVariable.call_function`.
Differential Revision: [D71906140](https://our.internmc.facebook.com/intern/diff/D71906140)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149482
Approved by: https://github.com/jansel, https://github.com/mlazos
This fixes most of https://github.com/huggingface/diffusers/issues/10795,
except for `torch.Tensor._make_subclass`, which will be fixed in a
subsequent patch.
The relevant tensor subclass from the aforementioned issue is defined
here: fbf6b856cc/src/diffusers/quantizers/gguf/utils.py (L398-L435).
There are two things to note about the tensor subclass:
1. it calls `super().__torch_function__`, which is
`torch._C._disabled_torch_function_impl`, so this patch updates
`SuperVariable.call_method` to handle it (we can't do a simpler
polyfill due to some bug with `var_getattr` raising
`NotImplementedError`, which forgot to restore symbolic context).
2. it sets and reads attributes (`quant_type`), and
defines new methods (`as_data`), so this patch adds support for those.
3. it has a `__init__`, which Dynamo needs to trace through in
`TensorSubclassVariable.call_function`.
Differential Revision: [D71906140](https://our.internmc.facebook.com/intern/diff/D71906140)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149482
Approved by: https://github.com/jansel, https://github.com/mlazos
This is for "for some large number Z, make sure the error messages are readable English." - beginning to audit all `unimplemented` sites and making sure that all messages are at least English-readable. Hints may not necessarily be provided.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147385
Approved by: https://github.com/jansel
This PR adds support for list subclasses. Among other things are
1) Tracking the mutations on internal vts like `_dict_vt` and `_list_vt` using sources. This helps identify if there was a mutation in the underlying data structures, and we need to reconstruct it.
2) `UserDefinedObjectVariable` now has a new method - `is_modified` which `side_effect` infra relies upon to check mutations in the underlying vts (like `_dict_vt`).
3) `reconstruction` logic ensures that we use `dict.__getitem__` and `list.__getitem__` methods. This is super important because we don't want to call the overridden `__getitem__` methods.
If this PR is hard to review, please let me know. I can break it into several small PRs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146819
Approved by: https://github.com/StrongerXi, https://github.com/jansel
Previously we never replayed side effects to `DequeVariable` with a
source; the bug was already in the `test_deque_input` test, but went
unnoticed because we didn't check the deque objects.
This patch adds limited but practical support for this (see comments in
`side_effects.py` for why limited), and updates the deque tests to check
for this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141714
Approved by: https://github.com/jansel
ghstack dependencies: #141713
This patch
1. removes `AutoDerefLocalSource` in favor of `LocalSource`, thereby
removing its special handling in guards.
2. introduces a `LocalCellSource` for cells from the root frame, with
only `reconstruct` implemented, to programmatically enforce that thse
cells should never be used by other components like guards.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141629
Approved by: https://github.com/jansel
ghstack dependencies: #141628
In addition to `NewCellVariable`, Dynamo has 3 ways of modeling cell objects:
1. For cells captured and created by the root frame, represent them as
their contents in `root_tx.symbolic_locals`, which `LOAD_DEREF` and
`STORE_DEREF` update directly, without going through `SideEffects`.
2. `ClosureVariable`: this is created when cells from (1) are captured
by a newly created function Dynamo is about to inline. It's a handle
with a name that redirects `LOAD_DEREF` and `STORE_DEREF` back (1),
to make `root_tx.symbolic_locals` up-to-date.
3. For cells that are captured by both the root frame and some
pre-existing function Dynamo is about to inline, represent those
cells as contents, and do not allow writes to them.
Note that (2) and (3) are mainly to conform with (1) -- to make sure
Dynamo has a consistent modeling of cells for the same cell objects.
In this patch, we represent all of these cells as `NewCellVariable`. The
main new code paths introduced are:
- using `NewCellVariable` to model cell objects created by the root
frame (the cells are passed in as input to `InstructionTranslator`),
this is what allows us to get rid of all 3 legacy paths above.
- adding a new `AutoDerefLocalSource` to deal with the python-code
level (guards) and bytecode level (codegen) auto-dereferencing
behavior, when accessing pre-existing python cells. This also
involves a tiny update to guard manager generation.
- plumbing some extra info into `LocalSource` and `CellVariable` so that
we can still emit `LOAD_DEREF`, `STORE_DEREF`, `LOAD_CLOSURE` (instead
of `make_cell`, `cell_contents` attribute access, and `LOAD_FAST`),
which is important for readability, performance, and some
assumptions `bytecode_transformation.py` makes.
As a result, this patch removes a lot of the now-dead code paths and
TODOs. Notably, it significantly simplified the `prune_dead_locals`
function, which was duplicating a lot of the logic from
`prune_dead_object_new`; this conveniently closes#137123.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140153
Approved by: https://github.com/jansel
ghstack dependencies: #140330, #140152, #140436, #140435
This patch fixes 2 things which are exposed if we have `NewCellVariable`
rather than `ClosureVariable` to model python cells:
1. `codegen_save_tempvars` must run first, to establish `source` for
objects, otherwise they can't reconstruct.
2. `prune_dead_object_new` must account for `OutputGraph.backward_state`
as well, since it also contains variables that must live.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140201
Approved by: https://github.com/jansel
ghstack dependencies: #140035, #140036, #140149, #140150, #140151
In `UserFunctionVariable.bind_args`, there's a rare case when the
underlying function satisfies all conditions below
1. The function captures a pre-existing cell
2. The cell isn't captured by root frame
3. `UserFunctionVariable.source` is `None`
In such cases, Dynamo would model the cell as its content (just like
what we do for cells in the root frame). However, this could break in
two cases:
- We could have multiple instances of `UserFunctionVariable`, where some
have source and others don't. This means sometimes we'll model the
cell as a `NewCellVariable`, and sometimes as its content. This
causes issues because writes to the `NewCellVariable` would be
buffered in `SideEffects` and never get picked up by the other
modeling.
- Only when `UserFunctionVariable` has a source, do we check whether we
already had a `NewCellVariable` for the captured cell. This again causes
Dynamo to potentially have multiple representations for the same cell
object, resulting in a similar "buffered writes not reflected" issue
as above.
This patch fixes the above 2 issues by
1. modeling captured cells of sourceless `UserFunctionVariable` as
immutable `NewCellVariable`, and adds a few lines in `SideEffects` to
account for its immutability.
2. always checking whether we already had a `NewCellVariable` for the
captured cell, before constructing a new one.
Tests are added for each aforementioned case.
I also left a TODO to investigate why exactly we would lose source
information for `UserFunctionVariable`. Some cases are easily fixable,
but others not so much.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140150
Approved by: https://github.com/jansel
ghstack dependencies: #140035, #140036, #140149
We added an unboxing optimization to avoid writes to cells that existed
before Dynamo tracing (such writes interfere with HOPs). However, the
avoided write shouldn't be there in the first place, since we were
basically creating an empty `NewCellVariable`, and then write the
pre-existing content into the variable.
This patch
1. adds logic to bypass the initial write for pre-existing cells
without undermining correctness.
2. removes the unboxing optimization and the restart code path.
Fixes#137456, #138491; also see those issues for more historical
context.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140149
Approved by: https://github.com/ezyang, https://github.com/jansel
ghstack dependencies: #140035, #140036
This patch
1. Adds documentation to `PyCodegen.__call__`, `PyCodegen.tempvars` and
the `allow_cache` flag.
2. Merges a few existing code paths in `PyCodegen.__call__`.
3. removes the `elif var in cg.tempvars` code path in
`codegen_save_tempvars`, because it's no longer needed after #113725,
as we have up-to-date `VariableTracker.source` now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139670
Approved by: https://github.com/jansel
ghstack dependencies: #139538
This patch adds 2 simple methods `VariableTracker.is_mutable()` and
`VariableTracker.is_immutable()`, which helps clarify intention. For
instance, rather than writing
```python
if var.mutation_type:
...
```
After this patch one can write
```python
if var.is_mutable():
...
```
This patch also simplifies `mutation_type` propagation in some
`ListVariable` methods.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139341
Approved by: https://github.com/mlazos, https://github.com/anijain2305
ghstack dependencies: #139339, #139340
This patch addresses the renaming part of #133027, specifically, it
renames the following and adds documentation for relevant classes.
1. `VariableTracker.mutable_local` to `mutation_type`
2. `MatableLocal `to `ValueMutationNew`
3. `MutableSideEffects `to `ValueMutationExisting`
4. `MutableLocalSource` to `SourceType`
5. `MutableLocalSource.Local` to `New`
Note that (2), (3) and (5) are mainly to bring consistency between them
and `AttributeMutationNew`, `AttributeMutationExisting`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139339
Approved by: https://github.com/jansel, https://github.com/mlazos, https://github.com/anijain2305
This patch addresses parts of the side-effect refactor proposed in #133027;
specifically, it does 3 things:
1. Change `SideEffects.store_attr_mutations` and `PyCodegen.tempvars`
to index on `VariableTracker` rather than `MutableLocalBase`.
2. Remove the `source` field from `MutableSideEffects` and
`AttributeMutation`, and use `VariableTracker.source` instead.
3. Plumb a `overridden_sources: Dict[Source, Source]` from
`handle_aliases_for_stolen_lists` to `PyCodegen` so that we don't
update `VariableTracker.source` in place, while still preserving what
`handle_aliases_for_stolen_lists` needed (i.e., modifying codegen for
certain `VariableTracker`).
(1) and (2) are merged in 1 patch because of some dependency between
a. `OutputGraph.handle_aliases_for_stolen_lists` which iterates over
`sideSideEffects.store_attr_mutations.keys()`, and potentially update
its source field to be completely different.
b. `SideEffects.codegen_update_mutated`, which happens after the above
and uses `cg(var.mutable_local.source)`.
where if we apply (1) only, (b) breaks, and if we apply (2) only, (a)
breaks.
(3) is needed for correctness, see comments in the PR for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137905
Approved by: https://github.com/jansel, https://github.com/anijain2305, https://github.com/mlazos
This PR implements tracing of with contexts with TorchFunction modes which have the default enter/exit behavior (ie pushing/popping the mode)
Typically the bytecode for a context manager looks like this during a graph break:
1. graph call
2. enter context
3. unsupported code
4. exit context
5. resume call
resume fn structure:
1. enter context
2. jump
...
3. exit context
The issue with torch function modes is that side effects will replay any mutations to the torch function stack performed during tracing. So, we do not need to enter and exit around the unsupported code in the original function (doing so would result in a duplicate torch function mode entry during execution of the unsupported code), and we don't need to enter again in the resume function (the mode that was pushed from the side effects bytecode would still be on the stack).
So for torch function modes the structure of our output code is this:
1. graph call
2. mutate tf mode stack to replay mutations
4. unsupported code
5. on exception restore stack
6. resume function
Then our resume fn looks like this:
1. no-op enter torch function mode
2. jump
3. exit tf mode
To implement the no-op enter of the torch function mode I added torch function mode in polyfill which no-op enters, but normally exits. This is needed because we still want to trace the with context in the resume function, and exit properly (the exit instructions will still be in the function, so we need to generate instructions to set up the context).
Separately from the bytecode, dynamo also tracks contexts on the block stack, which is how the SETUP_* instructions are implemented. Naturally at a graph break, we exit these block stacks to properly reset the contexts entirely, so that we can re-enter around the unsupported code soundly. However once again, in the torch function mode case, in the event of a graph we do not want to perform any exit side effects because we want to preserve the state of the mode stack as is so that we will properly update the stack with bytecode mentioned in the first section. If we exited here, dynamo would pop the mode off of the symbolic stack, and not update the true python torch function mode stack with the suffix bytecode. All in all, for torch function modes we enter exactly once, update the global torch function mode stack with side effects bytecode, re-read this stack when compiling the resume function, and exit exactly once in the resume function. This matches the semantics of eager exactly.
Approved by: https://github.com/williamwen42
ghstack dependencies: #134732, #133137, #135443, #135444
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137114
Approved by: https://github.com/yanboliang
This PR implements tracing of with contexts with TorchFunction modes which have the default enter/exit behavior (ie pushing/popping the mode)
Typically the bytecode for a context manager looks like this during a graph break:
1. graph call
2. enter context
3. unsupported code
4. exit context
5. resume call
resume fn structure:
1. enter context
2. jump
...
3. exit context
The issue with torch function modes is that side effects will replay any mutations to the torch function stack performed during tracing. So, we do not need to enter and exit around the unsupported code in the original function (doing so would result in a duplicate torch function mode entry during execution of the unsupported code), and we don't need to enter again in the resume function (the mode that was pushed from the side effects bytecode would still be on the stack).
So for torch function modes the structure of our output code is this:
1. graph call
2. mutate tf mode stack to replay mutations
4. unsupported code
5. on exception restore stack
6. resume function
Then our resume fn looks like this:
1. no-op enter torch function mode
2. jump
3. exit tf mode
To implement the no-op enter of the torch function mode I added torch function mode in polyfill which no-op enters, but normally exits. This is needed because we still want to trace the with context in the resume function, and exit properly (the exit instructions will still be in the function, so we need to generate instructions to set up the context).
Separately from the bytecode, dynamo also tracks contexts on the block stack, which is how the SETUP_* instructions are implemented. Naturally at a graph break, we exit these block stacks to properly reset the contexts entirely, so that we can re-enter around the unsupported code soundly. However once again, in the torch function mode case, in the event of a graph we do not want to perform any exit side effects because we want to preserve the state of the mode stack as is so that we will properly update the stack with bytecode mentioned in the first section. If we exited here, dynamo would pop the mode off of the symbolic stack, and not update the true python torch function mode stack with the suffix bytecode. All in all, for torch function modes we enter exactly once, update the global torch function mode stack with side effects bytecode, re-read this stack when compiling the resume function, and exit exactly once in the resume function. This matches the semantics of eager exactly.
Approved by: https://github.com/williamwen42
ghstack dependencies: #134732, #133137, #135443, #135444
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137114
Approved by: https://github.com/yanboliang
This reverts commit 7743149b2be4a9eba7e0997ccdc6abe552bec266.
Reverts
* https://github.com/pytorch/pytorch/pull/135503
* https://github.com/pytorch/pytorch/pull/135502
* https://github.com/pytorch/pytorch/pull/135422
This passes this test. Earlier, the getitem would stay like a getitem in the Fx graph. But now the fake tensor propagations fails saying that .item is called. It seems that torch function is not getting triggered while fake tensor propagation.
```
import torch
from torch.nn.attention.flex_attention import BlockMask, _mask_mod_signature, _score_mod_signature, flex_attention
from torch._inductor.lowering import make_pointwise, register_lowering
from torch._inductor.virtualized import ops
from torch.nn.attention.flex_attention import create_block_mask
torch.set_default_device('cuda')
flex_attention = torch.compile(flex_attention, dynamic=False)
prefix_lengths = torch.arange(8)
def prefix_lm(b, h, q, kv):
return prefix_lengths[b] >= kv
mask = create_block_mask(prefix_lm, 8, None, 512, 512, _compile=True)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136590
Approved by: https://github.com/Chillee
PR changes how `reconstruct` is done for a ConstDict. As of today, it works as follow:
(1) codegen(...) each pair of key/value
(2) create a new dictionary to hold the new items
(3) clear the original dictionary
(4) update the original dict with the one created in (2)
We do a micro optimization in the generated bytecode to:
- Only codegen the items that changed.
- Only clear the original dictionary if a key was removed.
Fixes: #133487
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134876
Approved by: https://github.com/zou3519
> Ignore FSDP2 forward hook side-effects in AC
Under AC, FSDP2 does not rely on forward hook to all-gather weights to do recomputation, instead it relies on pre-backward hook to do this job:
451eaf0ff2/torch/distributed/_composable/fsdp/_fsdp_state.py (L219-L220)
So when we use `speculate_subgraph` to trace the utils.checkpoint AC region, we don't actually need to worry about FSDP2 forward hook's side effects and can safely ignore it, because we are not and we don't expect to re-run the FSDP2 forward hook during backward recomputation.
----
Test commands:
- `pytest -rA test/distributed/_composable/fsdp/test_fully_shard_compile.py::TestFullyShardCompile::test_nested_fully_shard_backend_inductor`
- `pytest -rA test/distributed/_composable/fsdp/test_fully_shard_compile.py::TestFullyShardCompile::test_transformer_backend_inductor`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134997
Approved by: https://github.com/zou3519
ghstack dependencies: #135727
PR changes how `reconstruct` is done for a ConstDict. As of today, it works as follow:
(1) codegen(...) each pair of key/value
(2) create a new dictionary to hold the new items
(3) clear the original dictionary
(4) update the original dict with the one created in (2)
We do a micro optimization in the generated bytecode to:
- Only codegen the items that changed.
- Only clear the original dictionary if a key was removed.
Fixes: #133487
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134876
Approved by: https://github.com/zou3519
This PR implements tracing of with contexts with TorchFunction modes which have the default enter/exit behavior (ie pushing/popping the mode)
Typically the bytecode for a context manager looks like this during a graph break:
1. graph call
2. enter context
3. unsupported code
4. exit context
5. resume call
resume fn structure:
1. enter context
2. jump
...
3. exit context
The issue with torch function modes is that side effects will replay any mutations to the torch function stack performed during tracing. So, we do not need to enter and exit around the unsupported code in the original function (doing so would result in a duplicate torch function mode entry during execution of the unsupported code), and we don't need to enter again in the resume function (the mode that was pushed from the side effects bytecode would still be on the stack).
So for torch function modes the structure of our output code is this:
1. graph call
2. mutate tf mode stack to replay mutations
4. unsupported code
5. on exception restore stack
6. resume function
Then our resume fn looks like this:
1. no-op enter torch function mode
2. jump
3. exit tf mode
To implement the no-op enter of the torch function mode I added torch function mode in polyfill which no-op enters, but normally exits. This is needed because we still want to trace the with context in the resume function, and exit properly (the exit instructions will still be in the function, so we need to generate instructions to set up the context).
Separately from the bytecode, dynamo also tracks contexts on the block stack, which is how the SETUP_* instructions are implemented. Naturally at a graph break, we exit these block stacks to properly reset the contexts entirely, so that we can re-enter around the unsupported code soundly. However once again, in the torch function mode case, in the event of a graph we do not want to perform any exit side effects because we want to preserve the state of the mode stack as is so that we will properly update the stack with bytecode mentioned in the first section. If we exited here, dynamo would pop the mode off of the symbolic stack, and not update the true python torch function mode stack with the suffix bytecode. All in all, for torch function modes we enter exactly once, update the global torch function mode stack with side effects bytecode, re-read this stack when compiling the resume function, and exit exactly once in the resume function. This matches the semantics of eager exactly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135422
Approved by: https://github.com/williamwen42
ghstack dependencies: #134732, #133137, #135443, #135444