Fixes#130559
* Intro
This PR adds support for `@contextmanager` in Dynamo. We chose to limit the
scope of this work to only `@contextmanager` and plan to handle generators fully
in #141055 (still in draft).
* Motivation
Dynamo lacks support for generator functions. When it encounters one, it traces
it as if it were a regular function. This is problematic because it can lead to
incorrect behavior. To illustrate, consider the test case below:
```python
import torch
import contextlib
@contextlib.contextmanager
def set_default_dtype(dtype):
old_dtype = torch.get_default_dtype()
try:
torch.set_default_dtype(dtype)
yield
finally:
torch.set_default_dtype(old_dtype)
@torch.compile(backend="eager", fullgraph=True)
def fn():
with set_default_dtype(torch.float64):
x = torch.tensor([3.0, 3.0 + 5.0j])
return x
```
Before this work, Dynamo would not stop at the `yield`, and the graph produced
would contain both calls to `set_default_dtype` executed one after the other.
This is incorrect because the context manager should execute code before and
after the `yield`.
* List of changes
`YIELD_VALUE` now raises an exception (`YieldValueOp`) to signal that control
flow must be suspended and returned to the caller. Additionally, `RETURN_VALUE`
behaves differently in a generator function. Unlike regular functions, where
`RETURN_VALUE` indicates the final result, in generators it signifies that the
generator is exhausted and implicitly raises `StopIteration`.
A new `VariableTracker` named `FunctionDecoratedByContextlibContextManagerVariable`
was introduced to handle `@contextmanager`. This variable tracker acts not just
as a wrapper for the original function but also maintains an internal `tx`
(InstructionTranslator) object to suspend and return control flow to the parent
tracer when a `yield` is encountered.
* Corner cases
Returning a context manager from a compiled function is not supported. This
would require PyTorch to synchronize the generator state between Dynamo and the
interpreter. Any attempt to return it will result in an `IncorrectUsage`
exception.
Graph breaks require special handling as well. In the event of a graph break,
the frame associated with the context manager is skipped, and the context
manager runs in eager mode.
* This PR is breaking my code
There is a configuration flag (`enable_trace_contextlib`) that can be set to
`False` to disable tracing context managers. If this still causes crashes,
please revert this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136033
Approved by: https://github.com/zou3519
Fixes a bunch of benchmarks that failed with cudagraph errors including `tlp python benchmarks/dynamo/timm_models.py --device cuda --inductor --accuracy --amp --training --only resmlp_12_224` when `specialize_float=False`
Also brings down number of overall failures (with keep-going) from 108 => 62. I'd estimate >80% of those 62 are wobbly expect tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140346
Approved by: https://github.com/ezyang
ghstack dependencies: #140983, #141003
This PR implements an opt-in configuration option for synchronizing compilation across all ranks at the end of Dynamo tracing (and potentially, other places in the future). There are two pieces to this PR:
1. Implementing infrastructure for compiler collectives (DistributedState/LocalState, the actual collective)
2. Using this infrastructure to synchronize automatic dynamic choices across all ranks
The infrastructure in part one can be used for other purposes, just add more (serializable) fields to LocalState.
Here is how automatic dynamic synchronization works:
1. Preflight in "torch/_dynamo/variables/builder.py": On the first Dynamo trace run, we trace without automatic dynamic at all; we assume all Tensor inputs that are not otherwise marked are static. This run is purely to collect all Tensor input sizes in the program.
2. torch/_dynamo/output_graph.py: At the end of the first Dynamo trace run, we perform a compiler collective to distribute all Tensor input sizes to all ranks. Then, we restart Dynamo
3. Apply the updates in "torch/_dynamo/variables/builder.py": Now that we have all sizes for every rank, we now update frame state with the observed sizes for all ranks, in rank order. Under the assumption that frame state is consistent on all ranks, this series of updates will preserve consistency.
For future work, it would be safer if we force a consistent hint on all ranks; this is more involved as we have to interpose in fakification.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130935
Approved by: https://github.com/jansel
Summary:
- Log export errors to Scuba and mark them with "classified" and "unclassified"
- Classify errors by exception type (ALLOW_LIST) and a `case_name` attribute
- Add `case_name` for some exceptions.
Test Plan:
Running the code below logs a classified error to `torch_export_usage` table in Scuba.
```
import torch
from torch._export.db.case import SupportLevel
class TorchSymMin(torch.nn.Module):
"""
torch.sym_min operator is not supported in export.
"""
def forward(self, x):
return x.sum() + torch.sym_min(x.size(0), 100)
example_args = (torch.randn(3, 2),)
tags = {"torch.operator"}
support_level = SupportLevel.NOT_SUPPORTED_YET
model = TorchSymMin()
torch.export.export(model, example_args)
``
Differential Revision: D59981459
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131327
Approved by: https://github.com/zhxchen17
`unimplemented` is a function that raises an error, so
`raise unimplemented(...)` never reaches the `raise`.
Another related issue is that `raise unimplemented(...) from e`
doesn't attach the exception cause correctly. I fix this by adding
a `from_exc` argument to `unimplemented`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122136
Approved by: https://github.com/lezcano
Attempts to make the input/output mismatch error better by first checking if the inputs/outputs are able to be pytree flattened into supporting types (tensors, symints, ...). So if user passes in some datastructure which does not have a pytree flatten registration, this will error with the message "It looks like one of the inputs is with type CustomType is not supported or pytree flatten-able.... please register a pytree flatten/unflatten function using the pytree.register_pytree_node API".
The check inside of produce_matching should now only error if something unexpected happens (dynamo accidentally adds an input or removes an output), and should be considered an internal error.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117598
Approved by: https://github.com/avikchaudhuri, https://github.com/BowenBao
Attempts to make the input/output mismatch error better by first checking if the inputs/outputs are able to be pytree flattened into supporting types (tensors, symints, ...). So if user passes in some datastructure which does not have a pytree flatten registration, this will error with the message "It looks like one of the inputs is with type CustomType is not supported or pytree flatten-able.... please register a pytree flatten/unflatten function using the pytree.register_pytree_node API".
The check inside of produce_matching should now only error if something unexpected happens (dynamo accidentally adds an input or removes an output), and should be considered an internal error.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117598
Approved by: https://github.com/avikchaudhuri, https://github.com/BowenBao
Attempts to make the input/output mismatch error better by first checking if the inputs/outputs are able to be pytree flattened into supporting types (tensors, symints, ...). So if user passes in some datastructure which does not have a pytree flatten registration, this will error with the message "It looks like one of the inputs is with type CustomType is not supported or pytree flatten-able.... please register a pytree flatten/unflatten function using the pytree.register_pytree_node API".
The check inside of produce_matching should now only error if something unexpected happens (dynamo accidentally adds an input or removes an output), and should be considered an internal error.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117598
Approved by: https://github.com/avikchaudhuri
In my work on making guards installed eagerly (look up the stack), I found that our checkpoint/restore mechanism is very broken. There is lots of state (especially in shape_env) which we don't checkpoint and restore properly. We also have lots of mutable state on variable trackers already which is not checkpointed/restored. (See other PRs in this stack for some spot fixes.)
Since we wanted to get rid of this anyway for making VariableTracker mutable, I figured I would just switch to restarting analysis.
For other usages of copy_graphstate/restore_graphstate:
1) Many usages were pointless and not needed, these are removed in PRs below this.
2) Some other usage (similar to this one) is removed in PRs above this.
3) The tricky one I am not handling is higher_order_ops, which uses checkpoint/restore a lot. There might be some cases there where this speculate/restart trick won't work.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112902
Approved by: https://github.com/voznesenskym
We want to get to a point where most UserErrors link to exportdb examples. This PR makes passing case names non-optional to make this intent clearer and encourage developers who raise UserErrors to make or point to examples that make fixing such errors more obvious for users.
In addition, sometimes there are multiple examples that are relevant to an error. Thus this PR also enables passing multiple case names.
Retry of #110733 which was reverted due to a landrace.
Differential Revision: [D50087148](https://our.internmc.facebook.com/intern/diff/D50087148/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110878
Approved by: https://github.com/gmagogsfm, https://github.com/tugsbayasgalan
We want to get to a point where most `UserError`s link to `exportdb` examples. This PR makes passing case names non-optional to make this intent clearer and encourage developers who raise `UserError`s to make or point to examples that make fixing such errors more obvious for users.
In addition, sometimes there are multiple examples that are relevant to an error. Thus this PR also enables passing multiple case names.
Differential Revision: [D50020465](https://our.internmc.facebook.com/intern/diff/D50020465/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110733
Approved by: https://github.com/zhxchen17
Ideally all `_dynamo.exc.UserError`s should have "case names", i.e., link to examples in `exportdb`.
This PR adds case names to several instances of `_dynamo.exc.UserError`. In particular, looking at coverage based on `UserErrorType`:
* `DYNAMIC_CONTROL_FLOW`, `ANTI_PATTERN`, and `STANDARD_LIBRARY` are fully covered.
* `CONSTRAINT_VIOLATION` and `DYNAMIC_DIM` have no coverage. We don't seem to have any dedicated examples of specifying dynamic shapes in `exportdb` (although they are used in some other examples without explanation, to avoid some specialization that would make such examples moot).
* `INVALID_INPUT` is only partly covered. Frankly this is tedious to cover via examples.
Differential Revision: [D49928518](https://our.internmc.facebook.com/intern/diff/D49928518/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110555
Approved by: https://github.com/angelayi, https://github.com/ydwu4
We want cond to always throw errors despite user's torch.compile mode.
The current implementation is to
1. catch the UserError.GRAPH_BREAK_IN_CONTROL_FLOW and once saw it, we directly raise: once in [break_graph_if_unsupported](bad3f2db40/torch/_dynamo/symbolic_convert.py (L1250)), which catches and raises for call_function (entry point of higher order operator) and a few others.
2. The raised exception is caught and raised again in [step](bad3f2db40/torch/_dynamo/symbolic_convert.py (L691)), where all instructions' exceptions are handled.
3. At the top-level, we treat it like an hard error and not supressing the errors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108027
Approved by: https://github.com/zou3519
ghstack dependencies: #108025, #108026