Fix: https://github.com/pytorch/xla/issues/6009
This PR adds another case to `TensorVariable.method_new` special case, where it
re-dispatches `new` into `new_empty`.
Since we are using fake tensors, the `new` call doesn't actually gets to the corresponding
backend (e.g. XLA). So, things like the following might happen:
```python
@torch.compile(backend="openxla")
def foo(x):
new_x = x.new(*x.size())
# new_x.device() == "xla"
# x.device() == "xla:0"
return new_x + x
a = torch.arange(10)
foo(a.to(xm.xla_device()))
```
Resulting in the following error:
```python
Traceback (most recent call last):
...
File "torch/_dynamo/utils.py", line 1654, in get_fake_value
ret_val = wrap_fake_exception(
File "torch/_dynamo/utils.py", line 1190, in wrap_fake_exception
return fn()
File "torch/_dynamo/utils.py", line 1655, in <lambda>
lambda: run_node(tx.output, node, args, kwargs, nnmodule)
File "torch/_dynamo/utils.py", line 1776, in run_node
raise RuntimeError(make_error_message(e)).with_traceback(
File "torch/_dynamo/utils.py", line 1758, in run_node
return node.target(*args, **kwargs)
File "torch/utils/_stats.py", line 20, in wrapper
return fn(*args, **kwargs)
File "torch/_subclasses/fake_tensor.py", line 885, in __torch_dispatch__
return self.dispatch(func, types, args, kwargs)
File "torch/_subclasses/fake_tensor.py", line 1224, in dispatch
return self._cached_dispatch_impl(func, types, args, kwargs)
File "torch/_subclasses/fake_tensor.py", line 955, in _cached_dispatch_impl
output = self._dispatch_impl(func, types, args, kwargs)
File "torch/_subclasses/fake_tensor.py", line 1445, in _dispatch_impl
return self.wrap_meta_outputs_with_default_device_logic(
File "torch/_subclasses/fake_tensor.py", line 1575, in wrap_meta_outputs_with_default_device_logic
return tree_map(wrap, r)
File "torch/utils/_pytree.py", line 900, in tree_map
return treespec.unflatten(map(func, *flat_args))
File "torch/utils/_pytree.py", line 736, in unflatten
leaves = list(leaves)
File "torch/_subclasses/fake_tensor.py", line 1550, in wrap
) = FakeTensor._find_common_device(func, flat_args)
File "torch/_subclasses/fake_tensor.py", line 625, in _find_common_device
merge_devices(arg)
File "torch/_subclasses/fake_tensor.py", line 620, in merge_devices
raise RuntimeError(
torch._dynamo.exc.TorchRuntimeError: Failed running call_function <built-in function add>(*(FakeTensor(..., device='xla', size=(10,), dtype=torch.int64), FakeTensor(..., device='xla:0', size=(10,), dtype=torch.int64)), **{}):
Unhandled FakeTensor Device Propagation for aten.add.Tensor, found two different devices xla, xla:0
```
Using `new_empty`, instead, fixes this error because it uses the device from the source
tensor, instead of inferring from the current dispatch key set.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121075
Approved by: https://github.com/jansel
Fixes https://github.com/pytorch/pytorch/issues/120441
We follow how triton_kernel_wrapper_functional gets re-inplaced:
- If we see auto_functionalized, then first we compute what inputs we
actually need to clone ("tensors_to_clone") and fixup the graph. This happens in
`reinplace_and_refine_tensors_to_clone`, which I have refactored out
of the triton_kernel_wrapper_functional reinplacing code.
- Later on, after the reinplacing pass, we have a decomposition pass for
auto_functionalized. In that decomposition pass, we make use of the
"tensor_to_clone" info and only clone those inputs in the
decomposition.
- We shepherd "tensor_to_clone" from the first step to the second step
by setting the .meta field on the auto_functionalized node.
Test Plan:
- existing tests
- tested locally by reading the output of TORCH_LOGS="post_grad_graphs"
- added assertExpectedInline tests for the post_grad_graphs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120829
Approved by: https://github.com/oulgen
This is basically done the obvious way. For better or worse, I jammed this into what used to be `_maybe_guard_eq` but now is `_maybe_guard_rel`. I was careful to test all the off by one conditions, and each permutation. Let me know if you think I missed anything. Importantly, this now works for unbacked SymInts.
While testing, I noticed we are silently duck sizing all symbolic variables in `test_dynamic_shapes.py`. This may or may not be covering up bugs.
Along the way, I had to fix a bug in export constraints, where we weren't checking that the final var_to_range was consistent with what the user requested at top level.
After I implemented all this, I realized that applying this to non-unbacked SymInts was duplicative with @ysiraichi's previous work on https://github.com/pytorch/pytorch/pull/97963 . The upside is I now understand what Yukio was trying to do in the original PR, and I think my new logic is simpler and less error prone. In Yukio's earlier diff, Yukio tried very hard to avoid changing what guards we actually issue (since this would cause tests to wobble). Thus, when he refined a range, he also saved the guard that actually caused the range to refine. In this PR, I don't bother saving these guards; instead I just tighten var_to_range directly and rely on generating guards on this to be correct. The key insight is that if I assert `x < y`, it's always safe to emit (potentially) more restrictive range guards, because this won't invalidate our guards, it will just make them a little too strong (but actually, I think we are precise along the way.) If these guards make it unnecessary to test `x < y`, because now the ranges for x and y are disjoint, this is fine, we've subsumed the x < y guard and can just not bother testing it. If I've gotten it right, TV will agree with me.
In fact, I had a bug in this PR which TV didn't catch, which is that when we have a recorded var_to_guards for a symbol, we unconditionally never generate the range guard for it, even if the var_to_guards is potentially inconsistent with var_to_range (because var_to_range was updated separately). With var_to_guards removed, I don't have to worry abou this inconsistency.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120800
Approved by: https://github.com/Skylion007, https://github.com/avikchaudhuri, https://github.com/ysiraichi
Summary: In non-strict mode of torch.export() we didn't set those `is_compiling()` to `True` which is needed by some models.
Test Plan: Unit tests and manual testing.
Differential Revision: D53624452
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119602
Approved by: https://github.com/suo
This adds support for backwards hooks that are *both*:
1) Interior to the graph; and
2) Dynamically generated (e.g. lambdas)
We do this by creating a BackwardState object that is used to register the hooks in the forward, then populated by dynamo *after* the forwards runs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120382
Approved by: https://github.com/xmfan
Fixes https://github.com/pytorch/pytorch/issues/117268; check this issue for background.
This PR does the following:
* Do not perform a replacement if the expression we're replacing the symbol with has a less refined value range than the original. There's a little bit of trickiness around the handling for values close to INT64_MAX; when checking if a range refines another, I *only* consider the range representable in 64-bit integers. This is enough to prevent us from doing a substitution like `i0 = 10 - i1`, but it appears to still let us do the other substitutions we like, such as `i0 = i1` or `i0 = 12 * i1`
* The test above is order dependent: if we assert an equality BEFORE we have refined a range, we might be willing to do the replacement because there isn't a meaningful range. This means that it's important to mark things as sizes, before you start doing other error checking. `split_with_sizes` is adjusted accordingly. It would be good to raise an error if you get the ordering wrong, but I leave this to future work.
* It turns out this is not enough to fix AOTAutograd, because we lose the size-ness of unbacked SymInts when AOTAutograd retraces the Dynamo graph. So update deferred runtime assert insertion to also insert size-ness and value ranges annotations. Note that, in principle, it shouldn't be necessary to explicitly do the latter; these should just show up as deferred runtime asserts. That's some extra refactoring for a later day.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117356
Approved by: https://github.com/lezcano
Fixes https://github.com/pytorch/pytorch/issues/117268; check this issue for background.
This PR does the following:
* Do not perform a replacement if the expression we're replacing the symbol with has a less refined value range than the original. There's a little bit of trickiness around the handling for values close to INT64_MAX; when checking if a range refines another, I *only* consider the range representable in 64-bit integers. This is enough to prevent us from doing a substitution like `i0 = 10 - i1`, but it appears to still let us do the other substitutions we like, such as `i0 = i1` or `i0 = 12 * i1`
* The test above is order dependent: if we assert an equality BEFORE we have refined a range, we might be willing to do the replacement because there isn't a meaningful range. This means that it's important to mark things as sizes, before you start doing other error checking. `split_with_sizes` is adjusted accordingly. It would be good to raise an error if you get the ordering wrong, but I leave this to future work.
* It turns out this is not enough to fix AOTAutograd, because we lose the size-ness of unbacked SymInts when AOTAutograd retraces the Dynamo graph. So update deferred runtime assert insertion to also insert size-ness and value ranges annotations. Note that, in principle, it shouldn't be necessary to explicitly do the latter; these should just show up as deferred runtime asserts. That's some extra refactoring for a later day.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117356
Approved by: https://github.com/lezcano
Fixes#119198.
This PR make dynamo inline `__iter__` of a user defined object instead of creating a graph break. Also added a new test, which shows:
1. the loop is unrolled
2. the length of the loop is guarded when inlining `__iter__`
```python
class Mod:
def __init__(self):
self.a = [torch.randn(2, 2), torch.randn(2, 2)]
def __iter__(self):
return iter(self.a)
def f(mod):
ret = []
for x in mod:
ret.append(x + 1)
return ret
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119243
Approved by: https://github.com/jansel
Attempt #2 for https://github.com/pytorch/pytorch/pull/117875 to fix https://github.com/pytorch/pytorch/issues/112090.
Summary of changes:
- ~Changed CacheEntry linked list into a doubly-linked list structure to support deletion.~ (done by C++ refactor)
- Added CacheEntry and ExtraState borrowed references to GuardFn so that GuardFn can tell ExtraState to delete CacheEntry when the GuardFn is invalidated.
- ~Added ExtraState raw reference to CacheEntry so that we can get ExtraState to correctly point to the first CacheEntry if it gets deleted.~ (done by C++ refactor)
- CacheEntry destructor needs to reset GuardFn refs to ExtraState/CacheEntry in order to prevent use-after-free.
- code_context values that are nn.GraphModules need to be weakrefs in order to prevent circular references.
- Added tests that check for memory leaks and cache deletion operations.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119107
Approved by: https://github.com/jansel
Fixes https://github.com/pytorch/pytorch/issues/117361
The implementation here slightly diverges from what was proposed in the issue, so I will recap what this PR is doing here. Today, when doing computations involving size-like unbacked SymInts, we assume for all operations that the compile time range of the integer is `[2, inf]`, even though at runtime we also accept zero and one.
This PR removes the carte blanche assumption, and instead does the analysis in a much more limited and controlled fashion: only for guards which we have designated as "size oblivious" are we willing to do the analysis under the assumption that the range of all size-like unbacked SymInts is `[2, inf]`; otherwise, we will faithfully only do analysis with `[0, inf]` (or whatever the user provided) bounds.
The infra pieces of this PR are:
* Remove runtime_var_to_range from torch/fx/experimental/symbolic_shapes.py; modify `_constrain_range_for_size` to refine the range without clamping min to 2, and instead add the symbol to a `size_like` set in the ShapeEnv
* When evaluating an expression, if the expression is requested to be evaluated in a `size_oblivious` way, we attempt to statically compute the value of the expression with the assumption that all symbols in `size_like` are updated to assume that they are `>= 2`.
* Add Python and C++ APIs for guarding on a SymBool in a size-oblivious way. In C++, I also need to add some helpers for performing symbolic comparisons, since the stock comparisons immediately specialize in the "normal" way.
The rest of the changes of the PR are marking various spots in PyTorch framework code as size oblivious, based on what our current test suite exercises.
As you review the places where we have marked things as size oblivious, it may become clear why I ended up not opting for the "designate a branch as the default branch when it's not statically obvious which way to go": for some of the conditions, this answer is rather non-obvious. I think potentially there is another refinement on top of this PR, which is something like "I don't care if you can't figure it out with ValueRange analysis, go down this path anyway if there are unbacked sizes involved." But even if we add this API, I think we are obligated to attempt the ValueRange analysis first, since it can lead to better outcomes sometimes (e.g., we are able to figure out that something is contiguous no matter what the unbacked size is.)
When is it permissible to mark something as size oblivious? Heuristically, it is OK anywhere in framework code if it gets you past a guard on unbacked SymInt problem. It is somewhat difficult to provide a true semantic answer, however. In particular, these annotations don't have any observational equivalence guarantee; for example, if I have `torch.empty(u0, 1).squeeze()`, we will always produce a `[u0]` size tensor, even though if `u0 == 1` PyTorch will actually produce a `[]` size tensor. The argument that I gave to Lezcano is that we are in fact defining an alternate semantics for a "special" size = 0, 1, for which we have these alternate eager mode semantics. In particular, suppose that we have a constant `special1` which semantically denotes 1, but triggers alternate handling rules. We would define `torch.empty(special1, 1).squeeze()` to always produce a `[special1]` size tensor, making its semantics coincide with unbacked SymInt semantics. In this model, the decision to designate guards as size oblivious is simply a user API question: you put them where ever you need some handling for special1! As we conservatively error out whenever it is not obvious what `special1` semantics should be, it is always valid to expand these semantics to cover more cases (although you can always choose the wrong semantics!)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118579
Approved by: https://github.com/eellison, https://github.com/lezcano
For code like following:
```python
import torch
def f():
a = {"a": torch.randn(2, 2)}
a.clear()
return a
torch.compile(f, backend="eager", fullgraph=True)()
```
We have a graph break before the pr:
```
torch._dynamo.exc.Unsupported: call_method ConstDictVariable() clear [] {}
```
Test Plan:
Added new tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119197
Approved by: https://github.com/jansel, https://github.com/anijain2305
Make variables in dict lazy and remove DICT_KEYS guard.
We build the keys of a dict depth-first and we rely on the guards of
each element in the dict to create the correct guards. This allows us to
remove the rather buggy DICT_KEYS guard and make the guard lazy.
The guards are not completely lazy yet, as we instantiate them in
`_HashableTracker._eq_impl` but it should be possible to make them
truly lazy.
Also, adding new types to the supported types within keys should be less
error prone.
This is marginally less efficient when we graph break, but in turn we
should graph break much less. It also makes the dicts code easier to maintain
(removes `is_hashable_python_var`).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117625
Approved by: https://github.com/jansel, https://github.com/peterbell10, https://github.com/anijain2305
ghstack dependencies: #117982, #118098, #117983
Dynamo creates Tensors when tracing through numpy ufuncs like np.sin, np.minimum etc. When running, np functions generally return Tensors when run with `torch.compile`. However, we currently require when normalizing `out` arguments that the input is an ndarray. This creates assertion errors when running torch.compile on any numpy function with an out argument:
```
def test_numpy_ufunc_out(self):
@torch.compile(backend="eager")
def foo():
x = np.arange(5)
out = np.empty((x.shape[0], x.shape[0]))
res_out = np.sin(x, out=out)
assert res_out is out
foo()
```
Failure with stack trace: https://gist.github.com/jamesjwu/68e217638d735678b3de968584dba23f
Instead, we can wrap tensors in an ndarray in normalize_outarray to handle the case correctly. Fixing this resolves ~220 tests under dynamo_test_failures, but also exposes a followup bug.
In the presence of a graph break, ndarrays don't preserve their id, which can affect assertions and `is` checks between numpy arrays:
```
def test_x_and_out_broadcast(self, ufunc):
x = self.get_x(ufunc)
out = np.empty((x.shape[0], x.shape[0]))
x_b = np.broadcast_to(x, out.shape)
# ufunc is just np.sin here
res_out = ufunc(x, out=out)
res_bcast = ufunc(x_b)
# passes
assert res_out is out
graph_break()
# fails
assert res_out is out
```
Regular tensors preserve their id because Dynamo caches their example tensor values across a graph break. However, with ndarrays, we only store their converted tensor values, and construct new ndarrays around those values:
eebe7e1d37/torch/_dynamo/variables/builder.py (L1083)
Added a test with expected failure to showcase this — we can then fix that issue separately.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118248
Approved by: https://github.com/lezcano
Summary: When there were optionals with specified default values the code was improperly handling the number of parameters causing IndexError: tuple index out of range.
Test Plan: New tests.
Reviewed By: zou3519
Differential Revision: D53095812
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118331
Approved by: https://github.com/zou3519
Summary: When there were optionals with specified default values the code was improperly handling the number of parameters causing `IndexError: tuple index out of range`
Test Plan: new tests
Differential Revision: D52977644
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118035
Approved by: https://github.com/williamwen42
It looks like the inductor fallback previously worked with HOPs but no longer
does, so I fixed that:
- all HOPs are exposed under torch.ops.higher_order, so I changed how
inductor looks them up
- the inductor fallback assumed that an operator's signature was (*args,
**kwargs). This is true for all the OpOverloads but not HOPs. I
rewrote the code to not rely on this.
Test Plan:
- existing tests
- new test for auto_functionalized HOP.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117084
Approved by: https://github.com/williamwen42
This prepares the PR where we implement sets in terms of dicts.
To do so, rather than storing internally a dictionary that maps literals
to VariableTrackers, it stores (pretty much) a dictionary from VTs to VTs.
To do so, keys are wrapped in an opaque internal class _Hashable.
The Hashable class is opaque on purpose so that it fails hard if
if it inadvertently leaks back into user code.
We also found and fixed a number of latent bugs and inconsistencies
in the way dynamo checked what can be a dict key. More generally, we
make much clearer what are the things that need to be modified to add
a new supported key type to Dicts.
Fixes [#107595](https://www.internalfb.com/tasks?t=107595)
Fixes [#111603](https://www.internalfb.com/tasks?t=111603)
Re-PR of https://github.com/pytorch/pytorch/pull/111196 sadly due to reverts, we could not reuse @lezcano's original PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116785
Approved by: https://github.com/mlazos