`ProxyTorchDispatchMode` was added recently as part of `make_fx`, which was secretly causing the meta tensor calls used inside of functionalization to get baked into the graph. It also wasn't caught because the functionalization tests in core don't use `make_fx`, and the tests in functorch aren't as comprehensive.
Now that `make_fx` is in core, I also ported the functionalization test infra over to use it, which would have caught the regression. This also makes the tests cleaner, since mode-based tracing lets us pick up factory functions in the trace output.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80416
Approved by: https://github.com/ezyang, https://github.com/albanD
This should fix the last issue that @anijain2305 hit when running ResNet with TorchDynamo <> functionalization.
Today if you try to call an `OpOverloadPacket` from python with some arguments, we will use the types of those arguments to perform overload resolution. With some functional variants of ops, this can be ambiguous.
Today this affects just one op: `_fused_moving_avg_obs_fq_helper`, although it would potentially affect e.g. `native_batch_norm` in the future.
Example:
```
# There are technically two overloads:
# torch.ops.aten._fused_moving_avg_obs_fq_helper.default (returns 2 argument, mutates 4 of its inputs inplace)
# torch.ops.aten._fused_moving_avg_obs_fq_helper.functional (returns 6 argument, mutates none of its inputs)
# We pick the wrong one - no way to know that we should pick the functional one, just from the call site.
outs = torch.ops.aten._fused_moving_avg_obs_fq_helper(a, a, a, a, a, a, a, 1.0, 0, 1, 0)
# raises an error - tries to call the overload with only 2 returns
return _fused_moving_avg_obs_fq_helper_functional[5]
```
Specifically, functionalization will bake `_fused_moving_avg_obs_fq_helper.functional` into the graph, but when AOTAutograd tries to compile with TorchScript, it needs to remove the overload name (TS doesn't know how to parse overload names directly, so we need to remove the overload name and let it infer the right overload at runtime later- so it picks the wrong one).
The situation is pretty similar to inplace; `ops.aten.add` and `ops.aten.add_` represent two different `OverloadPacket` objects; they can't be overloads of the same op, because their schemas would be ambiguous - the alias annotations are different, but that isn't enough to disambiguate).
In this PR, I try to fix the situation in a pretty similar way to how we handle `inplace` in the data model: `inplace` ops get their own base operator name, but they are represented as a flag inside of `BaseOperatorName` in the data model.
Two other important changes that I made as part of this PR:
(1) Originally, there were ~100 different `*_functional` operators: e.g. we had operators named `resize.functional` and `zero.functional`. The `_functional` bit isn't actually necessary in most cases: it's only necessary for operators that **also** have a `SchemaKind.mutable` variant, where `_fused_moving_avg_obs_fq_helper` is the only op that fits that description today. So I removed the unnecessary notion of "functional" from those other ops. I also added a bunch of assertions to force this restriction.
I think that makes more sense in the long run, because it eliminates an unnecessary difference in the model. E.g. we don't have `add_.Tensor` and `add.Tensor_functional`. We just have `add_.Tensor` and `add.Tensor`.
(2) I noticed that we actually still weren't pairing up a bunch of `_foreach` operators correctly, because their input arguments were different (`self` vs. `tensors`). Since they're private API's, I went ahead and changed the argument names directly so they get matched up. Before this PR, we were generating a separate `_foreach_add` and `_foreach_add.functional` variant in a bunch of cases, that really did the same thing (but happened to have a different name for the first argument).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80556
Approved by: https://github.com/ezyang, https://github.com/albanD
Double-header bug fix:
- As reported by jansel, dtypes are still showing up as integers
when the schema is an optional dtype. This is simple enough to
fix and I added a test for it. But while I was at it...
- I noticed that the THPMemoryFormat_new idiom with "unused" name
doesn't actually work, the repr of the returned memory format
object is wrong and this shows up when we try to log the args/kwargs.
So I fixed memory format to do it properly along with everything
else.
Fixes https://github.com/pytorch/pytorch/issues/77135
Signed-off-by: Edward Z. Yang <ezyangfb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77543
Approved by: https://github.com/albanD, https://github.com/jansel
For the most part, PrimTorch refs have the same signature as their
ATen equivalents. I modify most PrimTorch refs to register themselves
as decompositions, using the prim name they wrap to find the aten name
(except for a few cases where the prim/aten names mismatch). There are
some exclusions, falling into one of two categories:
- The torch equivalent was already implemented as a CompositeImplicitAutograd
decomposition in C++
- The ref doesn't support enough features (e.g., the real deal has more
kwargs / overloads than are currently implemented)
PrimTorch refs are written as a single function that supports all
overloads, and this style is convenient for cases where we have a bundle
of overloads for what morally is a single overload with a Union type
on an argument (which we ought to have supported in
native_functions.yaml but blah); to support registering a single decomp
for all the overloads, we modify register_decomposition to register
to ALL overloads if you pass it an overload packet. This is technically
BC breaking but no tests started failing because of it.
Signed-off-by: Edward Z. Yang <ezyangfb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76835
Approved by: https://github.com/Chillee, https://github.com/mruberry
Summary:
I noticed after creating https://github.com/pytorch/pytorch/issues/71553 that the test ownership lint was not working properly.
This fixes my egregious mistake and fixes the broken lints.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71554
Reviewed By: malfet
Differential Revision: D33690732
Pulled By: janeyx99
fbshipit-source-id: ba4dfbcd98038e4afd63e326832ae40935d2501e
(cherry picked from commit 1bbc3d343ac143f10b3d4052496812fccfd9e853)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68395
At the time that I wrote the pass, I thought that `c10::TensorList` and `c10::List<Tensor>` were the same thing. But it looks like a `TensorList` is actually an `ArrayRef<Tensor>`. This led to a nasty bug when I tried to add conditional functionalization to `block_diag`, where in the boxed kernel, I would:
(1) unwrap the first `IValue` by calling `.toTensorList()` (this actually returns a `List<Tensor>`, not a `TensorList`).
(2) call `TensorList to_functional_tensor(List<Tensor>)` to get out a `TensorList` with the functionalized tensors
(3) wrap that back into an `IValue` and put in on the stack.
Somewhere in that sequence of operations, something bad happens and we segfault. Fixing up the signature of `to_functional_tensor` to be `List<Tensor> to_functional_tensor(List<Tensor>)` fixes the bug. I have a feeling that there's a latent TensorList-related bug in the boxing/unboxing logic that made this worse, but I'm okay to stick with my narrow fix for now.
Additionally tested by running `pytest test/test_ops.py test/test_vmap.py -v -k block_diag` on top of this PR: https://github.com/pytorch/functorch/pull/235
Test Plan: Imported from OSS
Reviewed By: zou3519
Differential Revision: D32448258
Pulled By: bdhirsh
fbshipit-source-id: 3b2b6c7cd5e4c29533d0502f24272d826bfe03c1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66101
Updated description:
This PR tests the functionalization pass in python in two ways. For each of the test programs that I have in `test_functionalization.py`, it:
- runs the program with and without functionalization, and asserts the outputs and (potentially mutated) inputs are equal in both cases
- runs the program with `LoggingTensor`, and uses expecttests on the resulting graph. I manually confirm that the graphs look reasonable and only contain functional ops.
Mechanically, the changes include:
- factoring out `LoggingTensor` into a testing util so it can be re-used in multiple tests
- adding some private python api's in the `torch` namespace as hooks that I can use during testing
In the original version of this PR, I also added some fixes to the `_make_subclass()` function in python: allowing you to pass in strides and storage_offset. I kept them in mainly because the changes were already there.
Test Plan: Imported from OSS
Reviewed By: zou3519
Differential Revision: D31942095
Pulled By: bdhirsh
fbshipit-source-id: 90ff4c88d461089704922e779571eee09c21d707