This PR fixes a bug where user defined triton kernels hidden behind `triton_op` do not register source code changes. If a user *only* changes a triton kernel source_code, because triton kernels are hidden under the custom op, dynamo hasn't traced into them yet.
This means at AOTAutograd time, we don't know the list of triton kernels that are defined by custom ops. This is an initial fix for the issue by parsing the AST of the custom op looking for triton kernels. This won't catch more degenerate cases if the custom op calls other custom ops/functions that then call triton kernels, and then the toplevel compiled graph doesn't know about it. To handle that, we'd have to trace through the custom op at dynamo time.
This should handle 99% of cases, though. I added an expectedFailure test to show the limitation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160120
Approved by: https://github.com/zou3519
This PR is part of the work to deprecate torch::deploy in OSS. Effectively it does 3 things to get started.
1. Remove test_deploy_interaction as we no longer need to worry about this
2. Remove all torch._running_with_deploy checks and use the False path always (surfaced 1)
3. Remove `USE_DEPLOY` and switch to the default path always
Note: MyPy does fail on a bunch of things here as a bunch of older files are touched. It may be better to fix these things on a separate PR
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158288
Approved by: https://github.com/albanD
This PR is part of the work to deprecate torch::deploy in OSS. Effectively it does 3 things to get started.
1. Remove test_deploy_interaction as we no longer need to worry about this
2. Remove all torch._running_with_deploy checks and use the False path always (surfaced 1)
3. Remove `USE_DEPLOY` and switch to the default path always
Note: MyPy does fail on a bunch of things here as a bunch of older files are touched. It may be better to fix these things on a separate PR
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158288
Approved by: https://github.com/albanD
This PR is part of the work to deprecate torch::deploy in OSS. Effectively it does 3 things to get started.
1. Remove test_deploy_interaction as we no longer need to worry about this
2. Remove all torch._running_with_deploy checks and use the False path always (surfaced 1)
3. Remove `USE_DEPLOY` and switch to the default path always
Note: MyPy does fail on a bunch of things here as a bunch of older files are touched. It may be better to fix these things on a separate PR
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158288
Approved by: https://github.com/albanD
We don't need to do a loop over all the args, kwargs in the
AdInplaceOrView key; we just need to bump the version on the args,
kwargs that are mutable.
On the benchmark mentioned in
https://github.com/pytorch/pytorch/issues/139494
this made the time go from
```
mutate2 = 61.72943878173828
no_mutate2 = 36.89440155029297
mutate = 236.3092498779297
no_mutate = 59.31964874267578
```
to
```
mutate2 = 47.976478576660156
no_mutate2 = 38.37468719482422
mutate = 71.21315002441406
no_mutate = 59.7432975769043
```
Test Plan:
- existing tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139513
Approved by: https://github.com/bdhirsh
ghstack dependencies: #139509
When we see a custom op:
- check that its mutation annotations are correct
- check that its aliasing constraints matches our constraints for custom
ops.
Otherwise, there may be undefined behavior.
Test Plan:
- new tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139212
Approved by: https://github.com/angelayi
We promise the user that these custom ops (and their kernels) are black
boxes w.r.t. torch.compile. Unfortunately Dynamo can turn itself back
on in the implementation of the custom operator, so we force it off by
disabling Dynamo
Test Plan:
- new tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133125
Approved by: https://github.com/ezyang
Made the following changes:
- mutates_args is now keyword-only and mandatory. This is to align with
torch.library.custom_op (which makes it mandatory because it's easy to
miss)
- op_name is now keyword-only. This helps the readability of the API
- updated all usages of infer_schema
This change is not BC-breaking because we introduced
torch.library.infer_schema a couple of days ago.
Test Plan:
- tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130705
Approved by: https://github.com/yushangdi
ghstack dependencies: #131777
Fixes#130284Fixes#130653
- Add `torch.library.register_vmap` to custom ops
- Add `register_vmap` for operators in ops in custom_op_db.
- Make `torch.autograd.Function` support kwarg-only kwargs for vmap
- test operators in op_db with `tests/test_vmap`.
- change `test_vmap` to allow custom `out_dim` and allow "None" in `out_dim` when testing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130589
Approved by: https://github.com/zou3519
Fixes#130284Fixes#130653
- Add `torch.library.register_vmap` to custom ops
- Add `register_vmap` for operators in ops in custom_op_db.
- Make `torch.autograd.Function` support kwarg-only kwargs for vmap
- test operators in op_db with `tests/test_vmap`.
- change `test_vmap` to allow custom `out_dim` and allow "None" in `out_dim` when testing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130589
Approved by: https://github.com/zou3519
Made the following changes:
- mutates_args is now keyword-only and mandatory. This is to align with
torch.library.custom_op (which makes it mandatory because it's easy to
miss)
- op_name is now keyword-only. This helps the readability of the API
- updated all usages of infer_schema
This change is not BC-breaking because we introduced
torch.library.infer_schema a couple of days ago.
Test Plan:
- tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130705
Approved by: https://github.com/yushangdi
Fixes#129389
If a user registers a device-specific implementation for an operator that accepts no Tensors, then we require the operator to have a "device: torch.device argument"
We switch on the device argument to select the correct backend to dispatch to.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129978
Approved by: https://github.com/zou3519
This reverts commit dd64ca2a02434944ecbc8f3e186d44ba81e3cb26.
There's a silent incorrectness bug with needs_fixed_stride_order=True and
mutable custom ops, so it's better to flip the default back to avoid
silent incorrectness.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127923
Approved by: https://github.com/williamwen42
Previously, the default was that Inductor did not respect strides for
all (builtin and custom) ops unless the op has a
"needs_fixed_stride_order" tag on it. This PR changes it so that:
- inductor doesn't respect strides for builtin ops. To change the
behavior, one can add the "needs_fixed_stride_order" tag
- inductor does respect strides for custom ops. To change the behavior,
one can add the "does_not_need_fixed_stride_order" tag
Test Plan:
- new tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126986
Approved by: https://github.com/ezyang, https://github.com/albanD