17 Commits

Author SHA1 Message Date
a43c4c3972 [5/N] Apply ruff UP035 rule (#164423)
Continued code migration to enable ruff `UP035`. Most changes are about moving `Callable` from `typing` to `from collections.abc`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164423
Approved by: https://github.com/ezyang
2025-10-02 07:31:11 +00:00
9708fcf92d Account for triton kernel source code hidden in custom ops properly in AOTAutogradCache (#160120)
This PR fixes a bug where user defined triton kernels hidden behind `triton_op` do not register source code changes. If a user *only* changes a triton kernel source_code, because triton kernels are hidden under the custom op, dynamo hasn't traced into them yet.

This means at AOTAutograd time, we don't know the list of triton kernels that are defined by custom ops. This is an initial fix for the issue by parsing the AST of the custom op looking for triton kernels. This won't catch more degenerate cases if the custom op calls other custom ops/functions that then call triton kernels, and then the toplevel compiled graph doesn't know about it. To handle that, we'd have to trace through the custom op at dynamo time.

This should handle 99% of cases, though. I added an expectedFailure test to show the limitation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160120
Approved by: https://github.com/zou3519
2025-08-12 14:11:06 +00:00
10bc36fe84 Get tensor subclasses and torch.library.triton_op to dispatch correctly (#160341)
Short-term fix for https://github.com/pytorch/pytorch/issues/160333

The problem is:
1) `triton_op` adds a decomposition for FunctionalTensorMode for this operation
2) Tensor Subclasses rely on FunctionalTensorMode's `__torch_dispatch__` returning NotImplemented.
3) `triton_op`'s FunctionalTensorMode decomposition takes precedence over FunctionalTensorMode's decomposition.

The easy fix is to copy-paste the FunctionalTensorMode's NotImplemented
return logic into the decomposition.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160341
Approved by: https://github.com/drisspg
2025-08-12 04:09:37 +00:00
bb9c426024 Typo Errors fixed in multiple files (#148262)
# Fix typo errors across PyTorch codebase

This PR fixes various spelling errors throughout the PyTorch codebase to improve documentation quality and code readability.

## Changes Made

### Documentation Fixes
- Changed "seperate" to "separate" in multiple files:
  - `setup.py`: Build system documentation
  - `torch/_library/triton.py`: AOT compilation comments
  - `torch/csrc/dynamo/compiled_autograd.h`: Node compilation documentation
  - `torch/export/_unlift.py`: Pass population comments
  - `torch/export/exported_program.py`: Decomposition table notes

### Code Comments and Error Messages
- Changed "occured" to "occurred" in:
  - `test/mobile/test_lite_script_module.py`: Exception handling comments
  - `torch/export/_draft_export.py`: Error message text
  - `aten/src/ATen/native/cuda/linalg/BatchLinearAlgebra.cpp`: MAGMA bug comment
  - `torch/csrc/utils/python_numbers.h`: Overflow handling comment
  - `torch/csrc/jit/OVERVIEW.md`: Graph compilation documentation
  - `torch/_dynamo/symbolic_convert.py`: Error explanation

### API Documentation
- Changed "fullfill" to "fulfill" in `torch/distributed/checkpoint/state_dict_loader.py`
- Changed "accross" to "across" in:
  - `torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp`
  - `torch/distributed/distributed_c10d.py`

## Motivation
These changes improve code readability and maintain consistent spelling throughout the codebase. No functional changes were made; this is purely a documentation and comment improvement PR.

## Test Plan
No testing required as these changes only affect comments and documentation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148262
Approved by: https://github.com/janeyx99

Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>
2025-03-09 12:21:40 +00:00
e5a13410cd Fix the tiny doc descriptions (#147319)
As the title stated
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147319
Approved by: https://github.com/zou3519
2025-02-25 17:10:16 +00:00
5b5766665d PEP585 update - torch/_C torch/_decomp torch/_lazy torch/_library torch/_numpy torch/_prims torch/_refs torch/_strobelight (#145102)
See #145101 for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145102
Approved by: https://github.com/bobrenjc93
ghstack dependencies: #145105
2025-01-18 20:47:12 +00:00
567552b98b fix typo in doc and import for torch._library.triton (#144882)
Previously, the doc's suggested `from torch._library.triton import wrap_triton, triton_op` doesn't work because wrap_triton is not imported in torch/_library/__init__.py but `from torch.library import wrap_triton` works. This PR imports wrap_triton and fix the doc.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144882
Approved by: https://github.com/zou3519
2025-01-17 17:32:12 +00:00
c7dbee5106 [reland][export] don't decompose custom triton op when exporting (#144284)
Summary:
A reland of https://github.com/pytorch/pytorch/pull/142426.

Copying the description over here:

For torch.export (strict and non-strict), we don't do functional decomposition. Instead, we preserve the custom triton ops as custom ops. This is because we want the exported program to be high-level and serializable.

The alternative:
If we decompose the custom op to a functional hop and make it a node in exported program, we need to figure out ways of serializing the hop and its arguments, which can be triton.jited python functions and triton dtypes. This is undesireble because:

it can be tedious to maintain layer that serialize the jited function (e.g. with a string) and dtypes.
changes to triton or the serialization logic for triton arguments can be BC breaking
exported program will expose the implementation detail (i.e. triton source code) for a specific backend (GPU) to users, which mixes levels of abstraction.

Future plans:
After this PR, in the short term, we expect users to have a seperate aot_compile stage that compiles the exported program into a Cubin file on the same machine that users call export, which does autotuning and removes triton dependency and serve the model with Cubin. This guarantees that triton changes won't break BC.

In the long term, we may export multiple cubins for the triton op directly.

Test Plan: see new tests.

Differential Revision: D67879685

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144284
Approved by: https://github.com/zou3519
2025-01-11 01:34:35 +00:00
45ef3309e3 [BE] typing for decorators (#144161)
Summary:
Untyped decorators strip annotations from the decorated items.

- _compile
- _inductor/fx_passes/post_grad
- _inductor/lowering
- _library/custom_ops
- _meta_registrations
- _ops
- _refs/nn/functional
- ao/quantization/quantizer/xnnpack_quantizer_utils
- distributed/_composable/contract
- fx/experimental/graph_gradual_typechecker
- fx/experimental/migrate_gradual_types/constraint_generator
- optim/optimizer
- signal/windows/windows
- testing/_internal/common_device_type
- torch/_inductor/decomposition
- utils/flop_counter

Test Plan: unit tests

Differential Revision: D62302684

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144161
Approved by: https://github.com/Skylion007, https://github.com/albanD
2025-01-04 16:40:09 +00:00
e9bd74d763 Revert "[export] don't decompose custom triton op when exporting (#142426)"
This reverts commit 10b9c5944e8d6ff0685e1ef25277a1d3c4c9c5aa.

Reverted https://github.com/pytorch/pytorch/pull/142426 on behalf of https://github.com/huydhn due to This fails one internal MTIA test, checking with the author that we need to revert and reland this ([comment](https://github.com/pytorch/pytorch/pull/142426#issuecomment-2555793496))
2024-12-19 21:21:38 +00:00
10b9c5944e [export] don't decompose custom triton op when exporting (#142426)
For torch.export (strict and non-strict), we don't do functional decomposition. Instead, we preserve the custom triton ops as custom ops. This is because we want the exported program to be high-level and serializable.

#### The alternative:
If we decompose the custom op to a functional hop and make it a node in exported program, we need to figure out ways of serializing the hop and its arguments, which can be triton.jited python functions and triton dtypes. This is undesireble because:
- it can be tedious to maintain layer that serialize the jited function (e.g. with a string) and dtypes.
- changes to triton or the serialization logic for triton arguments can be BC breaking
- exported program will expose the implementation detail (i.e. triton source code) for a specific backend (GPU) to users, which mixes levels of abstraction.

#### Future plans:
After this PR, in the short term, we expect users to have a seperate aot_compile stage that compiles the exported program into a Cubin file **on the same machine that users call export**, which does autotuning and removes triton dependency and serve the model with Cubin. This guarantees that triton changes won't break BC.

In the long term, we may export multiple cubins for the triton op directly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142426
Approved by: https://github.com/zou3519
ghstack dependencies: #142425
2024-12-18 21:36:28 +00:00
827c322290 Make torch.library.triton_op public (#141880)
We've been using it privately for half a year and everything's been
good. This PR:
1. Makes torch.library.triton_op public
2. Renames capture_triton -> wrap_triton. We got feedback that no one
   knew what "capture triton" does.
3. Makes torch.library.wrap_triton public.

triton_op is used to construct a Python custom operator that may call 1+
triton kernels. Each of those triton kernels must be annotated with
wrap_triton.

Test Plan:
- existing tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141880
Approved by: https://github.com/albanD
ghstack dependencies: #141894
2024-12-03 16:28:56 +00:00
ac600fdce6 Type exposed_in decorator (#141894)
Test Plan:
- lintrunner
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141894
Approved by: https://github.com/albanD
2024-12-03 16:28:17 +00:00
4ee5547b37 [triton_op] Skip HOP dispatch when possible (#132822)
The capture_triton decorator returns a function that goes through the
triton kernel wrapper HOP. This is useful for make_fx tracing and
non-strict export. However, the HOP dispatch is slow (~1ms) and not
necessary in certain situations.

This PR skips going through the HOP dispatch for any
capture_triton-wrapped triton kernels that are registered as
implementations to a `@triton_op` custom operator. We do this by
creating a new thread-local flag that controls if the
capture_trition-wrapped triton kernel goes through HOP dispatch or not.

Test Plan:
- new test and existing tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132822
Approved by: https://github.com/SherlockNoMad
2024-08-08 15:56:40 +00:00
d3c17fea90 Revert "[BE] typing for decorators - _library/custom_ops (#131578)"
This reverts commit c65b197b85aeee61ed4c09527a8f6eecf8c20e27.

Reverted https://github.com/pytorch/pytorch/pull/131578 on behalf of https://github.com/clee2000 due to breaking lint internally D60265575 ([comment](https://github.com/pytorch/pytorch/pull/131572#issuecomment-2254328359))
2024-07-28 03:29:32 +00:00
c65b197b85 [BE] typing for decorators - _library/custom_ops (#131578)
See #131429
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131578
Approved by: https://github.com/oulgen, https://github.com/zou3519
ghstack dependencies: #131568, #131569, #131570, #131571, #131572, #131573, #131574, #131575, #131576, #131577
2024-07-25 22:24:19 +00:00
ee039c0614 [custom_op] triton_op API V0 (#130637)
This is the initial version of an API to create custom operators whose
implementations are backed by triton kernels. While user-defined triton
kernels work out-of-the-box with triton kernels, you may wish to
construct a custom operator if you need to compose with other PyTorch
subsystems, like Tensor subclasses or vmap.

I'm hoping to get design feedback on this and ship it so that we can
begin experimenting with customers.

Test Plan:
- new tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130637
Approved by: https://github.com/albanD
2024-07-15 13:00:54 +00:00