pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Yuanyuan Chen	a43c4c3972	[5/N] Apply ruff UP035 rule (#164423 ) Continued code migration to enable ruff `UP035`. Most changes are about moving `Callable` from `typing` to `from collections.abc`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164423 Approved by: https://github.com/ezyang	2025-10-02 07:31:11 +00:00
James Wu	9708fcf92d	Account for triton kernel source code hidden in custom ops properly in AOTAutogradCache (#160120 ) This PR fixes a bug where user defined triton kernels hidden behind `triton_op` do not register source code changes. If a user only changes a triton kernel source_code, because triton kernels are hidden under the custom op, dynamo hasn't traced into them yet. This means at AOTAutograd time, we don't know the list of triton kernels that are defined by custom ops. This is an initial fix for the issue by parsing the AST of the custom op looking for triton kernels. This won't catch more degenerate cases if the custom op calls other custom ops/functions that then call triton kernels, and then the toplevel compiled graph doesn't know about it. To handle that, we'd have to trace through the custom op at dynamo time. This should handle 99% of cases, though. I added an expectedFailure test to show the limitation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/160120 Approved by: https://github.com/zou3519	2025-08-12 14:11:06 +00:00
rzou	10bc36fe84	Get tensor subclasses and torch.library.triton_op to dispatch correctly (#160341 ) Short-term fix for https://github.com/pytorch/pytorch/issues/160333 The problem is: 1) `triton_op` adds a decomposition for FunctionalTensorMode for this operation 2) Tensor Subclasses rely on FunctionalTensorMode's `__torch_dispatch__` returning NotImplemented. 3) `triton_op`'s FunctionalTensorMode decomposition takes precedence over FunctionalTensorMode's decomposition. The easy fix is to copy-paste the FunctionalTensorMode's NotImplemented return logic into the decomposition. Pull Request resolved: https://github.com/pytorch/pytorch/pull/160341 Approved by: https://github.com/drisspg	2025-08-12 04:09:37 +00:00
Aditya Tiwari	bb9c426024	Typo Errors fixed in multiple files (#148262 ) # Fix typo errors across PyTorch codebase This PR fixes various spelling errors throughout the PyTorch codebase to improve documentation quality and code readability. ## Changes Made ### Documentation Fixes - Changed "seperate" to "separate" in multiple files: - `setup.py`: Build system documentation - `torch/_library/triton.py`: AOT compilation comments - `torch/csrc/dynamo/compiled_autograd.h`: Node compilation documentation - `torch/export/_unlift.py`: Pass population comments - `torch/export/exported_program.py`: Decomposition table notes ### Code Comments and Error Messages - Changed "occured" to "occurred" in: - `test/mobile/test_lite_script_module.py`: Exception handling comments - `torch/export/_draft_export.py`: Error message text - `aten/src/ATen/native/cuda/linalg/BatchLinearAlgebra.cpp`: MAGMA bug comment - `torch/csrc/utils/python_numbers.h`: Overflow handling comment - `torch/csrc/jit/OVERVIEW.md`: Graph compilation documentation - `torch/_dynamo/symbolic_convert.py`: Error explanation ### API Documentation - Changed "fullfill" to "fulfill" in `torch/distributed/checkpoint/state_dict_loader.py` - Changed "accross" to "across" in: - `torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp` - `torch/distributed/distributed_c10d.py` ## Motivation These changes improve code readability and maintain consistent spelling throughout the codebase. No functional changes were made; this is purely a documentation and comment improvement PR. ## Test Plan No testing required as these changes only affect comments and documentation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148262 Approved by: https://github.com/janeyx99 Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>	2025-03-09 12:21:40 +00:00
FFFrog	e5a13410cd	Fix the tiny doc descriptions (#147319 ) As the title stated Pull Request resolved: https://github.com/pytorch/pytorch/pull/147319 Approved by: https://github.com/zou3519	2025-02-25 17:10:16 +00:00
Aaron Orenstein	5b5766665d	PEP585 update - torch/_C torch/_decomp torch/_lazy torch/_library torch/_numpy torch/_prims torch/_refs torch/_strobelight (#145102 ) See #145101 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145102 Approved by: https://github.com/bobrenjc93 ghstack dependencies: #145105	2025-01-18 20:47:12 +00:00
Yidi Wu	567552b98b	fix typo in doc and import for torch._library.triton (#144882 ) Previously, the doc's suggested `from torch._library.triton import wrap_triton, triton_op` doesn't work because wrap_triton is not imported in torch/_library/__init__.py but `from torch.library import wrap_triton` works. This PR imports wrap_triton and fix the doc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144882 Approved by: https://github.com/zou3519	2025-01-17 17:32:12 +00:00
Yidi Wu	c7dbee5106	[reland][export] don't decompose custom triton op when exporting (#144284 ) Summary: A reland of https://github.com/pytorch/pytorch/pull/142426. Copying the description over here: For torch.export (strict and non-strict), we don't do functional decomposition. Instead, we preserve the custom triton ops as custom ops. This is because we want the exported program to be high-level and serializable. The alternative: If we decompose the custom op to a functional hop and make it a node in exported program, we need to figure out ways of serializing the hop and its arguments, which can be triton.jited python functions and triton dtypes. This is undesireble because: it can be tedious to maintain layer that serialize the jited function (e.g. with a string) and dtypes. changes to triton or the serialization logic for triton arguments can be BC breaking exported program will expose the implementation detail (i.e. triton source code) for a specific backend (GPU) to users, which mixes levels of abstraction. Future plans: After this PR, in the short term, we expect users to have a seperate aot_compile stage that compiles the exported program into a Cubin file on the same machine that users call export, which does autotuning and removes triton dependency and serve the model with Cubin. This guarantees that triton changes won't break BC. In the long term, we may export multiple cubins for the triton op directly. Test Plan: see new tests. Differential Revision: D67879685 Pull Request resolved: https://github.com/pytorch/pytorch/pull/144284 Approved by: https://github.com/zou3519	2025-01-11 01:34:35 +00:00
Aaron Orenstein	45ef3309e3	[BE] typing for decorators (#144161 ) Summary: Untyped decorators strip annotations from the decorated items. - _compile - _inductor/fx_passes/post_grad - _inductor/lowering - _library/custom_ops - _meta_registrations - _ops - _refs/nn/functional - ao/quantization/quantizer/xnnpack_quantizer_utils - distributed/_composable/contract - fx/experimental/graph_gradual_typechecker - fx/experimental/migrate_gradual_types/constraint_generator - optim/optimizer - signal/windows/windows - testing/_internal/common_device_type - torch/_inductor/decomposition - utils/flop_counter Test Plan: unit tests Differential Revision: D62302684 Pull Request resolved: https://github.com/pytorch/pytorch/pull/144161 Approved by: https://github.com/Skylion007, https://github.com/albanD	2025-01-04 16:40:09 +00:00
PyTorch MergeBot	e9bd74d763	Revert "[export] don't decompose custom triton op when exporting (#142426 )" This reverts commit 10b9c5944e8d6ff0685e1ef25277a1d3c4c9c5aa. Reverted https://github.com/pytorch/pytorch/pull/142426 on behalf of https://github.com/huydhn due to This fails one internal MTIA test, checking with the author that we need to revert and reland this ([comment](https://github.com/pytorch/pytorch/pull/142426#issuecomment-2555793496))	2024-12-19 21:21:38 +00:00
Yidi Wu	10b9c5944e	[export] don't decompose custom triton op when exporting (#142426 ) For torch.export (strict and non-strict), we don't do functional decomposition. Instead, we preserve the custom triton ops as custom ops. This is because we want the exported program to be high-level and serializable. #### The alternative: If we decompose the custom op to a functional hop and make it a node in exported program, we need to figure out ways of serializing the hop and its arguments, which can be triton.jited python functions and triton dtypes. This is undesireble because: - it can be tedious to maintain layer that serialize the jited function (e.g. with a string) and dtypes. - changes to triton or the serialization logic for triton arguments can be BC breaking - exported program will expose the implementation detail (i.e. triton source code) for a specific backend (GPU) to users, which mixes levels of abstraction. #### Future plans: After this PR, in the short term, we expect users to have a seperate aot_compile stage that compiles the exported program into a Cubin file on the same machine that users call export, which does autotuning and removes triton dependency and serve the model with Cubin. This guarantees that triton changes won't break BC. In the long term, we may export multiple cubins for the triton op directly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/142426 Approved by: https://github.com/zou3519 ghstack dependencies: #142425	2024-12-18 21:36:28 +00:00
rzou	827c322290	Make torch.library.triton_op public (#141880 ) We've been using it privately for half a year and everything's been good. This PR: 1. Makes torch.library.triton_op public 2. Renames capture_triton -> wrap_triton. We got feedback that no one knew what "capture triton" does. 3. Makes torch.library.wrap_triton public. triton_op is used to construct a Python custom operator that may call 1+ triton kernels. Each of those triton kernels must be annotated with wrap_triton. Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/141880 Approved by: https://github.com/albanD ghstack dependencies: #141894	2024-12-03 16:28:56 +00:00
rzou	ac600fdce6	Type exposed_in decorator (#141894 ) Test Plan: - lintrunner Pull Request resolved: https://github.com/pytorch/pytorch/pull/141894 Approved by: https://github.com/albanD	2024-12-03 16:28:17 +00:00
rzou	4ee5547b37	[triton_op] Skip HOP dispatch when possible (#132822 ) The capture_triton decorator returns a function that goes through the triton kernel wrapper HOP. This is useful for make_fx tracing and non-strict export. However, the HOP dispatch is slow (~1ms) and not necessary in certain situations. This PR skips going through the HOP dispatch for any capture_triton-wrapped triton kernels that are registered as implementations to a `@triton_op` custom operator. We do this by creating a new thread-local flag that controls if the capture_trition-wrapped triton kernel goes through HOP dispatch or not. Test Plan: - new test and existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/132822 Approved by: https://github.com/SherlockNoMad	2024-08-08 15:56:40 +00:00
PyTorch MergeBot	d3c17fea90	Revert "[BE] typing for decorators - _library/custom_ops (#131578 )" This reverts commit c65b197b85aeee61ed4c09527a8f6eecf8c20e27. Reverted https://github.com/pytorch/pytorch/pull/131578 on behalf of https://github.com/clee2000 due to breaking lint internally D60265575 ([comment](https://github.com/pytorch/pytorch/pull/131572#issuecomment-2254328359))	2024-07-28 03:29:32 +00:00
Aaron Orenstein	c65b197b85	[BE] typing for decorators - _library/custom_ops (#131578 ) See #131429 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131578 Approved by: https://github.com/oulgen, https://github.com/zou3519 ghstack dependencies: #131568, #131569, #131570, #131571, #131572, #131573, #131574, #131575, #131576, #131577	2024-07-25 22:24:19 +00:00
rzou	ee039c0614	[custom_op] triton_op API V0 (#130637 ) This is the initial version of an API to create custom operators whose implementations are backed by triton kernels. While user-defined triton kernels work out-of-the-box with triton kernels, you may wish to construct a custom operator if you need to compose with other PyTorch subsystems, like Tensor subclasses or vmap. I'm hoping to get design feedback on this and ship it so that we can begin experimenting with customers. Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/130637 Approved by: https://github.com/albanD	2024-07-15 13:00:54 +00:00

17 Commits