pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 12:54:11 +08:00

Author	SHA1	Message	Date
Yuanyuan Chen	9fff8155c3	[2/N] Fix clang-tidy readability checks (#164652 ) This PR applies clang-tidy readability checks to jit sources and all headers in the code base. `readability-redundant-inline-specifier` is suppressed because it incurs too many changes. `readability-redundant-inline-specifier` is used to detect redundant inline specifiers on function and variable declarations. There are many in-class method definitions that are marked inline. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164652 Approved by: https://github.com/Skylion007	2025-10-06 01:06:01 +00:00
PyTorch MergeBot	2c5ed6e7c0	Revert "[2/N] Fix clang-tidy readability checks (#164652 )" This reverts commit 3c5ca685d6f5b6f3971c0cd20a054aa355610419. Reverted https://github.com/pytorch/pytorch/pull/164652 on behalf of https://github.com/izaitsevfb due to need to revert due to a conflict with revert of https://github.com/pytorch/pytorch/pull/162659 ([comment](https://github.com/pytorch/pytorch/pull/164652#issuecomment-3369346707))	2025-10-05 21:36:57 +00:00
Yuanyuan Chen	3c5ca685d6	[2/N] Fix clang-tidy readability checks (#164652 ) This PR applies clang-tidy readability checks to jit sources and all headers in the code base. `readability-redundant-inline-specifier` is suppressed because it incurs too many changes. `readability-redundant-inline-specifier` is used to detect redundant inline specifiers on function and variable declarations. There are many in-class method definitions that are marked inline. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164652 Approved by: https://github.com/Skylion007	2025-10-05 07:05:11 +00:00
Mu-Chu Lee	8f30a8dc47	[AOTInductor] Add grid information for Triton Kernels (#160131 ) Summary: Add grid information for Triton Kernels for profiling in Kineto. Test Plan: Before change: <img width="539" height="625" alt="Screenshot 2025-08-07 at 1 09 07 PM" src="https://github.com/user-attachments/assets/dd0778a9-2ff3-4819-acd3-de585cf7f9d1" /> After change: <img width="550" height="898" alt="Screenshot 2025-08-07 at 1 05 49 PM" src="https://github.com/user-attachments/assets/d84988df-bb83-41ed-80ac-8a6d843a1a9d" /> *Note we can extract grid size etc. from device side trace, but we're focusing host side specifically for this PR, mainly to add more host side information in the future needed for performance profiling. Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/160131 Approved by: https://github.com/desertfire	2025-09-23 02:15:24 +00:00
Mu-Chu Lee	40311e2ec1	[AOTInductor] ABI-Compatibility for RecordFunction. (#159842 ) Summary: Previous our implementation for RecordFunction injects Aten into codegen, which is breaking the ABI contract for AOTInductor. C10::IValue is aded to call the full record function. The extension of more profiling info will come in later PRs. Test Plan: Included in commit. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D79622071](https://our.internmc.facebook.com/intern/diff/D79622071) Pull Request resolved: https://github.com/pytorch/pytorch/pull/159842 Approved by: https://github.com/desertfire	2025-08-15 21:45:47 +00:00
Jane Xu	3ddfd46bd2	Cut a version of TORCH_ERROR_CODE_CHECK in headeronly from AOTI (#159604 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/159604 Approved by: https://github.com/albanD, https://github.com/desertfire	2025-08-06 00:29:56 +00:00
Mu-Chu Lee	70e7b76707	[AOTInductor] Add Python interface for user managed buffer. (#151141 ) Summary: Add pybind for user managed buffer in update_constants_buffer. Test Plan: Included in commit. ``` python test/inductor/test_aot_inductor.py -k user_managed ``` Differential Revision: D72892310 Pull Request resolved: https://github.com/pytorch/pytorch/pull/151141 Approved by: https://github.com/henrylhtsang, https://github.com/desertfire	2025-04-15 09:36:30 +00:00
Mu-Chu Lee	60a45eb862	[AOTInductor] Introduce MaybeOwningAtenTensorHandle for ConstantMap (#150275 ) Summary: We used RAIIAtenTensorHandle for ConstantMap, where RAIIAtenTensorHandle is a unique_ptr, indicating that all memory handling is by the AOTInductor internally. In this PR, we introduce ConstantAtenTensorHandle which replaces RAIIATenTensorHandle. This class holds a raw AtenTensorHandle, and also owns a RAIIAtenTensorHandle if user decides to delegate memory management to AOTInductor. This is a prerequisite for user managed buffer, this PR, however only introduces this class and make sure it works with existing AOTInductor and has the default behavior identical as using RAIIAtenTensorHandle. Test Plan: Existing tests. No change should be introduced within this PR. Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/150275 Approved by: https://github.com/chenyang78, https://github.com/desertfire	2025-04-05 06:00:35 +00:00
Benjamin Glass	b160dda743	cpp_wrapper: reduce memory usage by removing unneeded temporaries (#147403 ) This PR contains a set of interrelated changes, listed below, with the upshot that compiled model memory usage in `cpp_wrapper` mode is now roughly equivalent to the default inductor mode. Changes: 1. Refactor `reinterpret_view` calls in `cpp_wrapper` to always return a temporary RAII tensor object, rather than saving off a "temporary" tensor handle that persisted through the end of the function. This matches the behavior of the base Python wrapper class, and is responsible for majority of the memory usage reductions. 2. Eliminate nearly all other cases where a "temporary" tensor handle was saved off (with the exception of one or two places where the tensor would immediately be destroyed by going out-of-scope). This necessitated some ugly-looking code to handle `Optional[Tensor]` and `Optional[Sequence[Any]]`, since `Optional` is passed by pointer into the C-shim functions (making passing temporary objects difficult). This code is justified by the fact that it only appears in controlled circumstances that we auto-generate, so there are minimal user-facing footguns. 3. Delete the list containing the input tensors to the `cpp_wrapper` main function after casting them to `AtenTensorHandle` objects, which have an internal reference count keeping them alive. The [TorchInductor benchmark](https://hud.pytorch.org/benchmark/compilers?dashboard=torchinductor&startTime=Sat%2C%2015%20Feb%202025%2018%3A38%3A08%20GMT&stopTime=Sat%2C%2022%20Feb%202025%2018%3A38%3A08%20GMT&granularity=hour&mode=inference&dtype=bfloat16&deviceName=cuda%20(a100)&lBranch=gh/benjaminglass1/73/head&lCommit=4d5edaf67e80ca9ca36d301af1ded13967a04790&rBranch=main&rCommit=e1bf892d9004a4dba0748d0eda5c3b4eced0ea70) I ran shows the increased memory compression. Differential Revision: [D70648897](https://our.internmc.facebook.com/intern/diff/D70648897) Pull Request resolved: https://github.com/pytorch/pytorch/pull/147403 Approved by: https://github.com/desertfire	2025-03-06 16:08:16 +00:00
Benjamin Glass	54adbbf6b8	cpp_wrapper: Add support for MemoryFormat arguments (#141367 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141367 Approved by: https://github.com/desertfire	2024-12-02 20:40:24 +00:00
Bin Bao	dcf22fa58c	[AOTI][refactor] Add sizes and strides util functions (#140449 ) Summary: Similar to https://github.com/pytorch/pytorch/pull/139895, add sizes and strides methods to RAIIAtenTensorHandle and ConstantHandle, to increase the code readability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140449 Approved by: https://github.com/chenyang78 ghstack dependencies: #140447, #140448	2024-11-14 16:48:43 +00:00
Bin Bao	80870f62f0	[AOTI][refactor] Switch remaining aoti_torch_get_data_ptr (#140448 ) Summary: https://github.com/pytorch/pytorch/pull/139895 added data_ptr(), but there is a remaining place in cpp_wrapper_gpu.py didn't switch over. Also moved a few AtenTensorHandle related utility functions from arrayref_tensor.h to utils.h. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140448 Approved by: https://github.com/chenyang78 ghstack dependencies: #140447	2024-11-14 01:40:59 +00:00
Bin Bao	d0ffd6d142	[AOTI] Add data_ptr to RAIIAtenTensorHandle (#139895 ) Summary: To increase the readbility of the generated code. This is not BC-breaking, because RAIIAtenTensorHandle is implemented as header-only. Differential Revision: [D65547216](https://our.internmc.facebook.com/intern/diff/D65547216) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139895 Approved by: https://github.com/chenyang78	2024-11-07 01:36:28 +00:00
cyy	ab912b7fef	[2/N] Fix clang-tidy warnings in inductor (#132040 ) Follows #131979 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132040 Approved by: https://github.com/Skylion007	2024-07-29 18:41:24 +00:00
Bin Bao	945946e817	[AOTI] Fix another ABI-compatible CPU issue (#131798 ) Summary: This problem is seen on AOTI CPU dashboard runs, a cpp compilation error because ConstantHandle::get doesn't exist. This PR adds ConstantHandle::get so that the interface is consistent with RAIIAtenTensorHandle. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131798 Approved by: https://github.com/zou3519, https://github.com/chenyang78 ghstack dependencies: #131791	2024-07-26 11:27:58 +00:00
Wu, Chunyuan	4a997de8b9	[AOTI] support freezing for MKLDNN (#124350 ) ## Description Fixes https://github.com/pytorch/pytorch/issues/114450. This PR builds upon the work from @imzhuhl done in https://github.com/pytorch/pytorch/pull/114451. This PR requires https://github.com/pytorch/pytorch/pull/122472 to land firstly. We leverage the serialization and deserialization API from oneDNN v3.4.1 to save the opaque MKLDNN tensor during the compilation and restore the opaque tensor when loading the compiled .so. ideep version is updated so that we won't break any pipeline even if third_party/ideep is not updated at the same time. ### Test plan: ```sh python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_freezing_non_abi_compatible_cpu python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_conv_freezing_non_abi_compatible_cpu python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_deconv_freezing_non_abi_compatible_cpu python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_linear_freezing_non_abi_compatible_cpu ``` ### TODOs in follow-up PRs 1. We found that using `AOTI_TORCH_CHECK` will cause performance drop on several models (`DistillGPT2`, `MBartForConditionalGeneration`, `T5ForConditionalGeneration`, `T5Small`) compared with JIT Inductor which uses `TORCH_CHECK`. This may need further discussion how to address (`AOTI_TORCH_CHECK` is introduced in https://github.com/pytorch/pytorch/pull/119220). 2. Freezing in non-ABI compatible mode will work with the support in this PR. While for ABI compatible mode, we need to firstly address this issue: `AssertionError: None, i.e. optional output is not supported`. `6c4f43f826/torch/_inductor/codegen/cpp_wrapper_cpu.py (L2023-L2024)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/124350 Approved by: https://github.com/jgong5, https://github.com/desertfire	2024-05-25 07:15:36 +00:00
PyTorch MergeBot	5ae9daa4a2	Revert "[AOTI] support freezing for MKLDNN (#124350 )" This reverts commit 654afb6f3ae3ddbd926a753f9af95a6f6e22131c. Reverted https://github.com/pytorch/pytorch/pull/124350 on behalf of https://github.com/clee2000 due to Seems to have broken inductor/test_aot_inductor.py::AOTInductorTestNonABICompatibleCpu::test_freezing_non_abi_compatible_cpu `654afb6f3a` https://github.com/pytorch/pytorch/actions/runs/9224838183/job/25382780192 ([comment](https://github.com/pytorch/pytorch/pull/124350#issuecomment-2129889809))	2024-05-24 16:03:07 +00:00
Wu, Chunyuan	654afb6f3a	[AOTI] support freezing for MKLDNN (#124350 ) ## Description Fixes https://github.com/pytorch/pytorch/issues/114450. This PR builds upon the work from @imzhuhl done in https://github.com/pytorch/pytorch/pull/114451. This PR requires https://github.com/pytorch/pytorch/pull/122472 to land firstly. We leverage the serialization and deserialization API from oneDNN v3.4.1 to save the opaque MKLDNN tensor during the compilation and restore the opaque tensor when loading the compiled .so. ideep version is updated so that we won't break any pipeline even if third_party/ideep is not updated at the same time. ### Test plan: ```sh python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_freezing_non_abi_compatible_cpu python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_conv_freezing_non_abi_compatible_cpu python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_deconv_freezing_non_abi_compatible_cpu python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_linear_freezing_non_abi_compatible_cpu ``` ### TODOs in follow-up PRs 1. We found that using `AOTI_TORCH_CHECK` will cause performance drop on several models (`DistillGPT2`, `MBartForConditionalGeneration`, `T5ForConditionalGeneration`, `T5Small`) compared with JIT Inductor which uses `TORCH_CHECK`. This may need further discussion how to address (`AOTI_TORCH_CHECK` is introduced in https://github.com/pytorch/pytorch/pull/119220). 2. Freezing in non-ABI compatible mode will work with the support in this PR. While for ABI compatible mode, we need to firstly address this issue: `AssertionError: None, i.e. optional output is not supported`. `6c4f43f826/torch/_inductor/codegen/cpp_wrapper_cpu.py (L2023-L2024)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/124350 Approved by: https://github.com/jgong5, https://github.com/desertfire	2024-05-24 13:34:04 +00:00
Bin Bao	40ec155e58	[AOTI][refactor] Split common aoti_runtime utils into a separate header (#119066 ) Summary: Split common utils from aoti_runtime/model.h into a separate header file, because when turning on ABI-compatible mode for JIT Inductor we won't need AOTInductorModel, but we do need some common utils, e.g. RAIIAtenTensorHandle. Differential Revision: [D53478809](https://our.internmc.facebook.com/intern/diff/D53478809) Pull Request resolved: https://github.com/pytorch/pytorch/pull/119066 Approved by: https://github.com/khabinov	2024-02-07 16:54:00 +00:00

19 Commits