pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-21 05:34:18 +08:00

Author	SHA1	Message	Date
karthickai	fa4d5e76ea	[Inductor] Fix ComboKernels failing due to missing helper functions (#162759 ) Fixes: #162756 Differential Revision: [D82257359](https://our.internmc.facebook.com/intern/diff/D82257359) Pull Request resolved: https://github.com/pytorch/pytorch/pull/162759 Approved by: https://github.com/eellison, https://github.com/mlazos	2025-09-12 20:01:06 +00:00
ghostspiders	af10f1f86c	Fix requires_cuda to requires_cuda_and_triton (#160222 ) Fixes ##159399 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160222 Approved by: https://github.com/janeyx99	2025-08-10 07:05:52 +00:00
gaoyvfeng	50f23ff6f8	rename-HAS_CUDA-to-HAS_CUDA_AND_TRITON (#159883 ) Fixes #159399 "Modified torch.testing._internal.inductor_utils and test/inductor" Pull Request resolved: https://github.com/pytorch/pytorch/pull/159883 Approved by: https://github.com/janeyx99	2025-08-08 15:44:52 +00:00
Feng Shi	b8c2d4c259	add a corner test case of dynamic sizes for combo kernel (#156035 ) Summary: Added a unit test case for a corner case of combo kernel where all below are true: 1. more than 1 dimensions are dynamic size 2. no_x_dim presistent reduce op Test Plan: ``` buck2 test mode/opt caffe2/test/inductor:combo_kernels -- test_dynamic_shapes_persistent_reduction_no_x_dim_2 ``` Rollback Plan: Differential Revision: D76699002 Pull Request resolved: https://github.com/pytorch/pytorch/pull/156035 Approved by: https://github.com/mlazos	2025-06-18 22:57:09 +00:00
Aaron Orenstein	8c356ce3da	Fix lint errors in fbcode (#135614 ) Summary: Fixed a bunch of fbcode imports that happened to work but confused autodeps. After this autodeps still suggests "improvements" to TARGETS (which breaks our builds) but at least it can find all the imports. Test Plan: ``` fbpython fbcode/tools/build/buck/linters/lint_autoformat.py --linter=autodeps --default-exec-timeout=1800 -- fbcode/caffe2/TARGETS fbcode/caffe2/test/TARGETS ``` Before: ``` ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "test_export" (from caffe2/test/export/testing.py:229) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See https://fbur$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "testing" (from caffe2/test/export/test_export.py:87) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See https://fburl$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "test_export" (from caffe2/test/export/test_serdes.py:9) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See https://fb$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "testing" (from caffe2/test/export/test_serdes.py:10) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See https://fburl$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "testing" (from caffe2/test/export/test_retraceability.py:7) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See https:$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "test_export" (from caffe2/test/export/test_retraceability.py:6) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See ht$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "testing" (from caffe2/test/export/test_export_nonstrict.py:7) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See http$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "test_export" (from caffe2/test/export/test_export_nonstrict.py:6) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See $ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "test_export" (from caffe2/test/export/test_export_training_ir_to_run_decomp.py:8) when processing rule "test_export". Please make sure it's listed in the srcs parameter of an$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "testing" (from caffe2/test/export/test_export_training_ir_to_run_decomp.py:10) when processing rule "test_export". Please make sure it's listed in the srcs parameter of anoth$ ERROR while processing caffe2/test/TARGETS: Found "//python/typeshed_internal:typeshed_internal_library" owner for "cv2" but it is protected by visibility rules: [] (from caffe2/test/test_bundled_images.py:7) when processing rule "test_bundled_$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "caffe2.test.profiler_test_cpp_thread_lib" (from caffe2/test/profiler/test_cpp_thread.py:29) when processing rule "profiler_test_cpp_thread". Please make sure it's listed in t$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "torch._utils_internal.get_file_path_2" (from caffe2/test/test_custom_ops.py:23) when processing rule "custom_ops". Please make sure it's listed in the srcs parameter of anoth$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "torch._utils_internal.get_file_path_2" (from caffe2/test/test_public_bindings.py:13) when processing rule "public_bindings". Please make sure it's listed in the srcs paramete$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "torch._C._profiler.symbolize_tracebacks" (from caffe2/test/test_cuda.py:3348) when processing rule "test_cuda". Please make sure it's listed in the srcs parameter of another $ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "torch._C._profiler.gather_traceback" (from caffe2/test/test_cuda.py:3348) when processing rule "test_cuda". Please make sure it's listed in the srcs parameter of another rule$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for include <torch/csrc/autograd/profiler_kineto.h> (from caffe2/test/profiler/test_cpp_thread.cpp:2) when processing profiler_test_cpp_thread_lib. Some things to try: ``` Differential Revision: D62049222 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135614 Approved by: https://github.com/oulgen, https://github.com/laithsakka	2024-09-13 02:04:34 +00:00
Sam Larsen	4f9f1775d8	Fix flaky TestCudaWrapper.test_randint_cuda_cuda_wrapper (#135370 ) Summary: This test is flaky when run after `test_dynamic_shapes_persistent_reduction_mixed_x_dim_cuda_cuda_wrapper` because the TestCase sets config options globally in its setUp() that stick around for subsequent tests. For test isolation, we use a contextlib.ExitStack pattern in other tests to patch the config options and restore them in tearDown(). Update all TestCases in `test/inductor/test_combo_kernels.py` to use that pattern. Test Plan: ``` python test/inductor/test_combo_kernels.py python test/inductor/test_cuda_cpp_wrapper.py TestCudaWrapper.test_dynamic_shapes_persistent_reduction_mixed_x_dim_cuda_cuda_wrapper TestCudaWrapper.test_randint_cuda_cuda_wrapper ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/135370 Approved by: https://github.com/jansel	2024-09-10 18:43:14 +00:00
qchip	f6398eb0fa	dynamic shapes for combo_kenel/foreach_kernel (#134477 ) This PR add dynamic shapes support to foreach and combo kernels for horizontal fusion. A flag `combo_kernel_foreach_dynamic_shapes` (default False to avoid disturb production workflows) is added to _inductor/config.py. Setting it to True enables automatic dynamic shapes for foreach kernels. It is always enabled for combo kernels cases. Added unit cases. This PR also fixes a flaky test case for [T198833257](https://www.internalfb.com/intern/tasks/?t=198833257) Pull Request resolved: https://github.com/pytorch/pytorch/pull/134477 Approved by: https://github.com/mlazos	2024-08-30 19:58:20 +00:00
Feng Shi	19416bf38b	Reland "[2/2] PT2 Inductor ComboKernels - automatic horizontal fusing (#131675 )" (#133291 ) Reland by reverting commit 844103197d3e8cf6b4b59176e473365113f4f962. #131675 failed a few internal tests because it imported a diff version which wasn't rebased on the proper dependent diffs. Reland from OSS only to avoid the out-of-sync issue. Original description from #131675 Summary: A ComboKernel combines independent Inductor Triton kernels into a single one. This is part 2 pull request which 1) adds automatic horizontal fusion in the end of the inductor operator fusion process, 2) adds type annotation for trition_combo_kernel.py ComboKernel is used in two cases: 1) for existing foreach kernels, combo kernels are used as the backend kernel. the front-end kernel generation logic remains the same. 2) Added an extra optimization phase to the end of the scheduler to generate extra combo kernels if combo_kernels is True in config.py This is part 2 pull request which deals with the 2nd case above: The combo kernel generation in the added optimization phase is done in two steps: 1) in the front end inside the scheduler, it topologically sort the schedule nodes to find all the nodes with no data dependency and create a frond end schedule node for them. We currently limit the maximal number of sub-nodes for each combo kernel to 8 (but we still need to find what is the optimal number). 2) then, these sub-nodes are combined in the codegen phase to generate the combo kernel code for them based on a few rules. For example, 1d and 2d kernels are separated into different combo kernels, as mixing them is not supported yet. Note these algorithms we provide are very basic, and the users can register their customized combo kernel generation algorithms for both steps. Performance wise, combining small kernels is about always to see performance gain. however, combining very large kernels may not see any perf gain, sometimes even regression possibly due to improper block sizes. Thus, a benchmark function is implemented to avoid such perf regression, and it is recommended to turn it on by setting benchmark_combo_kernels to True whenever combo_kernels is True. Please refer to part 1 pull request https://github.com/pytorch/pytorch/pull/124969 for more details. Test Plan: buck2 test mode/dev-nosan caffe2/test/inductor:combo_kernels Pull Request resolved: https://github.com/pytorch/pytorch/pull/133291 Approved by: https://github.com/wdvr	2024-08-13 18:18:12 +00:00
PyTorch MergeBot	844103197d	Revert "[2/2] PT2 Inductor ComboKernels - automatic horizontal fusing (#131675 )" This reverts commit bb6eef8ed1de0eb48bde10a07da57b6acc82fb05. Reverted https://github.com/pytorch/pytorch/pull/131675 on behalf of https://github.com/fbgheith due to breaking internal tests ([comment](https://github.com/pytorch/pytorch/pull/131675#issuecomment-2285069508))	2024-08-12 23:31:16 +00:00
Feng Shi	bb6eef8ed1	[2/2] PT2 Inductor ComboKernels - automatic horizontal fusing (#131675 ) Summary: A ComboKernel combines independent Inductor Triton kernels into a single one. This is part 2 pull request which 1) adds automatic horizontal fusion in the end of the inductor operator fusion process, 2) adds type annotation for trition_combo_kernel.py ComboKernel is used in two cases: 1) for existing foreach kernels, combo kernels are used as the backend kernel. the front-end kernel generation logic remains the same. 2) Added an extra optimization phase to the end of the scheduler to generate extra combo kernels if combo_kernels is True in config.py This is part 2 pull request which deals with the 2nd case above: - The combo kernel generation in the added optimization phase is done in two steps: 1) in the front end inside the scheduler, it topologically sort the schedule nodes to find all the nodes with no data dependency and create a frond end schedule node for them. We currently limit the maximal number of sub-nodes for each combo kernel to 8 (but we still need to find what is the optimal number). 2) then, these sub-nodes are combined in the codegen phase to generate the combo kernel code for them based on a few rules. For example, 1d and 2d kernels are separated into different combo kernels, as mixing them is not supported yet. Note these algorithms we provide are very basic, and the users can register their customized combo kernel generation algorithms for both steps. - Performance wise, combining small kernels is about always to see performance gain. however, combining very large kernels may not see any perf gain, sometimes even regression possibly due to improper block sizes. Thus, a benchmark function is implemented to avoid such perf regression, and it is recommended to turn it on by setting benchmark_combo_kernels to True whenever combo_kernels is True. Please refer to part 1 pull request https://github.com/pytorch/pytorch/pull/124969 for more details. Test Plan: buck2 test mode/dev-nosan caffe2/test/inductor:combo_kernels Differential Revision: D60067757 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131675 Approved by: https://github.com/mlazos	2024-08-09 03:14:16 +00:00

10 Commits