pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
xinan.lin	b11d5cd584	[Inductor UT][Windows][XPU] Fix Inductor UT on XPU Windows. (#146481 ) This PR fixed all the inductor UT failures for XPU backend on Windows we found in local machine(Due to resource constraints, we have not yet set up a Windows CI pipeline online.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146481 Approved by: https://github.com/jansel, https://github.com/EikanWang ghstack dependencies: #147347	2025-02-22 02:53:16 +00:00
Aleksei Nikiforov	a82bab6419	Run only listed tests on s390x (#140265 ) Skip tests that are failing This was previously part of https://github.com/pytorch/pytorch/pull/125401 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140265 Approved by: https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2024-11-20 22:53:09 +00:00
CaoE	8cfe28e4e3	[Inductor] Pick ISA for inductor based on ATEN_CPU_CAPABILITY (#123514 ) It is part of https://github.com/pytorch/pytorch/issues/123224. Pick ISA based on the environment ATEN_CPU_CAPABILITY to control CPU vec ISA level for Inductor like eager. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123514 Approved by: https://github.com/jgong5, https://github.com/peterbell10	2024-10-17 09:06:57 +00:00
Huy Do	0b7ef196cd	Use filelock to build extension_device backend one at a time (#137930 ) Fixes https://github.com/pytorch/pytorch/issues/136125 Fixes https://github.com/pytorch/pytorch/issues/137026 Fixes https://github.com/pytorch/pytorch/issues/137027 The compilation fails during `setUpClass`, so disabling the test doesn't do nothing. The theory I have for this flaky issue is that `test_open_device_registration` from both `TritonExtensionBackendTests` and `ExtensionBackendTests` are run in parallel and cleaned up while the other is still in fly, causing flaky failure. Here is an example failure https://github.com/pytorch/pytorch/actions/runs/11331105492/job/31512603585#step:22:1710 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137930 Approved by: https://github.com/malfet	2024-10-15 17:46:28 +00:00
Jez Ng	c254901bdb	Have Triton custom extension test use privateuseone device (#137611 ) The original PR #122396 used the CPU device since at that point in time there was no actual Triton CPU backend. After #133408, this is no longer the case, so we now have multiple backends getting registered for the CPU. The test still works in OSS but fails internally due to different test runners initializing the backends in a different order. This PR doesn't actually end up fixing the test internally because cpp_extension -- needed to implement the privateuseone device -- isn't supported there, so we simply skip it instead. However, it still makes the OSS test independent of initialization order, which is good. Differential Revision: [D63838169](https://our.internmc.facebook.com/intern/diff/D63838169/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137611 Approved by: https://github.com/henrylhtsang	2024-10-11 21:27:29 +00:00
Simon Fan	b86269fab5	Unify cpp_extension build directory removal (#136059 ) Keeps existing default directory clearing logic, even though it fails when TORCH_EXTENSIONS_DIR is set. To properly clear, we'd need to track all the folders we compiled the extensions to. Pull Request resolved: https://github.com/pytorch/pytorch/pull/136059 Approved by: https://github.com/ezyang, https://github.com/albanD	2024-10-03 06:22:11 +00:00
PyTorch MergeBot	c13c7e11c5	Revert "[Inductor] Pick ISA for inductor based on ATEN_CPU_CAPABILITY (#123514 )" This reverts commit 6931c1644afdba53e63ce5671455e4e1b7265dd9. Reverted https://github.com/pytorch/pytorch/pull/123514 on behalf of https://github.com/huydhn due to Sorry for reverting your change but its test_cpu_repro test is failing in trunk `6931c1644a` ([comment](https://github.com/pytorch/pytorch/pull/123514#issuecomment-2383563919))	2024-09-30 15:47:04 +00:00
CaoE	6931c1644a	[Inductor] Pick ISA for inductor based on ATEN_CPU_CAPABILITY (#123514 ) It is part of https://github.com/pytorch/pytorch/issues/123224. Pick ISA based on the environment ATEN_CPU_CAPABILITY to control CPU vec ISA level for Inductor like eager. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123514 Approved by: https://github.com/jgong5, https://github.com/peterbell10	2024-09-30 00:53:18 +00:00
Aaron Orenstein	8c356ce3da	Fix lint errors in fbcode (#135614 ) Summary: Fixed a bunch of fbcode imports that happened to work but confused autodeps. After this autodeps still suggests "improvements" to TARGETS (which breaks our builds) but at least it can find all the imports. Test Plan: ``` fbpython fbcode/tools/build/buck/linters/lint_autoformat.py --linter=autodeps --default-exec-timeout=1800 -- fbcode/caffe2/TARGETS fbcode/caffe2/test/TARGETS ``` Before: ``` ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "test_export" (from caffe2/test/export/testing.py:229) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See https://fbur$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "testing" (from caffe2/test/export/test_export.py:87) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See https://fburl$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "test_export" (from caffe2/test/export/test_serdes.py:9) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See https://fb$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "testing" (from caffe2/test/export/test_serdes.py:10) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See https://fburl$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "testing" (from caffe2/test/export/test_retraceability.py:7) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See https:$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "test_export" (from caffe2/test/export/test_retraceability.py:6) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See ht$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "testing" (from caffe2/test/export/test_export_nonstrict.py:7) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See http$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "test_export" (from caffe2/test/export/test_export_nonstrict.py:6) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See $ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "test_export" (from caffe2/test/export/test_export_training_ir_to_run_decomp.py:8) when processing rule "test_export". Please make sure it's listed in the srcs parameter of an$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "testing" (from caffe2/test/export/test_export_training_ir_to_run_decomp.py:10) when processing rule "test_export". Please make sure it's listed in the srcs parameter of anoth$ ERROR while processing caffe2/test/TARGETS: Found "//python/typeshed_internal:typeshed_internal_library" owner for "cv2" but it is protected by visibility rules: [] (from caffe2/test/test_bundled_images.py:7) when processing rule "test_bundled_$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "caffe2.test.profiler_test_cpp_thread_lib" (from caffe2/test/profiler/test_cpp_thread.py:29) when processing rule "profiler_test_cpp_thread". Please make sure it's listed in t$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "torch._utils_internal.get_file_path_2" (from caffe2/test/test_custom_ops.py:23) when processing rule "custom_ops". Please make sure it's listed in the srcs parameter of anoth$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "torch._utils_internal.get_file_path_2" (from caffe2/test/test_public_bindings.py:13) when processing rule "public_bindings". Please make sure it's listed in the srcs paramete$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "torch._C._profiler.symbolize_tracebacks" (from caffe2/test/test_cuda.py:3348) when processing rule "test_cuda". Please make sure it's listed in the srcs parameter of another $ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "torch._C._profiler.gather_traceback" (from caffe2/test/test_cuda.py:3348) when processing rule "test_cuda". Please make sure it's listed in the srcs parameter of another rule$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for include <torch/csrc/autograd/profiler_kineto.h> (from caffe2/test/profiler/test_cpp_thread.cpp:2) when processing profiler_test_cpp_thread_lib. Some things to try: ``` Differential Revision: D62049222 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135614 Approved by: https://github.com/oulgen, https://github.com/laithsakka	2024-09-13 02:04:34 +00:00
Xuehai Pan	134bc4fc34	[BE][Easy][12/19] enforce style for empty lines in import segments in `test/i*/` (#129763 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129763 Approved by: https://github.com/jansel	2024-07-18 07:49:19 +00:00
PyTorch MergeBot	b732b52f1e	Revert "[BE][Easy][12/19] enforce style for empty lines in import segments in `test/i*/` (#129763 )" This reverts commit aecc746fccc4495313167e3a7f94210daf457e1d. Reverted https://github.com/pytorch/pytorch/pull/129763 on behalf of https://github.com/XuehaiPan due to need reland after rerunning lintrunner on main ([comment](https://github.com/pytorch/pytorch/pull/129763#issuecomment-2235736732))	2024-07-18 06:39:58 +00:00
Xuehai Pan	aecc746fcc	[BE][Easy][12/19] enforce style for empty lines in import segments in `test/i*/` (#129763 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129763 Approved by: https://github.com/jansel	2024-07-18 05:13:41 +00:00
Xu Han	76259ebfdd	[inductor] split cpu vec isa to dedicate file (keep git history) (#129789 ) This PR is the implemention of https://github.com/pytorch/pytorch/issues/124245#issuecomment-2197778902 plan 1 Changes: 1. Duplicate `codecache.py` to `cpu_vec_isa.py` with its `git history`. <img width="745" alt="image" src="https://github.com/pytorch/pytorch/assets/8433590/106533da-ce80-4825-8271-35ffb3141f92"> 2. Make `cpu_vec_isa.py` as dedicate file for CPU vec isa. It also good to extend for more archtectures and vec isa. 3. Update code for above changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129789 Approved by: https://github.com/jgong5, https://github.com/jansel	2024-07-02 05:29:05 +00:00
PyTorch MergeBot	19e17216a2	Revert "[inductor] split cpu vec isa to dedicate file (keep git history) (#129789 )" This reverts commit 58f346c874a8a982679b4d4f3876602cc05d66d4. Reverted https://github.com/pytorch/pytorch/pull/129789 on behalf of https://github.com/jeanschmidt due to Need to revert in order to revert https://github.com/pytorch/pytorch/pull/129577 ([comment](https://github.com/pytorch/pytorch/pull/129789#issuecomment-2200545144))	2024-07-01 16:08:44 +00:00
Xu Han	58f346c874	[inductor] split cpu vec isa to dedicate file (keep git history) (#129789 ) This PR is the implemention of https://github.com/pytorch/pytorch/issues/124245#issuecomment-2197778902 plan 1 Changes: 1. Duplicate `codecache.py` to `cpu_vec_isa.py` with its `git history`. <img width="745" alt="image" src="https://github.com/pytorch/pytorch/assets/8433590/106533da-ce80-4825-8271-35ffb3141f92"> 2. Make `cpu_vec_isa.py` as dedicate file for CPU vec isa. It also good to extend for more archtectures and vec isa. 3. Update code for above changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129789 Approved by: https://github.com/jgong5, https://github.com/jansel	2024-06-29 07:19:54 +00:00
PyTorch MergeBot	4bbadeee8a	Revert "Set simdlen based on ATEN_CPU_CAPABILITY (#123514 )" This reverts commit b66e3f0957b96b058c9b632ca60833d9717a9d8a. Reverted https://github.com/pytorch/pytorch/pull/123514 on behalf of https://github.com/clee2000 due to broke test/inductor/test_torchinductor.py::CpuTests::test_new_cpp_build_logical_cpu on periodic test on the no gpu tests `b66e3f0957` https://github.com/pytorch/pytorch/actions/runs/9453518547/job/26040077301 ([comment](https://github.com/pytorch/pytorch/pull/123514#issuecomment-2159433432))	2024-06-10 22:46:01 +00:00
CaoE	b66e3f0957	Set simdlen based on ATEN_CPU_CAPABILITY (#123514 ) It is part of https://github.com/pytorch/pytorch/issues/123224. Set simdlen based on the environment ATEN_CPU_CAPABILITY to control CPU vec ISA like eager. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123514 Approved by: https://github.com/jgong5, https://github.com/peterbell10	2024-06-10 09:02:14 +00:00
Jiong Gong	68a1f787c8	[inductor][cpp] move some common cpp utils to cpp_utils.py (#125152 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/125152 Approved by: https://github.com/desertfire, https://github.com/jansel	2024-05-06 04:30:30 +00:00
Alexander Grund	9dded148d0	Fix test_extension_backend on non-AVX systems (#117272 ) The test checks for a substring "loadu" in generated code. On AVX systems that line is: > auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + static_cast<long>(i0)) however on non-AVX systems it is > auto tmp0 = in_ptr0[static_cast<long>(i0)]; the difference depends on `codecache.valid_vec_isa_list()` being non-empty. See torch/_inductor/codegen/cpp.py:2639 Modify the test to account for that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/117272 Approved by: https://github.com/jgong5, https://github.com/jansel	2024-04-24 02:55:12 +00:00
sifengyang	46903d978b	fix maybe_initialize_device for custom device. (#121379 ) 1. fix maybe_initialize_device for custom device. @wanchaol @albanD @albanD I am very sorry that I have resubmitted a PR by new e-mail. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121379 Approved by: https://github.com/albanD	2024-04-09 16:58:52 +00:00
Wang, Eikan	f8eeae7aaa	Enable CPP wrapper codegen registration (#121296 ) Extend code gen registration for `CppWrapper`. W/ this PR, an new backend can register its specific `CppWrapper` at runtime. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121296 Approved by: https://github.com/jansel, https://github.com/desertfire	2024-03-26 06:51:03 +00:00
Matthew Haddock	50036ec781	[Inductor] Add a test for creating a cpu inductor-> triton backend (#122396 ) Summary: Currently there is a test for adding a backend in test/inductor/test_extension_backend.py for a cpp backend with a new device. However there is no such test for the Triton backend; it should be possible for a user to create and register your own ExtensionWrapperCodegen and ExtensionSchedulingfor another non-CUDA device and be able to generate Triton code. For simplicity I have chosen to use a CPU device, as I think it's plausible someone might want to create a CPU Triton backend. Unfortunately the generation and running of the code is quite tightly coupled so I've had to use a mocked function to extract the code before running. Suggestions are welcome for better ways to do this. This is a stepping off point for some additional PRs to make the Triton code path less CUDA specific, as currently there would be no way to test this avenue. Test plan: ``` frames [('total', 1), ('ok', 1)] stats [('calls_captured', 3), ('unique_graphs', 1)] inductor [('intermediate_hooks', 1)] aot_autograd [('total', 1), ('ok', 1)] . ---------------------------------------------------------------------- Ran 1 test in 0.394s ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/122396 Approved by: https://github.com/jansel	2024-03-23 01:14:57 +00:00
Sam Larsen	4cd503c1f3	Enable FX graph cache for a batch of inductor tests (#121696 ) Summary: Get more FX graph cache coverage by enabling it for these unit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/121696 Approved by: https://github.com/eellison	2024-03-14 03:39:59 +00:00
PyTorch MergeBot	6bffde99b0	Revert "[inductor] Move things into torch/testing/_internal/inductor_utils.py (#113275 )" This reverts commit 66d09f82170c528698b5ec606ba7838268ae1f8a. Reverted https://github.com/pytorch/pytorch/pull/113275 on behalf of https://github.com/huydhn due to Sorry for reverting your stack, but it is failing to list test internally with buck2 ([comment](https://github.com/pytorch/pytorch/pull/113275#issuecomment-1811666004))	2023-11-15 01:44:26 +00:00
PyTorch MergeBot	1e60174891	Revert "[dynamo] Add run_inductor_tests entrypoint (#113278 )" This reverts commit b00311ce9e430cf1b98d2103e21ed2179450a424. Reverted https://github.com/pytorch/pytorch/pull/113278 on behalf of https://github.com/huydhn due to Sorry for reverting your stack, but it is failing to list test internally with buck2 ([comment](https://github.com/pytorch/pytorch/pull/113278#issuecomment-1811646325))	2023-11-15 01:19:48 +00:00
Jason Ansel	b00311ce9e	[dynamo] Add run_inductor_tests entrypoint (#113278 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113278 Approved by: https://github.com/yanboliang	2023-11-11 08:54:43 +00:00
Jason Ansel	66d09f8217	[inductor] Move things into torch/testing/_internal/inductor_utils.py (#113275 ) This PR is just moving things around, so code shared by multiple tests files is in torch/testing/_internal/inductor_utils.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113275 Approved by: https://github.com/yanboliang ghstack dependencies: #113242	2023-11-11 03:17:35 +00:00
PyTorch MergeBot	68bf0f1e7d	Revert "[inductor] Move things into torch/testing/_internal/inductor_utils.py (#113275 )" This reverts commit c967dc526a40f4b15003f9c99383acabe66367a6. Reverted https://github.com/pytorch/pytorch/pull/113275 on behalf of https://github.com/PaliC due to the diff this is stacked on top of appears to be causing inductor failures internally ([comment](https://github.com/pytorch/pytorch/pull/113275#issuecomment-1805131017))	2023-11-10 05:40:55 +00:00
Jason Ansel	c967dc526a	[inductor] Move things into torch/testing/_internal/inductor_utils.py (#113275 ) This PR is just moving things around, so code shared by multiple tests files is in torch/testing/_internal/inductor_utils.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113275 Approved by: https://github.com/yanboliang	2023-11-10 00:11:09 +00:00
Jez Ng	ae85ba820f	[inductor] Memory planning (#112178 ) This was originally @jansel's PR: https://github.com/pytorch/pytorch/pull/102625, which I've built upon. This diff implements static memory planning. It's disabled by default while we examine its performance. We use a greedy-by-size approach. For dynamic shapes, the sizes of the example inputs are used as estimates when making planning decisions. We generate expressions to calculate the actual memory offsets and sizes at runtime when the values of the dynamic shapes are known. In order to simplify these calculations, we have organized the allocations into a tree that branches on space (address offsets) and time (live ranges). Finally, we need to align these offsets, so we have added an `align` sympy Expr to express these calculations. Some limitations: 1. It is only enabled during inference for now. Enabling it for training increases peak memory usage as we allocate all the memory needed for training upfront, before freeing the memory allocated during inference. We can probably address this by doing planning for both the inference and training passes together. 2. It doesn't work with PyTorch Distributed, because kernels like AllGatherIntoTensor codegen strings which do memory operations. We can fix this down the line by having them emit MemoryPlanningLines instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112178 Approved by: https://github.com/desertfire, https://github.com/jansel	2023-11-02 07:39:13 +00:00
PyTorch MergeBot	74e6c877e9	Revert "[inductor] Memory planning (#112178 )" This reverts commit f64a97c6f88873363c5b3c4c33f231b5578085b2. Reverted https://github.com/pytorch/pytorch/pull/112178 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it seems that ROCm will need to be fixed for the new test too `f64a97c6f8` ([comment](https://github.com/pytorch/pytorch/pull/112178#issuecomment-1788195311))	2023-11-01 00:03:56 +00:00
Jez Ng	f64a97c6f8	[inductor] Memory planning (#112178 ) This was originally @jansel's PR: https://github.com/pytorch/pytorch/pull/102625, which I've built upon. This diff implements static memory planning. It's disabled by default while we examine its performance. We use a greedy-by-size approach. For dynamic shapes, the sizes of the example inputs are used as estimates when making planning decisions. We generate expressions to calculate the actual memory offsets and sizes at runtime when the values of the dynamic shapes are known. In order to simplify these calculations, we have organized the allocations into a tree that branches on space (address offsets) and time (live ranges). Finally, we need to align these offsets, so we have added an `align` sympy Expr to express these calculations. Some limitations: 1. It is only enabled during inference for now. Enabling it for training increases peak memory usage as we allocate all the memory needed for training upfront, before freeing the memory allocated during inference. We can probably address this by doing planning for both the inference and training passes together. 2. It doesn't work with PyTorch Distributed, because kernels like AllGatherIntoTensor codegen strings which do memory operations. We can fix this down the line by having them emit MemoryPlanningLines instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112178 Approved by: https://github.com/desertfire, https://github.com/jansel	2023-10-31 20:02:30 +00:00
David Berard	4ed4753ac3	[inductor][easy] skip test_extension_backend.py in fbcode (#111591 ) Summary: It's currently failing. We should skip it in fbcode because cpp extensions don't work right now. Differential Revision: D48852412 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111591 Approved by: https://github.com/desertfire	2023-10-23 22:37:13 +00:00
Michael Lazos	2bb59a9ac6	Fix vscode test discovery (#107404 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/107404 Approved by: https://github.com/wconstab	2023-08-18 03:50:46 +00:00
Wang, Eikan	9921b48558	Extend Inductor to support the third-party backend (#106874 ) ## Summary This is re-land PR for https://github.com/pytorch/pytorch/pull/100706 to address the compilation latency performance regression. ## Root Cause Regarding the C++/OpenMP backend, `codecache.pick_vec_isa()` to check vectorization ISA is a time-consuming and one-shot operation. It leads to taking a longer time to import `codegen.cpp` package because the `LoopLevel` of the package is decorated by `@dataclasses.dataclass` while the decorator will invoke `codecache.pick_vec_isa()` to initialize the `simd_nelements` of the `LoopLevel`. `c14cf312c9/torch/_inductor/codegen/cpp.py (L2883C53-L2883C53)` In terms of the Triton backend, it does not need to touch it. But we'd prefer to uniform the code. Therefore, the new design simultaneously registers `CpuScheduling` for CPU and `TritonScheduling` for Triton regardless of whether the current backend is Triton. It will bring additional overhead to the Triton backend. ```python def init_backend_registration(self): if get_scheduling_for_device("cpu") is None: from .codegen.cpp import CppScheduling register_backend_for_device("cpu", CppScheduling, WrapperCodeGen) if get_scheduling_for_device("cuda") is None: from .codegen.triton import TritonScheduling register_backend_for_device("cuda", TritonScheduling, WrapperCodeGen) ``` ## Solution To resolve the compilation latency regression for the Triton backend, we changed the `LoopLevel` a little bit([new code changes](https://github.com/pytorch/pytorch/pull/106874/files#diff-5ab7b0235e2076a5fc6629ba0b109208940f5b94f5c13babc3e0f87cf4fcec82R2893-R2904)) by moving the `simd_nelements` to `__post_init__` and the compilation performance would be back. ## Compilation Latency Performance Result We ran a single model benchmark and reproduced the compilation regression: - Run `python benchmarks/dynamo/torchbench.py -dcuda --training --performance --inductor --only hf_Bart` - W/ PR #100706, the compilation latency is about 57~58 ``` dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks cuda,hf_Bart,4,1.556712,109.676554,57.055242,0.936330,5.760698,6.152422,642,1,8,7 cuda,hf_Bart,4,1.646658,109.621747,57.909817,0.936330,5.760698,6.152422,642,1,8,7 ``` - W/O PR #100706, the compilation latency is about 46~47 ``` dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks cuda,hf_Bart,4,1.599065,108.702480,47.490346,0.936330,5.760698,6.152422,642,1,8,7 cuda,hf_Bart,4,1.588419,108.431411,46.983041,0.936330,5.760698,6.152422,642,1,8,7 ``` This PR fixed the compilation performance regression. - W/ this PR #106874, the compilation latency is about 47~48 ``` dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks cuda,hf_Bart,4,1.586261,108.149467,47.481058,0.936330,5.760698,6.152422,642,1,8,7 cuda,hf_Bart,4,1.758915,108.613899,47.925633,0.936330,5.760698,6.152422,642,1,8,7 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/106874 Approved by: https://github.com/jansel	2023-08-16 04:11:36 +00:00
Yanbo Liang	1819fe1324	Revert "Extend Inductor to support the third-party backend (#100706 )" (#106652 ) This reverts commit 05bd24bb3548105776cf73226927cbd0ed575c55. It caused compilation time regression on torchbench, huggingface and dynamic models. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106652 Approved by: https://github.com/davidberard98, https://github.com/voznesenskym	2023-08-05 06:41:08 +00:00
Michael Lazos	83e36fe127	Fix vscode test discovery (#106490 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/106490 Approved by: https://github.com/eellison	2023-08-03 09:02:22 +00:00
Wang, Eikan	05bd24bb35	Extend Inductor to support the third-party backend (#100706 ) This PR intends to extend Inductor to support the third-party backend that only focuses on the code generation just like what C++/OpenMP and Triton backend have done. Currently, the generated code by Inductor contains two major parts. One is the kernel, and the other is the Python wrapper to glue the kernel. Therefore, the third-party backend needs to customize the two parts to generate its specific code. - Python wrapper code generation Inductor provides a `WrapperCodeGen` class to generate the Python wrapper code to glue the kernel. Therefore, it is straightforward for the third-party backend to generate the backend-specific Python wrapper code. It just needs to inherit the `WrapperCodeGen` class and purposely override the particular member functions. - Kernel code generation It is driven by different `Scheduling`. Hence, the third-party backend needs to provide a custom `Scheduling` for its specific kernel code generation. Currently, `CppScheduling` and `TritonScheduling` are for C++/OpenMP and Triton backend, respectively. But there is no common `Scheduling` class. Based on the scheduling invocation, this PR abstracts a common `Scheduling` class containing the following member functions. - [group_fn](`71c4becda7/torch/_inductor/scheduler.py (LL649C64-L649C64)`) - [flush](`71c4becda7/torch/_inductor/scheduler.py (L1150)`) - [can_fuse_vertical](`71c4becda7/torch/_inductor/scheduler.py (L1006)`) - [can_fuse_horizontal](`71c4becda7/torch/_inductor/scheduler.py (LL1008C45-L1008C64)`) - [codegen_template](`71c4becda7/torch/_inductor/scheduler.py (L1234)`) _This function is only available for triton. If the third-party backend behaves as a sub-class of `TritonScheduling`, it can override it or reuse it._ - [codegen_nodes](`71c4becda7/torch/_inductor/scheduler.py (L1234)`) - [codegen_sync](`71c4becda7/torch/_inductor/scheduler.py (LL1251C1-L1251C1)`). _This function is only available for triton debug purpose. But it might also be useful for other computation devices. Therefore, we'd prefer to keep this function._ The third-party backend needs to inherit from the `Scheduling` class and implement these functions. Regarding some other classes like `CppKernel` and `TritonKernel` for code generation, they are used by or part of the logic of either `Scheduling` or `WrapperCodeGen`. Hence, this PR does not define the interface and leaves the flexibility to the third-party backend. The third-party backend can decide to implement these classes from scratch or reuse them by inheriting and overriding them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100706 Approved by: https://github.com/jansel	2023-08-02 05:13:51 +00:00

38 Commits