Revert "[inductor] Expand use of generic benchmark function (#164938)"

This reverts commit 5c583e2573f29243742e00b9fa36b266c5c78bb3. Reverted https://github.com/pytorch/pytorch/pull/164938 on behalf of https://github.com/clee2000 due to I think this broke test/inductor/test_cuda_repro.py::CudaReproTests::test_epilogue_fusion_with_view? [GH job link](https://github.com/pytorch/pytorch/actions/runs/18529735968/job/52813191763) [HUD commit link](f58f301313) on both rocm and the slow grad check for linux. It did run successfully on cuda workflow on trunk, I wonder if this a gpu capability thing? no clue though ([comment](https://github.com/pytorch/pytorch/pull/164938#issuecomment-3407600224))
2025-11-06 00:54:56 +08:00 · 2025-10-15 17:48:37 +00:00
parent 7c6c5d04fe
commit 84d141e910
10 changed files with 45 additions and 103 deletions
--- a/torch/_inductor/select_algorithm.py
+++ b/torch/_inductor/select_algorithm.py
@ -2671,10 +2671,8 @@ class AlgorithmSelectorCache(PersistentCache):

        # Templates selected with input_gen_fns require specific input data to avoid IMA
        # Passing custom input gen fns to benchmark_fusion NYI, so skip deferred template selection
-        # TODO(jgong5): support multi-template on CPU C++ backend
-        if input_gen_fns is not None or (
-            layout.device.type == "cpu" and config.cpu_backend != "triton"
-        ):
+        # TODO(jgong5): support multi-template on CPU
+        if input_gen_fns is not None or layout.device.type == "cpu":
            return_multi_template = False

        # TODO - assert that we have not mutating kernels here