[Triton] [Inductor] Restrict subprocess autotuning to just Triton (#162688)

Summary: Restricts subprocess benchmarking to only `TritonTemplateCaller`, which is expected by the underlying `target` method. THhis triggered a bug with large K shapes because the decompose k is `SubgraphChoiceCaller`. Test Plan: mm autotuning with a large k and `TORCHINDUCTOR_AUTOTUNE_IN_SUBPROC=1` Rollback Plan: Differential Revision: D82181924 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162688 Approved by: https://github.com/PaulZhang12, https://github.com/eellison, https://github.com/mlazos
2025-10-20 21:14:14 +08:00 · 2025-09-11 22:17:57 +00:00
parent 468c1f9e9d
commit 082d3dd9d5
1 changed files with 7 additions and 2 deletions
--- a/torch/_inductor/select_algorithm.py
+++ b/torch/_inductor/select_algorithm.py
@ -3032,8 +3032,13 @@ class AlgorithmSelectorCache(PersistentCache):

        # only benchmark triton kernel in sub process for now.
        # ATen/Extern kernel are still benchmarked in the current process.
-        extern = [c for c in choices if isinstance(c, ExternKernelCaller)]
-        triton = [c for c in choices if not isinstance(c, ExternKernelCaller)]
+        extern = []
+        triton = []
+        for c in choices:
+            if isinstance(c, TritonTemplateCaller):
+                triton.append(c)
+            else:
+                extern.append(c)

        timings = cls.benchmark_in_current_process(
            extern, input_nodes, layout, input_gen_fns, hint_override=hint_override