[ROCm] Add conditions for channels last logic (#107812)

Although there are some performance benefits by enforcing NHWC convolutions as inductor's fallback method for all hardware this may not be the case. Currently on ROCm we are seeing some slow downs in gcnArch that do not have optimal NHWC implementations and would like to introduce some control on this behavior in pytorch. On ROCm MI200 series we will default to the enforced last channels behavior aligned with the rest of pytorch but on non-MI200 series we will disable the forced layout. For now we are using torch.cuda.get_device_name(0) for this control but we will replace with gcnArchName when https://github.com/pytorch/pytorch/pull/107477 lands. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107812 Approved by: https://github.com/jataylo, https://github.com/eellison
2025-10-20 21:14:14 +08:00 · 2023-08-24 19:39:49 +00:00
parent 40cbda274b
commit 1ef4bd169d
1 changed files with 9 additions and 0 deletions
--- a/torch/_inductor/graph.py
+++ b/torch/_inductor/graph.py
@ -243,6 +243,15 @@ class GraphLowering(torch.fx.Interpreter):
        if nconv == 0:
            return False

+        # Currently on ROCm we are seeing some slow downs in gcnArch that do not
+        # have optimal NHWC implementations. On ROCm MI200 series we will
+        # default to the enforced last channels behavior, but on non-MI200 series
+        # we will disable the forced layout.
+        if torch.version.hip and torch.cuda.is_available():
+            gpu_name = torch.cuda.get_device_name(0)
+            if not re.search(r"MI2\d\d", gpu_name):
+                return False
+
        # For cpu backend and mkldnn enabled, we always using channels_last for a better performance.
        if (
            all(