[ROCm] Add conditions for channels last logic (#107812)

Although there are some performance benefits by enforcing NHWC convolutions as inductor's fallback method for all hardware this may not be the case. Currently on ROCm we are seeing some slow downs in gcnArch that do not have optimal NHWC implementations and would like to introduce some control on this behavior in pytorch. On ROCm MI200 series we will default to the enforced last channels behavior aligned with the rest of pytorch but on non-MI200 series we will disable the forced layout.

For now we are using torch.cuda.get_device_name(0) for this control but we will replace with gcnArchName when https://github.com/pytorch/pytorch/pull/107477 lands.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107812
Approved by: https://github.com/jataylo, https://github.com/eellison
This commit is contained in:
Prachi Gupta
2023-08-24 19:39:49 +00:00
committed by PyTorch MergeBot
parent 40cbda274b
commit 1ef4bd169d

View File

@ -243,6 +243,15 @@ class GraphLowering(torch.fx.Interpreter):
if nconv == 0:
return False
# Currently on ROCm we are seeing some slow downs in gcnArch that do not
# have optimal NHWC implementations. On ROCm MI200 series we will
# default to the enforced last channels behavior, but on non-MI200 series
# we will disable the forced layout.
if torch.version.hip and torch.cuda.is_available():
gpu_name = torch.cuda.get_device_name(0)
if not re.search(r"MI2\d\d", gpu_name):
return False
# For cpu backend and mkldnn enabled, we always using channels_last for a better performance.
if (
all(