mirror of
https://github.com/pytorch/pytorch.git
synced 2025-10-20 21:14:14 +08:00
[ROCm] Add conditions for channels last logic (#107812)
Although there are some performance benefits by enforcing NHWC convolutions as inductor's fallback method for all hardware this may not be the case. Currently on ROCm we are seeing some slow downs in gcnArch that do not have optimal NHWC implementations and would like to introduce some control on this behavior in pytorch. On ROCm MI200 series we will default to the enforced last channels behavior aligned with the rest of pytorch but on non-MI200 series we will disable the forced layout. For now we are using torch.cuda.get_device_name(0) for this control but we will replace with gcnArchName when https://github.com/pytorch/pytorch/pull/107477 lands. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107812 Approved by: https://github.com/jataylo, https://github.com/eellison
This commit is contained in:
committed by
PyTorch MergeBot
parent
40cbda274b
commit
1ef4bd169d
@ -243,6 +243,15 @@ class GraphLowering(torch.fx.Interpreter):
|
||||
if nconv == 0:
|
||||
return False
|
||||
|
||||
# Currently on ROCm we are seeing some slow downs in gcnArch that do not
|
||||
# have optimal NHWC implementations. On ROCm MI200 series we will
|
||||
# default to the enforced last channels behavior, but on non-MI200 series
|
||||
# we will disable the forced layout.
|
||||
if torch.version.hip and torch.cuda.is_available():
|
||||
gpu_name = torch.cuda.get_device_name(0)
|
||||
if not re.search(r"MI2\d\d", gpu_name):
|
||||
return False
|
||||
|
||||
# For cpu backend and mkldnn enabled, we always using channels_last for a better performance.
|
||||
if (
|
||||
all(
|
||||
|
Reference in New Issue
Block a user