[CI] Do not constrain memory for ROCm testing in CI (#156115)

Fixes ROCm OOMs introduced by https://github.com/pytorch/pytorch/pull/155631 Pull Request resolved: https://github.com/pytorch/pytorch/pull/156115 Approved by: https://github.com/jeffdaily
2025-10-21 05:34:18 +08:00 · 2025-06-17 15:30:36 +00:00
parent 7fcad0231c
commit 0079c80b35
1 changed files with 3 additions and 1 deletions
--- a/test/run_test.py
+++ b/test/run_test.py
@ -1855,7 +1855,9 @@ def run_tests(
            ):
                raise RuntimeError(failure.message + keep_going_message)

-        os.environ["NUM_PARALLEL_PROCS"] = str(NUM_PROCS)
+        # This is used later to constrain memory per proc on the GPU. On ROCm
+        # the number of procs is the number of GPUs, so we don't need to do this
+        os.environ["NUM_PARALLEL_PROCS"] = str(1 if torch.version.hip else NUM_PROCS)

        # See Note [ROCm parallel CI testing]
        pool = get_context("spawn").Pool(