Releases multicast object before releasing mapped buffers in CUDASymmetricMemory (#163750)

Fixes: https://github.com/pytorch/pytorch/issues/162429. In B200, cuMulticastUnbind can error if the mapped buffers are free'd before the multicast object is free'd. The only documentation I could find is here: e11d7f77c1/src/transport/nvls.cc (L113). Pull Request resolved: https://github.com/pytorch/pytorch/pull/163750 Approved by: https://github.com/ngimel, https://github.com/Skylion007, https://github.com/kwen2501, https://github.com/nWEIdia, https://github.com/cyyever ghstack dependencies: #163575
2025-10-21 13:44:15 +08:00 · 2025-09-30 17:57:10 -07:00
parent 4dab208d97
commit ed90040d33
2 changed files with 6 additions and 5 deletions
--- a/test/distributed/test_symmetric_memory.py
+++ b/test/distributed/test_symmetric_memory.py
@ -52,10 +52,6 @@ from torch.testing._internal.common_utils import (

 test_contexts = [nullcontext, _test_mode]

-# Set environment variable to disable multicast for all tests in this module
-# Workaround https://github.com/pytorch/pytorch/issues/162429
-os.environ["TORCH_SYMM_MEM_DISABLE_MULTICAST"] = "1"
-
 # So that tests are written in device-agnostic way
 device_type = "cuda"
 device_module = torch.get_device_module(device_type)