[C10D] Document barrier interaction with device_id (#159389)

Addresses #159262

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159389
Approved by: https://github.com/malfet, https://github.com/H-Huang, https://github.com/kwen2501, https://github.com/fduwjj
This commit is contained in:
Will Constable
2025-07-31 15:53:45 -07:00
committed by PyTorch MergeBot
parent c0e0126399
commit dd22ba09b4

View File

@ -4780,6 +4780,11 @@ def barrier(
None, if not async_op or if not part of the group
.. note:: `ProcessGroupNCCL` now blocks the cpu thread till the completion of the barrier collective.
.. note:: `ProcessGroupNCCL` implements barrier as an all_reduce of a 1-element tensor. A device must be chosen
for allocating this tensor. The device choice is made by checking in this order (1) the first device passed to
`device_ids` arg of barrier if not None, (2) the device passed to init_process_group if not None, (3) the device
that was first used with this process group, if another collective with tensor inputs has been performed, (4)
the device index indicated by the global rank mod local device count.
"""
group = group or _get_default_group()