mirror of
https://github.com/pytorch/pytorch.git
synced 2025-11-11 22:34:53 +08:00
This updates the gloo submodule in PyTorch to a version that supports the new ibverbs backend that can be used with PyTorch.
Test plan:
```
sudo dnf install rdma-core-devel
USE_GLOO_IBVERBS=ON python setup.py develop
torchrun --nproc_per_node 2 ~/scripts/gloo_ibverbs_test.py
```
```py
"""
run with:
torchrun --nproc_per_node 2 ~/scripts/gloo_ibverbs_test.py
"""
import os
os.environ["GLOO_DEVICE_TRANSPORT"] = "IBVERBS"
import torch
import torch.distributed as dist
dist.init_process_group("gloo")
rank = dist.get_rank()
if rank == 0:
device = "cpu"
else:
device = "cuda"
print(device)
t = torch.full((10, 100), fill_value=(rank+1), device=device)
target = torch.full((10, 100), fill_value=3, device=device)
dist.all_reduce(t)
torch.testing.assert_close(t, target)
t = torch.full((10, 100), fill_value=(rank+1), device=device)
if rank == 0:
dist.send(t, dst=1)
else:
dist.recv(t, src=0)
torch.testing.assert_close(t, torch.full_like(t, 1))
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153425
Approved by: https://github.com/fduwjj