Files
pytorch/torch
Bruce Chang fa0db212e7 shrink_group implementation to expose ncclCommShrink API (#164518)
Closes #164529

To expose the new [ncclCommShrink](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/api/comms.html#ncclcommshrink) API to PyTorch.

This is useful when you need to exclude certain GPUs or nodes from a collective operation, for example in fault tolerance scenarios or when dynamically adjusting resource utilization.

For more info:  [Shrinking a communicator](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/usage/communicators.html#shrinking-a-communicator)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164518
Approved by: https://github.com/kwen2501
2025-10-19 18:00:08 +00:00
..
2025-10-18 18:51:52 +00:00
2025-10-18 18:51:52 +00:00
2025-10-18 07:36:18 +00:00
2025-10-15 22:49:15 +00:00
2025-10-10 17:23:31 +00:00
2025-10-18 18:51:52 +00:00
2025-10-18 05:44:14 +00:00
2025-10-14 14:22:54 +00:00
2025-10-08 02:30:57 +00:00
2025-10-13 01:48:55 +00:00
2025-10-18 05:44:14 +00:00