c10d/Store: add clone feature (#150966)

This adds a new `clone()` method to Store which will return a new Store instance that can be used from a different thread.

This is intended to better support multiple threads with stores such as when ProcessGroupNCCL needs a store to do error propagation.

Related issue: https://github.com/pytorch/pytorch/issues/150943

Test plan:

```
pytest test/distributed/test_store.py -k PythonStore
pytest test/distributed/test_store.py -k clone
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150966
Approved by: https://github.com/fduwjj
This commit is contained in:
Tristan Rice
2025-04-10 01:41:47 +00:00
committed by PyTorch MergeBot
parent 061832bc7a
commit 205881ea4a
11 changed files with 69 additions and 0 deletions

View File

@ -77,6 +77,8 @@ class TORCH_API TCPStore : public Store {
~TCPStore() override;
c10::intrusive_ptr<Store> clone() override;
void set(const std::string& key, const std::vector<uint8_t>& value) override;
std::vector<uint8_t> compareSet(