c10d/Store: add nonblocking mode to queue_pop (#151485)

This adds a non-blocking mode to queue_pop. This allows for workers to poll if work is ready without blocking the main loop. This is useful for the case where you want to have a GPU have maximum utilization when something only periodically is sent on the queue. We also expose a `torch.distributed.QueueEmptyError` so users can catch the error and handle it accordingly. Test plan: ``` pytest test/distributed/test_store.py -k queue -v -s -x ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/151485 Approved by: https://github.com/fduwjj, https://github.com/tianfengfrank
2025-10-20 21:14:14 +08:00 · 2025-04-18 02:14:47 +00:00
parent 3ed5f1fb77
commit 98c892749b
16 changed files with 64 additions and 23 deletions
--- a/torch/csrc/distributed/c10d/TCPStore.hpp
+++ b/torch/csrc/distributed/c10d/TCPStore.hpp
@ -117,7 +117,7 @@ class TORCH_API TCPStore : public Store {
  void queuePush(const std::string& key, const std::vector<uint8_t>& value)
      override;

-  std::vector<uint8_t> queuePop(const std::string& key) override;
+  std::vector<uint8_t> queuePop(const std::string& key, bool block) override;

  int64_t queueLen(const std::string& key) override;