[rref] Handle exceptions returned via remote() calls (#35331)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35331

When the function called by remote() throws, it seems sensible to
surface that exeption when rref.to_here() is called.

Doing this only involves simple modifications:
 - we need the OwnerRRef to keep around an optional<string>
   for the error
 - add an OwnerRRef setError() method that's parallel to setValue(),
   and plumb through the logic

We add rpc_tests to verify that the exception is propagated properly.
ghstack-source-id: 101136900

Test Plan:
buck test mode/dev-nosan caffe2/test/distributed/rpc:rpc_spawn
  buck test mode/dev-nosan caffe2/test/distributed/rpc/jit:rpc_spawn

Differential Revision: D20634078

fbshipit-source-id: b5b13fdb85cdf6a43f42347d82eabae1635368ec
This commit is contained in:
Jeremy Lilley
2020-03-31 10:04:20 -07:00
committed by Facebook GitHub Bot
parent b4c4342747
commit f182b43760
9 changed files with 131 additions and 44 deletions

View File

@ -338,22 +338,25 @@ class TORCH_API OwnerRRef final : public RRef {
// Get a constant reference of the real value. This method will block if the
// value is not ready. This method does not need GIL as it does not create
// any new py::object.
// any new py::object. It will throw if there is an error.
const IValue& getValue() const;
// Set the value of this ``OwnerRRef``. This method does not need GIL as it
// does not create any new py::object.
void setValue(IValue&& value);
// Sets the value of this ``OwnerRRef`` to contain an exception.
void setError(const std::string& err);
// Has a value been set?
// Has a value or error been set?
bool hasValue() const;
// Gets a future that is satisfied when the value is set.
// Gets a future that is satisfied when the value or error is set.
std::shared_ptr<FutureMessage> getFuture();
private:
friend class RRefContext;
c10::optional<IValue> value_;
c10::optional<std::string> error_;
mutable std::mutex mutex_;
mutable std::condition_variable valueCV_;
std::shared_ptr<FutureMessage> future_;