Files
pytorch/torch/csrc/distributed/rpc/python_rpc_handler.h
Shen Li 2486b0ba82 Add Python RRef as args and return value (#25499)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25499

See #23110 for model parallel design details, and #26759 for the RRef
protocol. This commit add support for using RRef as Python UDF arguments
and return value. RRefs can now be shared from owner to user, from user to
owner, or from user to user.

Limitations:
1. No implicit type conversion yet. (#27099)
2. No failure handling and retry. (#26116)
3. UDF is not yet blocked until all RRefs are confirmed. (#27098)
4. Internal RRef control messages are not idempotent yet. (#26116)
5. Cannot delete RRefs correctly when there are circular dependencies. (#27096)

Main changes:

1. Added `SCRIPT_REMOTE_CALL` and `PYTHON_REMOTE_CALL` to `Message.h` to represent `dist.remote` invocations.
2. Added `SCRIPT_RREF_FETCH_CALL`, `PYTHON_RREF_FETCH_CALL`, `RREF_USER_ACCEPT`, `RREF_USER_DELETE`, `RREF_CHILD_ACCEPT`, and `RREF_FORK_REQUEST` to `Message.h` as internal RRef control messages.
3. New message request handling code is added to `functions.cpp`, and message format is added in `script_remote_call.h`, `python_remote_call.h`, and `rref_proto.h`.
4. Added a `PyRRef` type in `py_rref.h` and `py_rref.cpp` which holds a shared pointer to C++ `RRef` type. `PyRRef` wraps the C++ API and also implements RRef pickling and unpickling. RRef fork related control messages will be sent during RRef pickling/unpickling procedure.
5.  Update `RRef.h` and `RRef.cpp` accordingly to support `py::object` RRefs.
6. RRef context (reference count, etc.) are tracked in `rref_context.h` and `rref_context.cpp`.

Test Plan:
Imported from OSS

buck test mode/dev-nosan //caffe2/test:rpc_fork

Differential Revision: D17184146

Pulled By: mrshenli

fbshipit-source-id: a3a268efc087ac1ef489136ab957080382629265
2019-10-03 17:47:12 -07:00

54 lines
1.9 KiB
C++

#pragma once
#include <torch/csrc/distributed/rpc/message.h>
#include <torch/csrc/distributed/rpc/types.h>
#include <torch/csrc/utils/pybind.h>
namespace torch {
namespace distributed {
namespace rpc {
// Singleton class provides interface to execute python UDF remote call
// and deserialize the returned results by running python function
// in internal_rpc_utilities.
// The singleton object is constructed at first when RPC agent is
// constructed, where the python function in
// torch/distributed/internal_rpc_utils.py are imported only once.
class PYBIND11_EXPORT PythonRpcHandler {
public:
static PythonRpcHandler& getInstance();
// Execute python UDF, result is pickled to binary string
std::vector<char> generatePythonUDFResult(
const std::vector<char>& pickledPayload,
const std::vector<torch::Tensor>& requestTensorTable,
std::vector<torch::Tensor>& responseTensorTable);
// Returned python UDF result is pickled binary string, so run python
// function to unpickle the python UDF result and return py::object to user
py::object loadPythonUDFResult(
const std::vector<char>& pickledPayload,
const std::vector<torch::Tensor>& tensorTable);
// Run a pickled Python UDF and return the result py::object
py::object runPythonUDF(const SerializedPyObj& serializedObj);
// Serialized a py::object into a string
SerializedPyObj serialize(const py::object& obj);
// Deserialize a string into a py::object
py::object deserialize(const SerializedPyObj& serializedObj);
private:
PythonRpcHandler();
~PythonRpcHandler() = default;
PythonRpcHandler(const PythonRpcHandler&) = delete;
PythonRpcHandler& operator=(const PythonRpcHandler&) = delete;
PythonRpcHandler(PythonRpcHandler&&) = delete;
PythonRpcHandler& operator=(PythonRpcHandler&&) = delete;
py::object runUDFFunction_;
py::object loadResultFunction_;
py::object serializeFunction_;
};
} // namespace rpc
} // namespace distributed
} // namespace torch