mirror of
https://github.com/pytorch/pytorch.git
synced 2025-10-20 21:14:14 +08:00
[RPC] Support timeout for RRef proxy functions (#50499)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50499 Adds a timeout API to the following functions: ``` rref.rpc_sync() rref.rpc_async() rref.remote() ``` so that RPCs initiated by these proxy calls can be appropriately timed out similar to the regular RPC APIs. Timeouts are supported in the following use cases: 1. rpc.remote finishes in time and successfully, but function run by rref.rpc_async() is slow and times out. Timeout error will be raised 2. rref.rpc_async() function is fast, but rpc.remote() is slow/hanging. Then when rref.rpc_async() is called, it will still timeout with the passed in timeout (and won't block for the rpc.remote() to succeed, which is what happens currently). Although, the timeout will occur during the future creation itself (and not the wait) since it calls `rref._get_type` which blocks. We can consider making this nonblocking by modifying rref._get_type to return a future, although that is likely a larger change. Test Plan: Added UT Reviewed By: wanchaol Differential Revision: D25897495 fbshipit-source-id: f9ad5b8f75121f50537677056a5ab16cf262847e
This commit is contained in:
committed by
Facebook GitHub Bot
parent
ab1ba8f433
commit
d64184ef4c
@ -228,20 +228,22 @@ std::string PyRRef::str() const {
|
||||
}
|
||||
}
|
||||
|
||||
py::object PyRRef::createRRefProxy(const RRefProxyType& type) const {
|
||||
py::object PyRRef::createRRefProxy(
|
||||
const RRefProxyType& type,
|
||||
float timeoutSeconds) const {
|
||||
auto& pythonRpcHandler = PythonRpcHandler::getInstance();
|
||||
pybind11::gil_scoped_acquire ag;
|
||||
auto& functions = pythonRpcHandler.getRRefProxyFunctions();
|
||||
auto& ctor = functions.rrefProxyCtor_;
|
||||
switch (type) {
|
||||
case RRefProxyType::RPC_SYNC: {
|
||||
return ctor(*this, functions.rpcSync_);
|
||||
return ctor(*this, functions.rpcSync_, timeoutSeconds);
|
||||
}
|
||||
case RRefProxyType::RPC_ASYNC: {
|
||||
return ctor(*this, functions.rpcAsync_);
|
||||
return ctor(*this, functions.rpcAsync_, timeoutSeconds);
|
||||
}
|
||||
case RRefProxyType::REMOTE: {
|
||||
return ctor(*this, functions.remote_);
|
||||
return ctor(*this, functions.remote_, timeoutSeconds);
|
||||
}
|
||||
default: {
|
||||
TORCH_INTERNAL_ASSERT(false, "Unrecognized RRefProxy type ", type);
|
||||
|
Reference in New Issue
Block a user