Commit Graph

9 Commits

Author SHA1 Message Date
f903bc475c [BE] add noqa for flake8 rule B036: found except BaseException without re-raising (#159043)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159043
Approved by: https://github.com/Skylion007
2025-07-25 02:56:34 +00:00
3b798df853 [BE][Easy] enable UFMT for torch/distributed/{fsdp,optim,rpc}/ (#128869)
Part of #123062

- #123062

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128869
Approved by: https://github.com/fegin
ghstack dependencies: #128868
2024-06-18 21:49:08 +00:00
7c12cc7ce4 Flip default value for mypy disallow_untyped_defs [6/11] (#127843)
See #127836 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127843
Approved by: https://github.com/oulgen
ghstack dependencies: #127842
2024-06-08 18:49:29 +00:00
b7b2178204 [BE]: Remove useless lambdas (#113602)
Applies PLW0108 which removes useless lambda calls in Python, the rule is in preview so it is not ready to be enabled by default just yet. These are the autofixes from the rule.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113602
Approved by: https://github.com/albanD
2023-11-14 20:06:48 +00:00
ef4bc3fa2f [distributed] Make rref_proxy._invoke_rpc trully async when needed. (#70206)
Summary:
From https://github.com/pytorch/pytorch/issues/67626: RRefProxy (rref.rpc_async, rref.rpc_sync, rref.remote) currently uses a blocking RPC call to the owner

This is done by chaining async calls. In the sync case we wait on the
resulting Future.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70206

Test Plan:
I ran rpc_tests using tensorpipe_rpc_agent_test_fixture.py and had to
adjust test_rref_proxy_timeout to the new behavior.

I ran into test_tensorpipe_set_default_timeout failing due to the
timeout being too small. Doesn't look related to this change.
mrshenli
Fixes https://github.com/pytorch/pytorch/issues/67626

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Reviewed By: pritamdamania87

Differential Revision: D33243348

Pulled By: kumpera

fbshipit-source-id: e1e8c34bb3d170407c0a793e2e585357f905d3c6
(cherry picked from commit 1ad5a7ceea17d00872e593650ef50d85bb232cda)
2022-01-19 23:37:15 +00:00
d64184ef4c [RPC] Support timeout for RRef proxy functions (#50499)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50499

Adds a timeout API to the following functions:
```
rref.rpc_sync()
rref.rpc_async()
rref.remote()
```
so that RPCs initiated by these proxy calls can be appropriately timed out similar to the regular RPC APIs. Timeouts are supported in the following use cases:

1. rpc.remote finishes in time and successfully, but function run by rref.rpc_async() is slow and times out. Timeout error will be raised
2. rref.rpc_async() function is fast, but rpc.remote() is slow/hanging. Then when rref.rpc_async() is called, it will still timeout with the passed in timeout (and won't block for the rpc.remote() to succeed, which is what happens currently). Although, the timeout will occur during the future creation itself (and not the wait) since it calls `rref._get_type` which blocks. We can consider making this nonblocking by modifying rref._get_type to return a future, although that is likely a larger change.

Test Plan: Added UT

Reviewed By: wanchaol

Differential Revision: D25897495

fbshipit-source-id: f9ad5b8f75121f50537677056a5ab16cf262847e
2021-01-15 13:23:23 -08:00
a5fb12d168 RRef proxy support for ScriptModule methods (#48339)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48339

Closes https://github.com/pytorch/pytorch/issues/48294
https://github.com/pytorch/pytorch/pull/48293 added creation and transfer of ScriptModule over RPC in python, but it did not work with ScriptModule.

This PR makes the above work with ScriptModule as per a discussion with mrshenli:
1) We remove the `hasattr()` check and just let Python throw the exception as it would when accessing the py function with `getattr`
2) We  condition on `issubclass(type, ScriptModule)` when checking if it is wrapped with async_function, because `ScriptModule` does not have getattr implemented (this is because ScriptModule forward/function is not a python function, it is a torchscript specific function):
```
torch/jit/_script.py", line 229, in __get__
    return self.__getattr__("forward")  # type: ignore
AttributeError: '_CachedForward' object has no attribute '__getattr__'
```
ghstack-source-id: 117631795

Test Plan: Modified ut

Reviewed By: wanchaol

Differential Revision: D25134423

fbshipit-source-id: 918ca88891c7b0531325f046b61f28947575cff0
2020-12-04 11:33:16 -08:00
257c6d0fde Make async_execution compatible with RRef helpers (#44666)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44666

Test Plan: Imported from OSS

Reviewed By: rohan-varma

Differential Revision: D23691989

Pulled By: mrshenli

fbshipit-source-id: b36f4b1c9d7782797a0220434a8272610a23e83e
2020-09-16 12:01:05 -07:00
5c2b273089 Add RRef Python Helper to launch function on the referenced object (#36619)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36619

With this PR, applications no longer need to create dedicated helpers
to run functions on the object referenced by an RRef. Instead,
`rref.rpc_sync().some_func()` will use `rpc_sync` to run `some_func`
on the owner of the RRef using the object referenced by the RRef.
Similar helpers for `rref.rpc_async().some_func()` and
`rref.remote().some_func()` are also added.

An alternative design is to expose PyRRef as RRefBase and then
implement everything in a new Python RRef class. However, the RRef
class cannot directly inherit from PyRRef/RRefBase, otherwise we
will need to let pyRemote* C++ functions to load RRef from Python
and return an RRef instance. It is possible to let RRef hold a
instance of PyRRef instead of inherit from it, but this does not
look like a elegant design, as we will have RRef holding PyRRef and
PyRRef holding the C++ RRef. Another alternative is to use dynamic
method loading, by installing member methods to PyRRef instances.
However, this would require different solutions to handle
RRef(data) and rpc.remote(...). Base on the above thinking, we
decided to go with the current implementation for simplicity and we
can also keep all RRef-related APIs in one place.

Test Plan: Imported from OSS

Differential Revision: D21028333

Pulled By: mrshenli

fbshipit-source-id: fe90f56ef7183d18874e357900093755e1601eb4
2020-04-21 19:29:54 -07:00