18 Commits

Author SHA1 Message Date
e8cf5ff564 Fix the Problems About Defining Static Variable in Inline Function (#147095)
Refer to https://github.com/pytorch/pytorch/issues/125465 for more informations

- Remove unused header files
- Move common functionality to separate files to reduce dependencies between picklers and unpicklers
- Move the inline function that defines the static variable to .cc

Differential Revision: [D76266755](https://our.internmc.facebook.com/intern/diff/D76266755)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147095
Approved by: https://github.com/cyyever, https://github.com/albanD

Co-authored-by: Edward Yang <ezyang@meta.com>
2025-06-25 01:59:10 +00:00
d4ab8e74f3 Revert "Fix the Problems About Defining Static Variable in Inline Function (#147095)"
This reverts commit c6fc11af760d4ad1f01cc699a3c6488ab5f41770.

Reverted https://github.com/pytorch/pytorch/pull/147095 on behalf of https://github.com/izaitsevfb due to still fails to link internally at meta ([comment](https://github.com/pytorch/pytorch/pull/147095#issuecomment-2917221575))
2025-05-28 18:22:39 +00:00
c6fc11af76 Fix the Problems About Defining Static Variable in Inline Function (#147095)
Refer to https://github.com/pytorch/pytorch/issues/125465 for more informations

- Remove unused header files
- Move the inline function that defines the static variable to .cc

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147095
Approved by: https://github.com/cyyever, https://github.com/albanD
2025-05-28 02:47:16 +00:00
4926bd6004 Revert "Fix the Problems About Defining Static Variable in Inline Function (#147095)"
This reverts commit 3da14d38bd396f5bbe8494872d1509efa1a6f048.

Reverted https://github.com/pytorch/pytorch/pull/147095 on behalf of https://github.com/atalman due to breaks internally ([comment](https://github.com/pytorch/pytorch/pull/147095#issuecomment-2787129770))
2025-04-08 17:10:36 +00:00
3da14d38bd Fix the Problems About Defining Static Variable in Inline Function (#147095)
Refer to https://github.com/pytorch/pytorch/issues/125465 for more informations

- Remove unused header files
- Move the inline function that defines the static variable to .cc

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147095
Approved by: https://github.com/cyyever, https://github.com/albanD
2025-04-08 10:23:02 +00:00
cyy
6aa6bd4ca5 [Distributed] [12/N] Fix clang-tidy warnings in torch/csrc/distributed/ (#136528)
Follows #136439. A dangling reference to qualifiedName was found and fixed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136528
Approved by: https://github.com/kwen2501
2024-09-25 20:12:08 +00:00
cyy
f048569c24 [Distributed] [11/N] Fix clang-tidy warnings in torch/csrc/distributed/ (#136439)
Follows #131671

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136439
Approved by: https://github.com/kwen2501
2024-09-24 13:05:15 +00:00
2973994259 fix typo in comments under torch/csrc/distributed (#96062)
This PR fixes typos in comments and messages of `.cpp` and `.hpp` files under `torch/csrc/distributed` directory

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96062
Approved by: https://github.com/ngimel
2023-03-07 02:56:41 +00:00
b07d68e24c [reland] Always use intrusive_ptr for Message (2 out of 2) (#59206)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59206

Reland of https://github.com/pytorch/pytorch/pull/58423

This is part 2 of the previous PR. Here we address the remaining occurrences of "raw" Message, namely the ones within toMessageImpl. And since they're the last ones, we make the constructor of Message private, to prevent new usages from emerging.
ghstack-source-id: 130202848

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D28623892

fbshipit-source-id: f815cf6b93e488c118e5d2298473e6e9d9f4c132
2021-06-02 05:45:55 -07:00
a6b9268f31 Revert D28474879: Always use intrusive_ptr for Message (2 out of 2)
Test Plan: revert-hammer

Differential Revision:
D28474879 (ebf55a7d13)

Original commit changeset: 498652a8b80a

fbshipit-source-id: 4d81e9769699356bf2a2ffc14b26f480bfeef9a1
2021-05-21 19:24:20 -07:00
ebf55a7d13 Always use intrusive_ptr for Message (2 out of 2) (#58423)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58423

This is part 2 of the previous PR. Here we address the remaining occurrences of "raw" Message, namely the ones within toMessageImpl. And since they're the last ones, we make the constructor of Message private, to prevent new usages from emerging.
ghstack-source-id: 129567049

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D28474879

fbshipit-source-id: 498652a8b80a953396cd5d4b275c0b2e869c9ecf
2021-05-21 13:15:25 -07:00
3fb1e73a4e Add rpc.async_execution support for rpc.remote on script functions (#39758)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39758

Test Plan: Imported from OSS

Differential Revision: D21963789

Pulled By: mrshenli

fbshipit-source-id: f16f464ba01401b160cc4d3daf036e4bc806d7ea
2020-06-10 13:17:07 -07:00
a4afac6076 enforce rref JIT pickling to be in the scope of rpc calls (#34689)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34689

rref JIT pickling is only allowed inside rpc calls. enforcing this by adding a thread local variable isInRpcCall and set it as True when converting rpc requests or responses to message, before calling JIT::pickle(). Inside JIT::pickle(), it allowes to pickle RRef only when the isInRpcCall is true.
ghstack-source-id: 100481001

Test Plan: unit tests

Differential Revision: D20429826

fbshipit-source-id: dbc04612ed15de5d6c7d75a4732041ccd4ef3f8c
2020-03-19 18:07:39 -07:00
dbe850af5b [jit] do the code reorg (#33851)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33851

Rationale and context described in #33828.

Script to reproduce the move:
https://gist.github.com/suo/16cbefaaeb67ca5a7c6caffd49b7f6e9
ghstack-source-id: 99079645

Test Plan: Make sure CI passes

Reviewed By: jamesr66a

Differential Revision: D20133869

fbshipit-source-id: 390e9241a9c85366d9005c492ac31f10aa96488e
2020-02-27 13:02:51 -08:00
6ad9e5c70d Support TorchScript call over remote API (RRef) (#32466)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32466

It's a follow-up work of https://github.com/pytorch/pytorch/pull/32197.

In https://github.com/pytorch/pytorch/pull/32197, `rpc.sync_rpc(..) `and `rpc.rpc_async(..)` support taking a TorchScript annotated Python function as the user function for RPC.

This PR extend along this direction by making `rpc.remote(..)` support taking a TorchScript annotated Python function as well.

ghstack-source-id: 97211168

Test Plan:
# Unit tests

```
buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork -- test_script_function_exception

buck build mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork

buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par -r test_script_function_exception
```

```
buck test mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_fork -- test_backward_simple_script_call

buck build mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_fork

buck-out/gen/caffe2/test/distributed/rpc/dist_autograd_fork\#binary.par -r test_backward_simple_script_call
```

Differential Revision: D19440633

fbshipit-source-id: d37f6dcdc0b80d35ac7bcba46ad6f9b831c3779b
2020-01-25 02:18:27 -08:00
2486b0ba82 Add Python RRef as args and return value (#25499)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25499

See #23110 for model parallel design details, and #26759 for the RRef
protocol. This commit add support for using RRef as Python UDF arguments
and return value. RRefs can now be shared from owner to user, from user to
owner, or from user to user.

Limitations:
1. No implicit type conversion yet. (#27099)
2. No failure handling and retry. (#26116)
3. UDF is not yet blocked until all RRefs are confirmed. (#27098)
4. Internal RRef control messages are not idempotent yet. (#26116)
5. Cannot delete RRefs correctly when there are circular dependencies. (#27096)

Main changes:

1. Added `SCRIPT_REMOTE_CALL` and `PYTHON_REMOTE_CALL` to `Message.h` to represent `dist.remote` invocations.
2. Added `SCRIPT_RREF_FETCH_CALL`, `PYTHON_RREF_FETCH_CALL`, `RREF_USER_ACCEPT`, `RREF_USER_DELETE`, `RREF_CHILD_ACCEPT`, and `RREF_FORK_REQUEST` to `Message.h` as internal RRef control messages.
3. New message request handling code is added to `functions.cpp`, and message format is added in `script_remote_call.h`, `python_remote_call.h`, and `rref_proto.h`.
4. Added a `PyRRef` type in `py_rref.h` and `py_rref.cpp` which holds a shared pointer to C++ `RRef` type. `PyRRef` wraps the C++ API and also implements RRef pickling and unpickling. RRef fork related control messages will be sent during RRef pickling/unpickling procedure.
5.  Update `RRef.h` and `RRef.cpp` accordingly to support `py::object` RRefs.
6. RRef context (reference count, etc.) are tracked in `rref_context.h` and `rref_context.cpp`.

Test Plan:
Imported from OSS

buck test mode/dev-nosan //caffe2/test:rpc_fork

Differential Revision: D17184146

Pulled By: mrshenli

fbshipit-source-id: a3a268efc087ac1ef489136ab957080382629265
2019-10-03 17:47:12 -07:00
fe4170bda8 Add send and recv backward functions for builtin operators RPC. (#25527)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25527

Master GH issue: https://github.com/pytorch/pytorch/issues/23110.

This change builds upon https://github.com/pytorch/pytorch/pull/24876 and
provides all the autograd hooks needed for a forward pass with distributed rpc
for builtin operators. This change does not address distributed rpc for python
UDFs and that will be addressed in follow up PRs.

Summary of changes:
1. Attach send autograd functions when a request is sent from the client and
response is sent from the server.
2. Attach receive autograd functions when a request is received on the server
and a response is received on the client.
3. Generate a globally unique autograd_message_id for each send/recv autograd
function pair to uniquely identify them.
ghstack-source-id: 91240466

Test Plan: unit tests.

Differential Revision: D17148077

fbshipit-source-id: 192d8a3f552ed7cc939f55dcca332965c9bd3233
2019-10-03 01:18:46 -07:00
197fd4f707 Adding RRef as return value for builtin operators (#25169)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25169

See #23110 for RRef design details. This commit only implements
RRef as return value for builtin operators, and RRef will communicate
between a user and the owner. More specifically, a RRef is first
created on the `dist.remote` caller, which is a user of the RRef.
Then the RRef user sends and notification to the owner to report
the fork to the owner, and the owner uses a shared_ptr to keep
the RRef alive. When the user RRef is destructed on the caller,
another notification will be sent to the owner, and the owner
can then drop it's RRef as well.

Test Plan: Imported from OSS

Differential Revision: D17048343

Pulled By: mrshenli

fbshipit-source-id: 9dd3b3d0e4fd214c76fecdbed746a6d3029b3efd
2019-09-05 15:14:17 -07:00