29 Commits

Author SHA1 Message Date
e8cf5ff564 Fix the Problems About Defining Static Variable in Inline Function (#147095)
Refer to https://github.com/pytorch/pytorch/issues/125465 for more informations

- Remove unused header files
- Move common functionality to separate files to reduce dependencies between picklers and unpicklers
- Move the inline function that defines the static variable to .cc

Differential Revision: [D76266755](https://our.internmc.facebook.com/intern/diff/D76266755)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147095
Approved by: https://github.com/cyyever, https://github.com/albanD

Co-authored-by: Edward Yang <ezyang@meta.com>
2025-06-25 01:59:10 +00:00
d4ab8e74f3 Revert "Fix the Problems About Defining Static Variable in Inline Function (#147095)"
This reverts commit c6fc11af760d4ad1f01cc699a3c6488ab5f41770.

Reverted https://github.com/pytorch/pytorch/pull/147095 on behalf of https://github.com/izaitsevfb due to still fails to link internally at meta ([comment](https://github.com/pytorch/pytorch/pull/147095#issuecomment-2917221575))
2025-05-28 18:22:39 +00:00
c6fc11af76 Fix the Problems About Defining Static Variable in Inline Function (#147095)
Refer to https://github.com/pytorch/pytorch/issues/125465 for more informations

- Remove unused header files
- Move the inline function that defines the static variable to .cc

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147095
Approved by: https://github.com/cyyever, https://github.com/albanD
2025-05-28 02:47:16 +00:00
4926bd6004 Revert "Fix the Problems About Defining Static Variable in Inline Function (#147095)"
This reverts commit 3da14d38bd396f5bbe8494872d1509efa1a6f048.

Reverted https://github.com/pytorch/pytorch/pull/147095 on behalf of https://github.com/atalman due to breaks internally ([comment](https://github.com/pytorch/pytorch/pull/147095#issuecomment-2787129770))
2025-04-08 17:10:36 +00:00
3da14d38bd Fix the Problems About Defining Static Variable in Inline Function (#147095)
Refer to https://github.com/pytorch/pytorch/issues/125465 for more informations

- Remove unused header files
- Move the inline function that defines the static variable to .cc

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147095
Approved by: https://github.com/cyyever, https://github.com/albanD
2025-04-08 10:23:02 +00:00
cyy
70206499f1 [3/N] Fix extra warnings brought by clang-tidy-17 (#137552)
Follows #137459

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137552
Approved by: https://github.com/ezyang
2024-10-15 02:33:44 +00:00
cyy
6aa6bd4ca5 [Distributed] [12/N] Fix clang-tidy warnings in torch/csrc/distributed/ (#136528)
Follows #136439. A dangling reference to qualifiedName was found and fixed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136528
Approved by: https://github.com/kwen2501
2024-09-25 20:12:08 +00:00
cyy
f4dcf2ae93 [1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128301
Approved by: https://github.com/ezyang, https://github.com/r-barnes
2024-07-08 07:03:53 +00:00
846bb30e13 Revert "[1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301)"
This reverts commit bd72e28314d8d63bb347becb8309f5ac7761c6b5.

Reverted https://github.com/pytorch/pytorch/pull/128301 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it fails XLA build bd72e28314. Please rebase your PR before relanding because I think the failure is hidden by an unrelated broken trunk XLA failure from your current base commit ([comment](https://github.com/pytorch/pytorch/pull/128301#issuecomment-2169035822))
2024-06-15 01:58:20 +00:00
cyy
bd72e28314 [1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128301
Approved by: https://github.com/ezyang
2024-06-14 23:21:01 +00:00
ed327876f5 [codemod] c10:optional -> std::optional (#126135)
Generated by running the following from PyTorch root:
```
find . -regex ".*\.\(cpp\|h\|cu\|hpp\|cc\|cxx\)$" | grep -v "build/" | xargs -n 50 -P 4 perl -pi -e 's/c10::optional/std::optional/'
```

`c10::optional` is just an alias for `std::optional`. This removes usages of that alias in preparation for eliminating it entirely.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126135
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/albanD, https://github.com/aaronenyeshi
2024-05-14 19:35:51 +00:00
ee44d73e59 Modernize override (#61744)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61744

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D29717320

fbshipit-source-id: 6eea4295ee2e5572ab337620be412376fcc2f3cc
2021-07-23 23:04:46 -07:00
b07d68e24c [reland] Always use intrusive_ptr for Message (2 out of 2) (#59206)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59206

Reland of https://github.com/pytorch/pytorch/pull/58423

This is part 2 of the previous PR. Here we address the remaining occurrences of "raw" Message, namely the ones within toMessageImpl. And since they're the last ones, we make the constructor of Message private, to prevent new usages from emerging.
ghstack-source-id: 130202848

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D28623892

fbshipit-source-id: f815cf6b93e488c118e5d2298473e6e9d9f4c132
2021-06-02 05:45:55 -07:00
a6b9268f31 Revert D28474879: Always use intrusive_ptr for Message (2 out of 2)
Test Plan: revert-hammer

Differential Revision:
D28474879 (ebf55a7d13)

Original commit changeset: 498652a8b80a

fbshipit-source-id: 4d81e9769699356bf2a2ffc14b26f480bfeef9a1
2021-05-21 19:24:20 -07:00
ebf55a7d13 Always use intrusive_ptr for Message (2 out of 2) (#58423)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58423

This is part 2 of the previous PR. Here we address the remaining occurrences of "raw" Message, namely the ones within toMessageImpl. And since they're the last ones, we make the constructor of Message private, to prevent new usages from emerging.
ghstack-source-id: 129567049

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D28474879

fbshipit-source-id: 498652a8b80a953396cd5d4b275c0b2e869c9ecf
2021-05-21 13:15:25 -07:00
bfdc279134 Unify invoking JIT functions (#57851)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57851

The same as the previous PR, but for JIT functions.
ghstack-source-id: 129567069

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D28253841

fbshipit-source-id: 2b8affde16c106f5c76efa8be49af070213708bf
2021-05-21 13:15:02 -07:00
eac02f85cf Fix more clang-tidy errors (#57235)
Summary:
In my last PR I've missed CUDA and distributed folders, fixing this now
This change is autogenerated by `python tool/clang_tidy.py -s`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57235

Reviewed By: janeyx99

Differential Revision: D28084444

Pulled By: malfet

fbshipit-source-id: bf222f69ee90c7872c3cb0931e8cdb84f0cb3cda
2021-04-28 23:29:10 -07:00
67cea74dd3 Add rpc.async_function decorator for TorchScript functions (#39267)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39267

When combined with `torch.jit.script`, the order of decorators matter.
`rpc.functions.async_execution` must be the outmost one. The
`async_execution` decorator will store the TorchScript function in
attribute `_wrapped_async_rpc_function` on the wrapper function, and
pass this wrapped TorchScript function (i.e., an instance of
`torch.jit.ScriptFunction`) to RPC. The caller will mark the ScriptCall
with `isAsyncExecution=true`, and the callee will extract the returned
`Future` in C++ and install subsequent processing as a callback to
that `Future`.

Test Plan: Imported from OSS

Differential Revision: D21792688

fbshipit-source-id: de095eb148d21e9114a478e9e6047c707d34fd07
2020-06-03 22:27:15 -07:00
a4afac6076 enforce rref JIT pickling to be in the scope of rpc calls (#34689)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34689

rref JIT pickling is only allowed inside rpc calls. enforcing this by adding a thread local variable isInRpcCall and set it as True when converting rpc requests or responses to message, before calling JIT::pickle(). Inside JIT::pickle(), it allowes to pickle RRef only when the isInRpcCall is true.
ghstack-source-id: 100481001

Test Plan: unit tests

Differential Revision: D20429826

fbshipit-source-id: dbc04612ed15de5d6c7d75a4732041ccd4ef3f8c
2020-03-19 18:07:39 -07:00
dbe850af5b [jit] do the code reorg (#33851)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33851

Rationale and context described in #33828.

Script to reproduce the move:
https://gist.github.com/suo/16cbefaaeb67ca5a7c6caffd49b7f6e9
ghstack-source-id: 99079645

Test Plan: Make sure CI passes

Reviewed By: jamesr66a

Differential Revision: D20133869

fbshipit-source-id: 390e9241a9c85366d9005c492ac31f10aa96488e
2020-02-27 13:02:51 -08:00
6ad9e5c70d Support TorchScript call over remote API (RRef) (#32466)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32466

It's a follow-up work of https://github.com/pytorch/pytorch/pull/32197.

In https://github.com/pytorch/pytorch/pull/32197, `rpc.sync_rpc(..) `and `rpc.rpc_async(..)` support taking a TorchScript annotated Python function as the user function for RPC.

This PR extend along this direction by making `rpc.remote(..)` support taking a TorchScript annotated Python function as well.

ghstack-source-id: 97211168

Test Plan:
# Unit tests

```
buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork -- test_script_function_exception

buck build mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork

buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par -r test_script_function_exception
```

```
buck test mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_fork -- test_backward_simple_script_call

buck build mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_fork

buck-out/gen/caffe2/test/distributed/rpc/dist_autograd_fork\#binary.par -r test_backward_simple_script_call
```

Differential Revision: D19440633

fbshipit-source-id: d37f6dcdc0b80d35ac7bcba46ad6f9b831c3779b
2020-01-25 02:18:27 -08:00
58234c0254 support torch script call over rpc (#32197)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32197

This is to reland https://github.com/pytorch/pytorch/pull/30063, the main change is to match a general exception and grep "pickle" error word in "test_script_functions_not_supported" unit test, as Python 3.5 and Python 3.6 throw different types of errors with different error message for the rpc call in the unit test.
[test all]This diff makes following changes:
1. Providing a new set of python rpc privated APIs, they can accept an annotated TorchScript call and this call can be serialized, deserialized and executed in C++ without GIL. These privated APIs will be binded to JIT in the future, and they are different from public APIs as future JIT binded private APIs will be able to accept qualified_name, not callables. These private APIs are subject to be deprecated once JIT supports torch script function to be a JIT type.

Also, these APIs require torch script function to be defined and annotated by users in python land, it can not be script class/module constructor or class/module methods.

2. This diff also allows public rpc APIs to accept an annotated TorchScript call and execute code path that above private APIs ran on. Therefore if users invoke an annotated TorchScript call over RPC, this call can be serialized, deserialized and executed in C++ without GIL as well.

3. The above private APIs call a newly defined C++ function to make rpc torch script call to be serialized, deserialized and executed in C++ land. This C++ function returns an ivalue::Future. so that in follow up diff this C++ function can be called when these privated APIs are binded to JIT.

4. script_call.cpp/.h and request_callback_impl.cpp files are refactored accordingly so that torch script call and builtin call can share same message type and codes.

5. refactored deserializeResponse() and added a new utility to deserizalize response to IValue

ghstack-source-id: 96879167
ghstack-source-id: 96879167

Test Plan: unit test

Differential Revision: D19402374

fbshipit-source-id: 04efcc7c167d08a6503f29efe55e76f2be4b2c5e
2020-01-18 09:24:17 -08:00
51a34545e9 Revert D18482934: support torch script call over rpc
Test Plan: revert-hammer

Differential Revision:
D18482934

Original commit changeset: bd82a0d820c4

fbshipit-source-id: ca5e50fb0a883ee311aeb310198d84ad28062158
2020-01-14 13:30:56 -08:00
dbd737158b support torch script call over rpc (#30063)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30063

This diff makes following changes:
1. Providing a new set of python rpc privated APIs, they can accept an annotated TorchScript call and this call can be serialized, deserialized and executed in C++ without GIL. These privated APIs will be binded to JIT in the future, and they are different from public APIs as future JIT binded private APIs will be able to accept qualified_name, not callables. These private APIs are subject to be deprecated once JIT supports torch script function to be a JIT type.

Also, these APIs require torch script function to be defined and annotated by users in python land, it can not be script class/module constructor or class/module methods.

2. This diff also allows public rpc APIs to accept an annotated TorchScript call and execute code path that above private APIs ran on. Therefore if users invoke an annotated TorchScript call over RPC, this call can be serialized, deserialized and executed in C++ without GIL as well.

3. The above private APIs call a newly defined C++ function to make rpc torch script call to be serialized, deserialized and executed in C++ land. This C++ function returns an ivalue::Future. so that in follow up diff this C++ function can be called when these privated APIs are binded to JIT.

4. script_call.cpp/.h and request_callback_impl.cpp files are refactored accordingly so that torch script call and builtin call can share same message type and codes.

5. refactored deserializeResponse() and added a new utility to deserizalize response to IValue

ghstack-source-id: 96638829

Test Plan: unit test

Differential Revision: D18482934

fbshipit-source-id: bd82a0d820c47a8e45b2e7c616eca06573f7d7ea
2020-01-14 09:27:04 -08:00
fe4170bda8 Add send and recv backward functions for builtin operators RPC. (#25527)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25527

Master GH issue: https://github.com/pytorch/pytorch/issues/23110.

This change builds upon https://github.com/pytorch/pytorch/pull/24876 and
provides all the autograd hooks needed for a forward pass with distributed rpc
for builtin operators. This change does not address distributed rpc for python
UDFs and that will be addressed in follow up PRs.

Summary of changes:
1. Attach send autograd functions when a request is sent from the client and
response is sent from the server.
2. Attach receive autograd functions when a request is received on the server
and a response is received on the client.
3. Generate a globally unique autograd_message_id for each send/recv autograd
function pair to uniquely identify them.
ghstack-source-id: 91240466

Test Plan: unit tests.

Differential Revision: D17148077

fbshipit-source-id: 192d8a3f552ed7cc939f55dcca332965c9bd3233
2019-10-03 01:18:46 -07:00
9ca901895f Make distructor virtual for class with virtual function (#26504)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26504

[pytorch] [distributed] Make distructor virtual for class with virtual function
Not having virtual distructor may lead to a memory leak.
ghstack-source-id: 90454880

Test Plan: Made sure pg based UT works.

Differential Revision: D17488876

fbshipit-source-id: 5fdc55e175fd2b22e931b740c36cb1feed454066
2019-09-20 09:36:29 -07:00
197fd4f707 Adding RRef as return value for builtin operators (#25169)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25169

See #23110 for RRef design details. This commit only implements
RRef as return value for builtin operators, and RRef will communicate
between a user and the owner. More specifically, a RRef is first
created on the `dist.remote` caller, which is a user of the RRef.
Then the RRef user sends and notification to the owner to report
the fork to the owner, and the owner uses a shared_ptr to keep
the RRef alive. When the user RRef is destructed on the caller,
another notification will be sent to the owner, and the owner
can then drop it's RRef as well.

Test Plan: Imported from OSS

Differential Revision: D17048343

Pulled By: mrshenli

fbshipit-source-id: 9dd3b3d0e4fd214c76fecdbed746a6d3029b3efd
2019-09-05 15:14:17 -07:00
5407241b4f Run clang-format on torch/csrc/distributed (#25647)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25647

TSIA

Test Plan: N/A

Differential Revision: D17182909

fbshipit-source-id: 22a6554693def0032a051cef5fe788f49de1d740
2019-09-04 10:08:09 -07:00
b6803d62fd Use snake names for all files in distributed.rpc (#24502)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24502

Files in distributed.rpc package mixes snake camel names. This
commit cleans that up and all files use snake names now.
ghstack-source-id: 88548990

Reviewed By: xush6528

Differential Revision: D16860155

fbshipit-source-id: 3a22a89bf6c4e11aac5849564fc53296a04d6a8b
2019-08-19 10:58:59 -07:00