Commit Graph

55 Commits

Author SHA1 Message Date
c4d1ff02f8 [Lint] Update clang-format to 19.1.4 (#153889)
All changes other than the one to `tools/linter/adapters/s3_init_config.json` are generated by newer clang-format
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153889
Approved by: https://github.com/cyyever, https://github.com/atalman
2025-05-20 14:12:46 +00:00
cyy
70206499f1 [3/N] Fix extra warnings brought by clang-tidy-17 (#137552)
Follows #137459

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137552
Approved by: https://github.com/ezyang
2024-10-15 02:33:44 +00:00
cyy
95dbbf713e [Distributed] [9/N] Fix clang-tidy warnings in torch/csrc/distributed/rpc (#130109)
Follows #125102

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130109
Approved by: https://github.com/ezyang
2024-07-16 04:23:42 +00:00
d7caef7996 [CI] Update clang-format (#116002)
To 17.0.6 build using https://github.com/pytorch/test-infra/blob/main/.github/workflows/clang-tidy-linux.yml

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116002
Approved by: https://github.com/suo
2023-12-18 14:58:46 +00:00
387d769156 [BE]: Replace string compares with more efficient cpp comparisons (#92765)
Replace cpp string comparisons with more efficient equality operators. These string comparisons are not just more readable, but they also allow for short-circuiting for faster string equality checks.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92765
Approved by: https://github.com/ezyang
2023-01-22 21:40:19 +00:00
b55a2500d2 [jit] Remove graph() call from abstract Function interface. (#65967)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65967

Graph is an implementation detail. If user wants to get access to the
underlying graph, they should be able to explicitly dynamic cast instead.
ghstack-source-id: 141659819

Test Plan: no behavior change.

Reviewed By: gmagogsfm

Differential Revision: D31326153

fbshipit-source-id: a0e984f57c6013494b92a7095bf5bb660035eb84
2021-10-27 11:54:26 -07:00
a9b0a921d5 Disable avoid-non-const-global-variables lint check (#62008)
Summary:
As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH`

All changes but the ones to `.clang-tidy` are generated using following script:
```
for i in `find . -type f -iname "*.c*" -or -iname "*.h"|xargs grep cppcoreguidelines-avoid-non-const-global-variables|cut -f1 -d:|sort|uniq`;  do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008

Reviewed By: driazati, r-barnes

Differential Revision: D29838584

Pulled By: malfet

fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13
2021-07-22 18:04:40 -07:00
6ecc1a4c4f Make pytorch clang-tidy clean (#60649)
Summary:
This PR suppresses clang-tidy warnings in the codebase (for now) so that we can re-enable clang-tidy checks on master.

I ran this script to add the `NOLINTNEXTLINE` comments (on a devserver):
```bash
python3 setup.py develop

# Uses same script that's run on CI and adds the -j (parallel), -s (add comments), -k (continue if diagnostic errors are found) options
python3 tools/clang_tidy.py \
  -j \
  -s \
  -k \
  -v \
  --paths torch/csrc/ \
  -g"-torch/csrc/jit/passes/onnx/helper.cpp" \
  -g"-torch/csrc/jit/passes/onnx/shape_type_inference.cpp" \
  -g"-torch/csrc/jit/serialization/onnx.cpp" \
  -g"-torch/csrc/jit/serialization/export.cpp" \
  -g"-torch/csrc/jit/serialization/import.cpp" \
  -g"-torch/csrc/jit/serialization/import_legacy.cpp" \
  -g"-torch/csrc/onnx/init.cpp" \
  -g"-torch/csrc/cuda/nccl.*" \
  -g"-torch/csrc/cuda/python_nccl.cpp" \
  -g"-torch/csrc/autograd/FunctionsManual.cpp" \
  -g"-torch/csrc/generic/*.cpp" \
  -g"-torch/csrc/jit/codegen/cuda/runtime/*" \
  -g"-torch/csrc/deploy/interpreter/interpreter.cpp" \
  -g"-torch/csrc/deploy/interpreter/interpreter.h" \
  -g"-torch/csrc/deploy/interpreter/interpreter_impl.h" \
  -g"-torch/csrc/deploy/interpreter/test_main.cpp"
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60649

Test Plan: Verified changes by re-running the script (without the `-s` option) and seeing no warnings/errors.

Reviewed By: walterddr, janeyx99

Differential Revision: D29504258

Pulled By: 1ntEgr8

fbshipit-source-id: 78310b30ee8213b73ddb4771ad874665323e7a4e
2021-07-01 12:21:07 -07:00
924717bf51 Add _get_type() API to RRef (#44663)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44663

The new API returns the type of the data object referenced by this
`RRef`. On the owner, this is same as `type(rref.local_value())`.
On a user, this will trigger an RPC to fetch the `type` object from
the owner. After this function is run once, the `type` object is
cached by the `RRef`, and subsequent invocations no longer trigger
RPC.

closes #33210

Test Plan: Imported from OSS

Reviewed By: rohan-varma

Differential Revision: D23691990

Pulled By: mrshenli

fbshipit-source-id: a2d87cd601a691dd75164b6bcd7315245e9cf6bd
2020-09-16 11:59:22 -07:00
89b0b3bc8c Allow RPC to be initialized again after shutdown. (#42723)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42723

This PR is addressing https://github.com/pytorch/pytorch/issues/39340
and allows users to initialize RPC again after shutdown. Major changes in the
PR include:

1. Change to DistAutogradContainer to support this.
2. Ensure PythonRpcHandler is reinitialized appropriately.
3. Use PrefixStore in RPC initialization to ensure each new `init_rpc` uses a
different prefix.
ghstack-source-id: 109805368

Test Plan: waitforbuildbot

Reviewed By: rohan-varma

Differential Revision: D22993909

fbshipit-source-id: 9f1c1e0a58b58b97125f41090601e967f96f70c6
2020-08-13 20:18:34 -07:00
7e82382ad5 Allow profiler to be enabled remotely with RPC (#38748)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38748

This diff contains the message scaffolding and profiler changes in order to be able to remotely run the profiler across different nodes and aggregate the results on a single node.

As discussed, we have implemented this by creating new message types, that similar to autograd messages, wrap the profiling information with the original message, and send this new message over the wire. On the receiving end, this wrapped message is detected, we fetch the original message from it, and process the original message with the profiler enabled. When sending a response with profiling information, we serialize the profiled `Events` and send them back over RPC. When such a message is received, the events profiled on the remote node are stored (added back to the local profiler).

Changes in this PR:
- New message types (run_with_profiling_req, run_with_profiling_resp) to send profiling info over the wire. Message parsing logic is added to handle these wrapped types.
- Handling of sending profiler data over the wire, in particular, the attributes of the `ProfilerConfig` and the serialized profiled `Event`s
- The logic for wrapping RPC messages is deduped with that in `rpc_with_autograd`, and the common payload wrapping/unwrapping logic is moved to helper functions in `rpc/utils.cpp`
- Changes in `autograd/utils.cpp` to detect if we have enabled the profiler and are sending an RPC, if so, uses the above new message types
- Changes in request_callback to parse and turn on the profiler in a thread-local fashion
- Serialization and deserialization of profiling `Events`, and support to add the remote events to the thread-local profiler
- Introduction of the concept of `node_id`, which as discussed with ilia-cher , will be used along with the `Event`s handle attribute to distinguish between events. When there are events from different nodes, this node information is rendered in the profile output (e.g. when printing tables), otherwise, it is not, since it is irrelevant.
- Some changes to profiler.cpp to add useful helper methods/guards
- toHere() is now profiled for RRefs
- Unittests
ghstack-source-id: 106134626

Test Plan: Added unittests, existing profiler unittests.

Differential Revision: D19510010

fbshipit-source-id: 044347af992f19a9e3b357c9567f6fc73e988157
2020-06-18 17:01:57 -07:00
fa4ed17183 Explicitly decref in UnpickledPythonCall dtor (#38398)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38398

Test Plan: Imported from OSS

Differential Revision: D21550712

Pulled By: mrshenli

fbshipit-source-id: aac4708a5b6f6dc38149f995d11e27c190648859
2020-06-04 22:36:35 -07:00
a05ef17e46 Add rpc.functions.async_execution decorator for rpc_sync/rpc_async (#39216)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39216

The `rpc.functions.async_execution` decorator specifies that the
wrapped function is guaranteed to return a `torch.futures.Future`.
The decorator adds a `_wrapped_async_rpc_function` attribute to
the wrapper function. The caller retrieves this information and
then sets `isAsyncFunction` argument accordingly which is later
added to PythonCall RPC message as a field. On the callee side,
if the PythonCall carries an asynchronous function, it will cast
the function's return value to a jit::PythonFutureWrapper object,
and then install response creation and communication as a callback
on the that jit::PythonFutureWrapper.

For applications, this feature is useful when a function needs to
wait for IO or additional singaling. In those cases, marking the
user function as `rpc.functions.async_execution` will prevent it
from blocking one thread on callee for too long.

Test Plan: Imported from OSS

Reviewed By: rohan-varma

Differential Revision: D21779962

fbshipit-source-id: 6b6aa698bf6f91dad6ed2a7ee433df429b59e941
2020-06-02 23:21:25 -07:00
797c608f50 Explicitly decref py::object in PythonRpcHandler (#38366)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38366

Test Plan: Imported from OSS

Differential Revision: D21537612

Pulled By: mrshenli

fbshipit-source-id: 089bcc3d7de3bce6e769f72d67e0e0f91e0219c6
2020-05-12 20:55:59 -07:00
5c2b273089 Add RRef Python Helper to launch function on the referenced object (#36619)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36619

With this PR, applications no longer need to create dedicated helpers
to run functions on the object referenced by an RRef. Instead,
`rref.rpc_sync().some_func()` will use `rpc_sync` to run `some_func`
on the owner of the RRef using the object referenced by the RRef.
Similar helpers for `rref.rpc_async().some_func()` and
`rref.remote().some_func()` are also added.

An alternative design is to expose PyRRef as RRefBase and then
implement everything in a new Python RRef class. However, the RRef
class cannot directly inherit from PyRRef/RRefBase, otherwise we
will need to let pyRemote* C++ functions to load RRef from Python
and return an RRef instance. It is possible to let RRef hold a
instance of PyRRef instead of inherit from it, but this does not
look like a elegant design, as we will have RRef holding PyRRef and
PyRRef holding the C++ RRef. Another alternative is to use dynamic
method loading, by installing member methods to PyRRef instances.
However, this would require different solutions to handle
RRef(data) and rpc.remote(...). Base on the above thinking, we
decided to go with the current implementation for simplicity and we
can also keep all RRef-related APIs in one place.

Test Plan: Imported from OSS

Differential Revision: D21028333

Pulled By: mrshenli

fbshipit-source-id: fe90f56ef7183d18874e357900093755e1601eb4
2020-04-21 19:29:54 -07:00
75e4c53b35 [rpc] Add a debug only check to debug python cleanup races (#35395)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35395

as title
ghstack-source-id: 101035263

Test Plan: CI

Differential Revision: D20632634

fbshipit-source-id: 737e353982b325e73da3825b130aae6b11dbcfe7
2020-03-27 18:53:35 -07:00
ac639d927a Reland "[RPC] Use qualified name str directly in RPC torch script code path" (#35489)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35489

Relanding https://github.com/pytorch/pytorch/pull/34733.

Fix is in https://github.com/pytorch/pytorch/pull/34988

Test Plan:
```
buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork
```

```
buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork

buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork && \
buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \
-r test_return_local_script_class_rref_in_py_and_use_in_script

buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork && \
buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \
-r test_return_local_script_module_rref_in_py_and_use_in_script
```

Differential Revision: D20661748

fbshipit-source-id: d550daab8d689d0a9aa2450f3bdb7417ab79dae2
2020-03-26 23:41:51 -07:00
5d92a6cc30 Revert D7778113: Reland "[RPC] Use qualified name str directly in RPC torch script code path"
Test Plan: revert-hammer

Differential Revision:
D7778113

Original commit changeset: b830c03ac946

fbshipit-source-id: ef08b287a6db58320c738cde0c99b3333f5724eb
2020-03-19 06:05:23 -07:00
d616cad676 Reland "[RPC] Use qualified name str directly in RPC torch script code path" (#34962)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34962

Relanding #34733. Fix is in https://github.com/pytorch/pytorch/pull/34988.

Test Plan:
```
buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork
```

```
buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork

buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \
&& buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \
-r test_return_local_script_class_rref_in_py_and_use_in_script

buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \
&& buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \
-r test_return_local_script_module_rref_in_py_and_use_in_script
```

```
buck test mode/dev //caffe2/test/distributed/rpc/jit:rpc_fork_thrift -- test_return_local_script_module_rref_in_py_and_use_in_script
```

Differential Revision: D7778113

fbshipit-source-id: b830c03ac9463075fca248eba75be364b0e8b080
2020-03-18 22:25:09 -07:00
d29f450e63 Revert D20442573: [RPC] Use qualified name str directly in RPC torch script code path
Test Plan: revert-hammer

Differential Revision:
D20442573

Original commit changeset: 87f8b7d94adc

fbshipit-source-id: db0f10c28352d2b3ca21b5357e8e09c01a50018c
2020-03-18 11:00:09 -07:00
ecd7c0f84c [RPC] Use qualified name str directly in RPC torch script code path (#34733)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34733

simplify
ghstack-source-id: 100292435

Test Plan:
```
buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork
```

```
buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork

buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \
&& buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \
-r test_return_local_script_class_rref_in_py_and_use_in_script

buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \
&& buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \
-r test_return_local_script_module_rref_in_py_and_use_in_script
```

Differential Revision: D20442573

fbshipit-source-id: 87f8b7d94adc03544f8e2955d01cd4702bb31a34
2020-03-17 10:28:52 -07:00
38b2856c71 Split deserialize from runPythonUdf and remove generatePythonUDFResult (#34496)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34496

Differential Revision: D20347469

Test Plan: Imported from OSS

Pulled By: mrshenli

fbshipit-source-id: b832a3a9e2ef61f149175f737b26f65d63bf797b
2020-03-16 18:28:07 -07:00
c235be42dd [jit] kill script namespace (#34515)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34515

Once upon a time we thought this was necessary. In reality it is not, so
removing it.

For backcompat, our public interface (defined in `api/`) still has
typedefs to the old `script::` names.

There was only one collision: `Pass` as a `Stmt` and `Pass` as a graph
transform. I renamed one of them.

Test Plan: Imported from OSS

Differential Revision: D20353503

Pulled By: suo

fbshipit-source-id: 48bb911ce75120a8c9e0c6fb65262ef775dfba93
2020-03-11 23:32:48 -07:00
8d84c5f1c7 Fix static data initialization deadlock on GIL (#34505)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34505

A thread could hold GIL when calling PythonRpcHandler::getInstance(),
meantime another thread could have been doing static data
initialization by calling `new PythonRpcHandler()`, inside of which GIL is
also required. Static data initialization is thread-safe, so the thread
holding the GIL will wait for the other thread to finish static data
initializating before going forward. Because the initialization can't
proceed without GIL, there is a deadlock. We ask the calling thread to
release GIL to avoid this situation.
ghstack-source-id: 99893858

Test Plan:
```
buck test mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_spawn -- 'test_backward_simple_script_call \(test_dist_autograd_spawn\.DistAutogradTestWithSpawn\)' --stress-runs 100
```

Differential Revision: D7490489

fbshipit-source-id: 76f63cc7bedf088d3dbff288f53aa0bd33749255
2020-03-10 20:40:22 -07:00
b9c32209db Use SerializedPyObj in PythonRpcHandler::generatePythonUDFResult (#34495)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34495

Differential Revision: D20347466

Test Plan: Imported from OSS

Pulled By: mrshenli

fbshipit-source-id: 79625adb4ac3c9c6da4f40016e973bf17466c693
2020-03-09 20:41:05 -07:00
b82658810e Split deserialize from _run_function in RPC internal.py (#34494)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34494

Differential Revision: D20347463

Test Plan: Imported from OSS

Pulled By: mrshenli

fbshipit-source-id: e6fd886622f26c46bb83ac118e67abb2f5b296b9
2020-03-09 20:41:00 -07:00
544fb64440 Use SerializedPyObj in PythonRpcHandler (#34493)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34493

Differential Revision: D20347462

Test Plan: Imported from OSS

Pulled By: mrshenli

fbshipit-source-id: 9edda9eb95b1994464459271bb53ee77b760e474
2020-03-09 20:40:55 -07:00
18ef09f5ac Remove _load_return_value from RPC internal.py (#34492)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34492

Differential Revision: D20347468

Test Plan: Imported from OSS

Pulled By: mrshenli

fbshipit-source-id: 92388d0d50a08fb895bacacf94c7b5495b4ae2b6
2020-03-09 20:40:50 -07:00
6d1c4df660 Consolidate Python Messages to use SerializedPyObj (#34491)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34491

Differential Revision: D20347467

Test Plan: Imported from OSS

Pulled By: mrshenli

fbshipit-source-id: efae4111d961f3a528cede77c863fb049cda9029
2020-03-09 20:40:45 -07:00
60e8615a6d [JIT] Virtualize Function (#33921)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33921

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.intern.facebook.com/intern/diff/D20153092/)!

Test Plan: Imported from OSS

Differential Revision: D20177227

Pulled By: jamesr66a

fbshipit-source-id: 87f3e484c4f873d60f76f50f6789c1b4a73bdfde
2020-03-07 10:03:50 -08:00
dbe850af5b [jit] do the code reorg (#33851)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33851

Rationale and context described in #33828.

Script to reproduce the move:
https://gist.github.com/suo/16cbefaaeb67ca5a7c6caffd49b7f6e9
ghstack-source-id: 99079645

Test Plan: Make sure CI passes

Reviewed By: jamesr66a

Differential Revision: D20133869

fbshipit-source-id: 390e9241a9c85366d9005c492ac31f10aa96488e
2020-02-27 13:02:51 -08:00
9266bde970 [pytorch] Minor: add GIL assert to PythonRpcHandler::handleExceptionGILHeld (#33557)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33557

We should add GIL asserts in some places to keep assumptions documented.
This just adds one in an exception codepath as a placeholder for more.

This change also moves a #define from a .h to the .cpp to reduce scope.
ghstack-source-id: 98673532

Test Plan: buck test mode/dev-nosan caffe2/test/...

Differential Revision: D20005387

fbshipit-source-id: b7eff54a6f1dd69d199f8ca05cdb3001c50b37c4
2020-02-20 18:15:44 -08:00
a751ddaaa5 Use leaky singletons for torch.distributed. (#32923)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32923

As per
https://isocpp.org/wiki/faq/ctors#construct-on-first-use-v2 and
https://isocpp.org/wiki/faq/ctors#static-init-order-on-first-use-members, we
should be using leaky singletons to avoid static initialization order problem.

Closes https://github.com/pytorch/pytorch/issues/27412
ghstack-source-id: 97601384

Test Plan: waitforbuildbot

Differential Revision: D19688986

fbshipit-source-id: 8c1935fb7da8a7116dbca55eb43dc04bc02695ac
2020-02-03 14:15:18 -08:00
821b6aa769 [pytorch] Minor: avoid acquiring GIL twice in PyRRef::localValue() (#32785)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32785

Add PythonRpcHandler::handleExceptionWithGIL() so that in PyRRef::localValue(),
we don't need to release the GIL and re-acquire the following line.
ghstack-source-id: 97418465

Test Plan: existing test coverage

Differential Revision: D19626195

fbshipit-source-id: db694d04b078811f819626789e1e86f1b35adb5b
2020-01-29 21:27:43 -08:00
b5d8982ae2 clean up GIL usuage (#32748)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32748

This is to follow up PR #30630, we need to have GIL when calling jit::toPyObject(), for some binded functions need to be taged with GIL release if underneath C++ codes requires GIL. so
1. pyRef::to_here() and pyRef::local_value() added GIL
2. pyRef::pickle and pyRef::unpickle() added GIL release tag
3. in request_callback_impl, also added GIL as needed
4. for typeParser, use cached jitCompilationUnit_, also clean it up in cleanUp() function
ghstack-source-id: 97373011

Test Plan: unit test

Differential Revision: D19612337

fbshipit-source-id: 4d09f9b52ba626545ae7d31fea6b671301ed3890
2020-01-29 11:58:46 -08:00
b474c351dd [rpc] Remove template on RRef and add Type to RRef creation (#30630)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30630

This remove template and all the specializations it have in rpc, we
universally use IValue as the inner value since we support making python
object to be hold inside IValue.

This will also ensure that we have the correct type information when
creating the RRef, we use the return type from the schema when creating
userRRef and OwnerRRef, it will enable IValue to always have the correct
type if the IValue is the RRef object (next PR)

Test Plan: Imported from OSS

Differential Revision: D19502235

fbshipit-source-id: 0d5decae8a9767e0893f3b8b6456b231653be3c5
2020-01-23 21:15:46 -08:00
9ce25cce91 add an option to record time spent waiting for GIL (#30842)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30842

We'd like to profile the time spent on GIL acqusiition to debug
performance issues.

Test Plan: Unit tests pass.

Differential Revision: D18837590

fbshipit-source-id: 925968f71c5fb96b8cd93f1eab4647602d2617d1
2020-01-21 11:29:23 -08:00
58234c0254 support torch script call over rpc (#32197)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32197

This is to reland https://github.com/pytorch/pytorch/pull/30063, the main change is to match a general exception and grep "pickle" error word in "test_script_functions_not_supported" unit test, as Python 3.5 and Python 3.6 throw different types of errors with different error message for the rpc call in the unit test.
[test all]This diff makes following changes:
1. Providing a new set of python rpc privated APIs, they can accept an annotated TorchScript call and this call can be serialized, deserialized and executed in C++ without GIL. These privated APIs will be binded to JIT in the future, and they are different from public APIs as future JIT binded private APIs will be able to accept qualified_name, not callables. These private APIs are subject to be deprecated once JIT supports torch script function to be a JIT type.

Also, these APIs require torch script function to be defined and annotated by users in python land, it can not be script class/module constructor or class/module methods.

2. This diff also allows public rpc APIs to accept an annotated TorchScript call and execute code path that above private APIs ran on. Therefore if users invoke an annotated TorchScript call over RPC, this call can be serialized, deserialized and executed in C++ without GIL as well.

3. The above private APIs call a newly defined C++ function to make rpc torch script call to be serialized, deserialized and executed in C++ land. This C++ function returns an ivalue::Future. so that in follow up diff this C++ function can be called when these privated APIs are binded to JIT.

4. script_call.cpp/.h and request_callback_impl.cpp files are refactored accordingly so that torch script call and builtin call can share same message type and codes.

5. refactored deserializeResponse() and added a new utility to deserizalize response to IValue

ghstack-source-id: 96879167
ghstack-source-id: 96879167

Test Plan: unit test

Differential Revision: D19402374

fbshipit-source-id: 04efcc7c167d08a6503f29efe55e76f2be4b2c5e
2020-01-18 09:24:17 -08:00
51a34545e9 Revert D18482934: support torch script call over rpc
Test Plan: revert-hammer

Differential Revision:
D18482934

Original commit changeset: bd82a0d820c4

fbshipit-source-id: ca5e50fb0a883ee311aeb310198d84ad28062158
2020-01-14 13:30:56 -08:00
dbd737158b support torch script call over rpc (#30063)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30063

This diff makes following changes:
1. Providing a new set of python rpc privated APIs, they can accept an annotated TorchScript call and this call can be serialized, deserialized and executed in C++ without GIL. These privated APIs will be binded to JIT in the future, and they are different from public APIs as future JIT binded private APIs will be able to accept qualified_name, not callables. These private APIs are subject to be deprecated once JIT supports torch script function to be a JIT type.

Also, these APIs require torch script function to be defined and annotated by users in python land, it can not be script class/module constructor or class/module methods.

2. This diff also allows public rpc APIs to accept an annotated TorchScript call and execute code path that above private APIs ran on. Therefore if users invoke an annotated TorchScript call over RPC, this call can be serialized, deserialized and executed in C++ without GIL as well.

3. The above private APIs call a newly defined C++ function to make rpc torch script call to be serialized, deserialized and executed in C++ land. This C++ function returns an ivalue::Future. so that in follow up diff this C++ function can be called when these privated APIs are binded to JIT.

4. script_call.cpp/.h and request_callback_impl.cpp files are refactored accordingly so that torch script call and builtin call can share same message type and codes.

5. refactored deserializeResponse() and added a new utility to deserizalize response to IValue

ghstack-source-id: 96638829

Test Plan: unit test

Differential Revision: D18482934

fbshipit-source-id: bd82a0d820c47a8e45b2e7c616eca06573f7d7ea
2020-01-14 09:27:04 -08:00
1111a6b810 Use pybind11::gil_scoped_* functions instead of AutoGIL/AutoNoGIL (#30274)
Summary:
Reland of https://github.com/pytorch/pytorch/pull/29095
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30274

Differential Revision: D18762293

Pulled By: ezyang

fbshipit-source-id: d3d50c2dd12bcb678ab25fa708eb6587cc4b66f9
2019-12-02 12:19:58 -08:00
eff4c4d7c1 Revert D18301806: Use pybind11::gil_scoped_* functions instead of AutoGIL/AutoNoGIL
Test Plan: revert-hammer

Differential Revision:
D18301806

Original commit changeset: 03da6a26c41e

fbshipit-source-id: c1324ee8d154e7e16f5dd4f1cf3625aaa566cd39
2019-11-21 14:50:07 -08:00
f4b9690f2d Use pybind11::gil_scoped_* functions instead of AutoGIL/AutoNoGIL (#29095)
Summary:
Given that pybind11 implements these gil functions, I don't think it makes sense for Pytorch to have its own bespoke versions.

Fixes https://github.com/pytorch/pytorch/issues/29065
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29095

Differential Revision: D18301806

Pulled By: ezyang

fbshipit-source-id: 03da6a26c41ee65aaadf7b67b9f0b14d2def2a5a
2019-11-21 13:44:40 -08:00
b410d864c9 make python remote exception to rethrow when using remote reference to itself (#29930)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29930
Right now, python call remote exception rethrown is coupled with deserializtiaon.
For owner ref, the setValue() and getValue() do not use serialization and deserialization, so when users create a ref to itself, and call ownerRef.to_here(), python call remote exception will not be rethrown.

This diff is to move remote exception rethrown out of deserialization, and exception can be handled for ownerRef.localValue() or ownerRef.to_here()

close #29924
ghstack-source-id: 94210894

Test Plan: unit tests

Differential Revision: D18541916

fbshipit-source-id: 7cda93f623d52c740b3c1b1fa9a442f866984340
2019-11-19 21:33:21 -08:00
3214f134b6 fix python rpc handler exit crash (#27251)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27251

 Explicitly clean up py::objects to avoid segment faults when py::objects with CPython are cleaned up later at program exit.

See similar issues reported https://github.com/pybind/pybind11/issues/1598
and https://github.com/pybind/pybind11/issues/1493.

Our local tests also caught this segment faults if py::objects are cleaned
up at program exit. The explaination is: CPython cleans up most critical
utitlies before cleaning up PythonRpcHandler singleton, so when
PythonRpcHandler signleton cleans up py::objects and call dec_ref(), it
will crash.

The solution is to clean up py::objects earlier when Rpc agent join().
Be note that py::objects can not be cleaned up when Rpc agent is destroyed
as well, as Rpc agent is global variable and it will have same issue as
PythonRpcHandler.

close #27182
ghstack-source-id: 92035069

Test Plan: unit tests on python 3.6 and python 3.5

Differential Revision: D17727362

fbshipit-source-id: c254023f6a85acce35528ba756a4efabba9a519f
2019-10-16 16:57:38 -07:00
a6d26ce135 Move internal functions to torch.distributed.rpc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27289

Test Plan: Imported from OSS

Differential Revision: D17808214

Pulled By: pietern

fbshipit-source-id: 4c453028e431c3e951d439784017ef07037ba1a9
2019-10-08 11:31:20 -07:00
48a571b29c Rename variables and add comments (#27286)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27286

The name `runUDFFunction` stutters because the F in UDF also stands
for function. Renamed these variables to be identical to their Python
equivalents. Renamed those to share a prefix and drop `internal`,
because internal functions can use an underscore prefix.

Test Plan: Imported from OSS

Differential Revision: D17808208

Pulled By: pietern

fbshipit-source-id: 7619f07fc8215203dfb1da1eb281845edcd2bb99
2019-10-08 11:31:08 -07:00
c742918854 Fix pybind11 warnings in python_rpc_handler.cpp (#27284)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27284

The warnings related to usage of the deprecated != operator. Instead
of checking the member field on every function call, we can check it
once, on construction of PythonRpcHandler.

Test Plan: Imported from OSS

Differential Revision: D17808213

Pulled By: pietern

fbshipit-source-id: 022c8f77f266942c49c55b1729e62dbb06262d77
2019-10-08 11:30:59 -07:00
2486b0ba82 Add Python RRef as args and return value (#25499)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25499

See #23110 for model parallel design details, and #26759 for the RRef
protocol. This commit add support for using RRef as Python UDF arguments
and return value. RRefs can now be shared from owner to user, from user to
owner, or from user to user.

Limitations:
1. No implicit type conversion yet. (#27099)
2. No failure handling and retry. (#26116)
3. UDF is not yet blocked until all RRefs are confirmed. (#27098)
4. Internal RRef control messages are not idempotent yet. (#26116)
5. Cannot delete RRefs correctly when there are circular dependencies. (#27096)

Main changes:

1. Added `SCRIPT_REMOTE_CALL` and `PYTHON_REMOTE_CALL` to `Message.h` to represent `dist.remote` invocations.
2. Added `SCRIPT_RREF_FETCH_CALL`, `PYTHON_RREF_FETCH_CALL`, `RREF_USER_ACCEPT`, `RREF_USER_DELETE`, `RREF_CHILD_ACCEPT`, and `RREF_FORK_REQUEST` to `Message.h` as internal RRef control messages.
3. New message request handling code is added to `functions.cpp`, and message format is added in `script_remote_call.h`, `python_remote_call.h`, and `rref_proto.h`.
4. Added a `PyRRef` type in `py_rref.h` and `py_rref.cpp` which holds a shared pointer to C++ `RRef` type. `PyRRef` wraps the C++ API and also implements RRef pickling and unpickling. RRef fork related control messages will be sent during RRef pickling/unpickling procedure.
5.  Update `RRef.h` and `RRef.cpp` accordingly to support `py::object` RRefs.
6. RRef context (reference count, etc.) are tracked in `rref_context.h` and `rref_context.cpp`.

Test Plan:
Imported from OSS

buck test mode/dev-nosan //caffe2/test:rpc_fork

Differential Revision: D17184146

Pulled By: mrshenli

fbshipit-source-id: a3a268efc087ac1ef489136ab957080382629265
2019-10-03 17:47:12 -07:00
fe4170bda8 Add send and recv backward functions for builtin operators RPC. (#25527)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25527

Master GH issue: https://github.com/pytorch/pytorch/issues/23110.

This change builds upon https://github.com/pytorch/pytorch/pull/24876 and
provides all the autograd hooks needed for a forward pass with distributed rpc
for builtin operators. This change does not address distributed rpc for python
UDFs and that will be addressed in follow up PRs.

Summary of changes:
1. Attach send autograd functions when a request is sent from the client and
response is sent from the server.
2. Attach receive autograd functions when a request is received on the server
and a response is received on the client.
3. Generate a globally unique autograd_message_id for each send/recv autograd
function pair to uniquely identify them.
ghstack-source-id: 91240466

Test Plan: unit tests.

Differential Revision: D17148077

fbshipit-source-id: 192d8a3f552ed7cc939f55dcca332965c9bd3233
2019-10-03 01:18:46 -07:00