pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-21 05:34:18 +08:00

Author	SHA1	Message	Date
Nikita Shulga	c4d1ff02f8	[Lint] Update clang-format to 19.1.4 (#153889 ) All changes other than the one to `tools/linter/adapters/s3_init_config.json` are generated by newer clang-format Pull Request resolved: https://github.com/pytorch/pytorch/pull/153889 Approved by: https://github.com/cyyever, https://github.com/atalman	2025-05-20 14:12:46 +00:00
cyy	70206499f1	[3/N] Fix extra warnings brought by clang-tidy-17 (#137552 ) Follows #137459 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137552 Approved by: https://github.com/ezyang	2024-10-15 02:33:44 +00:00
cyy	95dbbf713e	[Distributed] [9/N] Fix clang-tidy warnings in torch/csrc/distributed/rpc (#130109 ) Follows #125102 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130109 Approved by: https://github.com/ezyang	2024-07-16 04:23:42 +00:00
Nikita Shulga	d7caef7996	[CI] Update clang-format (#116002 ) To 17.0.6 build using https://github.com/pytorch/test-infra/blob/main/.github/workflows/clang-tidy-linux.yml Pull Request resolved: https://github.com/pytorch/pytorch/pull/116002 Approved by: https://github.com/suo	2023-12-18 14:58:46 +00:00
Aaron Gokaslan	387d769156	[BE]: Replace string compares with more efficient cpp comparisons (#92765 ) Replace cpp string comparisons with more efficient equality operators. These string comparisons are not just more readable, but they also allow for short-circuiting for faster string equality checks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92765 Approved by: https://github.com/ezyang	2023-01-22 21:40:19 +00:00
Zhengxu Chen	b55a2500d2	[jit] Remove graph() call from abstract Function interface. (#65967 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65967 Graph is an implementation detail. If user wants to get access to the underlying graph, they should be able to explicitly dynamic cast instead. ghstack-source-id: 141659819 Test Plan: no behavior change. Reviewed By: gmagogsfm Differential Revision: D31326153 fbshipit-source-id: a0e984f57c6013494b92a7095bf5bb660035eb84	2021-10-27 11:54:26 -07:00
Nikita Shulga	a9b0a921d5	Disable `avoid-non-const-global-variables` lint check (#62008 ) Summary: As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH` All changes but the ones to `.clang-tidy` are generated using following script: ``` for i in `find . -type f -iname ".c" -or -iname "*.h"\|xargs grep cppcoreguidelines-avoid-non-const-global-variables\|cut -f1 -d:\|sort\|uniq`; do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008 Reviewed By: driazati, r-barnes Differential Revision: D29838584 Pulled By: malfet fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13	2021-07-22 18:04:40 -07:00
Mike Guo	6ecc1a4c4f	Make pytorch clang-tidy clean (#60649 ) Summary: This PR suppresses clang-tidy warnings in the codebase (for now) so that we can re-enable clang-tidy checks on master. I ran this script to add the `NOLINTNEXTLINE` comments (on a devserver): ```bash python3 setup.py develop # Uses same script that's run on CI and adds the -j (parallel), -s (add comments), -k (continue if diagnostic errors are found) options python3 tools/clang_tidy.py \ -j \ -s \ -k \ -v \ --paths torch/csrc/ \ -g"-torch/csrc/jit/passes/onnx/helper.cpp" \ -g"-torch/csrc/jit/passes/onnx/shape_type_inference.cpp" \ -g"-torch/csrc/jit/serialization/onnx.cpp" \ -g"-torch/csrc/jit/serialization/export.cpp" \ -g"-torch/csrc/jit/serialization/import.cpp" \ -g"-torch/csrc/jit/serialization/import_legacy.cpp" \ -g"-torch/csrc/onnx/init.cpp" \ -g"-torch/csrc/cuda/nccl." \ -g"-torch/csrc/cuda/python_nccl.cpp" \ -g"-torch/csrc/autograd/FunctionsManual.cpp" \ -g"-torch/csrc/generic/.cpp" \ -g"-torch/csrc/jit/codegen/cuda/runtime/*" \ -g"-torch/csrc/deploy/interpreter/interpreter.cpp" \ -g"-torch/csrc/deploy/interpreter/interpreter.h" \ -g"-torch/csrc/deploy/interpreter/interpreter_impl.h" \ -g"-torch/csrc/deploy/interpreter/test_main.cpp" ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/60649 Test Plan: Verified changes by re-running the script (without the `-s` option) and seeing no warnings/errors. Reviewed By: walterddr, janeyx99 Differential Revision: D29504258 Pulled By: 1ntEgr8 fbshipit-source-id: 78310b30ee8213b73ddb4771ad874665323e7a4e	2021-07-01 12:21:07 -07:00
Shen Li	924717bf51	Add _get_type() API to RRef (#44663 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44663 The new API returns the type of the data object referenced by this `RRef`. On the owner, this is same as `type(rref.local_value())`. On a user, this will trigger an RPC to fetch the `type` object from the owner. After this function is run once, the `type` object is cached by the `RRef`, and subsequent invocations no longer trigger RPC. closes #33210 Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D23691990 Pulled By: mrshenli fbshipit-source-id: a2d87cd601a691dd75164b6bcd7315245e9cf6bd	2020-09-16 11:59:22 -07:00
Pritam Damania	89b0b3bc8c	Allow RPC to be initialized again after shutdown. (#42723 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42723 This PR is addressing https://github.com/pytorch/pytorch/issues/39340 and allows users to initialize RPC again after shutdown. Major changes in the PR include: 1. Change to DistAutogradContainer to support this. 2. Ensure PythonRpcHandler is reinitialized appropriately. 3. Use PrefixStore in RPC initialization to ensure each new `init_rpc` uses a different prefix. ghstack-source-id: 109805368 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D22993909 fbshipit-source-id: 9f1c1e0a58b58b97125f41090601e967f96f70c6	2020-08-13 20:18:34 -07:00
Rohan Varma	7e82382ad5	Allow profiler to be enabled remotely with RPC (#38748 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38748 This diff contains the message scaffolding and profiler changes in order to be able to remotely run the profiler across different nodes and aggregate the results on a single node. As discussed, we have implemented this by creating new message types, that similar to autograd messages, wrap the profiling information with the original message, and send this new message over the wire. On the receiving end, this wrapped message is detected, we fetch the original message from it, and process the original message with the profiler enabled. When sending a response with profiling information, we serialize the profiled `Events` and send them back over RPC. When such a message is received, the events profiled on the remote node are stored (added back to the local profiler). Changes in this PR: - New message types (run_with_profiling_req, run_with_profiling_resp) to send profiling info over the wire. Message parsing logic is added to handle these wrapped types. - Handling of sending profiler data over the wire, in particular, the attributes of the `ProfilerConfig` and the serialized profiled `Event`s - The logic for wrapping RPC messages is deduped with that in `rpc_with_autograd`, and the common payload wrapping/unwrapping logic is moved to helper functions in `rpc/utils.cpp` - Changes in `autograd/utils.cpp` to detect if we have enabled the profiler and are sending an RPC, if so, uses the above new message types - Changes in request_callback to parse and turn on the profiler in a thread-local fashion - Serialization and deserialization of profiling `Events`, and support to add the remote events to the thread-local profiler - Introduction of the concept of `node_id`, which as discussed with ilia-cher , will be used along with the `Event`s handle attribute to distinguish between events. When there are events from different nodes, this node information is rendered in the profile output (e.g. when printing tables), otherwise, it is not, since it is irrelevant. - Some changes to profiler.cpp to add useful helper methods/guards - toHere() is now profiled for RRefs - Unittests ghstack-source-id: 106134626 Test Plan: Added unittests, existing profiler unittests. Differential Revision: D19510010 fbshipit-source-id: 044347af992f19a9e3b357c9567f6fc73e988157	2020-06-18 17:01:57 -07:00
Shen Li	fa4ed17183	Explicitly decref in UnpickledPythonCall dtor (#38398 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38398 Test Plan: Imported from OSS Differential Revision: D21550712 Pulled By: mrshenli fbshipit-source-id: aac4708a5b6f6dc38149f995d11e27c190648859	2020-06-04 22:36:35 -07:00
Shen Li	a05ef17e46	Add rpc.functions.async_execution decorator for rpc_sync/rpc_async (#39216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39216 The `rpc.functions.async_execution` decorator specifies that the wrapped function is guaranteed to return a `torch.futures.Future`. The decorator adds a `_wrapped_async_rpc_function` attribute to the wrapper function. The caller retrieves this information and then sets `isAsyncFunction` argument accordingly which is later added to PythonCall RPC message as a field. On the callee side, if the PythonCall carries an asynchronous function, it will cast the function's return value to a jit::PythonFutureWrapper object, and then install response creation and communication as a callback on the that jit::PythonFutureWrapper. For applications, this feature is useful when a function needs to wait for IO or additional singaling. In those cases, marking the user function as `rpc.functions.async_execution` will prevent it from blocking one thread on callee for too long. Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D21779962 fbshipit-source-id: 6b6aa698bf6f91dad6ed2a7ee433df429b59e941	2020-06-02 23:21:25 -07:00
Shen Li	797c608f50	Explicitly decref py::object in PythonRpcHandler (#38366 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38366 Test Plan: Imported from OSS Differential Revision: D21537612 Pulled By: mrshenli fbshipit-source-id: 089bcc3d7de3bce6e769f72d67e0e0f91e0219c6	2020-05-12 20:55:59 -07:00
Shen Li	5c2b273089	Add RRef Python Helper to launch function on the referenced object (#36619 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36619 With this PR, applications no longer need to create dedicated helpers to run functions on the object referenced by an RRef. Instead, `rref.rpc_sync().some_func()` will use `rpc_sync` to run `some_func` on the owner of the RRef using the object referenced by the RRef. Similar helpers for `rref.rpc_async().some_func()` and `rref.remote().some_func()` are also added. An alternative design is to expose PyRRef as RRefBase and then implement everything in a new Python RRef class. However, the RRef class cannot directly inherit from PyRRef/RRefBase, otherwise we will need to let pyRemote* C++ functions to load RRef from Python and return an RRef instance. It is possible to let RRef hold a instance of PyRRef instead of inherit from it, but this does not look like a elegant design, as we will have RRef holding PyRRef and PyRRef holding the C++ RRef. Another alternative is to use dynamic method loading, by installing member methods to PyRRef instances. However, this would require different solutions to handle RRef(data) and rpc.remote(...). Base on the above thinking, we decided to go with the current implementation for simplicity and we can also keep all RRef-related APIs in one place. Test Plan: Imported from OSS Differential Revision: D21028333 Pulled By: mrshenli fbshipit-source-id: fe90f56ef7183d18874e357900093755e1601eb4	2020-04-21 19:29:54 -07:00
Rohan Varma	75e4c53b35	[rpc] Add a debug only check to debug python cleanup races (#35395 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35395 as title ghstack-source-id: 101035263 Test Plan: CI Differential Revision: D20632634 fbshipit-source-id: 737e353982b325e73da3825b130aae6b11dbcfe7	2020-03-27 18:53:35 -07:00
Shihao Xu	ac639d927a	Reland "[RPC] Use qualified name str directly in RPC torch script code path" (#35489 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35489 Relanding https://github.com/pytorch/pytorch/pull/34733. Fix is in https://github.com/pytorch/pytorch/pull/34988 Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork ``` ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork && \ buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \ -r test_return_local_script_class_rref_in_py_and_use_in_script buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork && \ buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \ -r test_return_local_script_module_rref_in_py_and_use_in_script ``` Differential Revision: D20661748 fbshipit-source-id: d550daab8d689d0a9aa2450f3bdb7417ab79dae2	2020-03-26 23:41:51 -07:00
Mike Ruberry	5d92a6cc30	Revert D7778113: Reland "[RPC] Use qualified name str directly in RPC torch script code path" Test Plan: revert-hammer Differential Revision: D7778113 Original commit changeset: b830c03ac946 fbshipit-source-id: ef08b287a6db58320c738cde0c99b3333f5724eb	2020-03-19 06:05:23 -07:00
Shihao Xu	d616cad676	Reland "[RPC] Use qualified name str directly in RPC torch script code path" (#34962 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34962 Relanding #34733. Fix is in https://github.com/pytorch/pytorch/pull/34988. Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork ``` ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \ && buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \ -r test_return_local_script_class_rref_in_py_and_use_in_script buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \ && buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \ -r test_return_local_script_module_rref_in_py_and_use_in_script ``` ``` buck test mode/dev //caffe2/test/distributed/rpc/jit:rpc_fork_thrift -- test_return_local_script_module_rref_in_py_and_use_in_script ``` Differential Revision: D7778113 fbshipit-source-id: b830c03ac9463075fca248eba75be364b0e8b080	2020-03-18 22:25:09 -07:00
Shihao Xu	d29f450e63	Revert D20442573: [RPC] Use qualified name str directly in RPC torch script code path Test Plan: revert-hammer Differential Revision: D20442573 Original commit changeset: 87f8b7d94adc fbshipit-source-id: db0f10c28352d2b3ca21b5357e8e09c01a50018c	2020-03-18 11:00:09 -07:00
Shihao Xu	ecd7c0f84c	[RPC] Use qualified name str directly in RPC torch script code path (#34733 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34733 simplify ghstack-source-id: 100292435 Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork ``` ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \ && buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \ -r test_return_local_script_class_rref_in_py_and_use_in_script buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \ && buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \ -r test_return_local_script_module_rref_in_py_and_use_in_script ``` Differential Revision: D20442573 fbshipit-source-id: 87f8b7d94adc03544f8e2955d01cd4702bb31a34	2020-03-17 10:28:52 -07:00
Shen Li	38b2856c71	Split deserialize from runPythonUdf and remove generatePythonUDFResult (#34496 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34496 Differential Revision: D20347469 Test Plan: Imported from OSS Pulled By: mrshenli fbshipit-source-id: b832a3a9e2ef61f149175f737b26f65d63bf797b	2020-03-16 18:28:07 -07:00
Michael Suo	c235be42dd	[jit] kill script namespace (#34515 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34515 Once upon a time we thought this was necessary. In reality it is not, so removing it. For backcompat, our public interface (defined in `api/`) still has typedefs to the old `script::` names. There was only one collision: `Pass` as a `Stmt` and `Pass` as a graph transform. I renamed one of them. Test Plan: Imported from OSS Differential Revision: D20353503 Pulled By: suo fbshipit-source-id: 48bb911ce75120a8c9e0c6fb65262ef775dfba93	2020-03-11 23:32:48 -07:00
Shihao Xu	8d84c5f1c7	Fix static data initialization deadlock on GIL (#34505 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34505 A thread could hold GIL when calling PythonRpcHandler::getInstance(), meantime another thread could have been doing static data initialization by calling `new PythonRpcHandler()`, inside of which GIL is also required. Static data initialization is thread-safe, so the thread holding the GIL will wait for the other thread to finish static data initializating before going forward. Because the initialization can't proceed without GIL, there is a deadlock. We ask the calling thread to release GIL to avoid this situation. ghstack-source-id: 99893858 Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_spawn -- 'test_backward_simple_script_call $test_dist_autograd_spawn\.DistAutogradTestWithSpawn$' --stress-runs 100 ``` Differential Revision: D7490489 fbshipit-source-id: 76f63cc7bedf088d3dbff288f53aa0bd33749255	2020-03-10 20:40:22 -07:00
Shen Li	b9c32209db	Use SerializedPyObj in PythonRpcHandler::generatePythonUDFResult (#34495 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34495 Differential Revision: D20347466 Test Plan: Imported from OSS Pulled By: mrshenli fbshipit-source-id: 79625adb4ac3c9c6da4f40016e973bf17466c693	2020-03-09 20:41:05 -07:00
Shen Li	b82658810e	Split deserialize from _run_function in RPC internal.py (#34494 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34494 Differential Revision: D20347463 Test Plan: Imported from OSS Pulled By: mrshenli fbshipit-source-id: e6fd886622f26c46bb83ac118e67abb2f5b296b9	2020-03-09 20:41:00 -07:00
Shen Li	544fb64440	Use SerializedPyObj in PythonRpcHandler (#34493 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34493 Differential Revision: D20347462 Test Plan: Imported from OSS Pulled By: mrshenli fbshipit-source-id: 9edda9eb95b1994464459271bb53ee77b760e474	2020-03-09 20:40:55 -07:00
Shen Li	18ef09f5ac	Remove _load_return_value from RPC internal.py (#34492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34492 Differential Revision: D20347468 Test Plan: Imported from OSS Pulled By: mrshenli fbshipit-source-id: 92388d0d50a08fb895bacacf94c7b5495b4ae2b6	2020-03-09 20:40:50 -07:00
Shen Li	6d1c4df660	Consolidate Python Messages to use SerializedPyObj (#34491 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34491 Differential Revision: D20347467 Test Plan: Imported from OSS Pulled By: mrshenli fbshipit-source-id: efae4111d961f3a528cede77c863fb049cda9029	2020-03-09 20:40:45 -07:00
James Reed	60e8615a6d	[JIT] Virtualize Function (#33921 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33921 NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.intern.facebook.com/intern/diff/D20153092/)! Test Plan: Imported from OSS Differential Revision: D20177227 Pulled By: jamesr66a fbshipit-source-id: 87f3e484c4f873d60f76f50f6789c1b4a73bdfde	2020-03-07 10:03:50 -08:00
Michael Suo	dbe850af5b	[jit] do the code reorg (#33851 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33851 Rationale and context described in #33828. Script to reproduce the move: https://gist.github.com/suo/16cbefaaeb67ca5a7c6caffd49b7f6e9 ghstack-source-id: 99079645 Test Plan: Make sure CI passes Reviewed By: jamesr66a Differential Revision: D20133869 fbshipit-source-id: 390e9241a9c85366d9005c492ac31f10aa96488e	2020-02-27 13:02:51 -08:00
Jeremy Lilley	9266bde970	[pytorch] Minor: add GIL assert to PythonRpcHandler::handleExceptionGILHeld (#33557 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33557 We should add GIL asserts in some places to keep assumptions documented. This just adds one in an exception codepath as a placeholder for more. This change also moves a #define from a .h to the .cpp to reduce scope. ghstack-source-id: 98673532 Test Plan: buck test mode/dev-nosan caffe2/test/... Differential Revision: D20005387 fbshipit-source-id: b7eff54a6f1dd69d199f8ca05cdb3001c50b37c4	2020-02-20 18:15:44 -08:00
Pritam Damania	a751ddaaa5	Use leaky singletons for torch.distributed. (#32923 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32923 As per https://isocpp.org/wiki/faq/ctors#construct-on-first-use-v2 and https://isocpp.org/wiki/faq/ctors#static-init-order-on-first-use-members, we should be using leaky singletons to avoid static initialization order problem. Closes https://github.com/pytorch/pytorch/issues/27412 ghstack-source-id: 97601384 Test Plan: waitforbuildbot Differential Revision: D19688986 fbshipit-source-id: 8c1935fb7da8a7116dbca55eb43dc04bc02695ac	2020-02-03 14:15:18 -08:00
Jeremy Lilley	821b6aa769	[pytorch] Minor: avoid acquiring GIL twice in PyRRef::localValue() (#32785 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32785 Add PythonRpcHandler::handleExceptionWithGIL() so that in PyRRef::localValue(), we don't need to release the GIL and re-acquire the following line. ghstack-source-id: 97418465 Test Plan: existing test coverage Differential Revision: D19626195 fbshipit-source-id: db694d04b078811f819626789e1e86f1b35adb5b	2020-01-29 21:27:43 -08:00
Yanli Zhao	b5d8982ae2	clean up GIL usuage (#32748 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32748 This is to follow up PR #30630, we need to have GIL when calling jit::toPyObject(), for some binded functions need to be taged with GIL release if underneath C++ codes requires GIL. so 1. pyRef::to_here() and pyRef::local_value() added GIL 2. pyRef::pickle and pyRef::unpickle() added GIL release tag 3. in request_callback_impl, also added GIL as needed 4. for typeParser, use cached jitCompilationUnit_, also clean it up in cleanUp() function ghstack-source-id: 97373011 Test Plan: unit test Differential Revision: D19612337 fbshipit-source-id: 4d09f9b52ba626545ae7d31fea6b671301ed3890	2020-01-29 11:58:46 -08:00
Yanli Zhao	b474c351dd	[rpc] Remove template on RRef and add Type to RRef creation (#30630 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30630 This remove template and all the specializations it have in rpc, we universally use IValue as the inner value since we support making python object to be hold inside IValue. This will also ensure that we have the correct type information when creating the RRef, we use the return type from the schema when creating userRRef and OwnerRRef, it will enable IValue to always have the correct type if the IValue is the RRef object (next PR) Test Plan: Imported from OSS Differential Revision: D19502235 fbshipit-source-id: 0d5decae8a9767e0893f3b8b6456b231653be3c5	2020-01-23 21:15:46 -08:00
Rohan Varma	9ce25cce91	add an option to record time spent waiting for GIL (#30842 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30842 We'd like to profile the time spent on GIL acqusiition to debug performance issues. Test Plan: Unit tests pass. Differential Revision: D18837590 fbshipit-source-id: 925968f71c5fb96b8cd93f1eab4647602d2617d1	2020-01-21 11:29:23 -08:00
Yanli Zhao	58234c0254	support torch script call over rpc (#32197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32197 This is to reland https://github.com/pytorch/pytorch/pull/30063, the main change is to match a general exception and grep "pickle" error word in "test_script_functions_not_supported" unit test, as Python 3.5 and Python 3.6 throw different types of errors with different error message for the rpc call in the unit test. [test all]This diff makes following changes: 1. Providing a new set of python rpc privated APIs, they can accept an annotated TorchScript call and this call can be serialized, deserialized and executed in C++ without GIL. These privated APIs will be binded to JIT in the future, and they are different from public APIs as future JIT binded private APIs will be able to accept qualified_name, not callables. These private APIs are subject to be deprecated once JIT supports torch script function to be a JIT type. Also, these APIs require torch script function to be defined and annotated by users in python land, it can not be script class/module constructor or class/module methods. 2. This diff also allows public rpc APIs to accept an annotated TorchScript call and execute code path that above private APIs ran on. Therefore if users invoke an annotated TorchScript call over RPC, this call can be serialized, deserialized and executed in C++ without GIL as well. 3. The above private APIs call a newly defined C++ function to make rpc torch script call to be serialized, deserialized and executed in C++ land. This C++ function returns an ivalue::Future. so that in follow up diff this C++ function can be called when these privated APIs are binded to JIT. 4. script_call.cpp/.h and request_callback_impl.cpp files are refactored accordingly so that torch script call and builtin call can share same message type and codes. 5. refactored deserializeResponse() and added a new utility to deserizalize response to IValue ghstack-source-id: 96879167 ghstack-source-id: 96879167 Test Plan: unit test Differential Revision: D19402374 fbshipit-source-id: 04efcc7c167d08a6503f29efe55e76f2be4b2c5e	2020-01-18 09:24:17 -08:00
Michael Suo	51a34545e9	Revert D18482934: support torch script call over rpc Test Plan: revert-hammer Differential Revision: D18482934 Original commit changeset: bd82a0d820c4 fbshipit-source-id: ca5e50fb0a883ee311aeb310198d84ad28062158	2020-01-14 13:30:56 -08:00
Yanli Zhao	dbd737158b	support torch script call over rpc (#30063 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30063 This diff makes following changes: 1. Providing a new set of python rpc privated APIs, they can accept an annotated TorchScript call and this call can be serialized, deserialized and executed in C++ without GIL. These privated APIs will be binded to JIT in the future, and they are different from public APIs as future JIT binded private APIs will be able to accept qualified_name, not callables. These private APIs are subject to be deprecated once JIT supports torch script function to be a JIT type. Also, these APIs require torch script function to be defined and annotated by users in python land, it can not be script class/module constructor or class/module methods. 2. This diff also allows public rpc APIs to accept an annotated TorchScript call and execute code path that above private APIs ran on. Therefore if users invoke an annotated TorchScript call over RPC, this call can be serialized, deserialized and executed in C++ without GIL as well. 3. The above private APIs call a newly defined C++ function to make rpc torch script call to be serialized, deserialized and executed in C++ land. This C++ function returns an ivalue::Future. so that in follow up diff this C++ function can be called when these privated APIs are binded to JIT. 4. script_call.cpp/.h and request_callback_impl.cpp files are refactored accordingly so that torch script call and builtin call can share same message type and codes. 5. refactored deserializeResponse() and added a new utility to deserizalize response to IValue ghstack-source-id: 96638829 Test Plan: unit test Differential Revision: D18482934 fbshipit-source-id: bd82a0d820c47a8e45b2e7c616eca06573f7d7ea	2020-01-14 09:27:04 -08:00
Edward Yang	1111a6b810	Use pybind11::gil_scoped_* functions instead of AutoGIL/AutoNoGIL (#30274 ) Summary: Reland of https://github.com/pytorch/pytorch/pull/29095 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30274 Differential Revision: D18762293 Pulled By: ezyang fbshipit-source-id: d3d50c2dd12bcb678ab25fa708eb6587cc4b66f9	2019-12-02 12:19:58 -08:00
Mike Ruberry	eff4c4d7c1	Revert D18301806: Use pybind11::gil_scoped_* functions instead of AutoGIL/AutoNoGIL Test Plan: revert-hammer Differential Revision: D18301806 Original commit changeset: 03da6a26c41e fbshipit-source-id: c1324ee8d154e7e16f5dd4f1cf3625aaa566cd39	2019-11-21 14:50:07 -08:00
Alan Du	f4b9690f2d	Use pybind11::gil_scoped_* functions instead of AutoGIL/AutoNoGIL (#29095 ) Summary: Given that pybind11 implements these gil functions, I don't think it makes sense for Pytorch to have its own bespoke versions. Fixes https://github.com/pytorch/pytorch/issues/29065 Pull Request resolved: https://github.com/pytorch/pytorch/pull/29095 Differential Revision: D18301806 Pulled By: ezyang fbshipit-source-id: 03da6a26c41ee65aaadf7b67b9f0b14d2def2a5a	2019-11-21 13:44:40 -08:00
Yanli Zhao	b410d864c9	make python remote exception to rethrow when using remote reference to itself (#29930 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29930 Right now, python call remote exception rethrown is coupled with deserializtiaon. For owner ref, the setValue() and getValue() do not use serialization and deserialization, so when users create a ref to itself, and call ownerRef.to_here(), python call remote exception will not be rethrown. This diff is to move remote exception rethrown out of deserialization, and exception can be handled for ownerRef.localValue() or ownerRef.to_here() close #29924 ghstack-source-id: 94210894 Test Plan: unit tests Differential Revision: D18541916 fbshipit-source-id: 7cda93f623d52c740b3c1b1fa9a442f866984340	2019-11-19 21:33:21 -08:00
Yanli Zhao	3214f134b6	fix python rpc handler exit crash (#27251 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27251 Explicitly clean up py::objects to avoid segment faults when py::objects with CPython are cleaned up later at program exit. See similar issues reported https://github.com/pybind/pybind11/issues/1598 and https://github.com/pybind/pybind11/issues/1493. Our local tests also caught this segment faults if py::objects are cleaned up at program exit. The explaination is: CPython cleans up most critical utitlies before cleaning up PythonRpcHandler singleton, so when PythonRpcHandler signleton cleans up py::objects and call dec_ref(), it will crash. The solution is to clean up py::objects earlier when Rpc agent join(). Be note that py::objects can not be cleaned up when Rpc agent is destroyed as well, as Rpc agent is global variable and it will have same issue as PythonRpcHandler. close #27182 ghstack-source-id: 92035069 Test Plan: unit tests on python 3.6 and python 3.5 Differential Revision: D17727362 fbshipit-source-id: c254023f6a85acce35528ba756a4efabba9a519f	2019-10-16 16:57:38 -07:00
Pieter Noordhuis	a6d26ce135	Move internal functions to torch.distributed.rpc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27289 Test Plan: Imported from OSS Differential Revision: D17808214 Pulled By: pietern fbshipit-source-id: 4c453028e431c3e951d439784017ef07037ba1a9	2019-10-08 11:31:20 -07:00
Pieter Noordhuis	48a571b29c	Rename variables and add comments (#27286 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27286 The name `runUDFFunction` stutters because the F in UDF also stands for function. Renamed these variables to be identical to their Python equivalents. Renamed those to share a prefix and drop `internal`, because internal functions can use an underscore prefix. Test Plan: Imported from OSS Differential Revision: D17808208 Pulled By: pietern fbshipit-source-id: 7619f07fc8215203dfb1da1eb281845edcd2bb99	2019-10-08 11:31:08 -07:00
Pieter Noordhuis	c742918854	Fix pybind11 warnings in python_rpc_handler.cpp (#27284 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27284 The warnings related to usage of the deprecated != operator. Instead of checking the member field on every function call, we can check it once, on construction of PythonRpcHandler. Test Plan: Imported from OSS Differential Revision: D17808213 Pulled By: pietern fbshipit-source-id: 022c8f77f266942c49c55b1729e62dbb06262d77	2019-10-08 11:30:59 -07:00
Shen Li	2486b0ba82	Add Python RRef as args and return value (#25499 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25499 See #23110 for model parallel design details, and #26759 for the RRef protocol. This commit add support for using RRef as Python UDF arguments and return value. RRefs can now be shared from owner to user, from user to owner, or from user to user. Limitations: 1. No implicit type conversion yet. (#27099) 2. No failure handling and retry. (#26116) 3. UDF is not yet blocked until all RRefs are confirmed. (#27098) 4. Internal RRef control messages are not idempotent yet. (#26116) 5. Cannot delete RRefs correctly when there are circular dependencies. (#27096) Main changes: 1. Added `SCRIPT_REMOTE_CALL` and `PYTHON_REMOTE_CALL` to `Message.h` to represent `dist.remote` invocations. 2. Added `SCRIPT_RREF_FETCH_CALL`, `PYTHON_RREF_FETCH_CALL`, `RREF_USER_ACCEPT`, `RREF_USER_DELETE`, `RREF_CHILD_ACCEPT`, and `RREF_FORK_REQUEST` to `Message.h` as internal RRef control messages. 3. New message request handling code is added to `functions.cpp`, and message format is added in `script_remote_call.h`, `python_remote_call.h`, and `rref_proto.h`. 4. Added a `PyRRef` type in `py_rref.h` and `py_rref.cpp` which holds a shared pointer to C++ `RRef` type. `PyRRef` wraps the C++ API and also implements RRef pickling and unpickling. RRef fork related control messages will be sent during RRef pickling/unpickling procedure. 5. Update `RRef.h` and `RRef.cpp` accordingly to support `py::object` RRefs. 6. RRef context (reference count, etc.) are tracked in `rref_context.h` and `rref_context.cpp`. Test Plan: Imported from OSS buck test mode/dev-nosan //caffe2/test:rpc_fork Differential Revision: D17184146 Pulled By: mrshenli fbshipit-source-id: a3a268efc087ac1ef489136ab957080382629265	2019-10-03 17:47:12 -07:00
Pritam Damania	fe4170bda8	Add send and recv backward functions for builtin operators RPC. (#25527 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25527 Master GH issue: https://github.com/pytorch/pytorch/issues/23110. This change builds upon https://github.com/pytorch/pytorch/pull/24876 and provides all the autograd hooks needed for a forward pass with distributed rpc for builtin operators. This change does not address distributed rpc for python UDFs and that will be addressed in follow up PRs. Summary of changes: 1. Attach send autograd functions when a request is sent from the client and response is sent from the server. 2. Attach receive autograd functions when a request is received on the server and a response is received on the client. 3. Generate a globally unique autograd_message_id for each send/recv autograd function pair to uniquely identify them. ghstack-source-id: 91240466 Test Plan: unit tests. Differential Revision: D17148077 fbshipit-source-id: 192d8a3f552ed7cc939f55dcca332965c9bd3233	2019-10-03 01:18:46 -07:00

1 2

55 Commits