Commit Graph

5 Commits

Author SHA1 Message Date
dbe850af5b [jit] do the code reorg (#33851)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33851

Rationale and context described in #33828.

Script to reproduce the move:
https://gist.github.com/suo/16cbefaaeb67ca5a7c6caffd49b7f6e9
ghstack-source-id: 99079645

Test Plan: Make sure CI passes

Reviewed By: jamesr66a

Differential Revision: D20133869

fbshipit-source-id: 390e9241a9c85366d9005c492ac31f10aa96488e
2020-02-27 13:02:51 -08:00
043530a9b9 Support remote for Python UDF in distributed autograd
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28656

Test Plan: Imported from OSS

Differential Revision: D18138561

Pulled By: mrshenli

fbshipit-source-id: 798e7c00465b5a299f7b4642683bc407895bc7da
2019-10-29 19:39:04 -07:00
400293fcc6 Support remote for builtin operators in distributed autograd (#28630)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28630

This includes:
1. Respect autograd context in rpc.remote for builtin ops
2. Force setting autograd context in RRef.to_here() even if the
message for to_here() does not contain any tensor.

Test Plan: Imported from OSS

Differential Revision: D18138562

Pulled By: mrshenli

fbshipit-source-id: a39ec83e556d19130f22eb317927241a017000ba
2019-10-29 19:39:00 -07:00
58873776ff Make RRef::toHere() return a jit::Future (#27943)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27943

This is step 1 to make PyRRef::toHere() non-blocking on caller.

Test Plan: Imported from OSS

Differential Revision: D17936747

Pulled By: mrshenli

fbshipit-source-id: 7cf60e5804e72bdc28f0135fed4d7fdce05ea38a
2019-10-23 17:07:11 -07:00
2486b0ba82 Add Python RRef as args and return value (#25499)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25499

See #23110 for model parallel design details, and #26759 for the RRef
protocol. This commit add support for using RRef as Python UDF arguments
and return value. RRefs can now be shared from owner to user, from user to
owner, or from user to user.

Limitations:
1. No implicit type conversion yet. (#27099)
2. No failure handling and retry. (#26116)
3. UDF is not yet blocked until all RRefs are confirmed. (#27098)
4. Internal RRef control messages are not idempotent yet. (#26116)
5. Cannot delete RRefs correctly when there are circular dependencies. (#27096)

Main changes:

1. Added `SCRIPT_REMOTE_CALL` and `PYTHON_REMOTE_CALL` to `Message.h` to represent `dist.remote` invocations.
2. Added `SCRIPT_RREF_FETCH_CALL`, `PYTHON_RREF_FETCH_CALL`, `RREF_USER_ACCEPT`, `RREF_USER_DELETE`, `RREF_CHILD_ACCEPT`, and `RREF_FORK_REQUEST` to `Message.h` as internal RRef control messages.
3. New message request handling code is added to `functions.cpp`, and message format is added in `script_remote_call.h`, `python_remote_call.h`, and `rref_proto.h`.
4. Added a `PyRRef` type in `py_rref.h` and `py_rref.cpp` which holds a shared pointer to C++ `RRef` type. `PyRRef` wraps the C++ API and also implements RRef pickling and unpickling. RRef fork related control messages will be sent during RRef pickling/unpickling procedure.
5.  Update `RRef.h` and `RRef.cpp` accordingly to support `py::object` RRefs.
6. RRef context (reference count, etc.) are tracked in `rref_context.h` and `rref_context.cpp`.

Test Plan:
Imported from OSS

buck test mode/dev-nosan //caffe2/test:rpc_fork

Differential Revision: D17184146

Pulled By: mrshenli

fbshipit-source-id: a3a268efc087ac1ef489136ab957080382629265
2019-10-03 17:47:12 -07:00