mirror of https://github.com/pytorch/pytorch.git synced 2025-11-11 22:34:53 +08:00

Files

Shen Li d5b38984c8 Let RPC return FutureIValue instead of FutureMessage (#37519 )

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37519

closes #37446

Currently FutureMessage is used in several places:

1. `rpc_async` returns a `FutureMessage` object and we expose it
   as `torch.distributed.rpc.Future`. From applications perspective,
   they are expecting a `py::object` instead of a `Message`, and we
   do the conversion in the `Future.wait()` pybind method.
2. RPC autograd profiler takes `FutureMessage` and installs
   callbacks to it. The profiler actually only need a `Future<T>`
   and does not care what `T` is.
3. `OwnerRRef` exposes a `getFuture()` API which returns a
   `FutureMessage`. This `FutureMessage` will be marked completed
   when the value referenced by the `OwnerRRef` is ready.
   `OwnerRRef` does not need it to be a Message type either, it
   actually creates an empty `Message` to mark the `Future`.

The above places are using `FutureMessage`, but they don't really
need a `Message`, and `Message` is a communication layer type that
applications or profiler or the RRef shouldn't be aware of.

Another motivation for making this change is that for async RPC
UDF #36071, we are going to allow application to call
`markCompleted` in Python. If we still use `FutureMessage`, then
in the `markCompleted` pybind function, it needs to convert the
provided `py::object` into a specific message type, which is
leaking communication layer code to pybind functions. Even if
this is doable, we will have two entities (RPC agent and pybind
Python frontend) accessing the same request callback logic. This is too messy.

This commit replaces all surface `FutureMessage` with `FutureIValue`,
so that `FutureMessage` is no longer visible from Python land. Note
that this does not cause BC issues, as the Python Future type name
and its API stay intact. Internally, we still have `FutureMessage`
in the communication layer.

Test Plan: Imported from OSS

Reviewed By: xush6528

Differential Revision: D21308887

Pulled By: mrshenli

fbshipit-source-id: 4f574f38e83125081f142813cfdde56119522089

2020-04-29 19:10:29 -07:00

api

Fix cpp extension compile failure on some envs (#37221 )

2020-04-26 11:00:20 -07:00

autograd

Let RPC return FutureIValue instead of FutureMessage (#37519 )

2020-04-29 19:10:29 -07:00

cuda

[PyTorch][Dist] Trigger pre/post hooks of output function nodes under distributed autograd (#34501 )

2020-04-21 13:23:18 -07:00

distributed

Let RPC return FutureIValue instead of FutureMessage (#37519 )

2020-04-29 19:10:29 -07:00

generic

Fix some bugs with zipfile serialization (#32244 )

2020-02-05 15:32:14 -08:00

jit

Add overload names to dict operators. (#37279 )

2020-04-29 12:10:28 -07:00

multiprocessing

Fix remaining invalid function cast warnings that show up with GCC 8/9 (#26104 )

2019-09-17 07:43:37 -07:00

onnx

[ONNX] fix provider_version and add consistency test (#36797 )

2020-04-27 11:00:23 -07:00

tensor

Fix typos, via a Levenshtein-type corrector (#31523 )

2020-01-17 16:03:19 -08:00

utils

[reland][quant] QuantizedCUDA implementation (#36936 ) (#37081 )

2020-04-24 10:21:59 -07:00

copy_utils.h

Canonicalize all includes in PyTorch. (#14849 )

2018-12-08 19:38:30 -08:00

CudaIPCTypes.cpp

Implement reference counting for shared IPC CUDA tensors (#16854 )

2019-03-25 10:24:38 -07:00

CudaIPCTypes.h

Implement reference counting for shared IPC CUDA tensors (#16854 )

2019-03-25 10:24:38 -07:00

DataLoader.cpp

Enable -Werror=format compile errors on torch exception types (#34019 )

2020-03-02 13:25:39 -08:00

DataLoader.h

Canonicalize all includes in PyTorch. (#14849 )

2018-12-08 19:38:30 -08:00

Device.cpp

Make PyTorch Python 3.8 compatible (#29302 )

2019-11-07 09:20:19 -08:00

Device.h

Python <-> C++ Frontend inter-op (#13481 )

2018-12-13 08:04:02 -08:00

dl.c

Improve Windows Compatibility (for csrc/scripts) (#2941 )

2017-11-08 19:51:35 +01:00

Dtype.cpp

Added tensor.is_complex(), is_complex and dtype.is_complex py binding, tensor printing, and dixed the scalar type returned for complex float (#33268 )

2020-02-20 13:38:01 -08:00

Dtype.h

allow passing Python built-in types as dtypes (#21215 )

2019-06-06 13:17:23 -07:00

DynamicTypes.cpp

Rename TensorTypeId to DispatchKey (#32154 )

2020-01-15 11:16:08 -08:00

DynamicTypes.h

Remove Type dispatch (#21964 )

2019-06-30 04:11:35 -07:00

empty.c

Don't use RTLD_GLOBAL to load _C. (#31162 )

2020-01-09 07:28:15 -08:00

Exceptions.cpp

Changes warnings generated in cpp to show point of Python origination (#36052 )

2020-04-25 21:18:58 -07:00

Exceptions.h

Changes warnings generated in cpp to show point of Python origination (#36052 )

2020-04-25 21:18:58 -07:00

Generator.cpp

Canonicalize includes in torch, and add tests for it (#36303 )

2020-04-23 08:09:21 -07:00

Generator.h

Add THP_API to THPGenerator_Wrap (#35194 )

2020-03-23 05:58:09 -07:00

Layout.cpp

Make PyTorch Python 3.8 compatible (#29302 )

2019-11-07 09:20:19 -08:00

Layout.h

Canonicalize all includes in PyTorch. (#14849 )

2018-12-08 19:38:30 -08:00

MemoryFormat.cpp

Make PyTorch Python 3.8 compatible (#29302 )

2019-11-07 09:20:19 -08:00

MemoryFormat.h

Memory format support for contiguous and is_contiguous (#20455 )

2019-05-16 07:18:24 -07:00

Module.cpp

[RELAND] New operator registration API (#35061 ) (#35629 )

2020-03-29 19:48:29 -07:00

Module.h

Refactor Random Number Generators in ATen (#21364 )

2019-06-12 13:01:30 -07:00

PtrWrapper.cpp

Make PyTorch Python 3.8 compatible (#29302 )

2019-11-07 09:20:19 -08:00

PtrWrapper.h

Cleanup includes in torch/csrc/* (#19924 )

2019-05-06 14:03:18 -07:00

python_dimname.cpp

Remove unnecessary ATen/core/EnableNamedTensor.h (#31117 )

2019-12-12 09:53:07 -08:00

python_dimname.h

Remove unnecessary ATen/core/EnableNamedTensor.h (#31117 )

2019-12-12 09:53:07 -08:00

python_headers.h

Return namedtuples from torch.* function with multiple return arguments for C++ operators (#15429 )

2019-01-22 11:12:18 -08:00

PythonTypes.h

Canonicalize all includes in PyTorch. (#14849 )

2018-12-08 19:38:30 -08:00

QScheme.cpp

Make PyTorch Python 3.8 compatible (#29302 )

2019-11-07 09:20:19 -08:00

QScheme.h

Add qscheme() method (#20608 )

2019-06-14 16:29:29 -07:00

README.md

Use pybind11::gil_scoped_* functions instead of AutoGIL/AutoNoGIL (#30274 )

2019-12-02 12:19:58 -08:00

serialization.cpp

Enabled BFloat16 storage (#21523 )

2019-07-09 21:51:06 -07:00

serialization.h

Enabled BFloat16 storage (#21523 )

2019-07-09 21:51:06 -07:00

Size.cpp

[jit] do the code reorg (#33851 )

2020-02-27 13:02:51 -08:00

Size.h

Delete THP_CORE macro; partially replace with THP_BUILD_MAIN_LIB (#29143 )

2019-11-06 15:02:02 -08:00

Storage.cpp

Enabled BFloat16 storage (#21523 )

2019-07-09 21:51:06 -07:00

Storage.h

Delete THP_CORE macro; partially replace with THP_BUILD_MAIN_LIB (#29143 )

2019-11-06 15:02:02 -08:00

StorageDefs.h

Define THPStorage struct only once (rather than N times) (#14802 )

2018-12-05 13:19:29 -08:00

stub.cpp

add torch-python target (#12742 )

2018-11-16 11:43:48 -08:00

THP_export.h

Delete THP_CORE macro; partially replace with THP_BUILD_MAIN_LIB (#29143 )

2019-11-06 15:02:02 -08:00

THP.h

Delete THP_CORE macro; partially replace with THP_BUILD_MAIN_LIB (#29143 )

2019-11-06 15:02:02 -08:00

TypeInfo.cpp

Make PyTorch Python 3.8 compatible (#29302 )

2019-11-07 09:20:19 -08:00

TypeInfo.h

Canonicalize all includes in PyTorch. (#14849 )

2018-12-08 19:38:30 -08:00

Types.h

Canonicalize all includes in PyTorch. (#14849 )

2018-12-08 19:38:30 -08:00

utils.cpp

Enabled BFloat16 storage (#21523 )

2019-07-09 21:51:06 -07:00

utils.h

Delete THP_CORE macro; partially replace with THP_BUILD_MAIN_LIB (#29143 )

2019-11-06 15:02:02 -08:00

WindowsTorchApiMacro.h

Unify libtorch and libcaffe2 (#17783 )

2019-05-10 09:50:53 -07:00

README.md

csrc

The csrc directory contains all of the code concerned with integration with Python. This is in contrast to lib, which contains the Torch libraries that are Python agnostic. csrc depends on lib, but not vice versa.

There are a number of utilities for easing integration with Python which are worth knowing about, which we briefly describe here. But the most important gotchas:

DO NOT forget to take out the GIL with pybind11::gil_scoped_acquire before calling Python API or bringing a THPObjectPtr into scope.
Make sure you include Python.h first in your header files, before any system headers; otherwise, you will get error: "_XOPEN_SOURCE" redefined error. If you pay attention to warnings, you will see where you need to do this.

Notes

Note [Storage is not nullptr]

Historically, Torch supported nullptr storage, as a minor optimization to avoid having to allocate a storage object when it would be empty. However, this is actually a confusing special case to deal with, so by-in-large, PyTorch assumes that, in fact, storage is never nullptr.

One important case where this assumption is important is when tracking the CUDA device a tensor is stored in: this information is stored solely in the storage, so if a storage is nullptr, we lose this information.

Although storage is never nullptr, the data field of THStorage may be nullptr. This mostly occurs when we want to pre-allocate an output tensor struct, but then have it be resized and filled with data by some operator: there's no point in allocating data for it in this case!

Files

`Exceptions.h`

Frequently when working with the Python API, you may call a function which returns an error. In this case, we want to return directly to the Python interpreter, so that this exception can be propagated accordingly; however, because the Python API is C-based, what actually will happen is it will return control to whatever C++ code called it. Similarly, if we raise a C++ exception, prior to returning to the Python interpreter, we must set the Python error flags, so it turns into a C++ exception.

Moreover, when using the following macros, the generated warnings will be converted into python warnings that can be caught by the user.

Exceptions define helpers for two main cases:

For code where you write the python binding by hand, HANDLE_TH_ERRORS, END_HANDLE_TH_ERRORS and an exception class python_error. You call them like this:

// Entry point from Python interpreter
PyObject* run(PyObject* arg) {
  HANDLE_TH_ERRORS
  ...
  if (!x) throw python_error();
  // From c10/Exception.h
  TORCH_CHECK(cond, "cond was false here");
  TORCH_WARN("Warning message");
  ...
  END_HANDLE_TH_ERRORS
}

The HANDLE_TH_ERRORS macro will catch all exceptions and convert them into an appropriate Python signal. python_error is a special exception which doesn't contain any info, instead it says, "An error occurred in the Python API; if you return to the interpreter, Python will raise that exception, nothing else needs to be done."

For code that you bind using pybind, HANDLE_TH_ERRORS and END_HANDLE_TH_ERRORS_PYBIND can be used. They will work jointly with pybind error handling to raise pytorch errors and warnings natively and let pybind handle other errors. It can be used as:

// Function given to the pybind binding
at::Tensor foo(at::Tensor x) {
  HANDLE_TH_ERRORS
  ...
  if (!x) throw python_error();
  // pybind native error
  if (!x) throw py::value_error();
  // From c10/Exception.h
  TORCH_CHECK(cond, "cond was false here");
  TORCH_WARN("Warning message");
  ...
  END_HANDLE_TH_ERRORS_PYBIND
}

GIL

Whenever you make any calls to the Python API, you must have taken out the Python GIL, as none of these calls are thread safe. pybind11::gil_scoped_acquire is a RAII struct which handles taking and releasing the GIL. Use it like this:

void iWantToUsePython() {
  pybind11::gil_scoped_acquire gil;
  ...
}

In general, the compiler will NOT warn you if you use Python functionality without taking out the GIL, so DO NOT FORGET this call.

`utils/object_ptr.h`

THPPointer is a smart pointer class analogous to std::shared_ptr, but which is overloaded to handle reference counting scheme of various objects which are not based on shared_ptr. The most important overloads are:

PyObject (so important we've aliased it as THPObjectPtr), which hooks into Python reference counting. (By the way, that means you MUST take out the GIL before bringing one of these into scope!)
The various TH tensor and storage types (e.g., THTensor), which hook into TH's reference counting. (TH's reference counting IS thread safe, no locks necessary.)