[RELAND] New operator registration API (#35061) (#35629)

Summary:
Reland of https://github.com/pytorch/pytorch/pull/35061 ; removed
the get qualified type name magic from debug strings to work around
MSVC 2017 bug.

Main points of the new API:

- You can register implementations (impl) without having to specify a schema.
- Registrations are commutative, so no matter what order your static
  initializers run, you end up with the same end result.

op_registration_test.cpp contains a reasonably comprehensive accounting
for the available API surface

How does this implementation proceed?  The basic concept is to relax the
internal invariants of Dispatcher data structures to allow the
possibility that a FunctionSchema is not specified in an Operator.

- DispatchKeyExtractor has an uninitialized state where it doesn't look
  for dispatch keys in any arguments of the stack.  It can have a
  schema (de)registered to itself post facto with
  registerSchema/unregisterSchema.
- DispatchTable has a new constructor taking only an OperatorName for
  the uninitialized state.  It can have a schema (de)registered to itself
  post facto with registerSchema/unregisterSchema
- OperatorDef maintains counts of both defs and well as defs_and_impls.
  defs_and_impls keeps track of the outstanding impl registrations; you
  may have impl registrations but no defs.  If there are no defs (no
  schema), the operator is not returned by findSchema.  A new
  findOperatorByName fucntion unconditionally returns the OperatorHandle
  even if there's no schema.  OperatorHandle::hasSchema can be used
  to check if the operator has schema.
- Replaced 'registerKernel' with 'registerImpl', which is the new
  interface for directly registering kernels without implementations.
- Because 'registerImpl' no longer requires an OperatorHandle, change
  'registerDef' to only return a RegistrationHandleRAII.  This is marginally
  less efficient (since we're doing two hash table lookups on a registration
  now), but this won't matter in the long term, and probably doesn't
  matter now either.
- Rename registerBackendFallbackKernel to registerFallback (this exposed
  a bunch of places where we're improperly directly interfacing with Dispatcher;
  we need to add this capability to the true public API)
- All code generated internal registrations are switched to use the new
  API.  This includes VariableType registrations (which previously
  weren't converted) and the mobile autograd stuff
- Switch the new-style def()/impl() APIs to interact directly with Dispatcher,
  rather than indirecting through the old API
- We deleted alias analysis kind merging entirely.  As a nod to BC, it's
  possible to define a full schema with alias analysis kind, and then
  later do another full schema def with missing alias analysis kind, but
  the opposite direction is not allowed.  We can remove this entirely
  following the plan at https://github.com/pytorch/pytorch/issues/35040
- Schema matching is moved inside the dispatcher, because we might not
  be able to immediately schema match at the point of an impl() (because
  we don't have the schema yet).  To do this, we store the inferred
  function schema inside a KernelEntry, so we can check it when we get
  the real schema.
- Registered kernel functions now store a debug string which
  can be used to more easily identify them.  Tests use this to
  distinguish between multiple distinct registrations; regular
  invocations get only very basic information.

Because we need our static initializers to work no matter what order
they're run, the testing strategy on this PR is quite involved.

The general concept:
- Bind a (very gimped) version of the dispatcher API from Python,
  so that we can easily write a more complex testing harness
  using expect tests.
- For series of registrations we want to test, exhaustively
  test every possible permutation of registrations (and
  deregistrations), and show that the intermediate states
  agree no matter what path is taken.
- Intermediate states are rendered using a new dumpState()
  debugging method that prints the internal state of the
  dispatcher.  This method may be generally useful for people
  who want to see what's in the dispatcher.
- Simultaneously, add a new invariant testing function which
  checks that the internal invariants of the dispatcher are
  upheld (so we don't have to print internal implementation
  details of the dispatcher)

The testing framework found a few bugs in development.  For example,
here is a case where we registered schema too early, before checking
if it was valid:

```
Traceback (most recent call last):
  File "test/test_dispatch.py", line 164, in test_def_impl_schema_mismatch
    ], raises=True)
  File "test/test_dispatch.py", line 135, in commute
    results=results, raises=raises)
  File "test/test_dispatch.py", line 83, in run_permutation
    .format(ctor_order[:i], op_ix))
  File "test/test_dispatch.py", line 59, in check_invariants
    .format(expected_provenance, actual_provenance)
AssertionError: 'name[16 chars]ema: (none)\ncatchall: boxed unboxed :: (Tenso[18 chars]0)\n' != 'name[16 chars]ema: test::foo(Tensor x, Tensor y) -> (Tensor)[53 chars]0)\n'
  name: test::foo
- schema: (none)
+ schema: test::foo(Tensor x, Tensor y) -> (Tensor)
  catchall: boxed unboxed :: (Tensor _0) -> (Tensor _0)
 : expected from running ctors (1,); actual from running ctors (1,) and then failing to run ctor 0 (did this failure leave the dispatcher in a wedged state? it shouldn't!)
```

There are also C++ smoketests for the API.  These tests comprehensively
cover the C++ API surface of the new operator registration API, but
don't check very hard if the API does the right thing (that's what
test_dispatch.py is for)

Some miscellaneous changes which could have been split into other
PRs, but I was too lazy to do so:

- Add torch::jit::parseName (mirroring parseSchema/parseSchemaOrName)
- Add cloneWithName functionality to FunctionSchema
- Unconditionally generate schema registration, even when type_method_dispatch
  is a dict.  The one exception is for manual registrations....
- Add fallback, CppFunction::makeFallthrough and
  CppFunction::makeFromBoxedFunction to public API of op_registration, so we can
  stop calling internal registerImpl directly
- Add new syntax sugar dispatch_autograd for registering autograd kernels
- Minor OperatorName cleanup, storing OperatorName in DispatchTable
  and defining operator<< on OperatorName
- Refactored the op registration API to take FunctionSchema directly.
  We now do namespacing by post facto fixing up the OperatorName
  embedded in FunctionSchema.  This also means that you can
  now do torch::import("ns1").def("ns2::blah") and have the ns2
  override ns1 (although maybe this is not the correct behavior.)
- New torch::schema public API, for attaching alias analysis kind
  annotation kinds.  This meant we had to template up some function
  signatures which previously took const char*.  There's now a nice
  comment explaining this strategy.
- torch::import now takes std::string which means we can use
  the namespacing from Python

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35629

Differential Revision: D20724551

Pulled By: ezyang

fbshipit-source-id: befa46a1affb4ec4ae1fb39e3564a63695a6ca41
This commit is contained in:
Edward Yang
2020-03-29 19:46:19 -07:00
committed by Facebook GitHub Bot
parent 86be6443d8
commit 9e3605de98
39 changed files with 1726 additions and 630 deletions

View File

@ -0,0 +1,162 @@
#include <torch/csrc/utils/python_dispatch.h>
#include <torch/csrc/jit/frontend/function_schema_parser.h>
#include <ATen/core/op_registration/op_registration.h>
#include <ATen/ATen.h>
#include <pybind11/operators.h>
#include <iostream>
namespace py = pybind11;
namespace torch {
namespace impl {
namespace dispatch {
c10::optional<c10::DispatchKey> parseDispatchKey(const std::string& k) {
static std::unordered_map<std::string, c10::DispatchKey> key_map = {
{"cpu", c10::DispatchKey::CPUTensorId},
{"cuda", c10::DispatchKey::CUDATensorId},
{"xla", c10::DispatchKey::XLATensorId},
{"autograd", c10::DispatchKey::VariableTensorId},
{"", c10::DispatchKey::Undefined},
};
auto it = key_map.find(k);
TORCH_CHECK(it != key_map.end(), "could not parse ", k);
if (it->second == c10::DispatchKey::Undefined) {
return c10::nullopt;
} else {
return c10::make_optional(it->second);
}
}
c10::AliasAnalysisKind parseAliasAnalysisKind(const std::string& k) {
static std::unordered_map<std::string, c10::AliasAnalysisKind> key_map = {
{"CONSERVATIVE", c10::AliasAnalysisKind::CONSERVATIVE},
{"FROM_SCHEMA", c10::AliasAnalysisKind::FROM_SCHEMA},
{"PURE_FUNCTION", c10::AliasAnalysisKind::PURE_FUNCTION},
{"", c10::AliasAnalysisKind::FROM_SCHEMA}, // default
};
auto it = key_map.find(k);
TORCH_CHECK(it != key_map.end(), "could not parse ", k);
return it->second;
}
template <typename Func>
inline c10::CppFunction dispatch_str(const char* key, Func&& raw_f) {
auto mb_key = parseDispatchKey(key);
if (mb_key) {
return c10::dispatch(*mb_key, std::move(raw_f));
} else {
c10::CppFunction f(std::forward<Func>(raw_f));
return f;
}
}
void initDispatchBindings(PyObject* module) {
auto m = py::handle(module).cast<py::module>();
py::class_<c10::OperatorHandle>(m, "_DispatchOperatorHandle")
.def("schema", &c10::OperatorHandle::schema);
// TODO: figure out how to do chaining
py::class_<c10::Module>(m, "_DispatchModule")
.def("def_", [](py::object self, const char* schema, const char* alias) {
self.cast<c10::Module&>().def(torch::schema(schema, parseAliasAnalysisKind(alias)));
return self;
}, "", py::arg("schema"), py::arg("alias") = "")
// Simulated "legacy" def where alias analysis kind is not set.
// Ordinarily this can only be exercised from RegisterOperators() API
// but I am not going to bind that here
.def("def_legacy", [](py::object self, const char* schema) {
self.cast<c10::Module&>().def(torch::jit::parseSchema(schema));
return self;
}, "", py::arg("schema"))
// We can't conveniently turn Python functions into valid functions
// in the dispatcher. So instead we provide a bunch of precanned
// functions for testing purposes. You're NOT intended to actually
// call these functions; they're just here so we can actually register
// something
//
// Mangling scheme: args_rets. One character per.
// t = Tensor
.def("def_name_t_t", [](py::object self, const char* name, const char* dispatch, const char* debug) {
self.cast<c10::Module&>().def(
name,
dispatch_str(dispatch, [](const at::Tensor& a) {
return a;
}).debug(debug)
);
return self;
}, "", py::arg("name"),
py::arg("dispatch") = "",
py::arg("debug") = "default_def_name_t_t")
.def("def_schema_t_t", [](py::object self, const char* schema, const char* dispatch, const char* alias, const char* debug) {
self.cast<c10::Module&>().def(
torch::schema(schema, parseAliasAnalysisKind(alias)),
dispatch_str(dispatch, [](const at::Tensor& a) {
return a;
}).debug(debug)
);
return self;
}, "", py::arg("name"),
py::arg("dispatch") = "",
py::arg("alias") = "",
py::arg("debug") = "default_def_schema_t_t")
// TODO: maybe consider deduplicating the definitions here, it's getting
// pretty long
.def("impl_t_t", [](py::object self, const char* name, const char* dispatch, const char* debug) {
self.cast<c10::Module&>().impl(
name,
dispatch_str(dispatch, [](const at::Tensor& a) {
return a;
}).debug(debug)
);
return self;
}, "", py::arg("name"),
py::arg("dispatch") = "",
py::arg("debug") = "impl_t_t")
.def("impl_tt_t", [](py::object self, const char* name, const char* dispatch, const char* debug) {
self.cast<c10::Module&>().impl(
name,
dispatch_str(dispatch, [](const at::Tensor& a, const at::Tensor& b) {
return a;
}).debug(debug)
);
return self;
}, "", py::arg("name"), py::arg("dispatch") = "", py::arg("debug") = "")
;
m.def("_dispatch_import", [](std::string name) {
if (name.empty()) {
return std::make_unique<c10::Module>(torch::import());
} else {
return std::make_unique<c10::Module>(torch::import(name));
}
});
m.def("_dispatch_dump", [](const char* name) -> std::string {
auto op = c10::Dispatcher::singleton().findOp(torch::jit::parseName(name));
if (!op) {
return "";
} else {
return op->dumpState();
}
});
m.def("_dispatch_check_invariants", [](const char* name) {
auto op = c10::Dispatcher::singleton().findOp(torch::jit::parseName(name));
if (!op) {
} else {
return op->checkInvariants();
}
});
m.def("_dispatch_check_all_invariants", []() {
c10::Dispatcher::singleton().checkInvariants();
});
}
}}} // namespace torch::impl::dispatch