mirror of
https://github.com/pytorch/pytorch.git
synced 2025-10-21 21:49:24 +08:00
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55462 handles and symbolicate exception callstack thrown from backend. Objective of this diff is to achieve improve error reporting when exceptions are raised from lowered backend. We would effectively like to get the same model level stack trace that you would get without having lowered some module to backend. For example: ``` class AA(nn.Module): def forward(self, x, y): return x + y class A(nn.Module): def __init__(...): self.AA0 = AA() def forward(self, x, y): return self.AA0.forward(x, y) + 3 class B(nn.Module): def forward(self, x): return x + 2 class C(nn.Module): def __init__(...): self.A0 = A() self.B0 = B() def forward(self, x, y): return self.A0.forward(x, y) + self.B0.forward(x) ``` If the we then do C().forward(torch.rand((2,3)), torch.rand(14,2))) we will likely see error stack like: ``` C++ exception with description "The following operation failed in the TorchScript interpreter. Traceback of TorchScript (most recent call last): File "<string>", line 3, in forward def forward(self, x, y): return self.A0.forward(x, y) + self.B0.forward(x) ~~~~~~~~~~~~~~~ <--- HERE File "<string>", line 3, in forward def forward(self, x, y): return self.AA0.forward(x, y) + 3 ~~~~~~~~~~~~~~~~ <--- HERE File "<string>", line 3, in forward def forward(self, x, y): return x + y ~~~~~ <--- HERE ``` We would like to see the same error stack if we lowered C.A0 to some backend. With this diff we get something like: ``` Module hierarchy:top(C).A0(backend_with_compiler_demoLoweredModule).AA0(AA) Traceback of TorchScript (most recent call last): File "<string>", line 3, in FunctionName_UNKNOWN def forward(self, x, y): return self.A0.forward(x, y) + self.B0.forward(x) ~~~~~~~~~~~~~~~ <--- HERE File "<string>", line 5, in FunctionName_UNKNOWN typed_inputs: List[Any] = [x, y, ] if self.__backend.is_available() : _0, = self.__backend.execute(self.__handles["forward"], typed_inputs) ~~~~~~~~~~~~~~~~~~~~~~ <--- HERE assert isinstance(_0, Tensor) return _0 File "<string>", line 3, in FunctionName_UNKNOWN def forward(self, x, y): return self.AA0.forward(x, y) + 3 ~~~~~~~~~~~~~~~~ <--- HERE File "<string>", line 3, in FunctionName_UNKNOWN def forward(self, x, y): return x + y ~~~~~ <--- HERE ``` This is achieved in 3 parts: Part 1: A. BackendDebugInfoRecorder: During backend lowering, in `to_backend`, before calling the preprocess function corresponding to the backend. This will facilitate recording of debug info (such as source range + inlined callstack) for the lowered module. B. Instantiate WithBackendDebugInfoRecorder with BackendDebugInfoRecorder. This initializes thread local pointer to BackendDebugInfoRecorder. C. generate_debug_handles: In preprocess function, the backend will call generate_debug_handles for each method being lowered separately. generate_debug_handles takes `Graph` of the method being lowered and returns a map of Node*-to-debug_handles. Backend is responsible for storing debug handles appropriately so as to raise exception (and later profiling) using debug handles when the exception being raised corresponds to particular Node that was lowered. Inside generate_debug_handles, we will query the current BackendDebugHandleInfoRecorder, that is issuing debug handles. This debug handle manager will issue debug handles as well as record debug_handles-to-<source range, inlined callstack> map. D. Back in `to_backend`, once the preprocess function is has finished lowering the module, we will call `stopRecord` on BackendDebugInfoRecorder. This will return the debug info map. This debug info is then stored inside the lowered module. Part 2: Serialization: During serialization for bytecode (lite interpreter), we will do two things: 1. Extract all the source ranges that are contained inside debug_handles-to-<source range, inlined callstack> map for lowered module. This will be source range corresponding to debug handles, including what is there is inlined callstack. Since we replaced original module with lowered module, we wont be serializing code for the original module and thus no source range. That is why the source range will have to be stored separately. We will lump all the source ranges for all the lowered modules in one single debug_pkl file. 2. Then we will serialize debug_handles-to-<source range, inlined callstack> map. Now during deserialization we will be able to reconstruct debug_handles-to-<source range, inlined callstack> map. Given all debug_handles are unique we would not need any module information. Test Plan: Tests are added in test_backend.cpp Tests are added in test_backend.cpp Imported from OSS Differential Revision: D27621330 D27621330 Reviewed By: raziel Pulled By: kimishpatel fbshipit-source-id: 0650ec68cda0df0a945864658cab226a97ba1890
245 lines
7.6 KiB
C++
245 lines
7.6 KiB
C++
#include <torch/csrc/jit/mobile/module.h>
|
|
|
|
#include <torch/csrc/jit/backends/backend_exception.h>
|
|
#include <torch/csrc/jit/mobile/interpreter.h>
|
|
#include <torch/csrc/jit/mobile/observer.h>
|
|
#include <torch/csrc/jit/runtime/jit_exception.h>
|
|
#include <exception>
|
|
|
|
#include <ATen/record_function.h>
|
|
|
|
namespace torch {
|
|
namespace jit {
|
|
std::ostream& operator<<(std::ostream& out, Instruction inst);
|
|
namespace mobile {
|
|
|
|
void CompilationUnit::register_function(std::unique_ptr<Function> fn) {
|
|
methods_.emplace_back(std::move(fn));
|
|
}
|
|
|
|
Function* CompilationUnit::find_function(const c10::QualifiedName& qn) {
|
|
for (auto& fn : methods_) {
|
|
if (fn->qualname() == qn) {
|
|
return fn.get();
|
|
}
|
|
}
|
|
return nullptr;
|
|
}
|
|
|
|
Method Module::get_method(const std::string& name) const {
|
|
if (auto method = find_method(name)) {
|
|
return *method;
|
|
}
|
|
AT_ERROR("Method '", name, "' is not defined.");
|
|
}
|
|
|
|
c10::optional<Method> Module::find_method(const std::string& basename) const {
|
|
for (auto& fn : cu_->methods()) {
|
|
if (fn->name() == basename) {
|
|
return c10::make_optional<Method>(Method(this, fn.get()));
|
|
}
|
|
}
|
|
return c10::nullopt;
|
|
}
|
|
|
|
namespace {
|
|
void set_train_recurse(
|
|
const c10::intrusive_ptr<c10::ivalue::Object>& obj,
|
|
bool on) {
|
|
if (auto slot = obj->type()->findAttributeSlot("training")) {
|
|
obj->setSlot(*slot, on);
|
|
} else {
|
|
TORCH_INTERNAL_ASSERT(false, "'training' attribute not found");
|
|
}
|
|
for (const auto& slot : obj->slots()) {
|
|
if (slot.isObject()) {
|
|
set_train_recurse(slot.toObject(), on);
|
|
}
|
|
}
|
|
}
|
|
|
|
void slot_params_recurse(
|
|
const c10::intrusive_ptr<c10::ivalue::Object>& obj,
|
|
std::vector<at::Tensor>* params) {
|
|
for (const auto& slot : obj->slots()) {
|
|
if (slot.isTensor()) {
|
|
params->emplace_back(slot.toTensor());
|
|
} else if (slot.isObject()) {
|
|
slot_params_recurse(slot.toObject(), params);
|
|
}
|
|
}
|
|
}
|
|
|
|
void slot_named_params_recurse(
|
|
const c10::intrusive_ptr<c10::ivalue::Object>& obj,
|
|
std::map<std::string, at::Tensor>* params,
|
|
const std::string& parent_name) {
|
|
auto slots = obj->slots();
|
|
size_t nslots = slots.size();
|
|
for (size_t i = 0; i < nslots; ++i) {
|
|
auto slot = slots[i];
|
|
std::string name =
|
|
parent_name.size() == 0 ? parent_name : parent_name + ".";
|
|
name += obj->type()->getAttributeName(i);
|
|
// TODO: Fix this filter. Requires_grad is not the appropriate
|
|
// filter of a parameter, but is a temporary hack to help probable
|
|
// users of this api. The correct behavior is to filter by the
|
|
// obj->type->is_parameter() but this currently always returns
|
|
// false on mobile.
|
|
if (slot.isTensor() && slot.toTensor().requires_grad()) {
|
|
(*params)[name] = slot.toTensor();
|
|
} else if (slot.isObject()) {
|
|
slot_named_params_recurse(slot.toObject(), params, name);
|
|
}
|
|
}
|
|
}
|
|
|
|
std::string getTopModuleTypeName(const Module& m) {
|
|
std::string name;
|
|
if (m._ivalue()->type() && m._ivalue()->type()->name()) {
|
|
name = m._ivalue()->type()->name().value().name();
|
|
}
|
|
return name;
|
|
}
|
|
} // namespace
|
|
|
|
const std::vector<at::Tensor> Module::parameters() const {
|
|
std::vector<at::Tensor> params;
|
|
slot_params_recurse(object_, ¶ms);
|
|
return params;
|
|
}
|
|
|
|
// Returns a mapping for all attributes that requires_grad=True in a module.
|
|
// This behavior differs from full torch script modules. This is a bug,
|
|
// but currently there is no way to correctly label parameters in the
|
|
// loading of a mobile module. TODO
|
|
const std::map<std::string, at::Tensor> Module::named_parameters() const {
|
|
std::map<std::string, at::Tensor> params;
|
|
const std::string name = "";
|
|
slot_named_params_recurse(object_, ¶ms, name);
|
|
return params;
|
|
}
|
|
|
|
// We will continue to support this API for now as this is being relied upon
|
|
// for profiling.
|
|
// We really need to change this part, so in the next step for profiling support
|
|
// for delegates, the first thing will be to rewrite how profiling is done
|
|
// for lite interpreter.
|
|
std::string Module::get_forward_method_debug_info(size_t pc) const {
|
|
auto debug_handle = find_method("forward")->get_debug_handle(pc);
|
|
#if defined(SYMBOLICATE_MOBILE_DEBUG_HANDLE)
|
|
return getDebugTable().getModuleHierarchyInfo(
|
|
debug_handle, getTopModuleTypeName(*this));
|
|
#else
|
|
return "";
|
|
#endif
|
|
}
|
|
|
|
void Module::train(bool on) {
|
|
set_train_recurse(object_, on);
|
|
}
|
|
|
|
bool Module::is_training() const {
|
|
if (auto slot = object_->type()->findAttributeSlot("training")) {
|
|
return object_->getSlot(*slot).toBool();
|
|
}
|
|
return true;
|
|
}
|
|
|
|
const std::vector<Method> Module::get_methods() const {
|
|
std::vector<Method> methods;
|
|
for (std::unique_ptr<Function>& fn : cu_->methods()) {
|
|
methods.emplace_back(this, fn.get());
|
|
}
|
|
return methods;
|
|
}
|
|
|
|
Method::Method(const Module* owner, Function* function)
|
|
: owner_(owner), function_(function) {}
|
|
|
|
void Method::run(Stack& stack) const {
|
|
auto observer = torch::observerConfig().getModuleObserver();
|
|
// NOLINTNEXTLINE(clang-analyzer-security.insecureAPI.rand)
|
|
auto instance_key = std::rand();
|
|
/* if the metadata dict doesn't contain "model_name", copy the metadata and
|
|
set the value of "model_name" as name() */
|
|
std::unordered_map<std::string, std::string> copied_metadata =
|
|
owner_->metadata();
|
|
if (owner_->metadata().find("model_name") == owner_->metadata().end()) {
|
|
copied_metadata["model_name"] = owner_->name();
|
|
}
|
|
if (observer) {
|
|
observer->onEnterRunMethod(
|
|
copied_metadata, instance_key, function_->name());
|
|
}
|
|
|
|
auto debug_info = std::make_shared<MobileDebugInfo>();
|
|
std::string name = copied_metadata["model_name"];
|
|
debug_info->setModelName(name);
|
|
debug_info->setMethodName(function_->name());
|
|
at::DebugInfoGuard guard(at::DebugInfoKind::MOBILE_RUNTIME_INFO, debug_info);
|
|
|
|
try {
|
|
stack.insert(stack.begin(), owner_->_ivalue()); // self
|
|
function_->run(stack);
|
|
if (observer) {
|
|
observer->onExitRunMethod(instance_key);
|
|
}
|
|
// This exception must be caught first as it derived from c10::Error
|
|
} catch (c10::BackendRuntimeException& e) {
|
|
#if defined(SYMBOLICATE_MOBILE_DEBUG_HANDLE)
|
|
e.pushDebugHandle(function_->getExceptionDebugHandle());
|
|
// symbolicate all handles
|
|
e.add_context(owner_->getDebugTable().getSourceDebugString(
|
|
e.getDebugHandles(), getTopModuleTypeName(*owner_)));
|
|
#endif
|
|
if (observer) {
|
|
observer->onFailRunMethod(instance_key, e.what());
|
|
}
|
|
TORCH_RETHROW(e);
|
|
} catch (c10::Error& error) {
|
|
#if defined(SYMBOLICATE_MOBILE_DEBUG_HANDLE)
|
|
auto debug_string = owner_->getDebugTable().getSourceDebugString(
|
|
function_->getExceptionDebugHandle(), getTopModuleTypeName(*owner_));
|
|
error.add_context(debug_string);
|
|
#endif
|
|
if (observer) {
|
|
observer->onFailRunMethod(instance_key, error.what());
|
|
}
|
|
TORCH_RETHROW(error);
|
|
} catch (...) {
|
|
auto currentException = std::current_exception();
|
|
try {
|
|
if (!currentException) {
|
|
TORCH_CHECK(false, "Unknown exception");
|
|
} else {
|
|
try {
|
|
std::rethrow_exception(currentException);
|
|
} catch (const std::exception& e) {
|
|
TORCH_CHECK(false, e.what());
|
|
}
|
|
}
|
|
} catch (c10::Error& error) {
|
|
#if defined(SYMBOLICATE_MOBILE_DEBUG_HANDLE)
|
|
auto debug_string = owner_->getDebugTable().getSourceDebugString(
|
|
function_->getExceptionDebugHandle(), getTopModuleTypeName(*owner_));
|
|
error.add_context(debug_string);
|
|
#endif
|
|
if (observer) {
|
|
observer->onFailRunMethod(instance_key, error.what());
|
|
}
|
|
TORCH_RETHROW(error);
|
|
}
|
|
}
|
|
}
|
|
|
|
c10::IValue Method::operator()(std::vector<c10::IValue> stack) const {
|
|
run(stack);
|
|
TORCH_INTERNAL_ASSERT(!stack.empty());
|
|
return stack.front();
|
|
}
|
|
|
|
} // namespace mobile
|
|
} // namespace jit
|
|
} // namespace torch
|