mirror of
https://github.com/pytorch/pytorch.git
synced 2025-10-21 05:34:18 +08:00
Re-landing #68111/#74596 ## Description v0.5 PR of this [RFC](https://github.com/pytorch/pytorch/issues/49444). On the basis of #50256, the below improvements are included: * The [v0.5 release branch](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.5) of the oneDNN Graph API is used * The fuser now works with the profiling graph executor. We have inserted type check nodes to guard the profiled tensor properties. ### User API: The optimization pass is disabled by default. Users could enable it by: ``` torch.jit.enable_onednn_fusion(True) ``` `torch.jit.freeze` should be used after tracing (recommended) or scripting a model. ### Performance: [pytorch/benchmark](https://github.com/pytorch/benchmark) tool is used to compare the performance: * SkyLake 8180 (1 socket of 28 cores):  * SkyLake 8180 (single thread):  * By mapping hardswish to oneDNN Graph, it’s 8% faster than PyTorch JIT (NNC + OFI) ** We expect performance gain after mapping transpose, contiguous & view to oneDNN graph ops ### Directory structure of the integration code Fuser-related code is placed under: ``` torch/csrc/jit/codegen/onednn/ ``` Optimization pass registration is done in: ``` torch/csrc/jit/passes/onednn_graph_fuser.h ``` CMake for the integration code is in: ``` caffe2/CMakeLists.txt cmake/public/mkldnn.cmake cmake/Modules/FindMKLDNN.cmake ``` ## Limitations * In this PR, we only support Pytorch-oneDNN-Graph integration on Linux platform. Support on Windows and MacOS will be enabled as a next step. * We have only optimized the inference use-case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76622 Approved by: https://github.com/eellison
46 lines
1.4 KiB
C++
46 lines
1.4 KiB
C++
#include <torch/csrc/jit/codegen/onednn/guard_shape.h>
|
|
|
|
#include <torch/csrc/jit/jit_log.h>
|
|
#include <torch/csrc/jit/passes/tensorexpr_fuser.h>
|
|
#include <torch/csrc/jit/passes/utils/subgraph_utils.h>
|
|
#include <torch/csrc/jit/runtime/graph_executor.h>
|
|
|
|
namespace torch {
|
|
namespace jit {
|
|
namespace fuser {
|
|
namespace onednn {
|
|
|
|
//! [ Note -- prepareFusionGroupAndGuardOutputs implementation ]
|
|
//! shamelessly copying code from NNC (tensorexpr_fuser) with very little
|
|
//! modification, original code at:
|
|
//! `torch/csrc/jit/passes/tensorexpr_fuser.cpp:prepareFusionGroupAndGuardOutputs`
|
|
//!
|
|
//! We have the assumption that LLGA does not have operators
|
|
//! depending on the content of the tensor.
|
|
void prepareFusionGroupAndGuardOutputs(Block* block) {
|
|
std::vector<Node*> fusion_groups;
|
|
for (Node* n : block->nodes()) {
|
|
for (Block* b : n->blocks()) {
|
|
prepareFusionGroupAndGuardOutputs(b);
|
|
}
|
|
if (n->kind() == prim::oneDNNFusionGroup) {
|
|
fusion_groups.push_back(n);
|
|
}
|
|
}
|
|
for (Node* fusion_group : fusion_groups) {
|
|
// TODO: add further optimization pass to removeOutputsUsedOnlyInSize,
|
|
// refer to
|
|
// `torch/csrc/jit/passes/tensorexpr_fuser.cpp:removeOutputsUsedOnlyInSize`
|
|
// removeOutputsUsedOnlyInSize(fusion_group);
|
|
insertTypeGuard(
|
|
fusion_group,
|
|
[](const TensorTypePtr& t) { return t; },
|
|
prim::oneDNNFusionGuard);
|
|
}
|
|
}
|
|
|
|
} // namespace onednn
|
|
} // namespace fuser
|
|
} // namespace jit
|
|
} // namespace torch
|