283ce12aa9
Add channels_last3d support for mkldnn conv and mkldnn deconv ( #95271 )
...
### Motivation
- Add channels_last3d support for mkldnn conv and mkldnn deconv.
- Use `ideep::convolution_transpose_forward::compute_v3` instead of `ideep::convolution_transpose_forward::compute`. compute_v3 uses `is_channels_last` to notify ideep whether to go CL or not to align with the memory format check of PyTorch.
### Testing
1 socket (28 cores):
- memory format: torch.contiguous_format
module | shape | forward / ms | backward / ms
-- | -- | -- | --
conv3d | input size: (32, 32, 10, 100, 100), weight size: (32, 32, 3, 3, 3) | 64.56885 | 150.1796
conv3d | input size: (32, 16, 10, 200, 200), weight size: (16, 16, 3, 3, 3) | 100.6754 | 231.8883
conv3d | input size: (16, 4, 5, 300, 300), weight size: (4, 4, 3, 3, 3) | 19.31751 | 68.31131
module | shape | forward / ms | backward / ms
-- | -- | -- | --
ConvTranspose3d | input size: (32, 32, 10, 100, 100), weight size: (32, 32, 3, 3, 3) | 122.7646 | 207.5125
ConvTranspose3d | input size: (32, 16, 10, 200, 200), weight size: (16, 16, 3, 3, 3) | 202.4542 | 368.5492
ConvTranspose3d | input size: (16, 4, 5, 300, 300), weight size: (4, 4, 3, 3, 3) | 122.959 | 84.62577
- memory format: torch.channels_last_3d
module | shape | forward / ms | backward / ms
-- | -- | -- | --
conv3d | input size: (32, 32, 10, 100, 100), weight size: (32, 32, 3, 3, 3) | 40.06993 | 114.317
conv3d | input size: (32, 16, 10, 200, 200), weight size: (16, 16, 3, 3, 3 | 49.08249 | 133.4079
conv3d | input size: (16, 4, 5, 300, 300), weight size: (4, 4, 3, 3, 3) | 5.873911 | 17.58647
module | shape | forward / ms | backward / ms
-- | -- | -- | --
ConvTranspose3d | input size: (32, 32, 10, 100, 100), weight size: (32, 32, 3, 3, 3) | 88.4246 | 208.2269
ConvTranspose3d | input size: (32, 16, 10, 200, 200), weight size: (16, 16, 3, 3, 3 | 140.0725 | 270.4172
ConvTranspose3d | input size: (16, 4, 5, 300, 300), weight size: (4, 4, 3, 3, 3) | 23.0223 | 37.16972
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95271
Approved by: https://github.com/jgong5 , https://github.com/cpuhrsch
2023-08-30 02:53:30 +00:00
693a8dd04c
[NNC] enable fusion of conv with elementwise OP ( #77157 )
...
## Pitch
Enable Conv-Eltwise fusion in NNC.
## Description
This PR adds a `FuseConvWithEltwise` pass to fuse convolution with elementwise OP for TE subgraph. This pass will insert prepack and packed run ops for conv2d and enable fusion of conv2d with elementwise OPs. The fused packed run ops is implemented via external call in NNC.
## Code structure
Graph rewrite pass related code is placed in:
```
torch/csrc/jit/passes/mkldnn_rewrite.h
torch/csrc/jit/passes/mkldnn_rewrite.cpp
```
NNC integration of fused conv-eltwise OP via external call is located in:
```
torch/csrc/jit/tensorexpr/kernel.cpp
torch/csrc/jit/tensorexpr/operators/conv2d.h
torch/csrc/jit/tensorexpr/operators/conv2d.cpp
torch/csrc/jit/tensorexpr/lowerings.cpp
torch/csrc/jit/tensorexpr/external_functions.cpp
```
Fused prepack OP context is in:
```
aten/src/ATen/native/mkldnn/Common.h
aten/src/ATen/native/mkldnn/RegisterMkldnnOpContextClass.cpp
aten/src/ATen/native/mkldnn/OpContext.h
aten/src/ATen/native/mkldnn/OpContext.cpp
```
Fused OP implementation is done in:
```
aten/src/ATen/native/mkldnn/ConvPrepack.h
aten/src/ATen/native/mkldnn/ConvPrepack.cpp
```
## OP benchmark for conv-relu
The below performance is measured on top of these two PRs to support NHWC: https://github.com/pytorch/pytorch/pull/76948 and https://github.com/pytorch/pytorch/pull/78238 .
- Measured on Cascade Lake 8280
- Jemalloc enabled
- batch_size = 1
- Channels Last format
### Single thread:
<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40 ">
<head>
<meta name=ProgId content=Excel.Sheet>
<meta name=Generator content="Microsoft Excel 15">
<link id=Main-File rel=Main-File
href="file:///C:/Users/chunyuan/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
<link rel=File-List
href="file:///C:/Users/chunyuan/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
</head>
<body link="#0563C1" vlink="#954F72">
shape | time (us)_no_fusion | time (us)_fusion | Gain
-- | -- | -- | --
kernel=3, N=1, iC=64, H=56, W=56, oC=64, stride=1, pad=1, dilates=1, g=1 | 1706.22 | 1371.97 | 19.59%
kernel=1, N=1, iC=256, H=56, W=56, oC=512, stride=2, pad=0, dilates=1, g=1 | 2499.28 | 1571.52 | 37.12%
kernel=3, N=1, iC=256, H=56, W=56, oC=256, stride=1, pad=1, dilates=1, g=32 | 4169.52 | 2738.53 | 34.32%
kernel=3, N=1, iC=512, H=56, W=56, oC=512, stride=2, pad=1, dilates=1, g=32 | 3998.77 | 3085.85 | 22.83%
kernel=1, N=1, iC=64, H=56, W=56, oC=64, stride=1, pad=0, dilates=1, g=1 | 673.73 | 430.81 | 36.06%
kernel=1, N=1, iC=256, H=56, W=56, oC=64, stride=1, pad=0, dilates=1, g=1 | 1101.87 | 801.07 | 27.30%
kernel=1, N=1, iC=256, H=56, W=56, oC=256, stride=1, pad=0, dilates=1, g=1 | 4692.91 | 3116.13 | 33.60%
kernel=1, N=1, iC=512, H=28, W=28, oC=512, stride=1, pad=0, dilates=1, g=1 | 3310.64 | 2503.39 | 24.38%
</body>
</html>
### 4 threads:
<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40 ">
<head>
<meta name=ProgId content=Excel.Sheet>
<meta name=Generator content="Microsoft Excel 15">
<link id=Main-File rel=Main-File
href="file:///C:/Users/chunyuan/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
<link rel=File-List
href="file:///C:/Users/chunyuan/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
</head>
<body link="#0563C1" vlink="#954F72">
shape | time (us)_no_fusion | time (us)_fusion | Gain
-- | -- | -- | --
kernel=3, N=1, iC=64, H=56, W=56, oC=64, stride=1, pad=1, dilates=1, g=1 | 360.07 | 321.21 | 10.79%
kernel=1, N=1, iC=256, H=56, W=56, oC=512, stride=2, pad=0, dilates=1, g=1 | 391.49 | 323.17 | 17.45%
kernel=3, N=1, iC=256, H=56, W=56, oC=256, stride=1, pad=1, dilates=1, g=32 | 536.4 | 465.97 | 13.13%
kernel=3, N=1, iC=512, H=56, W=56, oC=512, stride=2, pad=1, dilates=1, g=32 | 674.98 | 616.32 | 8.69%
kernel=1, N=1, iC=64, H=56, W=56, oC=64, stride=1, pad=0, dilates=1, g=1 | 160.97 | 70.05 | 56.48%
kernel=1, N=1, iC=256, H=56, W=56, oC=64, stride=1, pad=0, dilates=1, g=1 | 215.81 | 182.6 | 15.39%
kernel=1, N=1, iC=256, H=56, W=56, oC=256, stride=1, pad=0, dilates=1, g=1 | 658.45 | 576.97 | 12.37%
kernel=1, N=1, iC=512, H=28, W=28, oC=512, stride=1, pad=0, dilates=1, g=1 | 702.18 | 566.39 | 19.34%
</body>
</html>
### 1 socket (28 cores):
<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40 ">
<head>
<meta name=ProgId content=Excel.Sheet>
<meta name=Generator content="Microsoft Excel 15">
<link id=Main-File rel=Main-File
href="file:///C:/Users/chunyuan/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
<link rel=File-List
href="file:///C:/Users/chunyuan/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
</head>
<body link="#0563C1" vlink="#954F72">
shape | time (us)_no_fusion | time (us)_fusion | Gain
-- | -- | -- | --
kernel=3, N=1, iC=64, H=56, W=56, oC=64, stride=1, pad=1, dilates=1, g=1 | 149.92 | 103.78 | 30.78%
kernel=1, N=1, iC=256, H=56, W=56, oC=512, stride=2, pad=0, dilates=1, g=1 | 192.76 | 110.87 | 42.48%
kernel=3, N=1, iC=256, H=56, W=56, oC=256, stride=1, pad=1, dilates=1, g=32 | 160.67 | 127.24 | 20.81%
kernel=3, N=1, iC=512, H=56, W=56, oC=512, stride=2, pad=1, dilates=1, g=32 | 212.45 | 180.55 | 15.02%
kernel=1, N=1, iC=64, H=56, W=56, oC=64, stride=1, pad=0, dilates=1, g=1 | 114.57 | 50.58 | 55.85%
kernel=1, N=1, iC=256, H=56, W=56, oC=64, stride=1, pad=0, dilates=1, g=1 | 198.64 | 70.6 | 64.46%
kernel=1, N=1, iC=256, H=56, W=56, oC=256, stride=1, pad=0, dilates=1, g=1 | 281.35 | 155.8 | 44.62%
kernel=1, N=1, iC=512, H=28, W=28, oC=512, stride=1, pad=0, dilates=1, g=1 | 262.15 | 162.94 | 37.84%
</body>
</html>
## UT
```
test/test_mkldnn_fusion.py
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77157
Approved by: https://github.com/ZolotukhinM
2022-08-10 21:46:51 +00:00