23 Commits

Author SHA1 Message Date
cyy
df458be4e5 [4/N] Apply py39 ruff and pyupgrade fixes (#143257)
```torch/fx/passes/annotate_getitem_nodes.py``` was changed to support the new type hinting annotations.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143257
Approved by: https://github.com/justinchuby, https://github.com/albanD
2025-01-04 10:47:51 +00:00
fc6066b80f improve mkldnn_linear_pointwise_binary performance for contiguous tensor with non default contiguous strides (#132019)
Fixes https://github.com/pytorch/pytorch/issues/131734

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132019
Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5
2024-07-30 05:02:38 +00:00
db1a6eda9e [codemod] markDynamoStrictTest batch 22 (#117729)
[codemod] markDynamoStrictTest test_autograd
[codemod] markDynamoStrictTest test_ao_sparsity
[codemod] markDynamoStrictTest test_jit
[codemod] markDynamoStrictTest test_quantization
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117729
Approved by: https://github.com/bdhirsh
2024-01-18 16:59:26 +00:00
80d8a2a237 improve mkldnn_linear_pointwise performance for contiguous tensor with non default contiguous strides (#114939)
This PR will convert the stride to the default contiguous stride in `mkldnn_linear_pointwise` before calling oneDNN to run into an optimization path similar to https://github.com/pytorch/pytorch/pull/99511. Also refactored the code to provide a common utility function.

https://github.com/pytorch/pytorch/pull/111976 will ignore Dims of value 1 in Require_Stride_order. For a tensor with `size = [1, 1280]`, `stride = [0, 1]`:
**Before the above PR**, it is considered as non-contiguous, thus in the below call, it is converted to `size = [1, 1280]`, `stride = [1280,1]`:
25b83521be/torch/_inductor/ir.py (L5263)

**While after the above PR**, dims of value 1 are ignored so this tensor is already contiguous and we'll feed a tensor with `stride = [0, 1]` to oneDNN, which results in poor performance.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114939
Approved by: https://github.com/jgong5
2023-12-01 23:30:07 +00:00
283ce12aa9 Add channels_last3d support for mkldnn conv and mkldnn deconv (#95271)
### Motivation

- Add channels_last3d support for mkldnn conv and mkldnn deconv.
- Use `ideep::convolution_transpose_forward::compute_v3` instead of `ideep::convolution_transpose_forward::compute`.  compute_v3 uses `is_channels_last` to notify ideep whether to go CL or not to align with the memory format check of PyTorch.

### Testing
1 socket (28 cores):

- memory format: torch.contiguous_format

module | shape | forward / ms | backward / ms
-- | -- | -- | --
conv3d | input size: (32, 32, 10, 100, 100), weight size: (32, 32, 3, 3, 3) | 64.56885 | 150.1796
conv3d | input size: (32, 16, 10, 200, 200), weight size: (16, 16, 3, 3, 3) | 100.6754 | 231.8883
conv3d | input size: (16, 4, 5, 300, 300), weight size: (4, 4, 3, 3, 3) | 19.31751 | 68.31131

module | shape | forward / ms | backward / ms
-- | -- | -- | --
ConvTranspose3d | input size: (32, 32, 10, 100, 100), weight size: (32, 32, 3, 3, 3) | 122.7646 | 207.5125
ConvTranspose3d | input size: (32, 16, 10, 200, 200), weight size: (16, 16, 3, 3, 3) | 202.4542 | 368.5492
ConvTranspose3d | input size: (16, 4, 5, 300, 300), weight size: (4, 4, 3, 3, 3) | 122.959 | 84.62577

- memory format: torch.channels_last_3d

module | shape | forward / ms | backward / ms
-- | -- | -- | --
conv3d | input size: (32, 32, 10, 100, 100), weight size: (32, 32, 3, 3, 3) | 40.06993 | 114.317
conv3d | input size: (32, 16, 10, 200, 200), weight size: (16, 16, 3, 3, 3 | 49.08249 | 133.4079
conv3d | input size: (16, 4, 5, 300, 300), weight size: (4, 4, 3, 3, 3) | 5.873911 | 17.58647

module | shape | forward / ms | backward / ms
-- | -- | -- | --
ConvTranspose3d | input size: (32, 32, 10, 100, 100), weight size: (32, 32, 3, 3, 3) | 88.4246 | 208.2269
ConvTranspose3d | input size: (32, 16, 10, 200, 200), weight size: (16, 16, 3, 3, 3 | 140.0725 | 270.4172
ConvTranspose3d | input size: (16, 4, 5, 300, 300), weight size: (4, 4, 3, 3, 3) | 23.0223 | 37.16972

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95271
Approved by: https://github.com/jgong5, https://github.com/cpuhrsch
2023-08-30 02:53:30 +00:00
6d43c89f37 [BE]: Update Ruff to 0.0.280 (#105724)
Removes unusued loop values in python dictionary iteration. Automated fix from Ruff master

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105724
Approved by: https://github.com/ezyang, https://github.com/janeyx99
2023-07-22 23:03:34 +00:00
0aa6486441 inductor: reduce compile time for cpu backend by reducing weight conversion (#104402)
Before this PR, we always add ```to_mkldnn``` before doing weight packing, it is redundant, we can directly convert a dense tensor to block tensor.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104402
Approved by: https://github.com/jgong5, https://github.com/jansel, https://github.com/eellison, https://github.com/desertfire
2023-07-06 13:44:50 +00:00
917ac30aeb Revert "inductor: reduce compile time for cpu backend by reducing weight conversion (#104402)"
This reverts commit 6bfd507c15f2d26212d3e2b9e581d9525bfd37d1.

Reverted https://github.com/pytorch/pytorch/pull/104402 on behalf of https://github.com/XiaobingSuper due to introduce compile error for fp32 linear ([comment](https://github.com/pytorch/pytorch/pull/104402#issuecomment-1623189759))
2023-07-06 08:13:02 +00:00
6bfd507c15 inductor: reduce compile time for cpu backend by reducing weight conversion (#104402)
Before this PR, we always add ```to_mkldnn``` before doing weight packing, it is redundant, we can directly convert a dense tensor to block tensor.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104402
Approved by: https://github.com/jgong5, https://github.com/jansel, https://github.com/eellison
2023-07-06 06:07:05 +00:00
4cfa06f706 [BE] Deprecate has_XYZ attributes (#103279)
Use [`__getattr__`](https://peps.python.org/pep-0562/) to raise warningwhen one tries to access `has_XYZ` methods and recommend appropriate `torch.backends.XYZ` methods

Make respective properties in `torch._C` private (by prefixing them with underscore), to exclude from `from torch._C import *`.

Added `warnings.simplefilter` to workaround Python-3.11 torch.compile lineinfo issue.

Fixes https://github.com/pytorch/pytorch/issues/102484

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103279
Approved by: https://github.com/janeyx99, https://github.com/Skylion007
2023-06-10 05:17:17 +00:00
0d2e7a1888 support ConvBinaryInplace in Inductor cpp wrapper (#101394)
This PR has changed the OP schema since `at::Tensor&` should be the FirstArg:
87f9160b67/aten/src/ATen/core/boxing/impl/boxing.h (L305-L341)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101394
Approved by: https://github.com/jgong5, https://github.com/EikanWang, https://github.com/desertfire
2023-06-01 00:22:29 +00:00
046e88a291 [BE] [3/3] Rewrite super() calls in test (#94592)
Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied.

- #94587
- #94588
- #94592

Also, methods with only a `super()` call are removed:

```diff
class MyModule(nn.Module):
-   def __init__(self):
-       super().__init__()
-
    def forward(self, ...):
        ...
```

Some cases that change the semantics should be kept unchanged. E.g.:

f152a79be9/caffe2/python/net_printer.py (L184-L190)

f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94592
Approved by: https://github.com/ezyang, https://github.com/seemethere
2023-02-12 22:20:53 +00:00
bd4a5b400a [Re-open 90266] [inductor] weight prepack for _convolution_transpose_pointwise (#91955)
Re-open https://github.com/pytorch/pytorch/pull/90266 since earlier pr on that stack got reverted.
Depend on internal ideep upgrade.
[Update]: internal ideep upgrade issue is resolved in https://github.com/pytorch/pytorch/pull/92239.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91955
Approved by: https://github.com/jgong5, https://github.com/desertfire
2023-01-31 13:28:57 +00:00
3870fdabfb [Re-land 90264] add conv_transpose2d pointwise(unary) fusion kernel (#91953)
Re-land https://github.com/pytorch/pytorch/pull/90264.
Depend on internal ideep upgrade.
[Update]: internal ideep upgrade issue is resolved in https://github.com/pytorch/pytorch/pull/92239.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91953
Approved by: https://github.com/jgong5, https://github.com/desertfire
2023-01-31 12:58:05 +00:00
7d3f2b7902 Revert "add conv_transpose2d pointwise(unary) fusion kernel (#90264)"
This reverts commit 85698d0ac4686c10ba527f94724de61b4a856027.

Reverted https://github.com/pytorch/pytorch/pull/90264 on behalf of https://github.com/osalpekar due to build breakage on feed pytorch build package internally
2022-12-16 23:16:59 +00:00
85698d0ac4 add conv_transpose2d pointwise(unary) fusion kernel (#90264)
This PR adds `torch.ops.mkldnn._convolution_transpose_pointwise` that supports ConvTranspose fusion with the below unary pointwise OPs:

- relu
- sigmoid
- tanh
- hardswish
- leaky_relu
- hardtanh
- gelu

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90264
Approved by: https://github.com/jgong5, https://github.com/jansel
2022-12-15 14:16:58 +00:00
3e43ff2794 torchdynamo: add convolution add(relu) inplace fusion kernel (#88048)
This PR is about add convolution add(relu) inplace fusion kernel which  works for **other.add_(conv)**.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88048
Approved by: https://github.com/jgong5, https://github.com/jansel
2022-11-10 13:54:37 +00:00
46aaae98c5 torchdynamo: add linear pointwise(binary) fusion kernel (#86583)
Support binary fusion of Linear with:

- add
- sub
- mul
- div

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86583
Approved by: https://github.com/jgong5, https://github.com/jansel
2022-10-15 01:57:42 +00:00
5210fab64d torchdynamo: add convolution pointwise(binary) fusion kernel (#86582)
Support binary fusion of Convolution with:

- add
- sub
- mul
- div
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86582
Approved by: https://github.com/jgong5, https://github.com/jansel
2022-10-15 01:55:08 +00:00
9a7a49b254 torchdynamo: add convolution pointwise(unary) fusion kernel (#86581)
Support unary fusion of Convolution with:

- relu
- sigmoid
- tanh
- hardswish
- leaky_relu
- hardtanh
- gelu

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86581
Approved by: https://github.com/jgong5, https://github.com/jansel
2022-10-15 01:51:01 +00:00
c6b7c33885 torchdynamo: add linear eltwise fusion kernel (#85622)
Support fusion of linear with:

- relu
- sigmoid
- tanh
- hardswish
- leaky_relu
- hardtanh
- gelu
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85622
Approved by: https://github.com/EikanWang, https://github.com/jansel
2022-10-10 05:47:11 +00:00
ebf45a0785 [NNC] support aten::_convolution when it is 2D conv (#84038)
## Motivation
Currently, only `aten::conv2d` has been supported in NNC. When using `torch.jit.trace`, the node on the graph will be `aten::_convolution`. This PR adds support of `aten::_convolution` node when it corresponds to a 2D convolution.

## Pitch
Support `aten::_convolution` in NNC when we can infer from the parameters that it is a 2D convolution to support models obtained from `torch.jit.trace`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84038
Approved by: https://github.com/huiguoo
2022-09-19 17:45:20 +00:00
693a8dd04c [NNC] enable fusion of conv with elementwise OP (#77157)
## Pitch
Enable Conv-Eltwise fusion in NNC.

## Description
This PR adds a `FuseConvWithEltwise` pass to fuse convolution with elementwise OP for TE subgraph. This pass will insert prepack and packed run ops for conv2d and enable fusion of conv2d with elementwise OPs. The fused packed run ops is implemented via external call in NNC.

## Code structure
Graph rewrite pass related code is placed in:
```
torch/csrc/jit/passes/mkldnn_rewrite.h
torch/csrc/jit/passes/mkldnn_rewrite.cpp
```

NNC integration of fused conv-eltwise OP via external call is located in:
```
torch/csrc/jit/tensorexpr/kernel.cpp

torch/csrc/jit/tensorexpr/operators/conv2d.h
torch/csrc/jit/tensorexpr/operators/conv2d.cpp

torch/csrc/jit/tensorexpr/lowerings.cpp
torch/csrc/jit/tensorexpr/external_functions.cpp
```

Fused prepack OP context is in:
```
aten/src/ATen/native/mkldnn/Common.h
aten/src/ATen/native/mkldnn/RegisterMkldnnOpContextClass.cpp
aten/src/ATen/native/mkldnn/OpContext.h
aten/src/ATen/native/mkldnn/OpContext.cpp
```

Fused OP implementation is done in:
```
aten/src/ATen/native/mkldnn/ConvPrepack.h
aten/src/ATen/native/mkldnn/ConvPrepack.cpp
```

## OP benchmark for conv-relu
The below performance is measured on top of these two PRs to support NHWC: https://github.com/pytorch/pytorch/pull/76948 and https://github.com/pytorch/pytorch/pull/78238.

- Measured on Cascade Lake 8280
- Jemalloc enabled
- batch_size = 1
- Channels Last format

### Single thread:
<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40">

<head>

<meta name=ProgId content=Excel.Sheet>
<meta name=Generator content="Microsoft Excel 15">
<link id=Main-File rel=Main-File
href="file:///C:/Users/chunyuan/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
<link rel=File-List
href="file:///C:/Users/chunyuan/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">

</head>

<body link="#0563C1" vlink="#954F72">

shape | time (us)_no_fusion | time (us)_fusion | Gain
-- | -- | -- | --
kernel=3, N=1, iC=64, H=56, W=56, oC=64,   stride=1, pad=1, dilates=1, g=1 | 1706.22 | 1371.97 | 19.59%
kernel=1, N=1, iC=256, H=56, W=56,   oC=512, stride=2, pad=0, dilates=1, g=1 | 2499.28 | 1571.52 | 37.12%
kernel=3, N=1, iC=256, H=56, W=56,   oC=256, stride=1, pad=1, dilates=1, g=32 | 4169.52 | 2738.53 | 34.32%
kernel=3, N=1, iC=512, H=56, W=56,   oC=512, stride=2, pad=1, dilates=1, g=32 | 3998.77 | 3085.85 | 22.83%
kernel=1, N=1, iC=64, H=56, W=56, oC=64,   stride=1, pad=0, dilates=1, g=1 | 673.73 | 430.81 | 36.06%
kernel=1, N=1, iC=256, H=56, W=56, oC=64,   stride=1, pad=0, dilates=1, g=1 | 1101.87 | 801.07 | 27.30%
kernel=1, N=1, iC=256, H=56, W=56,   oC=256, stride=1, pad=0, dilates=1, g=1 | 4692.91 | 3116.13 | 33.60%
kernel=1, N=1, iC=512, H=28, W=28,   oC=512, stride=1, pad=0, dilates=1, g=1 | 3310.64 | 2503.39 | 24.38%

</body>

</html>

### 4 threads:
<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40">

<head>

<meta name=ProgId content=Excel.Sheet>
<meta name=Generator content="Microsoft Excel 15">
<link id=Main-File rel=Main-File
href="file:///C:/Users/chunyuan/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
<link rel=File-List
href="file:///C:/Users/chunyuan/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">

</head>

<body link="#0563C1" vlink="#954F72">

shape | time (us)_no_fusion | time (us)_fusion | Gain
-- | -- | -- | --
kernel=3, N=1, iC=64, H=56, W=56, oC=64,   stride=1, pad=1, dilates=1, g=1 | 360.07 | 321.21 | 10.79%
kernel=1, N=1, iC=256, H=56, W=56,   oC=512, stride=2, pad=0, dilates=1, g=1 | 391.49 | 323.17 | 17.45%
kernel=3, N=1, iC=256, H=56, W=56,   oC=256, stride=1, pad=1, dilates=1, g=32 | 536.4 | 465.97 | 13.13%
kernel=3, N=1, iC=512, H=56, W=56,   oC=512, stride=2, pad=1, dilates=1, g=32 | 674.98 | 616.32 | 8.69%
kernel=1, N=1, iC=64, H=56, W=56, oC=64,   stride=1, pad=0, dilates=1, g=1 | 160.97 | 70.05 | 56.48%
kernel=1, N=1, iC=256, H=56, W=56, oC=64,   stride=1, pad=0, dilates=1, g=1 | 215.81 | 182.6 | 15.39%
kernel=1, N=1, iC=256, H=56, W=56,   oC=256, stride=1, pad=0, dilates=1, g=1 | 658.45 | 576.97 | 12.37%
kernel=1, N=1, iC=512, H=28, W=28,   oC=512, stride=1, pad=0, dilates=1, g=1 | 702.18 | 566.39 | 19.34%

</body>

</html>

### 1 socket (28 cores):
<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40">

<head>

<meta name=ProgId content=Excel.Sheet>
<meta name=Generator content="Microsoft Excel 15">
<link id=Main-File rel=Main-File
href="file:///C:/Users/chunyuan/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
<link rel=File-List
href="file:///C:/Users/chunyuan/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">

</head>

<body link="#0563C1" vlink="#954F72">

shape | time (us)_no_fusion | time (us)_fusion | Gain
-- | -- | -- | --
kernel=3, N=1, iC=64, H=56, W=56, oC=64,   stride=1, pad=1, dilates=1, g=1 | 149.92 | 103.78 | 30.78%
kernel=1, N=1, iC=256, H=56, W=56,   oC=512, stride=2, pad=0, dilates=1, g=1 | 192.76 | 110.87 | 42.48%
kernel=3, N=1, iC=256, H=56, W=56,   oC=256, stride=1, pad=1, dilates=1, g=32 | 160.67 | 127.24 | 20.81%
kernel=3, N=1, iC=512, H=56, W=56,   oC=512, stride=2, pad=1, dilates=1, g=32 | 212.45 | 180.55 | 15.02%
kernel=1, N=1, iC=64, H=56, W=56, oC=64,   stride=1, pad=0, dilates=1, g=1 | 114.57 | 50.58 | 55.85%
kernel=1, N=1, iC=256, H=56, W=56, oC=64,   stride=1, pad=0, dilates=1, g=1 | 198.64 | 70.6 | 64.46%
kernel=1, N=1, iC=256, H=56, W=56,   oC=256, stride=1, pad=0, dilates=1, g=1 | 281.35 | 155.8 | 44.62%
kernel=1, N=1, iC=512, H=28, W=28,   oC=512, stride=1, pad=0, dilates=1, g=1 | 262.15 | 162.94 | 37.84%

</body>

</html>

## UT
```
test/test_mkldnn_fusion.py
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77157
Approved by: https://github.com/ZolotukhinM
2022-08-10 21:46:51 +00:00