pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 12:54:11 +08:00

Author	SHA1	Message	Date
cyy	df458be4e5	[4/N] Apply py39 ruff and pyupgrade fixes (#143257 ) ```torch/fx/passes/annotate_getitem_nodes.py``` was changed to support the new type hinting annotations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143257 Approved by: https://github.com/justinchuby, https://github.com/albanD	2025-01-04 10:47:51 +00:00
Jiang, Yanbing	fc6066b80f	improve mkldnn_linear_pointwise_binary performance for contiguous tensor with non default contiguous strides (#132019 ) Fixes https://github.com/pytorch/pytorch/issues/131734 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132019 Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5	2024-07-30 05:02:38 +00:00
rzou	db1a6eda9e	[codemod] markDynamoStrictTest batch 22 (#117729 ) [codemod] markDynamoStrictTest test_autograd [codemod] markDynamoStrictTest test_ao_sparsity [codemod] markDynamoStrictTest test_jit [codemod] markDynamoStrictTest test_quantization Pull Request resolved: https://github.com/pytorch/pytorch/pull/117729 Approved by: https://github.com/bdhirsh	2024-01-18 16:59:26 +00:00
chunyuan	80d8a2a237	improve mkldnn_linear_pointwise performance for contiguous tensor with non default contiguous strides (#114939 ) This PR will convert the stride to the default contiguous stride in `mkldnn_linear_pointwise` before calling oneDNN to run into an optimization path similar to https://github.com/pytorch/pytorch/pull/99511. Also refactored the code to provide a common utility function. https://github.com/pytorch/pytorch/pull/111976 will ignore Dims of value 1 in Require_Stride_order. For a tensor with `size = [1, 1280]`, `stride = [0, 1]`: Before the above PR, it is considered as non-contiguous, thus in the below call, it is converted to `size = [1, 1280]`, `stride = [1280,1]`: `25b83521be/torch/_inductor/ir.py (L5263)` While after the above PR, dims of value 1 are ignored so this tensor is already contiguous and we'll feed a tensor with `stride = [0, 1]` to oneDNN, which results in poor performance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114939 Approved by: https://github.com/jgong5	2023-12-01 23:30:07 +00:00
CaoE	283ce12aa9	Add channels_last3d support for mkldnn conv and mkldnn deconv (#95271 ) ### Motivation - Add channels_last3d support for mkldnn conv and mkldnn deconv. - Use `ideep::convolution_transpose_forward::compute_v3` instead of `ideep::convolution_transpose_forward::compute`. compute_v3 uses `is_channels_last` to notify ideep whether to go CL or not to align with the memory format check of PyTorch. ### Testing 1 socket (28 cores): - memory format: torch.contiguous_format module \| shape \| forward / ms \| backward / ms -- \| -- \| -- \| -- conv3d \| input size: (32, 32, 10, 100, 100), weight size: (32, 32, 3, 3, 3) \| 64.56885 \| 150.1796 conv3d \| input size: (32, 16, 10, 200, 200), weight size: (16, 16, 3, 3, 3) \| 100.6754 \| 231.8883 conv3d \| input size: (16, 4, 5, 300, 300), weight size: (4, 4, 3, 3, 3) \| 19.31751 \| 68.31131 module \| shape \| forward / ms \| backward / ms -- \| -- \| -- \| -- ConvTranspose3d \| input size: (32, 32, 10, 100, 100), weight size: (32, 32, 3, 3, 3) \| 122.7646 \| 207.5125 ConvTranspose3d \| input size: (32, 16, 10, 200, 200), weight size: (16, 16, 3, 3, 3) \| 202.4542 \| 368.5492 ConvTranspose3d \| input size: (16, 4, 5, 300, 300), weight size: (4, 4, 3, 3, 3) \| 122.959 \| 84.62577 - memory format: torch.channels_last_3d module \| shape \| forward / ms \| backward / ms -- \| -- \| -- \| -- conv3d \| input size: (32, 32, 10, 100, 100), weight size: (32, 32, 3, 3, 3) \| 40.06993 \| 114.317 conv3d \| input size: (32, 16, 10, 200, 200), weight size: (16, 16, 3, 3, 3 \| 49.08249 \| 133.4079 conv3d \| input size: (16, 4, 5, 300, 300), weight size: (4, 4, 3, 3, 3) \| 5.873911 \| 17.58647 module \| shape \| forward / ms \| backward / ms -- \| -- \| -- \| -- ConvTranspose3d \| input size: (32, 32, 10, 100, 100), weight size: (32, 32, 3, 3, 3) \| 88.4246 \| 208.2269 ConvTranspose3d \| input size: (32, 16, 10, 200, 200), weight size: (16, 16, 3, 3, 3 \| 140.0725 \| 270.4172 ConvTranspose3d \| input size: (16, 4, 5, 300, 300), weight size: (4, 4, 3, 3, 3) \| 23.0223 \| 37.16972 Pull Request resolved: https://github.com/pytorch/pytorch/pull/95271 Approved by: https://github.com/jgong5, https://github.com/cpuhrsch	2023-08-30 02:53:30 +00:00
Aaron Gokaslan	6d43c89f37	[BE]: Update Ruff to 0.0.280 (#105724 ) Removes unusued loop values in python dictionary iteration. Automated fix from Ruff master Pull Request resolved: https://github.com/pytorch/pytorch/pull/105724 Approved by: https://github.com/ezyang, https://github.com/janeyx99	2023-07-22 23:03:34 +00:00
XiaobingSuper	0aa6486441	inductor: reduce compile time for cpu backend by reducing weight conversion (#104402 ) Before this PR, we always add ```to_mkldnn``` before doing weight packing, it is redundant, we can directly convert a dense tensor to block tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104402 Approved by: https://github.com/jgong5, https://github.com/jansel, https://github.com/eellison, https://github.com/desertfire	2023-07-06 13:44:50 +00:00
PyTorch MergeBot	917ac30aeb	Revert "inductor: reduce compile time for cpu backend by reducing weight conversion (#104402 )" This reverts commit 6bfd507c15f2d26212d3e2b9e581d9525bfd37d1. Reverted https://github.com/pytorch/pytorch/pull/104402 on behalf of https://github.com/XiaobingSuper due to introduce compile error for fp32 linear ([comment](https://github.com/pytorch/pytorch/pull/104402#issuecomment-1623189759))	2023-07-06 08:13:02 +00:00
XiaobingSuper	6bfd507c15	inductor: reduce compile time for cpu backend by reducing weight conversion (#104402 ) Before this PR, we always add ```to_mkldnn``` before doing weight packing, it is redundant, we can directly convert a dense tensor to block tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104402 Approved by: https://github.com/jgong5, https://github.com/jansel, https://github.com/eellison	2023-07-06 06:07:05 +00:00
Nikita Shulga	4cfa06f706	[BE] Deprecate `has_XYZ` attributes (#103279 ) Use [`__getattr__`](https://peps.python.org/pep-0562/) to raise warningwhen one tries to access `has_XYZ` methods and recommend appropriate `torch.backends.XYZ` methods Make respective properties in `torch._C` private (by prefixing them with underscore), to exclude from `from torch._C import *`. Added `warnings.simplefilter` to workaround Python-3.11 torch.compile lineinfo issue. Fixes https://github.com/pytorch/pytorch/issues/102484 Pull Request resolved: https://github.com/pytorch/pytorch/pull/103279 Approved by: https://github.com/janeyx99, https://github.com/Skylion007	2023-06-10 05:17:17 +00:00
chunyuan	0d2e7a1888	support ConvBinaryInplace in Inductor cpp wrapper (#101394 ) This PR has changed the OP schema since `at::Tensor&` should be the FirstArg: `87f9160b67/aten/src/ATen/core/boxing/impl/boxing.h (L305-L341)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/101394 Approved by: https://github.com/jgong5, https://github.com/EikanWang, https://github.com/desertfire	2023-06-01 00:22:29 +00:00
Xuehai Pan	046e88a291	[BE] [3/3] Rewrite `super()` calls in test (#94592 ) Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied. - #94587 - #94588 - #94592 Also, methods with only a `super()` call are removed: ```diff class MyModule(nn.Module): - def __init__(self): - super().__init__() - def forward(self, ...): ... ``` Some cases that change the semantics should be kept unchanged. E.g.: `f152a79be9/caffe2/python/net_printer.py (L184-L190)` `f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/94592 Approved by: https://github.com/ezyang, https://github.com/seemethere	2023-02-12 22:20:53 +00:00
chunyuan	bd4a5b400a	[Re-open 90266] [inductor] weight prepack for _convolution_transpose_pointwise (#91955 ) Re-open https://github.com/pytorch/pytorch/pull/90266 since earlier pr on that stack got reverted. Depend on internal ideep upgrade. [Update]: internal ideep upgrade issue is resolved in https://github.com/pytorch/pytorch/pull/92239. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91955 Approved by: https://github.com/jgong5, https://github.com/desertfire	2023-01-31 13:28:57 +00:00
chunyuan	3870fdabfb	[Re-land 90264] add conv_transpose2d pointwise(unary) fusion kernel (#91953 ) Re-land https://github.com/pytorch/pytorch/pull/90264. Depend on internal ideep upgrade. [Update]: internal ideep upgrade issue is resolved in https://github.com/pytorch/pytorch/pull/92239. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91953 Approved by: https://github.com/jgong5, https://github.com/desertfire	2023-01-31 12:58:05 +00:00
PyTorch MergeBot	7d3f2b7902	Revert "add conv_transpose2d pointwise(unary) fusion kernel (#90264 )" This reverts commit 85698d0ac4686c10ba527f94724de61b4a856027. Reverted https://github.com/pytorch/pytorch/pull/90264 on behalf of https://github.com/osalpekar due to build breakage on feed pytorch build package internally	2022-12-16 23:16:59 +00:00
chunyuan	85698d0ac4	add conv_transpose2d pointwise(unary) fusion kernel (#90264 ) This PR adds `torch.ops.mkldnn._convolution_transpose_pointwise` that supports ConvTranspose fusion with the below unary pointwise OPs: - relu - sigmoid - tanh - hardswish - leaky_relu - hardtanh - gelu Pull Request resolved: https://github.com/pytorch/pytorch/pull/90264 Approved by: https://github.com/jgong5, https://github.com/jansel	2022-12-15 14:16:58 +00:00
XiaobingSuper	3e43ff2794	torchdynamo: add convolution add(relu) inplace fusion kernel (#88048 ) This PR is about add convolution add(relu) inplace fusion kernel which works for other.add_(conv). Pull Request resolved: https://github.com/pytorch/pytorch/pull/88048 Approved by: https://github.com/jgong5, https://github.com/jansel	2022-11-10 13:54:37 +00:00
XiaobingSuper	46aaae98c5	torchdynamo: add linear pointwise(binary) fusion kernel (#86583 ) Support binary fusion of Linear with: - add - sub - mul - div Pull Request resolved: https://github.com/pytorch/pytorch/pull/86583 Approved by: https://github.com/jgong5, https://github.com/jansel	2022-10-15 01:57:42 +00:00
XiaobingSuper	5210fab64d	torchdynamo: add convolution pointwise(binary) fusion kernel (#86582 ) Support binary fusion of Convolution with: - add - sub - mul - div Pull Request resolved: https://github.com/pytorch/pytorch/pull/86582 Approved by: https://github.com/jgong5, https://github.com/jansel	2022-10-15 01:55:08 +00:00
XiaobingSuper	9a7a49b254	torchdynamo: add convolution pointwise(unary) fusion kernel (#86581 ) Support unary fusion of Convolution with: - relu - sigmoid - tanh - hardswish - leaky_relu - hardtanh - gelu Pull Request resolved: https://github.com/pytorch/pytorch/pull/86581 Approved by: https://github.com/jgong5, https://github.com/jansel	2022-10-15 01:51:01 +00:00
chunyuan	c6b7c33885	torchdynamo: add linear eltwise fusion kernel (#85622 ) Support fusion of linear with: - relu - sigmoid - tanh - hardswish - leaky_relu - hardtanh - gelu Pull Request resolved: https://github.com/pytorch/pytorch/pull/85622 Approved by: https://github.com/EikanWang, https://github.com/jansel	2022-10-10 05:47:11 +00:00
Wu, Chunyuan	ebf45a0785	[NNC] support aten::_convolution when it is 2D conv (#84038 ) ## Motivation Currently, only `aten::conv2d` has been supported in NNC. When using `torch.jit.trace`, the node on the graph will be `aten::_convolution`. This PR adds support of `aten::_convolution` node when it corresponds to a 2D convolution. ## Pitch Support `aten::_convolution` in NNC when we can infer from the parameters that it is a 2D convolution to support models obtained from `torch.jit.trace`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84038 Approved by: https://github.com/huiguoo	2022-09-19 17:45:20 +00:00
chunyuan-w	693a8dd04c	[NNC] enable fusion of conv with elementwise OP (#77157 ) ## Pitch Enable Conv-Eltwise fusion in NNC. ## Description This PR adds a `FuseConvWithEltwise` pass to fuse convolution with elementwise OP for TE subgraph. This pass will insert prepack and packed run ops for conv2d and enable fusion of conv2d with elementwise OPs. The fused packed run ops is implemented via external call in NNC. ## Code structure Graph rewrite pass related code is placed in: ``` torch/csrc/jit/passes/mkldnn_rewrite.h torch/csrc/jit/passes/mkldnn_rewrite.cpp ``` NNC integration of fused conv-eltwise OP via external call is located in: ``` torch/csrc/jit/tensorexpr/kernel.cpp torch/csrc/jit/tensorexpr/operators/conv2d.h torch/csrc/jit/tensorexpr/operators/conv2d.cpp torch/csrc/jit/tensorexpr/lowerings.cpp torch/csrc/jit/tensorexpr/external_functions.cpp ``` Fused prepack OP context is in: ``` aten/src/ATen/native/mkldnn/Common.h aten/src/ATen/native/mkldnn/RegisterMkldnnOpContextClass.cpp aten/src/ATen/native/mkldnn/OpContext.h aten/src/ATen/native/mkldnn/OpContext.cpp ``` Fused OP implementation is done in: ``` aten/src/ATen/native/mkldnn/ConvPrepack.h aten/src/ATen/native/mkldnn/ConvPrepack.cpp ``` ## OP benchmark for conv-relu The below performance is measured on top of these two PRs to support NHWC: https://github.com/pytorch/pytorch/pull/76948 and https://github.com/pytorch/pytorch/pull/78238. - Measured on Cascade Lake 8280 - Jemalloc enabled - batch_size = 1 - Channels Last format ### Single thread: <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40"> <head> <meta name=ProgId content=Excel.Sheet> <meta name=Generator content="Microsoft Excel 15"> <link id=Main-File rel=Main-File href="file:///C:/Users/chunyuan/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> <link rel=File-List href="file:///C:/Users/chunyuan/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> </head> <body link="#0563C1" vlink="#954F72"> shape \| time (us)_no_fusion \| time (us)_fusion \| Gain -- \| -- \| -- \| -- kernel=3, N=1, iC=64, H=56, W=56, oC=64, stride=1, pad=1, dilates=1, g=1 \| 1706.22 \| 1371.97 \| 19.59% kernel=1, N=1, iC=256, H=56, W=56, oC=512, stride=2, pad=0, dilates=1, g=1 \| 2499.28 \| 1571.52 \| 37.12% kernel=3, N=1, iC=256, H=56, W=56, oC=256, stride=1, pad=1, dilates=1, g=32 \| 4169.52 \| 2738.53 \| 34.32% kernel=3, N=1, iC=512, H=56, W=56, oC=512, stride=2, pad=1, dilates=1, g=32 \| 3998.77 \| 3085.85 \| 22.83% kernel=1, N=1, iC=64, H=56, W=56, oC=64, stride=1, pad=0, dilates=1, g=1 \| 673.73 \| 430.81 \| 36.06% kernel=1, N=1, iC=256, H=56, W=56, oC=64, stride=1, pad=0, dilates=1, g=1 \| 1101.87 \| 801.07 \| 27.30% kernel=1, N=1, iC=256, H=56, W=56, oC=256, stride=1, pad=0, dilates=1, g=1 \| 4692.91 \| 3116.13 \| 33.60% kernel=1, N=1, iC=512, H=28, W=28, oC=512, stride=1, pad=0, dilates=1, g=1 \| 3310.64 \| 2503.39 \| 24.38% </body> </html> ### 4 threads: <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40"> <head> <meta name=ProgId content=Excel.Sheet> <meta name=Generator content="Microsoft Excel 15"> <link id=Main-File rel=Main-File href="file:///C:/Users/chunyuan/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> <link rel=File-List href="file:///C:/Users/chunyuan/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> </head> <body link="#0563C1" vlink="#954F72"> shape \| time (us)_no_fusion \| time (us)_fusion \| Gain -- \| -- \| -- \| -- kernel=3, N=1, iC=64, H=56, W=56, oC=64, stride=1, pad=1, dilates=1, g=1 \| 360.07 \| 321.21 \| 10.79% kernel=1, N=1, iC=256, H=56, W=56, oC=512, stride=2, pad=0, dilates=1, g=1 \| 391.49 \| 323.17 \| 17.45% kernel=3, N=1, iC=256, H=56, W=56, oC=256, stride=1, pad=1, dilates=1, g=32 \| 536.4 \| 465.97 \| 13.13% kernel=3, N=1, iC=512, H=56, W=56, oC=512, stride=2, pad=1, dilates=1, g=32 \| 674.98 \| 616.32 \| 8.69% kernel=1, N=1, iC=64, H=56, W=56, oC=64, stride=1, pad=0, dilates=1, g=1 \| 160.97 \| 70.05 \| 56.48% kernel=1, N=1, iC=256, H=56, W=56, oC=64, stride=1, pad=0, dilates=1, g=1 \| 215.81 \| 182.6 \| 15.39% kernel=1, N=1, iC=256, H=56, W=56, oC=256, stride=1, pad=0, dilates=1, g=1 \| 658.45 \| 576.97 \| 12.37% kernel=1, N=1, iC=512, H=28, W=28, oC=512, stride=1, pad=0, dilates=1, g=1 \| 702.18 \| 566.39 \| 19.34% </body> </html> ### 1 socket (28 cores): <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40"> <head> <meta name=ProgId content=Excel.Sheet> <meta name=Generator content="Microsoft Excel 15"> <link id=Main-File rel=Main-File href="file:///C:/Users/chunyuan/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> <link rel=File-List href="file:///C:/Users/chunyuan/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> </head> <body link="#0563C1" vlink="#954F72"> shape \| time (us)_no_fusion \| time (us)_fusion \| Gain -- \| -- \| -- \| -- kernel=3, N=1, iC=64, H=56, W=56, oC=64, stride=1, pad=1, dilates=1, g=1 \| 149.92 \| 103.78 \| 30.78% kernel=1, N=1, iC=256, H=56, W=56, oC=512, stride=2, pad=0, dilates=1, g=1 \| 192.76 \| 110.87 \| 42.48% kernel=3, N=1, iC=256, H=56, W=56, oC=256, stride=1, pad=1, dilates=1, g=32 \| 160.67 \| 127.24 \| 20.81% kernel=3, N=1, iC=512, H=56, W=56, oC=512, stride=2, pad=1, dilates=1, g=32 \| 212.45 \| 180.55 \| 15.02% kernel=1, N=1, iC=64, H=56, W=56, oC=64, stride=1, pad=0, dilates=1, g=1 \| 114.57 \| 50.58 \| 55.85% kernel=1, N=1, iC=256, H=56, W=56, oC=64, stride=1, pad=0, dilates=1, g=1 \| 198.64 \| 70.6 \| 64.46% kernel=1, N=1, iC=256, H=56, W=56, oC=256, stride=1, pad=0, dilates=1, g=1 \| 281.35 \| 155.8 \| 44.62% kernel=1, N=1, iC=512, H=28, W=28, oC=512, stride=1, pad=0, dilates=1, g=1 \| 262.15 \| 162.94 \| 37.84% </body> </html> ## UT ``` test/test_mkldnn_fusion.py ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/77157 Approved by: https://github.com/ZolotukhinM	2022-08-10 21:46:51 +00:00

23 Commits