pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Jiang, Yanbing	f4d8bc46c7	Enable TF32 as fp32 internal precision for matmul/linear/conv (#157520 ) ### Description This PR is to enable TF32 as fp32 internal precision for matmul/linear/conv in `mkldnn backend`. Since we have refined fp32 precision API in https://github.com/pytorch/pytorch/pull/125888, we can easily extend the API to support TF32 for `mkldnn backend`. ``` torch.backends.mkldnn.matmul.fp32_precision = 'tf32' torch.backends.mkldnn.conv.fp32_precision = "tf32" ``` Related kernel update and UTs update are done. And the wrapper `bf32_on_and _off` is updated to `reduced_f32_on_and_off`, and it can run tests 3 times, one is reduced_f32 OFF, the other two are reduced_f32 ON (including `bf32 ON` and `tf32 ON`). Pull Request resolved: https://github.com/pytorch/pytorch/pull/157520 Approved by: https://github.com/mingfeima, https://github.com/jansel	2025-07-17 08:57:34 +00:00
Xuehai Pan	fc0376e8b1	[BE][2/6] fix typos in test/ (test/test_*.py) (#157636 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/157636 Approved by: https://github.com/yewentao256, https://github.com/mlazos ghstack dependencies: #156311, #156609	2025-07-09 11:02:23 +00:00
haozhe.zhu	5a2db5152d	allow to use bf16 as fp32 internal precision for mkldnn conv (#126050 ) Allow to use `BF16` as the internal computation data types by `torch.backends.mkldnn.conv.fp32_precision="bf16"` ### TestPlan python test/test_mkldnn.py -k conv ### Benchmarking FP32 conv2d vs. BF16 internal computation conv2d on SPR Single core: Input \| fp32 ms \| bf16 internal ms \| Speed up -- \| -- \| -- \| -- IC: 64, OC: 256, kernel: 1, stride: 1, N: 256, H: 56, W: 56, G: 1, pad: 0 \| 185.5071 \| 83.4749 \| 2.22 IC: 128, OC: 512, kernel: 1, stride: 1, N: 256, H: 28, W: 28, G: 1, pad: 0 \| 194.7558 \| 79.1683\| 2.46 IC: 256, OC: 256, kernel: 3, stride: 1, N: 1, H: 16, W: 16, G: 1, pad: 0 \| 1.9213 \| 1.3690 \| 1.40 56 cores: Input \| fp32 ms \| bf16 internal ms \| Speed up -- \| -- \| -- \| -- IC: 64, OC: 256, kernel: 1, stride: 1, N: 256, H: 28, W: 28, G: 1, pad: 0 \| 6.5804 \| 7.4349 \| 0.89 IC: 128, OC: 512, kernel: 1, stride: 1, N: 256, H: 28, W: 28, G: 1, pad: 0 \| 4.9940 \| 3.8093 \| 1.31 IC: 256, OC: 1024, kernel: 1, stride: 1, N: 256, H: 14, W: 14, G: 1, pad: 0 \| 8.8359 \| 5.5802 \| 1.58 IC: 1024, OC: 256, kernel: 1, stride: 1, N: 256, H: 14, W: 14, G: 1, pad: 0 \| 16.5800 \| 9.2367 \| 1.80 IC: 256, OC: 256, kernel: 3, stride: 1, N: 1, H: 16, W: 16, G: 1, pad: 0 \| 79.5436 \| 38.3861 \| 2.07 Pull Request resolved: https://github.com/pytorch/pytorch/pull/126050 Approved by: https://github.com/jgong5, https://github.com/jansel Co-authored-by: Jiang, Yanbing <yanbing.jiang@intel.com>	2025-07-02 01:31:23 +00:00
haozhe.zhu	53e0b9c393	refine fp32 precision api (#125888 ) Based on the [conversation](https://github.com/pytorch/pytorch/issues/121791), we plan to drop the "highest, high, medium" to represent fp32 internal computation data types . Instead, we will directly use the algorithm to represent it. ### Design Choice: Directly use algorithms name like "TF32", "BF16". #### Pros - The names are more informative. 'tf32' is more informative than a simple "high". - Easier to extend new algorithm like `tf32x3` #### Cons - "HIGHEST, HIGH, MEDIUM" indicated the relative precision between different algorithms. However, we can have more documents to discuss them. ### We provide a layered structure for backends/operators. ('f32' is short for 'fp32_precision') ![image](https://github.com/user-attachments/assets/f89143e5-d6a1-4865-9351-9a50439f5067) ### We provide 3 fp32 compute precision can be set: - "ieee": Not allowed to use any other internal computation data types . - "tf32": Allowed to use tf32 as internal computation data types. - "bf16": Allowed to use bf16 as internal computation data types. - "none": Precision's are not set. Can be override by its father node. ### Overriding Precision Settings Child node can be override by its father node if it is set to default. For current default settings: ``` backend = generic, op = all, precision setting = none backend = cuda, op = all, precision setting = none backend = cuda, op = conv, precision setting = tf32 backend = cuda, op = rnn, precision setting = tf32 backend = cuda, op = matmul, precision setting = none backend = matmul, op = all, precision setting = none backend = matmul, op = conv, precision setting = none backend = matmul, op = rnn, precision setting = none backend = matmul, op = matmul, precision setting = none ``` - If the user set `torch.backends.mkldnn.fp32_precision="bf16"`, his child nodes `torch.backends.mkldnn.matmul.fp32_precision` / `torch.backends.mkldnn.conv.fp32_precision` / `torch.backends.mkldnn.rnn.fp32_precision` will also be override to "bf16". - If the user set `torch.backends.fp32_precision="bf16"`, `torch.backends.mkldnn.fp32_precision` and his child nodes will also we override to "bf16". ### Backward Compatible Since new API allow user to have more fine-grained control. There will be some conflict. For example, previous `torch.backends.cudnn.allow_tf32` are not enough to represent the status for `torch.backends.cudnn.rnn.fp32_precision="ieee"` and `torch.backends.cudnn.conv.fp32_precision="tf32"`. Therefore, our goal for backward compatible is - If the user only uses previous APIs, it will work as previous expectations. - If the user use new API to change the status to an un-representable status for old API, and try to access the status by old API. We will raise Runtime Error and point the document for user. ### Test Plan ``` python test/test_cuda.py -k test_fp32_precision_with_tf32 python test/test_cuda.py -k test_fp32_precision_with_float32_matmul_precision python test/test_cuda.py -k test_invalid_status_for_legacy_api python test/test_mkldnn.py -k test_mlkdnn_get_set python test/test_mkldnn.py -k test_generic_precision python test/test_mkldnn.py -k test_invalid python test/test_mkldnn.py -k test_default_use_parent ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/125888 Approved by: https://github.com/jgong5, https://github.com/albanD Co-authored-by: Jiang, Yanbing <yanbing.jiang@intel.com>	2025-06-26 10:32:20 +00:00
Jiang, Yanbing	a264af8c71	Support fp8 output of _scaled_mm for CPU (#153600 ) This PR is to support fp8 output of torch._scaled_mm for CPU, and create related UTs with fp8 and bf16/fp16/fp32 output. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153600 Approved by: https://github.com/leslie-fang-intel, https://github.com/mingfeima, https://github.com/jansel	2025-05-22 01:15:39 +00:00
PyTorch MergeBot	fdc387ec7c	Revert "refine fp32 precision api (#125888 )" This reverts commit 4c11b26158691cfd9ad48338ddebd1ca9bded788. Reverted https://github.com/pytorch/pytorch/pull/125888 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to cause some failures on ROCm ([comment](https://github.com/pytorch/pytorch/pull/125888#issuecomment-2869274791))	2025-05-11 00:35:46 +00:00
haozhe.zhu	4c11b26158	refine fp32 precision api (#125888 ) Based on the [conversation](https://github.com/pytorch/pytorch/issues/121791), we plan to drop the "highest, high, medium" to represent fp32 internal computation data types . Instead, we will directly use the algorithm to represent it. ### Design Choice: Directly use algorithms name like "TF32", "BF16". #### Pros - The names are more informative. 'tf32' is more informative than a simple "high". - Easier to extend new algorithm like `tf32x3` #### Cons - "HIGHEST, HIGH, MEDIUM" indicated the relative precision between different algorithms. However, we can have more documents to discuss them. ### We provide a layered structure for backends/operators. ('f32' is short for 'fp32_precision') ![image](https://github.com/user-attachments/assets/f89143e5-d6a1-4865-9351-9a50439f5067) ### We provide 3 fp32 compute precision can be set: - "ieee": Not allowed to use any other internal computation data types . - "tf32": Allowed to use tf32 as internal computation data types. - "bf16": Allowed to use bf16 as internal computation data types. - "none": Precision's are not set. Can be override by its father node. ### Overriding Precision Settings Child node can be override by its father node if it is set to default. For current default settings: ``` backend = generic, op = all, precision setting = none backend = cuda, op = all, precision setting = none backend = cuda, op = conv, precision setting = tf32 backend = cuda, op = rnn, precision setting = tf32 backend = cuda, op = matmul, precision setting = none backend = matmul, op = all, precision setting = none backend = matmul, op = conv, precision setting = none backend = matmul, op = rnn, precision setting = none backend = matmul, op = matmul, precision setting = none ``` - If the user set `torch.backends.mkldnn.fp32_precision="bf16"`, his child nodes `torch.backends.mkldnn.matmul.fp32_precision` / `torch.backends.mkldnn.conv.fp32_precision` / `torch.backends.mkldnn.rnn.fp32_precision` will also be override to "bf16". - If the user set `torch.backends.fp32_precision="bf16"`, `torch.backends.mkldnn.fp32_precision` and his child nodes will also we override to "bf16". ### Backward Compatible Since new API allow user to have more fine-grained control. There will be some conflict. For example, previous `torch.backends.cudnn.allow_tf32` are not enough to represent the status for `torch.backends.cudnn.rnn.fp32_precision="ieee"` and `torch.backends.cudnn.conv.fp32_precision="tf32"`. Therefore, our goal for backward compatible is - If the user only uses previous APIs, it will work as previous expectations. - If the user use new API to change the status to an un-representable status for old API, and try to access the status by old API. We will raise Runtime Error and point the document for user. ### Test Plan ``` python test/test_cuda.py -k test_fp32_precision_with_tf32 python test/test_cuda.py -k test_fp32_precision_with_float32_matmul_precision python test/test_cuda.py -k test_invalid_status_for_legacy_api python test/test_mkldnn.py -k test_mlkdnn_get_set python test/test_mkldnn.py -k test_generic_precision python test/test_mkldnn.py -k test_invalid python test/test_mkldnn.py -k test_default_use_parent ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/125888 Approved by: https://github.com/jgong5, https://github.com/albanD Co-authored-by: Jiang, Yanbing <yanbing.jiang@intel.com>	2025-05-10 11:13:04 +00:00
Nikita Shulga	6f327128a9	[MKLDNN] Check that strides are positive (#151848 ) For pooling ops. Prevents division-by-zero when argument is wrong Fixes https://github.com/pytorch/pytorch/issues/149274 Pull Request resolved: https://github.com/pytorch/pytorch/pull/151848 Approved by: https://github.com/atalman	2025-04-22 04:25:47 +00:00
Nikita Shulga	6470b373c1	`torch.backends.mkldnn.flags()` CM should not warn (#150358 ) By returning `None` rather than `False` from `THPModule_allowTF32OneDNN` when USE_XPU is not defined Added regression test Fixes https://github.com/pytorch/pytorch/issues/149829 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/150358 Approved by: https://github.com/atalman	2025-04-01 01:33:40 +00:00
Xuehai Pan	b77406a9ec	[BE][CI] bump `ruff` to 0.8.4 (#143753 ) Changes: 1. Bump `ruff` from 0.7.4 to 0.8.4 2. Change `%`-formatted strings to f-string 3. Change arguments with the `__`-prefix to positional-only arguments with the `/` separator in function signature. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143753 Approved by: https://github.com/Skylion007	2024-12-24 12:24:10 +00:00
Tom Ritchford	d8c8ba2440	Fix unused Python variables in test/[e-z]* (#136964 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136964 Approved by: https://github.com/justinchuby, https://github.com/albanD	2024-12-18 23:02:30 +00:00
CaoE	17e9c2d1e7	Add oneDNN support for Half LSTM on CPU (#132607 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132607 Approved by: https://github.com/jgong5, https://github.com/peterbell10	2024-08-29 03:40:10 +00:00
William Wen	5359af0c7e	[dynamo] wrap GraphModule exceptions in dynamo-wrapped tests (#126341 ) Better approach to https://github.com/pytorch/pytorch/pull/126197 to catch issues like https://github.com/pytorch/pytorch/issues/125568. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126341 Approved by: https://github.com/anijain2305, https://github.com/jansel	2024-05-29 05:18:04 +00:00
Aaron Gokaslan	ee5d981249	[BE]: Enable RUFF PERF402 and apply fixes (#115505 ) * Enable PERF402. Makes code more efficient and succinct by removing useless list copies that could be accomplished either via a list constructor or extend call. All test cases have noqa added since performance is not as sensitive in that folder. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115505 Approved by: https://github.com/malfet	2023-12-20 18:01:24 +00:00
Sergii Dymchenko	9c9f66c042	[TorchFix] Update old pretrained TorchVision API in tests (#111708 ) For TorchVision models, `pretrained` parameters have been deprecated in favor of "Multi-weight support API" - see https://pytorch.org/vision/0.15/models.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/111708 Approved by: https://github.com/NicolasHug	2023-10-21 07:05:33 +00:00
CaoE	9399e0b1ff	add fp16 support for gemm (#99498 ) ### Testing Native matmul vs. mkldnn matmul on SPR (with avx512_fp16 support) single core: Input \| Naïve impl / ms \| oneDNN / ms \| Speed up -- \| -- \| -- \| -- M: 128, N: 128, K: 128, trans_a: False, trans_b: False \| 2010.387 \| 64.700 \| 31.072 M: 128, N: 256, K: 128, trans_a: False, trans_b: False \| 4027.116 \| 107.780 \| 37.364 M: 8192, N: 768, K: 768, trans_a: False, trans_b: False \| 28685868.488 \| 90663.008 \| 316.401 56 cores: Input \| Naïve impl / ms \| oneDNN / ms \| Speed up -- \| -- \| -- \| -- M: 128, N: 128, K: 128, trans_a: False, trans_b: False \| 5.091 \| 0.24 \| 211.30 M: 128, N: 128, K: 128, trans_a: False, trans_b: True \| 5.224 \| 0.23 \| 220.09 M: 128, N: 256, K: 128, trans_a: False, trans_b: False \| 10.006 \| 0.30 \| 330.31 M: 8192, N: 768, K: 768, trans_a: False, trans_b: False \| 29435.372 \| 1.770 \| 1662.80 M: 8192, N: 768, K: 768, trans_a: False, trans_b: True \| 31464.961 \| 1.728 \| 18204.76 M: 8192, N: 768, K: 3072, trans_a: False, trans_b: False \| 115035.849 \| 7.990 \| 14396.90 M: 8192, N: 768, K: 3072, trans_a: False, trans_b: True \| 122981.023 \| 7.725 \| 15918.34 Batch: 768, M: 128, N: 64, K: 128 \| 2032.523 \| 0.705 \| 2882.23 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99498 Approved by: https://github.com/jgong5, https://github.com/malfet	2023-09-28 01:03:50 +00:00
CaoE	7c9052165a	add fp16 support for native conv and deconv on CPU (#99497 ) ### Testing Native conv vs. mkldnn conv on SPR (with avx512_fp16 support) Single core: Input \| Naïve impl / us \| oneDNN / us \| Speed up -- \| -- \| -- \| -- IC: 64, OC: 256, kernel: 1, stride: 1, N: 256, H: 56, W: 56, G: 1, pad: 0 \| 34676789 \| 524199.8 \| 66.15185 IC: 128, OC: 512, kernel: 1, stride: 1, N: 256, H: 28, W: 28, G: 1, pad: 0 \| 33454125 \| 349844.4 \| 95.62573 IC: 256, OC: 256, kernel: 3, stride: 1, N: 1, H: 16, W: 16, G: 1, pad: 0 \| 317650.1 \| 2317.677 \| 137.0554 IC: 128, OC: 256, kernel: 3, stride: 1, N: 1, L: 64 \| 15334.68 \| 167.264 \| 91.67952 56 cores: Input \| Naïve impl / us \| oneDNN / us \| Speed up -- \| -- \| -- \| -- IC: 64, OC: 256, kernel: 1, stride: 1, N: 256, H: 56, W: 56, G: 1, pad: 0 \| 1032064 \| 11073.58 \| 93.20061 IC: 128, OC: 512, kernel: 1, stride: 1, N: 256, H: 28, W: 28, G: 1, pad: 0 \| 1000097 \| 16371.19 \| 61.08883 IC: 256, OC: 1024, kernel: 1, stride: 1, N: 256, H: 14, W: 14, G: 1, pad: 0 \| 981813.4 \| 9008.908 \| 108.9825 IC: 1024, OC: 256, kernel: 1, stride: 1, N: 256, H: 14, W: 14, G: 1, pad: 0 \| 1082606 \| 10150.47 \| 106.6558 IC: 256, OC: 256, kernel: 3, stride: 1, N: 1, H: 16, W: 16, G: 1, pad: 0 \| 319980.6 \| 181.598 \| 1762.027 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99497 Approved by: https://github.com/jgong5, https://github.com/cpuhrsch	2023-09-25 01:31:26 +00:00
CaoE	8ed906030c	add fp16 support for mkldnn conv and deconv on CPU (#99496 ) The PR is part of https://github.com/pytorch/pytorch/issues/97068, which is to add fp16 support for mkldnn conv and mkldnn deconv to leverage avx_ne_convert, avx512-fp16, and amx-fp16 via the oneDNN library. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99496 Approved by: https://github.com/jgong5, https://github.com/cpuhrsch	2023-09-19 12:37:28 +00:00
CaoE	283ce12aa9	Add channels_last3d support for mkldnn conv and mkldnn deconv (#95271 ) ### Motivation - Add channels_last3d support for mkldnn conv and mkldnn deconv. - Use `ideep::convolution_transpose_forward::compute_v3` instead of `ideep::convolution_transpose_forward::compute`. compute_v3 uses `is_channels_last` to notify ideep whether to go CL or not to align with the memory format check of PyTorch. ### Testing 1 socket (28 cores): - memory format: torch.contiguous_format module \| shape \| forward / ms \| backward / ms -- \| -- \| -- \| -- conv3d \| input size: (32, 32, 10, 100, 100), weight size: (32, 32, 3, 3, 3) \| 64.56885 \| 150.1796 conv3d \| input size: (32, 16, 10, 200, 200), weight size: (16, 16, 3, 3, 3) \| 100.6754 \| 231.8883 conv3d \| input size: (16, 4, 5, 300, 300), weight size: (4, 4, 3, 3, 3) \| 19.31751 \| 68.31131 module \| shape \| forward / ms \| backward / ms -- \| -- \| -- \| -- ConvTranspose3d \| input size: (32, 32, 10, 100, 100), weight size: (32, 32, 3, 3, 3) \| 122.7646 \| 207.5125 ConvTranspose3d \| input size: (32, 16, 10, 200, 200), weight size: (16, 16, 3, 3, 3) \| 202.4542 \| 368.5492 ConvTranspose3d \| input size: (16, 4, 5, 300, 300), weight size: (4, 4, 3, 3, 3) \| 122.959 \| 84.62577 - memory format: torch.channels_last_3d module \| shape \| forward / ms \| backward / ms -- \| -- \| -- \| -- conv3d \| input size: (32, 32, 10, 100, 100), weight size: (32, 32, 3, 3, 3) \| 40.06993 \| 114.317 conv3d \| input size: (32, 16, 10, 200, 200), weight size: (16, 16, 3, 3, 3 \| 49.08249 \| 133.4079 conv3d \| input size: (16, 4, 5, 300, 300), weight size: (4, 4, 3, 3, 3) \| 5.873911 \| 17.58647 module \| shape \| forward / ms \| backward / ms -- \| -- \| -- \| -- ConvTranspose3d \| input size: (32, 32, 10, 100, 100), weight size: (32, 32, 3, 3, 3) \| 88.4246 \| 208.2269 ConvTranspose3d \| input size: (32, 16, 10, 200, 200), weight size: (16, 16, 3, 3, 3 \| 140.0725 \| 270.4172 ConvTranspose3d \| input size: (16, 4, 5, 300, 300), weight size: (4, 4, 3, 3, 3) \| 23.0223 \| 37.16972 Pull Request resolved: https://github.com/pytorch/pytorch/pull/95271 Approved by: https://github.com/jgong5, https://github.com/cpuhrsch	2023-08-30 02:53:30 +00:00
Aaron Gokaslan	2f95a3d0fc	[BE]: Apply ruff PERF fixes to torch (#104917 ) Applies automated ruff fixes in the PERF modules and enables all automatic ones. I also updated ruff which applied some additional fixes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104917 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-07-11 20:45:21 +00:00
Nikita Shulga	4cfa06f706	[BE] Deprecate `has_XYZ` attributes (#103279 ) Use [`__getattr__`](https://peps.python.org/pep-0562/) to raise warningwhen one tries to access `has_XYZ` methods and recommend appropriate `torch.backends.XYZ` methods Make respective properties in `torch._C` private (by prefixing them with underscore), to exclude from `from torch._C import *`. Added `warnings.simplefilter` to workaround Python-3.11 torch.compile lineinfo issue. Fixes https://github.com/pytorch/pytorch/issues/102484 Pull Request resolved: https://github.com/pytorch/pytorch/pull/103279 Approved by: https://github.com/janeyx99, https://github.com/Skylion007	2023-06-10 05:17:17 +00:00
leslie-fang-intel	d7035ffde3	Enable uint8/int8 mkldnn/dense tensor conversion (#102965 ) Summary Support mkldnn tensor and dense tensor conversion with uint8/int8 data type. Test Plan ``` python -m pytest -s -v test_mkldnn.py -k test_conversion_byte_char ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/102965 Approved by: https://github.com/jgong5, https://github.com/XiaobingSuper	2023-06-06 05:05:29 +00:00
PyTorch MergeBot	da57d597e1	Revert "fix onednn ConvTranspose2d channels last issue when ic=1 (#99539 )" This reverts commit 233cc34d3b8a1b92eeeea78661463f8ec660fbcd. Reverted https://github.com/pytorch/pytorch/pull/99539 on behalf of https://github.com/DanilBaibak due to Break internal build	2023-04-21 08:44:28 +00:00
XiaobingSuper	ccd5ad816e	inductor(CPU): add ISA check before do cpu fx packed weight (#99502 ) 1. This PR is to fix https://github.com/pytorch/pytorch/issues/99423, which will add an ISA check before doing the bf16 weight pack. 2. Move CPU-related tests from ```test_torchinductor.py``` to ```test_cpu_repo.py``` to reduce the CI time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99502 Approved by: https://github.com/jgong5, https://github.com/desertfire	2023-04-20 08:37:15 +00:00
XiaobingSuper	233cc34d3b	fix onednn ConvTranspose2d channels last issue when ic=1 (#99539 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99539 Approved by: https://github.com/mingfeima	2023-04-20 04:41:33 +00:00
XiaobingSuper	e21f648cde	improve mkldnn matmul performance when one input is contiguous tensor but the strides is not default contiguous strides (#99511 ) giving the following case: ``` import torch a= torch.empty_strided([64, 1, 33], [33, 3, 1], dtype=torch.bfloat16).fill_(1) b = torch.randn(64, 33, 256).to(dtype = torch.bfloat16) y = torch.ops.aten.bmm(a, b) ``` ```a``` is a contiguous tensor, but the strides are not defaulted contiguous strides ([33, 33, 1]), onednn matmul always running a non-optimized path: ``` onednn_verbose,exec,cpu,matmul,gemm:jit,undef,src_bf16::blocked:abc:f0 wei_bf16::blocked:abc:f0 dst_bf16::blocked:abc:f0,attr-scratchpad:user ,,64x1x33:64x33x256:64x1x256,7.28711 ``` This PR will convert the inputs' stride to deafult contiguous stride before calling onednn to running an optimization path: ``` onednn_verbose,exec,cpu,matmul,brg:avx512_core_amx_bf16,undef,src_bf16::blocked:abc:f0 wei_bf16::blocked:abc:f0 dst_bf16::blocked:abc:f0,attr-scratchpad:user ,,64x1x33:64x33x256:64x1x256,3.06396 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/99511 Approved by: https://github.com/mingfeima, https://github.com/jgong5	2023-04-19 18:13:00 +00:00
yanbing-j	38da54e9c9	Split rnn primitive for inference and training (#96736 ) ## Description Currently, both inference and training will use `forward_training` in rnn primitive, which will bring performance downgrade for inference (The performance drop is from rnn primitive and unnecessary creation of `pd` and `workspace`). This PR is to split them into `forward_inference` and `forward_training` seperately. ## Performance With this fix PR, in RNN-T inference, the throughput reduction is 167 ms, which increases `3.7%` of E2E time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96736 Approved by: https://github.com/jgong5	2023-03-27 11:14:15 +00:00
mingfeima	26cba842ad	Optimize ConvTransposed2D with mkldnn float32 and bfloat16 on CPU (#92530 ) this PR optimized `ConvTranspose2d` with oneDNN and add channels last support for it. Also the fallback path `slow_conv_transpose2d` also have channels last support. So the memory format propagation behavior would stay the same with or without oneDNN. Replacement of https://github.com/pytorch/pytorch/pull/77060, https://github.com/pytorch/pytorch/pull/70897 and https://github.com/pytorch/pytorch/pull/74023 which enables oneDNN for `ConvTranspose2d` and `ConvTranspose3d` The following results collects on Skylake Xeon 8180, dual sockets, 28 cores per socket. ### single core channels last configs \| forward before/ms \| forward after/ms \| ratio \| backward before/ms \| backward after/ms \| ratio -- \| -- \| -- \| -- \| -- \| -- \| -- input size: (32, 32, 100, 100), weight size: (32, 32, 3, 3) \| 181.36 \| 91.16 \| 1.99 \| 531.38 \| 124.08 \| 4.28 input size: (32, 16, 200, 200), weight size: (16, 16, 3, 3) \| 324.35 \| 153.50 \| 2.11 \| 973.16 \| 185.97 \| 5.23 input size: (32, 128, 100, 100), weight size: (128, 128, 3, 3) \| 1086.82 \| 671.52 \| 1.62 \| 3008.94 \| 1453.33 \| 2.07 ### single core channels first configs \| forward before/ms \| forward after/ms \| ratio \| backward before/ms \| backward after/ms \| ratio -- \| -- \| -- \| -- \| -- \| -- \| -- input size: (32, 32, 100, 100), weight size: (32, 32, 3, 3) \| 138.10 \| 5.94 \| 23.23 \| 37.97 \| 11.25 \| 3.38 input size: (32, 16, 200, 200), weight size: (16, 16, 3, 3) \| 236.43 \| 8.75 \| 27.03 \| 87.77 \| 18.58 \| 4.72 input size: (32, 128, 100, 100), weight size: (128, 128, 3, 3) \| 484.39 \| 37.69 \| 12.85 \| 185.40 \| 90.57 \| 2.05 ### single socket channels last configs \| forward before/ms \| forward after/ms \| ratio \| backward before/ms \| backward after/ms \| ratio -- \| -- \| -- \| -- \| -- \| -- \| -- input size: (32, 32, 100, 100), weight size: (32, 32, 3, 3) \| 138.10 \| 5.94 \| 23.23 \| 37.97 \| 11.25 \| 3.38 input size: (32, 16, 200, 200), weight size: (16, 16, 3, 3) \| 236.43 \| 8.75 \| 27.03 \| 87.77 \| 18.58 \| 4.72 input size: (32, 128, 100, 100), weight size: (128, 128, 3, 3) \| 484.39 \| 37.69 \| 12.85 \| 185.40 \| 90.57 \| 2.0 ### single socket channels first configs \| forward before/ms \| forward after/ms \| ratio \| backward before/ms \| backward after/ms \| ratio -- \| -- \| -- \| -- \| -- \| -- \| -- input size: (32, 32, 100, 100), weight size: (32, 32, 3, 3) \| 132.56 \| 7.19 \| 18.43 \| 31.43 \| 11.20 \| 2.81 input size: (32, 16, 200, 200), weight size: (16, 16, 3, 3) \| 227.94 \| 13.33 \| 17.11 \| 63.00 \| 23.41 \| 2.69 input size: (32, 128, 100, 100), weight size: (128, 128, 3, 3) \| 473.68 \| 52.79 \| 8.97 \| 150.40 \| 87.33 \| 1.72 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92530 Approved by: https://github.com/jgong5, https://github.com/ezyang	2023-02-06 10:11:25 +00:00
yanbing-j	94a7c01159	Enable oneDNN implementation in LSTM op (#91158 ) ### Description This PR is to enable oneDNN implementation in LSTM op to improve the performance of it. Both FP32 and BF16 are supported. ### Performance improvement In CPX 28C, with setting iomp and jemalloc. We choose 8 LSTM input options (including input_size, hidden_size, num_layers, bidirectional, bias, batch_first, dropout, batch_size, seq_len), and the final option is a real input from train-clean-100 in LibriSpeech dataset. The performance improvements are shown in the following figures. We can see that LSTM with oneDNN implementation can perform better than the original. In single socket: ![image](https://user-images.githubusercontent.com/61222868/211182994-833debec-518a-4b35-8504-6b0fadb17930.png) ![image](https://user-images.githubusercontent.com/61222868/211183012-31e1253f-2c60-4c92-a656-c239a971b453.png) In single core: ![image](https://user-images.githubusercontent.com/61222868/211183017-186e5d47-cb9a-4c1e-914f-fa718e769f1c.png) ![image](https://user-images.githubusercontent.com/61222868/211183022-53266857-5a9e-4a95-b300-33fa34811d08.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91158 Approved by: https://github.com/jgong5, https://github.com/malfet	2023-01-18 04:41:18 +00:00
Edward Z. Yang	edc5bb5fbe	Only populate real_value_cache during export (#90468 ) Fixes https://github.com/pytorch/torchdynamo/issues/1950 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/90468 Approved by: https://github.com/voznesenskym	2022-12-15 02:28:21 +00:00
XiaobingSuper	d70bc222d8	add parameters check for mkldnn_transpose (#85318 ) This PR is about add parameters check for mkldnn_transpose, fixed https://github.com/pytorch/pytorch/issues/85216. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85318 Approved by: https://github.com/jgong5, https://github.com/mingfeima, https://github.com/leslie-fang-intel	2022-11-03 17:28:33 +00:00
mingfeima	6f72c13f9b	test mkldnn conv2d channels last when weight is nchw format (#77348 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/77348 Approved by: https://github.com/malfet	2022-09-09 07:28:04 +00:00
liukun	9f5f6ba683	check params shape for mkldnn_convolution (#76526 ) Fixes #73193 Follow check rules from native/Convolution.cpp without transpose supported. Seems that mkldnn_convolution does not support transpose. ideep has special api for that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76526 Approved by: https://github.com/XiaobingSuper, https://github.com/mingfeima, https://github.com/jbschlosser	2022-08-02 17:26:36 +00:00
yanbing-j	cd33e412a2	Enable fp32/bf16 PRelu forward and backward in MkldnnCPU path (#60427 ) Enable fp32/bf16 PRelu forward and backward in MkldnnCPU path. Fixes https://github.com/pytorch/pytorch/issues/58896 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60427 Approved by: https://github.com/VitalyFedyunin, https://github.com/ngimel, https://github.com/malfet	2022-05-10 17:29:11 +00:00
Nikita Shulga	b08633917d	Revert D29463782: opitimze ConvTransposedND with mkldnn float32 and bfloat16 on CPU Test Plan: revert-hammer Differential Revision: D29463782 (`479e0d64e6`) Original commit changeset: 74b3d6138945 Original Phabricator Diff: D29463782 (`479e0d64e6`) fbshipit-source-id: a9765f67f9c8c01faad82450e3c6a8d0c0abbe4b (cherry picked from commit 12ce4ef02a13da85aa9bfe6c92ac41d4e0b8d2b0)	2022-05-06 19:34:41 +00:00
mingfeima	479e0d64e6	opitimze ConvTransposedND with mkldnn float32 and bfloat16 on CPU (#58348 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58348 Test Plan: Imported from OSS Reviewed By: mikaylagawarecki Differential Revision: D29463782 Pulled By: VitalyFedyunin fbshipit-source-id: 74b3d613894526280996c8211e0df918ac09364d (cherry picked from commit 2db963bfaee7823bf5ecb2ef909405eb02db0613)	2022-05-06 17:19:05 +00:00
mingfeima	dbfb9a823d	enable BFloat16 mkldnn_convolution on both contiguous and channels last memory format (#55864 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55864 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D27941367 Pulled By: VitalyFedyunin fbshipit-source-id: c6bcb73c41652cc0aca11c1d1e0697a8a2fa43ad (cherry picked from commit 3fc0b992a7dccbc31042dc35afec9ae3dc59a05a)	2022-05-02 22:23:10 +00:00
mingfeima	92a9c0e3e0	add channels last (2d) support for mkldnn_convolution (#55584 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55584 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D27941368 Pulled By: VitalyFedyunin fbshipit-source-id: 7dd6f02a5787efa1995f31cdbd3244b25653840c (cherry picked from commit bb555ed0fedafd529cb552807326384e95c90df9)	2022-04-20 22:34:44 +00:00
yanbing-j	12026124cc	Avoid the view for mkldnn case in 1D convolution (#68166 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68034 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68166 Reviewed By: mrshenli Differential Revision: D32432444 Pulled By: jbschlosser fbshipit-source-id: fc4e626d497d9e4597628a18eb89b94518bb3b33	2021-11-15 11:56:45 -08:00
Jane Xu	6e67150f57	[skip ci] Set test owner for test_mkldnn.py (#66845 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc gujinghui PenghuiCheng XiaobingSuper jianyuh VitalyFedyunin Pull Request resolved: https://github.com/pytorch/pytorch/pull/66845 Reviewed By: anjali411 Differential Revision: D31803377 Pulled By: janeyx99 fbshipit-source-id: 4fcf77d3e4bf976449a0b1ab4d750619db3493a1	2021-10-20 12:38:56 -07:00
Shen Li	1022443168	Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: revert-hammer Differential Revision: D30279364 (`b004307252`) Original commit changeset: c1ed77dfe43a fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e	2021-08-12 11:45:01 -07:00
Zsolt Dollenstein	b004307252	[codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: manual inspection & sandcastle Reviewed By: zertosh Differential Revision: D30279364 fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a	2021-08-12 10:58:35 -07:00
yanbing-j	c7a7c2b62f	Enable Gelu fp32/bf16 in CPU path using Mkldnn implementation (#58525 ) Summary: Enable Gelu bf16/fp32 in CPU path using Mkldnn implementation. User doesn't need to_mkldnn() explicitly. New Gelu fp32 performs better than original one. Add Gelu backward for https://github.com/pytorch/pytorch/pull/53615. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58525 Reviewed By: ejguan Differential Revision: D29940369 Pulled By: ezyang fbshipit-source-id: df9598262ec50e5d7f6e96490562aa1b116948bf	2021-08-03 06:52:23 -07:00
XiaobingSuper	4f46943e3d	enable check trace when tracing a mkldnn model (#61241 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43039, when tracing a MKLDNN model with setting check_trace=True, there has an error: RuntimeError: unsupported memory format option Preserve, this PR is to solve this problem. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61241 Reviewed By: anjali411 Differential Revision: D29737365 Pulled By: suo fbshipit-source-id: e8f7f124bc6256f10b9d29969e0c65d332514625	2021-07-19 11:03:53 -07:00
Nikita Shulga	c7d8d8f925	[BE] Improve has_bf16_support (#57408 ) Summary: Use `functools.lru_cache` to avoid calling this function multiple time Check that we are running on Linux platform before trying to open "/proc/cpuinfo" Do not spawn new process, but simply open("/proc/cpuinfo").read() and search the output for the keywords Fixes https://github.com/pytorch/pytorch/issues/57360 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57408 Reviewed By: driazati Differential Revision: D28136769 Pulled By: malfet fbshipit-source-id: ab476774c3be2913cb576d98d47a2f7ec03c19aa	2021-05-03 09:11:04 -07:00
Masaya, Kato	473d193966	Use mkldnn copy for copy_ when self and src are Mkldnn layout (#54248 ) Summary: Currently, when copy_ is called with Mkldnn layout, a RuntimeError is raised. Environment - CPU : Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz - PyTorch master(1772e26f6380d1) - build with USE_MKLDNN=1 Sample code to reproduce: ```python import torch x = torch.randn(4, 5, dtype=torch.float32) mkldnn_x = x.to_mkldnn() mkldnn_y = torch.randn(4, 5, dtype=torch.float32).to_mkldnn() mkldnn_y.copy_(mkldnn_x) print(x) print(mkldnn_y.to_dense()) ``` Results: Actual: ```sh Traceback (most recent call last): File "mkldnn_copy.py", line 6, in <module> mkldnn_y.copy_(mkldnn_x) RuntimeError: unsupported tensor layout: Mkldnn ``` Expected: ```sh # x tensor([[ 0.1276, -0.1179, 1.1970, 2.4836, 1.9059], [-1.9647, 0.8613, -0.5060, 0.1555, 0.3661], [-0.1560, -0.2133, 0.3414, -1.7095, -2.3431], [ 1.3291, 0.3083, 0.5523, -2.0577, -0.4740]]) # mkldnn_y tensor([[ 0.1276, -0.1179, 1.1970, 2.4836, 1.9059], [-1.9647, 0.8613, -0.5060, 0.1555, 0.3661], [-0.1560, -0.2133, 0.3414, -1.7095, -2.3431], [ 1.3291, 0.3083, 0.5523, -2.0577, -0.4740]]) ``` This is because `copy_` does not support Mkldnn layout. So I modified to call `copy_mkldnn_` in `copy_` when both `self` and `src` are Mkldnn layout. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54248 Reviewed By: mrshenli Differential Revision: D27641352 Pulled By: ezyang fbshipit-source-id: 70a37cdacb4a40b250ca16f2f6ddb6b71ff52d90	2021-04-08 06:35:39 -07:00
Akao, Kazutoshi	d2a58bfe6f	Add mkldnn tanh operator (#54656 ) Summary: ## 🚀 Feature Add Mkl-Layout kernel for tanh. ## Motivation We want to add a Mkl-Layout kernel for tanh to improve tanh's performance when the input Tensor is Mkl-Layout. Because, PyTorch does not have the Mkl-Layout kernel for tanh, so it cannot execute the tanh input by the Mkl-Layout Tensor. Off course you can temporarily avoid this problem by executing to_dense/to_mkldnn, but the performance is significantly reduced due to the copy overhead(1.6-4.3 times slower than CPU kernel). ## Perfomance results ### Environment - CPU: Intel(R) Core(TM) i7-8086K CPU @ 4.00GHz - OS: 18.04.1 LTS - compiler: gcc 7.5.0 - branch: master - commit ID: fe2c126 - build Environment variable: USE_CUDA=0 - Python: 3.6.9 - Intel MKL(Math Kernel Library): 2020.2-254 - Intel oneDNN: 1.8.1 ### Benchmark script ``` python import torch import torch.nn as nn torch.manual_seed(1) x = torch.randn(2048, 2048) x_mkl = x.to_mkldnn() print("### CPU tanh") with torch.autograd.profiler.profile(record_shapes=True) as prof: for i in range(100): output = x.tanh() print(prof.key_averages().table(sort_by="self_cpu_time_total")) print("\n### CPU tanh_") with torch.autograd.profiler.profile(record_shapes=True) as prof: for i in range(100): x.tanh_() print(prof.key_averages().table(sort_by="self_cpu_time_total")) print("\n### to_dense/to_mkldnn + tanh") with torch.autograd.profiler.profile(record_shapes=True) as prof: for i in range(100): output = x_mkl.to_dense().tanh().to_mkldnn() print(prof.key_averages().table(sort_by="self_cpu_time_total")) print("\n### to_dense/to_mkldnn + tanh_") with torch.autograd.profiler.profile(record_shapes=True) as prof: for i in range(100): x_mkl.to_dense().tanh_().to_mkldnn() print(prof.key_averages().table(sort_by="self_cpu_time_total")) print("\n### Mkl-Layout tanh") with torch.autograd.profiler.profile(record_shapes=True) as prof: for i in range(100): output = x_mkl.tanh() print(prof.key_averages().table(sort_by="self_cpu_time_total")) print("\n### Mkl-Layout tanh_") with torch.autograd.profiler.profile(record_shapes=True) as prof: for i in range(100): x_mkl.tanh_() print(prof.key_averages().table(sort_by="self_cpu_time_total")) ``` ### Results #### OMP_NUM_THREADS=1 Results(Self CPU time total ms) \| Operation \| CPU kernel \| to_dense/to_mkldnn+CPU kernel \| Mkl-Layout kernel(This PR) \| \| ---------- \| ---------- \| ----------------------------- \| -------------------------- \| \|tanh \| 579.662 \| 1658.000 \| 617.565 \| \| tanh_ \| 554.477 \| 881.997 \| 589.426 \| #### OMP_NUM_THREADS=6 Results(Self CPU time total ms) \| Operation \| CPU kernel \| to_dense/to_mkldnn+CPU kernel \| Mkl-Layout kernel(This PR) \| \| ---------- \| ---------- \| ----------------------------- \| -------------------------- \| \|tanh \| 182.387 \| 421.336 \| 136.226 \| \| tanh_ \| 94.331 \| 404.931 \| 99.254 \| ## Modification policy for the code oneDNN is already supported tanh operation. [oneDNN: Elementwise](https://spec.oneapi.com/versions/latest/elements/oneDNN/source/primitives/eltwise.html) There is already exist sigmoid implementation that uses the same Elementwise API as tanh, so we created this PR code with reference to the sigmoid implementation. `527c1e0e37/aten/src/ATen/native/mkldnn/UnaryOps.cpp (L28-L42)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/54656 Test Plan: A test for sigmoid has already been created as shown below. So, I added a new test of tanh referring to the test of sigmoid. `527c1e0e37/test/test_mkldnn.py (L944-L954)` ### mkldnn tanh test result ``` $ python3 test/test_mkldnn.py TestMkldnn.test_tanh Couldn't download test skip set, leaving all tests enabled... . ---------------------------------------------------------------------- Ran 1 test in 0.004s OK ``` Reviewed By: gchanan Differential Revision: D27395827 Pulled By: ezyang fbshipit-source-id: d4481332de187e2dea095f9b6aabc73a497960fe	2021-04-05 00:00:16 -07:00
Masaya, Kato	2c4a64589b	fix mkldnn_add in-place behavior (#51687 ) Summary: There are the following two patterns to call add in-pace. ```python torch.add(a, b, out=a) # (1) a in-placed torch.add(a, b, out=b) # (2) b in-placed ``` If a and b are mkldnn Tensor, the value is different from expected in case (2). Sample code to reproduce the behavior: ```python import torch torch.manual_seed(4) a = torch.randn(4, 4) b = torch.randn(4, 4) b.fill_(1.0) a_mkl = a.to_mkldnn() b_mkl = b.to_mkldnn() torch.add(b, a, alpha=1.0, out=a) torch.add(b_mkl, a_mkl, alpha=1.0, out=a_mkl) print(a) print(a_mkl) ``` Results: Actual: ```python tensor([[ 0.0586, 2.2632, 0.8162, 1.1505], [ 1.1075, 0.7220, -1.6021, 1.6245], [ 0.1316, 0.7949, 1.3976, 1.6699], [ 0.9463, 1.0467, -0.7671, -1.1205]]) tensor([[2., 2., 2., 2.], [2., 2., 2., 2.], [2., 2., 2., 2.], [2., 2., 2., 2.]], layout=torch._mkldnn) ``` Expected: ```python tensor([[ 0.0586, 2.2632, 0.8162, 1.1505], [ 1.1075, 0.7220, -1.6021, 1.6245], [ 0.1316, 0.7949, 1.3976, 1.6699], [ 0.9463, 1.0467, -0.7671, -1.1205]]) tensor([[ 0.0586, 2.2632, 0.8162, 1.1505], [ 1.1075, 0.7220, -1.6021, 1.6245], [ 0.1316, 0.7949, 1.3976, 1.6699], [ 0.9463, 1.0467, -0.7671, -1.1205]], layout=torch._mkldnn) ``` This is because `dnnl::sum` called in `mkldnn_add` has the following specifications: [oneDNN doc : Sum](https://oneapi-src.github.io/oneDNN/dev_guide_sum.html) > The sum primitive supports in-place operation, meaning that the src0 tensor can be used as both input and output. > In-place operation overwrites the original data. Using in-place operation requires the memory footprint of the > output tensor to be either bigger than or equal to the size of the dst memory descriptor used for primitive creation. but, case 2) are added to the first argument. So, we modified it so that a and b are swapped and passed to "sum" in case (2). Environment ・CPU : Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz ・build USE_MKLDNN=1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51687 Reviewed By: jbschlosser Differential Revision: D27062172 Pulled By: VitalyFedyunin fbshipit-source-id: bf76d36f9fdb1b4337d71d87bcdbaf4edb11f12f	2021-03-16 12:54:27 -07:00
XiaobingSuper	793a29a7d5	add OneDNN batch_norm backward (#50460 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50460 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D26006887 Pulled By: VitalyFedyunin fbshipit-source-id: 472398772af01a31594096ccc714fd487ed33dd4	2021-03-15 13:30:17 -07:00
XiaobingSuper	33e3deed4f	add OneDNN relu backward and reshape backward (#49455 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49455 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D26006886 Pulled By: VitalyFedyunin fbshipit-source-id: c81ef115205171b80652800a76170dd759905e28	2021-03-15 13:27:56 -07:00

1 2 3

102 Commits