35 Commits

Author SHA1 Message Date
f2f25a5444 Upgrade submodule oneDNN to v3.7.1 (#148293)
This PR is to upgrade submodule oneDNN to v3.7.1.

## Improvements

- Improved performance of convolution and matmul primitives on Intel Xeon processors with Intel AMX instruction set support (formerly Sapphire Rapids and Granite Rapids).
- Improved performance of int8 and fp32 forward convolution primitive on processors with Intel AVX2 instruction set support.
- Improved performance of fp8 matmul primitives with bf16 and fp16 bias data type on Intel Xeon processors with Intel AMX instruction set support (formerly Sapphire Rapids and Granite Rapids).
- Introduced initial optimizations for Intel GPUs based on Xe3 architecture.
- Added bfloat16 support for SDPA, implemented fp16 and bf16 gemm kernel in SDPA.
- Fixed f16 matmul accuracy, the issue of SDPA cannot dispatched to ukernel, bf16/fp16/fp32 conv performance, INT8 Kernel trigger page fault, deconvolution precision issue on complex128 and fp64 and gemm correctness issue in float16 issues.
- Improved bf16 matmul performance with fp32 destination with Arm Compute Library (ACL).
- Improved bf16 to fp32 reorder performance.
- Improved bf16 reorder performance.
- Improved bf16 convolution with ACL.

Fixes https://github.com/pytorch/pytorch/issues/136348.

## Validation results on CPU

1. NLP models accuracy/inference/training
![image](https://github.com/user-attachments/assets/859279b8-1631-4268-b226-7de9ac5870d8)

![image](https://github.com/user-attachments/assets/30ec7151-41ca-482a-9d2d-0c4850e75bab)

2. Torchbench cpu userbenchmark inference & training

![image](https://github.com/user-attachments/assets/71c9807c-caf9-4385-9990-d2ab637031cd)

3. Inductor quantization

![image](https://github.com/user-attachments/assets/3d2a3bd3-82fa-4566-8050-7ea5d6b61675)

4. Dynamo benchmarks
![image](https://github.com/user-attachments/assets/554ecce3-c85c-4a0e-88f1-2e73983c5dcd)
![image](https://github.com/user-attachments/assets/148c88f8-4367-4428-bb54-ce8a4deefd1b)
![image](https://github.com/user-attachments/assets/f2e744f4-d710-4699-acf4-1f130ecfadf1)
![image](https://github.com/user-attachments/assets/97128b80-4d0e-495a-aeda-dde3e70c96fd)
![image](https://github.com/user-attachments/assets/a9afce37-684c-45c0-b938-6dd7e0383805)
![image](https://github.com/user-attachments/assets/b8714236-9681-4fbe-8d98-be93deedab88)
![image](https://github.com/user-attachments/assets/4423061f-d133-45ba-98bd-d2f739e50431)
![image](https://github.com/user-attachments/assets/7955da10-3d23-493e-99fa-658f7f40035b)

## Validation results on XPU
Accuracy is same as baseline. Performance is shown below.
![image](https://github.com/user-attachments/assets/7645304d-5b1d-43f9-b840-9f846ed380a0)

## Validation results on ARM
![image](https://github.com/user-attachments/assets/080f7c02-0238-436f-ad20-5a9e3f6aafbb)
![image](https://github.com/user-attachments/assets/443742aa-ca61-41de-ae80-5d4c65cd0c87)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148293
Approved by: https://github.com/mingfeima, https://github.com/atalman
2025-03-04 13:56:45 +00:00
e72b4c61bf Revert "Upgrade submodule oneDNN to v3.7 (#147498)"
This reverts commit 576ed1e400d069ec2fff6162f82a71ff0bd81f7c.

Reverted https://github.com/pytorch/pytorch/pull/147498 on behalf of https://github.com/wdvr due to failing some tests on trunk - see below ([comment](https://github.com/pytorch/pytorch/pull/147498#issuecomment-2679867286))
2025-02-24 22:57:39 +00:00
576ed1e400 Upgrade submodule oneDNN to v3.7 (#147498)
This PR is to upgrade submodule oneDNN to v3.7.

## Improvements

- Improved performance of convolution and matmul primitives on Intel Xeon processors with Intel AMX instruction set support (formerly Sapphire Rapids and Granite Rapids).
- Improved performance of int8 and fp32 forward convolution primitive on processors with Intel AVX2 instruction set support.
- Improved performance of fp8 matmul primitives with bf16 and fp16 bias data type on Intel Xeon processors with Intel AMX instruction set support (formerly Sapphire Rapids and Granite Rapids).
- Introduced initial optimizations for Intel GPUs based on Xe3 architecture.
- Added bfloat16 support for SDPA, implemented fp16 and bf16 gemm kernel in SDPA.
- Fixed f16 matmul accuracy, the issue of SDPA cannot dispatched to ukernel, bf16/fp16/fp32 conv performance, INT8 Kernel trigger page fault, deconvolution precision issue on complex128 and fp64 and gemm correctness issue in float16 issues.
- Improved bf16 matmul performance with fp32 destination with Arm Compute Library (ACL).
- Improved bf16 to fp32 reorder performance.
- Improved bf16 reorder performance.
- Improved bf16 convolution with ACL.

Fixes https://github.com/pytorch/pytorch/issues/136348.

## Validation results on CPU

1. NLP models accuracy/inference/training
![image](https://github.com/user-attachments/assets/859279b8-1631-4268-b226-7de9ac5870d8)

![image](https://github.com/user-attachments/assets/30ec7151-41ca-482a-9d2d-0c4850e75bab)

2. Torchbench cpu userbenchmark inference & training

![image](https://github.com/user-attachments/assets/71c9807c-caf9-4385-9990-d2ab637031cd)

3. Inductor quantization

![image](https://github.com/user-attachments/assets/3d2a3bd3-82fa-4566-8050-7ea5d6b61675)

4. Dynamo benchmarks
![image](https://github.com/user-attachments/assets/554ecce3-c85c-4a0e-88f1-2e73983c5dcd)
![image](https://github.com/user-attachments/assets/148c88f8-4367-4428-bb54-ce8a4deefd1b)
![image](https://github.com/user-attachments/assets/f2e744f4-d710-4699-acf4-1f130ecfadf1)
![image](https://github.com/user-attachments/assets/97128b80-4d0e-495a-aeda-dde3e70c96fd)
![image](https://github.com/user-attachments/assets/a9afce37-684c-45c0-b938-6dd7e0383805)
![image](https://github.com/user-attachments/assets/b8714236-9681-4fbe-8d98-be93deedab88)
![image](https://github.com/user-attachments/assets/4423061f-d133-45ba-98bd-d2f739e50431)
![image](https://github.com/user-attachments/assets/7955da10-3d23-493e-99fa-658f7f40035b)

## Validation results on XPU
Accuracy is same as baseline. Performance is shown below.
![image](https://github.com/user-attachments/assets/7645304d-5b1d-43f9-b840-9f846ed380a0)

## Validation results on ARM
![image](https://github.com/user-attachments/assets/080f7c02-0238-436f-ad20-5a9e3f6aafbb)
![image](https://github.com/user-attachments/assets/443742aa-ca61-41de-ae80-5d4c65cd0c87)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147498
Approved by: https://github.com/fadara01, https://github.com/mingfeima, https://github.com/atalman
2025-02-24 14:32:51 +00:00
f7c0c06692 Add oneDNN BRGEMM support on CPU (#131878)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131878
Approved by: https://github.com/jgong5, https://github.com/peterbell10
2024-09-07 13:22:30 +00:00
2a73ba298c Upgrade submodule oneDNN to v3.5.3 (#131620)
This PR is to upgrad submodule oneDNN to v3.5.3.

## Improvements

- [experimental] Introduced [microkernel API](https://oneapi-src.github.io/oneDNN/ukernels.html) for Intel Architecture Processors. This API exposes internal mechanisms used in matmul and convolution implementation to expert users.
- Improved performance of matmul primitive with sum post-op for batched cases on processors with Intel AMX instruction set support.
- Introduced fp64 matmul support. This functionality is currently implemented on Intel GPUs with hardware acceleration for fp64 math only.

## Validation results on CPU
No regression was found.

1. NLP models accuracy/inference/training

Model Name | Mode Name | Precision | OneDNN | Baseline | OneDNN/Baseline
-- | -- | -- | -- | -- | --
bert-large | realtime | bf16 | 192.498 | 189.664 | 1.014942214
bert-large | throughput | bf16 | 202.424 | 202.156 | 1.001325709
bert-large | train_phase2 | bf16 | 15.955 | 16.029 | 0.995383368
LCM | throughput | bf16 | 1.01983 | 1.06632 | 0.956401455
stable-diffusion | throughput | bf16 | 0.10313 | 0.10184 | 1.012666929
ViT | realtime | bf16 | 1086.48 | 928.43 | 1.17023362
ViT | throughput | bf16 | 1419.07 | 1393.81 | 1.018122987
yolov7 | realtime | bf16 | 413.468682 | 415.16503 | 0.995914039
yolov7 | throughput | bf16 | 369.697 | 366.789 | 1.007928264
bert-large | realtime | fp32 | 46.685 | 46.652 | 1.000707365
bert-large | throughput | fp32 | 47.766 | 48.007 | 0.994979899
bert-large | train_phase2 | fp32 | 7.101 | 7.104 | 0.999577703
LCM | throughput | fp32 | 0.5501 | 0.55023 | 0.999763735
stable-diffusion | throughput | fp32 | 0.04012 | 0.04002 | 1.002498751
ViT | realtime | fp32 | 337.27 | 335.19 | 1.006205436
ViT | throughput | fp32 | 346.52 | 350.08 | 0.989830896
yolov7 | realtime | fp32 | 107.138054 | 107.242747 | 0.999023775
yolov7 | throughput | fp32 | 103.383 | 104.301 | 0.99119855
bert-large | realtime | int8 | 283.541 | 289.569 | 0.979182855
LCM | throughput | int8 | 1.09864 | 1.08998 | 1.0079451
stable-diffusion | throughput | int8 | 0.10617 | 0.10604 | 1.001225952
ViT | realtime | int8 | 1562.11 | 1554.68 | 1.004779119
ViT | throughput | int8 | 1904.38 | 1903.39 | 1.000520125
yolov7 | realtime | int8 | 540.489493 | 539.902488 | 1.001087243
yolov7 | throughput | int8 | 499.999 | 500.757 | 0.998486292

Device | Dtype | Geomean Higher is better
-- | -- | --
All | all | 101.17%
All | fp32 | 99.83%
All | bf16 | 102.24%
All | int8 | 99.91%
All | fp16 | 103.61%
SPR | all | 100.54%
SPR | fp32 | 99.82%
SPR |bf16 | 101.78%
SPR |int8 | 99.90%
GNR | all | 101.58%
GNR | fp32 | 99.85%
GNR | bf16 | 102.66%
GNR | int8 | 99.93%
GNR | fp16 | 103.61%

2. Torchbench cpu userbenchmark inference & training

Perf_Geomean | Ratio (oneDNN/baseline)
-- | --
eager_throughtput_bf16_infer | 1.00x
eager_throughtput_fp32_infer | 1.00x
jit_llga_throughtput_amp_bf16 | 1.00x
jit_llga_throughtput_fp32 | 1.00x
eager_throughtput_fx_int8 | 0.99x
eager_throughtput_bf16_train | 1.01x
eager_throughtput_fp32_train | 1.00x

3. Inductor quantization

Static quant:
Perf_Geomean | Ratio (oneDNN/baseline)
-- | --
PTQ | 1.00x
PTQ_CPP_WRAPPER | 1.00x
QAT | 1.00x

ACC_Geomean | Ratio (oneDNN/baseline)
-- | --
PTQ | 1.00x
PTQ_CPP_WRAPPER | 1.00x
QAT | 1.00x

Dynamic quant:

  | Ratio (oneDNN/baseline)
-- | --
Performance | 1.04x
Accuracy | 1.00x

4. Dynamo benchmarks
GEOMEAN summary
![image](https://github.com/user-attachments/assets/82fc4b76-50f6-4f06-9ba9-034b932f1158)

FP32 Static shape, default wrapper
![image](https://github.com/user-attachments/assets/9335268e-3e99-426b-91f8-f9df90a2007c)

FP32 Dynamic shape, default wrapper
![image](https://github.com/user-attachments/assets/e7cf3f4f-2a62-4b58-9461-5e5ba254d822)

AMP Static shape, default wrapper
![image](https://github.com/user-attachments/assets/12392c88-e44f-4c95-904a-4fa5fc9f34a2)

AMP Dynamic shape, default wrapper
![image](https://github.com/user-attachments/assets/13930b0d-9bb2-46de-9ecb-5d2585d5c2f6)

## Validation results on XPU
Category | Eager | Inductor
-- | -- | --
huggingface_amp_fp16_training | 1.002456 | 0.999998
huggingface_bfloat16_inference | 1.005386 | 1.003511
huggingface_float32_training | 1.002533 | 1.003098
torchbench_amp_fp16_training | 1.009065 | 1.01323
torchbench_bfloat16_inference | 1.003371 | 1.001534
torchbench_float32_training | 1.012102 | 1.011596
timm_models_amp_fp16_training | 1.005511 | 1.010329
timm_models_bfloat16_inference | 1.000935 | 1.000538
timm_models_float32_training | 0.991873 | 0.99721

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131620
Approved by: https://github.com/jgong5, https://github.com/malfet
2024-08-21 23:40:02 +00:00
cyy
d44daebdbc [Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127051
Approved by: https://github.com/cpuhrsch, https://github.com/malfet
2024-05-31 01:20:45 +00:00
67739d8c6f Revert "[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051)"
This reverts commit 699db7988d84d163ebb6919f78885e4630182a7a.

Reverted https://github.com/pytorch/pytorch/pull/127051 on behalf of https://github.com/PaliC due to This PR needs to be synced using the import button as there is a bug in our diff train ([comment](https://github.com/pytorch/pytorch/pull/127051#issuecomment-2138496995))
2024-05-30 01:16:57 +00:00
cyy
699db7988d [Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127051
Approved by: https://github.com/cpuhrsch, https://github.com/malfet
2024-05-29 11:58:03 +00:00
cdbb2c9acc Revert "[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051)"
This reverts commit 4fdbaa794f9d5af2f171f772a51cb710c51c925f.

Reverted https://github.com/pytorch/pytorch/pull/127051 on behalf of https://github.com/PaliC due to This PR needs to be synced using the import button as there is a bug in our diff train ([comment](https://github.com/pytorch/pytorch/pull/127051#issuecomment-2136428735))
2024-05-29 03:02:35 +00:00
cyy
4fdbaa794f [Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127051
Approved by: https://github.com/cpuhrsch, https://github.com/malfet
2024-05-27 03:54:03 +00:00
c2f8c75129 [Reopen] Upgrade submodule oneDNN to v3.4.2 (#126137)
Reopen of https://github.com/pytorch/pytorch/pull/122472

## Improvements
This upgrade fixes the following issues:
- https://github.com/pytorch/pytorch/issues/120982

This upgrade brings the following new features:
- Introduced memory descriptor serialization API. This API is needed to support freezing on CPU in AOTInductor (https://github.com/pytorch/pytorch/issues/114450)

## Validation results on CPU
Original results with oneDNN v3.4.1 are here: https://github.com/pytorch/pytorch/pull/122472#issue-2201602846

Need to rerun validation and update results.

Co-authored-by: Sunita Nadampalli <nadampal@amazon.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126137
Approved by: https://github.com/jgong5, https://github.com/snadampal, https://github.com/atalman
2024-05-16 12:00:16 +00:00
ee0c47349c Revert "Upgrade submodule oneDNN to v3.4 (#122472)"
This reverts commit dbcf123105a3f11d02f04067ca0cb377ed09e88c.

Reverted https://github.com/pytorch/pytorch/pull/122472 on behalf of https://github.com/atalman due to broke aarch64 builds and tests ([comment](https://github.com/pytorch/pytorch/pull/122472#issuecomment-2096750000))
2024-05-06 19:28:20 +00:00
dbcf123105 Upgrade submodule oneDNN to v3.4 (#122472)
## Improvements
This upgrade fixes the following issues:
- https://github.com/pytorch/pytorch/issues/120982

This upgrade brings the following new features:
- Introduced memory descriptor serialization API. This API is needed to support freezing on CPU in AOTInductor (https://github.com/pytorch/pytorch/issues/114450)

## Validation results on CPU
No regression was found.

1. NLP models accuracy/inference/training

Model Name | Mode| Precision | New | Baseline | New/Baseline
-- | -- | -- | -- | -- | --
bert-large | accuracy | fp32 | 93.15325 | 93.15325 | 100.00%
bert-large | accuracy | bf16 | 93.20125 | 93.20125 | 100.00%
bert-large | accuracy | int8 | 92.66641 | 92.66641 | 100.00%
LCM | accuracy | fp32 | 44.11152 | 44.11154 | 100.00%
LCM | accuracy | bf16 | 43.57667 | 43.65096 | 100.17%
ViT | accuracy | fp32 | 0.8033 | 0.8033 | 100.00%
ViT | accuracy | bf16 | 0.8031 | 0.8031 | 100.00%
ViT | accuracy | int8 | 0.7985 | 0.7985 | 100.00%
yolov7 | accuracy | fp32 | 0.512 | 0.512 | 100.00%
yolov7 | accuracy | bf16 | 0.504 | 0.504 | 100.00%
yolov7 | accuracy | int8 | 0.507 | 0.507 | 100.00%
bert-large | realtime | fp32 | 37.433 | 39.136 | 95.65%
bert-large | realtime | bf16 | 166.592 | 160.134 | 104.03%
bert-large | realtime | int8 | 230.876 | 222.594 | 103.72%
ViT | realtime | fp32 | 288.19 | 282.05 | 102.18%
ViT | realtime | bf16 | 755.42 | 741.1 | 101.93%
ViT | realtime | int8 | 1060.94 | 1092.47 | 97.11%
yolov7 | realtime | fp32 | 17.06927 | 16.47995 | 103.58%
yolov7 | realtime | bf16 | 54.68561 | 54.00723 | 101.26%
yolov7 | realtime | int8 | 78.38271 | 77.63214 | 100.97%
bert-large | throughput | fp32 | 47.142 | 47.341 | 99.58%
bert-large | throughput | bf16 | 200.365 | 200.806 | 99.78%
bert-large | throughput | int8 | 144.999 | 145.295 | 99.80%
LCM | throughput | fp32 | 0.54913 | 0.54897 | 100.03%
LCM | throughput | bf16 | 1.062417 | 1.07772 | 98.58%
stable-diffusion | throughput | fp32 | 0.03301 | 0.0331 | 99.73%
stable-diffusion | throughput | bf16 | 0.08773 | 0.08849 | 99.14%
stable-diffusion | throughput | int8 | 0.0491 | 0.05024 | 97.73%
ViT | throughput | fp32 | 342.55 | 346.47 | 98.87%
ViT | throughput | bf16 | 1263.4 | 1268.32 | 99.61%
ViT | throughput | int8 | 1331.3 | 1345.32 | 98.96%
yolov7 | throughput | fp32 | 115.313 | 115.612 | 99.74%
yolov7 | throughput | bf16 | 323.364 | 323.747 | 99.88%
yolov7 | throughput | int8 | 388.137 | 384.236 | 101.02%
bert-large | train_phase1 | fp32 | 34.223 | 34.309 | 99.75%
bert-large | train_phase1 | bf16 | 90.372 | 88.453 | 102.17%
bert-large | train_phase2 | fp32 | 7.307 | 7.318 | 99.85%

Data Type | Geomean
-- | --
fp32 | 99.88%
bf16 | 100.70%
int8 | 99.88%
all | 100.16%

2. Torchbench cpu userbenchmark inference & training

Test suite | Geomean Ratio (New/baseline)
-- | --
eager_throughtput_bf16_infer | 1.00x
eager_throughtput_fp32_infer | 1.00x
jit_llga_throughtput_amp_bf16 | 0.99x
jit_llga_throughtput_fp32 | 1.01x
eager_throughtput_fx_int8 | 1.00x
eager_throughtput_bf16_train | 1.00x
eager_throughtput_fp32_train | 1.00x

3. Inductor quantization (static & dynamic) accuracy & performance

Config | Performance geomean ratio (New/baseline) | Accuracy ratio (New/baseline)
-- | -- | --
Static quant PTQ | 0.99x | 1.00x
Static quant PTQ_CPP_WRAPPER | 0.98x | 1.00x
Static quant QAT | 0.99x | 1.00x
Dynamic quant PTQ | 1.00x | 1.00x

4. Dynamo benchmarks

Precision | Shape | Wrapper | Thread | Ratio   old/new GEOMEAN | Ratio   old/new GEOMEAN
-- | -- | -- | -- | -- | --
  |   |   |   | Eager | Inductor
Float32 | Static | Default | Multiple | 0.998776 | 1.002091
  |   |   | Single | 1.014086 | 1.01054
Float32 | Dynamic | Default | Multiple | 1.00386 | 1.005975
  |   |   | Single | 1.011036 | 1.008317
AMP | Static | Default | Multiple | 0.996965 | 1.005117
  |   |   | Single | 1.00092 | 0.995666
AMP | Dynamic | Default | Multiple | 0.9959 | 0.995048
  |   |   | Single | 1.002569 | 0.994085

---

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122472
Approved by: https://github.com/jgong5, https://github.com/EikanWang, https://github.com/atalman
2024-05-01 20:59:17 +00:00
481c9bb1fc Upgrade submodule oneDNN to v3.3.6 (#122164)
As the title. Including issue fixes for aarch64:
- https://github.com/oneapi-src/oneDNN/pull/1831
- https://github.com/oneapi-src/oneDNN/pull/1834

---

## Validation results
(on Intel CPU + Linux)
**Static quantization with Inductor on CV models**

Quant method | Geomean throughput ratio (v3.3.6/baseline)
-- | --
ptq | 0.982937
ptq (cpp wrapper) | 0.978384
qat | 0.978828

**Torchbench cpu userbenchmark with Inductor**

Items | Perf Geomean Ratio (v3.3.6/baseline)
-- | --
eager_throughtput_bf16_infer | 1.00x
eager_throughtput_fp32_infer | 1.00x
jit_llga_throughtput_amp_bf16 | 1.01x
jit_llga_throughtput_fp32 | 1.00x
eager_throughtput_fx_int8 | 1.00x
eager_throughtput_bf16_train | 1.46x
eager_throughtput_fp32_train | 1.41x

**Dynamo benchmarks tests**
Precision | Shape | Wrapper | Thread | Eager old/new GEOMEAN | Inductor old/new GEOMEAN
-- | -- | -- | -- | -- | --
Float32 | Static | Default | Multiple | 1.003836812 | 1.003425
Float32 | Static | Default | Single | 1.000181451 | 0.999611
Float32 | Dynamic | Default | Multiple | 1.003980183 | 1.006563
Float32 | Dynamic | Default | Single | 1.000076939 | 0.999969
AMP | Static | Default | Multiple | 0.996824772 | 0.998715
AMP | Static | Default | Single | 0.996402574 | 1.001483
AMP | Dynamic | Default | Multiple | 0.994919866 | 1.000467
AMP | Dynamic | Default | Single | 0.9962054 | 1.000767

(on Aarch64)
https://github.com/pytorch/pytorch/pull/122164#issuecomment-2007912919

---

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122164
Approved by: https://github.com/snadampal, https://github.com/malfet, https://github.com/atalman
2024-03-28 21:36:27 +00:00
daf89b4101 Update oneDNN submodule to v3.3.2 (#112700)
Update oneDNN submodule to v3.3.2.
Add a macro to check the version of `third_party/ideep`.
Since we have versioning now, the changes won't break any pipeline even if `third_party/ideep` is not updated at the same time.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112700
Approved by: https://github.com/leslie-fang-intel, https://github.com/atalman
2023-12-05 17:51:55 +00:00
62df4f3428 Revert "Update oneDNN submodule to v3.3.2 (#112700)"
This reverts commit afbaa0c1650cf15100fb5dc579ceeba24fb8665a.

Reverted https://github.com/pytorch/pytorch/pull/112700 on behalf of https://github.com/atalman due to Diff broke internal tests ([comment](https://github.com/pytorch/pytorch/pull/112700#issuecomment-1839350284))
2023-12-04 19:41:12 +00:00
afbaa0c165 Update oneDNN submodule to v3.3.2 (#112700)
Update oneDNN submodule to v3.3.2.
Add a macro to check the version of `third_party/ideep`.
Since we have versioning now, the changes won't break any pipeline even if `third_party/ideep` is not updated at the same time.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112700
Approved by: https://github.com/leslie-fang-intel, https://github.com/atalman
2023-12-01 18:40:07 +00:00
97a291f6bd [ONEDNN][BC-breaking] update onednn from v2.7.3 to v3.1.1 (#97957)
**Summary**
Update onednn from v2.7.3 to v3.1.1.
It is bc-breaking as some APIs are changed on oneDNN side. Changes include:
- PyTorch code where oneDNN is directly called
- Submodule `third_party/ideep` to adapt to oneDNN's new API.
- CMAKE files to fix build issues.

**Test plan**
Building issues and correctness are covered by CI checks.
For performance, we have run TorchBench models to ensure there is no regression. Below is the comparison before and after oneDNN update.
![image](https://github.com/pytorch/pytorch/assets/12522207/415a4ff0-7566-40c6-aed0-24997a475b0e)

Note:
- Base commit of PyTorch: da322ea
- CPU: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Ice Lake)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97957
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-08-25 12:13:18 +00:00
85edb58179 Fix oneDNN double checkout issue and Upgrade oneDNN to v2.7.3 (#92239)
### Descriotion

This PR is to fix oneDNN double checkout issue that mentioned in https://github.com/pytorch/pytorch/pull/87061#issuecomment-1284384276, and upgrade oneDNN to v2.7.3 to fix #92138.

### Performance test

Use TorchBench test in ICX with 40 cores
Intel OpenMP & jemalloc were preloaded
![image](https://user-images.githubusercontent.com/61222868/212634378-b91c20b5-0e85-474f-861c-c1d2f6962de1.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92239
Approved by: https://github.com/jgong5, https://github.com/malfet
2023-01-17 01:54:21 +00:00
39306c1dfb Use @pytorch// in bazel build files (#89660)
This change aims to make bazel build more embeeding-friendly.
Namely, when PyTorch is included as an external repo in another project, it is usually included like this
```
        native.local_repository(
            name = "pytorch",
            path = ...,
            repo_mapping = repo_mapping,
        )
```
Or
```
        http_archive(
            name = "pytorch",
            urls = ...
            repo_mapping = repo_mapping,
        )
```
In this case, references to `@//` would resolve to the top-level WORKSPACE that includes PyTorch.
That makes upgrades harder because we need to carry around this patch.
Note that under some edge-case circumstances even `//` resolves to the top-level `WORKSPACE`.

This change makes the embedding of the bazel build easier without compromising anything for the main repo, since the `@pytorch//` still refers to the same thing.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89660
Approved by: https://github.com/kit1980
2022-12-22 05:14:55 +00:00
dc40b6d043 Upgrade oneDNN to v2.7.2 (#90051)
This PR is to upgrade oneDNN to v2.7.2.

### oneDNN v2.7.1 & 2.7.2 changes:
Fixes #89104
Updated ITT API version to 3.23.0

### Performance Benchmark
Use TorchBench test in ICX with 40 cores
Intel OpenMP & tcmalloc were preloaded
![image](https://user-images.githubusercontent.com/61222868/205240855-04e2d50f-8b3a-4097-9038-fdd0c0fc93b9.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90051
Approved by: https://github.com/XiaobingSuper, https://github.com/jgong5
2022-12-08 09:41:02 +00:00
c56be31d2e Upgrade oneDNN to v2.7 (#87061)
This PR is to upgrade oneDNN to v2.7.

### oneDNN v2.7 changes:

**Performance Optimizations**
- Improved performance for future Intel Xeon Scalable processors (code name Sapphire Rapids).
- Introduced performance optimizations for [bf16 floating point math mode](http://oneapi-src.github.io/oneDNN/group_dnnl_api_mathmode.html) on Intel Xeon Scalable processors (code name Sapphire Rapids). The bf16 math mode allows oneDNN to use bf16 arithmetic and Intel AMX instructions in computations on fp32 data.

Please go to https://github.com/oneapi-src/oneDNN/releases/tag/v2.7 for more detailed changes.

### oneDNN v2.6.1 & 2.6.2 changes:

**Functionality**

- Updated ITT API to 3.22.5
- Fixed correctness issue in fp32 convolution implementation for cases with large spatial size (https://github.com/pytorch/pytorch/issues/84488)

### Performance Benchmark
Use TorchBench test in ICX with 40 cores
Intel OpenMP & tcmalloc were preloaded
![image](https://user-images.githubusercontent.com/61222868/196121957-656faebc-9f4a-49f0-9ef0-0784416c3a47.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87061
Approved by: https://github.com/jgong5, https://github.com/XiaobingSuper, https://github.com/weiwangmeta
2022-10-18 19:07:58 +00:00
58d773ad29 Upgrade oneDNN to v2.6.0 (#75398)
Summary:
This PR upgrades oneDNN to v2.6.0.

v2.6.0 changes:

- Improved performance for future Intel Xeon® Scalable processors (code name Sapphire Rapids). The functionality requires Linux kernel 5.16 or later.
- Improved performance of matmul primitive for processors with Intel AVX-512 support.
- Introduced bfloat16 destination support for int8 convolution, matmul and inner product primitives for processors with Intel AVX-512 support and or future Intel Xeon® Scalable processors (code name Sapphire Rapids)
- Extended RNN primitive with support for [AUGRU cell](https://oneapi-src.github.io/oneDNN/dev_guide_rnn.html#augru).
- Added support for non-zero negative slope in ReLU post-op for batch normalization primitive.
- Introduced support for mixed source and destination data types in softmax primitive.
- Introduced [persistent cache API](https://oneapi-src.github.io/oneDNN/dev_guide_persistent_cache.html). This functionality allows to serialize and reuse JIT kernels.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75398

Reviewed By: dagitses, bigfootjon

Differential Revision: D35976827

Pulled By: frank-wei

fbshipit-source-id: a77ae15cd77fc7c114ab9722453c28dc64aac679
(cherry picked from commit e376698d3c772aaa2dfbe51a4d1a75c8d17d0eee)
2022-05-02 22:07:42 +00:00
4567d5ded4 Upgrade oneDNN to v2.5.2 (#71546)
Summary:
This PR upgrades oneDNN to v2.5.2, and includes some building support for oneDNN v2.5.2.

v2.4 changes:
- Improved performance for future Intel Xeon Scalable processor (code name Sapphire Rapids). The functionality is disabled by default and should be enabled via CPU dispatcher control.
- Improved binary primitive performance for cases when one of the tensors is broadcasted.
- Improved performance of reduction primitive, reorder, shuffle primitives.
- Improved performance of depthwise convolution forward propagation for processors with Intel AVX5-12 support
- Improved performance of forward inner product primitive for the shapes with minibatch equal to 1 for processors with Intel AVX-512 support
- Improved performance of int8 matmul and inner product primitives for processors with Intel AVX2 and Intel DL Boost support

v2.5 changes:
- Improved performance for future Intel Xeon Scalable processors (code name Sapphire Rapids). The functionality is now enabled by default and requires Linux kernel 5.16.
- Improved performance of matmul primitive for processors with Intel AVX-512 support.

v2.5.2 changes:
- Fixed performance regression in binary primitive with broadcast
- Fixed segmentation fault in depthwise convolution primitive for shapes with huge spatial size for processors with Intel AVX-512 support

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71546

Reviewed By: george-qi

Differential Revision: D33827108

Pulled By: VitalyFedyunin

fbshipit-source-id: 8f5a19b331c82af5b0783f081e061e1034a93952
(cherry picked from commit 9705212fe9b7b0838cc010d040c37d1175be83ce)
2022-02-01 18:34:58 +00:00
847dbb8684 CMake: Clean up unused definitions (#69216)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69216

This cleans up 4 pre-processor defines not used by any code:
- HAVE_GCC_GET_CPUID
- USE_GCC_GET_CPUID
- USE_AVX
- USE_AVX2

`cpuid` isn't used in PyTorch any more, we only use `cpuinfo`.
`USE_AVX*` is also not used, instead `HAVE_*_CPU_DEFINITIONS` tells
you which `CPU_CAPABILITY` flags are being compiled.

There is also `fbgemm`'s code path adding `third_party` as an include
path, despite `fbgemm` having a dedicated include directory and a
CMake setup that properly includes it.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33794424

Pulled By: malfet

fbshipit-source-id: 99d504af088818d4a26c2f6ce67ec0d59a5eb703
(cherry picked from commit 2e099d41f0e2f7d96c6013ac83223a75f4e4f862)
2022-01-31 22:49:11 +00:00
304efd8e9a Change TH_BLAS_MKL into AT_MKL_ENABLED() (#70219)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70219

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69419

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D33246758

Pulled By: ngimel

fbshipit-source-id: aedef4c9ef97b6aa9f574313c94f774b77df2748
2021-12-21 10:36:55 -08:00
9ad05f2c3a Upgrade oneDNN to v2.3.3 and package oneDNN Graph API together (#63748)
Summary:
This PR upgrades oneDNN to [v2.3.3](https://github.com/oneapi-src/oneDNN/releases/tag/v2.3.3) and includes [Graph API preview release](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.2) in one package.

- oneDNN will be located at `pytorch/third_party/ideep/mkl-dnn/third_party/oneDNN`
- The version of oneDNN will be [v2.3.3](https://github.com/oneapi-src/oneDNN/releases/tag/v2.3.3)
  The main changes on CPU:

  - v2.3
    - Extended primitive cache to improve primitive descriptor creation performance.
    - Improved primitive cache performance in multithreaded configurations.
    - Introduced initial optimizations for bfloat16 compute functionality for future Intel Xeon Scalable processor (code name Sapphire Rapids).
    - Improved performance of binary primitive and binary post-op for cases with broadcast and mixed source and destination formats.
    - Improved performance of reduction primitive
    - Improved performance of depthwise convolution primitive with NHWC activations for training cases
  - v2.3.1
    -  Improved int8 GEMM performance for processors with Intel AVX2 and Intel DL Boost support
    - Fixed integer overflow for inner product implementation on CPUs
    - Fixed out of bounds access in GEMM implementation for Intel SSE 4.1
  - v2.3.2
    - Fixed performance regression in fp32 inner product primitive for processors with Intel AVX512 support
  - v2.3.3
    - Reverted check for memory descriptor stride validity for unit dimensions
    - Fixed memory leak in CPU GEMM implementation

  More changes can be found in https://github.com/oneapi-src/oneDNN/releases.
- The Graph API provides flexible API for aggressive fusion, and the preview2 supports fusion for FP32 inference.  See the [Graph API release branch](https://github.com/oneapi-src/oneDNN/tree/dev-graph-preview2) and [spec](https://spec.oneapi.io/onednn-graph/latest/introduction.html) for more details. A separate PR will be submitted to integrate the oneDNN Graph API to Torchscript graph.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63748

Reviewed By: albanD

Differential Revision: D32153889

Pulled By: malfet

fbshipit-source-id: 536071168ffe312d452f75d54f34c336ca3778c1
2021-12-09 13:42:40 -08:00
ef13341a8d upgrade onednn to v2.2.3 (#57928)
Summary:
This PR is to upgrade onednn to v2.2.3 (including v2.2 and v2.2.3 changes) which has the following main changes about CPU:

v2.2 changes:
Improved performance of compute functionality for future Intel Core processor with Intel AVX2 and Intel DL Boost instructions support (code name Alder Lake).
Improved fp32 inner product forward propagation performance for processors with Intel AVX-512 support.
Improved dnnl_gemm performance for cases with n=1 on all supported processors.

v2.2.3 changes:
Fixed a bug in int8 depthwise convolution ptimitive with groups and 1d spatial size for processors with Intel AVX-512 and Intel AVX2 support
Fixed correctness issue for PReLU primitive on Intel Processor Graphics
Fixed corretness issue in reorder for blocked layouts with zero padding
Improved performance of weights reorders used by BRGEMM-based convolution primitive for processors with Intel AVX-512 support

More changes can be found in https://github.com/oneapi-src/oneDNN/releases.

Ideep used version is pytorch-rls-v2.2.3.
OneDNN used version is v2.2.3.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57928

Reviewed By: bdhirsh

Differential Revision: D29037857

Pulled By: VitalyFedyunin

fbshipit-source-id: db74534858bdcf5d6c7dcf58e224fc756188bc31
2021-06-14 11:57:45 -07:00
f8788d5188 Upgrade onednn to v2.1.2 (#54956)
Summary:
This PR is to upgrade onednn to v2.1.2 which has the following main changes about cpu:

- Improved performance of forward convolution with plain activations for processors with Intel AVX-512 support

- Improved performance of fp32 depthwise convolution with plain activations on CPU.

more changes can be found in https://github.com/oneapi-src/oneDNN/releases.

Ideep used version is [pytorch-rls-v2.1.2](https://github.com/intel/ideep/tree/pytorch-rls-v2.1.2).
OneDNN used version is  [v2.1.2](https://github.com/oneapi-src/oneDNN/tree/v2.1.2).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54956

Reviewed By: ejguan

Differential Revision: D27466741

Pulled By: VitalyFedyunin

fbshipit-source-id: ff96e2cbda4b6bf04d299b9978e9125a013ce32f
2021-04-06 10:51:57 -07:00
1eed54d17a Upgrade oneDNN (mkl-dnn) to v1.7 (#47853)
Summary:
Bump oneDNN (mkl-dnn) to 1.7 for bug fixes and performance optimizations
- Fixes https://github.com/pytorch/pytorch/issues/42115. Fixed build issue on Windows for the case when oneDNN is built as submodule
- Fixes https://github.com/pytorch/pytorch/issues/45746. Fixed segmentation fault for convolution weight gradient on systems with Intel AVX512 support

This PR also contains a few changes in ideep for follow-up update (not enabled in current PR yet):
- Performance improvements for the CPU path of Convolution
- Channel-last support

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47853

Reviewed By: bdhirsh

Differential Revision: D25275268

Pulled By: VitalyFedyunin

fbshipit-source-id: 75a589d57e3d19a7f23272a67045ad7494f1bdbe
2020-12-03 11:54:31 -08:00
63e5a53b8c DNNL: fix build error when DNNL using TBB threading pool (#40699)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40699

Differential Revision: D22286334

Pulled By: albanD

fbshipit-source-id: 0635a0a5e4bf80d44d90c86945d92e98e26ef480
2020-06-29 13:53:18 -07:00
7f270233fb Upgrade DNNL to 1.5 (#40088)
Summary:
- Bump DNNL to 1.5
- Bug fixes and improvements in ideep
  - suppress g++ Wreorder warning
  - avoid rebuilding `libmkldnn.so` https://github.com/oneapi-src/oneDNN/issues/743
  - enable conv3d (integration code was checked in by Xiaobing https://github.com/pytorch/pytorch/pull/35662)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40088

Differential Revision: D22071530

Pulled By: albanD

fbshipit-source-id: e7a53d7421e8a7a03e36a7dfb68edc565a2f00df
2020-06-16 11:42:30 -07:00
447bcd341d Bazel build of pytorch with gating CI (#36011)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36011

Differential Revision: D20873430

Pulled By: malfet

fbshipit-source-id: 8ffffd10ca0ff8bdab578a70a9b2b777aed985d0
2020-04-06 22:50:33 -07:00
6be9c77998 Revert D20783179: [pytorch][PR] Bazel build of pytorch
Test Plan: revert-hammer

Differential Revision:
D20783179

Original commit changeset: b160908a3e10

fbshipit-source-id: 5b7b36305525e7ccc49540b48991149cf0a759f4
2020-04-03 17:59:10 -07:00
585f153d00 Bazel build of pytorch (#35220)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35220

Reviewed By: seemethere

Differential Revision: D20783179

Pulled By: malfet

fbshipit-source-id: b160908a3e107790fa06057a77de9d6d23493bbc
2020-04-03 17:13:58 -07:00