11 Commits

Author SHA1 Message Date
e06a08059a Add device guard for xpu conv on multi device (#153067)
# Motivation
fixes https://github.com/pytorch/pytorch/issues/153022
The root cause is that the XPU backend registers the convolution op using `m.impl`, which bypasses the device guard logic typically added by the code generation system. This can lead to unexpected behavior if the current device isn't explicitly set.

# Additional Context
run the following script
```python
import torch
import torchvision.models as models

torch.manual_seed(0)

model = models.resnet50(weights="ResNet50_Weights.DEFAULT")
model.eval()
data = torch.rand(1, 3, 224, 224)

device = torch.device('xpu:1')  # 'xpu:0'
model = model.to(device=device, dtype=torch.float16)
data = data.to(device, dtype=torch.float16)

with torch.no_grad():
    ret = model(data)
    print(ret)

print("Execution finished")
```
The output is
```bash
         -9.2102e-02, -7.7588e-01, -1.4111e+00, -9.2383e-01,  6.4551e-01,
         -6.0730e-03, -7.8271e-01, -1.1904e+00, -4.1602e-01,  3.2715e-02,
         -4.9854e-01, -6.3623e-01, -8.5107e-01, -6.8555e-01, -9.4434e-01,
         -8.8672e-01, -6.7969e-01, -6.9824e-01, -2.8882e-01,  2.0312e+00]],
       device='xpu:1', dtype=torch.float16)
Execution finished

```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153067
Approved by: https://github.com/albanD, https://github.com/EikanWang
2025-05-09 09:41:51 +00:00
ae351d4d0e [Intel GPU] allow_tf32 for oneDNN backend - XPU part (#137570)
# Motivation
Add context variable `torch.bachend.mkldnn.allow_tf32` to control tf32 computation in convolution kernels at XPU side.  The tf32 data type is beneficial to improve the performance of deep learning workloads during training/inference. Current PR uses the [oneDNN API fpmath_mode](https://oneapi-src.github.io/oneDNN/dev_guide_attributes_fpmath_mode.html#the-floating-point-math-mode-attribute) to trigger the tf32 acceleration in convolution kernels.

# Valiadation
* ut to test context variable
`python test/xpu/test_conv.py -k test_mkldnn_allow_tf32_get_set`

* Runtime exemplification
```
onednn_verbose,primitive,exec,gpu:0,convolution,jit:ir,forward_training,src_f32::blocked:abcd::f0 wei_f32::blocked:abcd::f0 bia_f32::blocked:a::f0 dst_f32::blocked:abcd::f0,attr-scratchpad:user attr-fpmath:tf32,alg:convolution_direct,mb20_ic16oc33_ih50oh24kh3sh2dh0ph0_iw100ow49kw3sw2dw0pw0,0.649902
onednn_verbose,primitive,exec,gpu:0,convolution,jit:ir,forward_training,src_f32::blocked:abcd::f0 wei_f32::blocked:abcd::f0 bia_f32::blocked:a::f0 dst_f32::blocked:abcd::f0,attr-scratchpad:user attr-fpmath:tf32,alg:convolution_direct,mb20_ic33oc33_ih24oh24kh3sh1dh0ph1_iw49ow49kw3sw1dw0pw1,0.151855
onednn_verbose,primitive,exec,gpu:0,convolution,jit:ir,backward_data,src_f32::blocked:abcd::f0 wei_f32::blocked:abcd::f0 bia_undef::undef::: dst_f32::blocked:abcd::f0,attr-scratchpad:user attr-fpmath:tf32,alg:convolution_direct,mb20_ic33oc33_ih24oh24kh3sh1dh0ph1_iw49ow49kw3sw1dw0pw1,0.167969
onednn_verbose,primitive,exec,gpu:0,convolution,jit:ir,backward_weights,src_f32::blocked:abcd::f0 wei_f32::blocked:abcd::f0 bia_f32::blocked:a::f0 dst_f32::blocked:abcd::f0,attr-scratchpad:user attr-fpmath:tf32,alg:convolution_direct,mb20_ic33oc33_ih24oh24kh3sh1dh0ph1_iw49ow49kw3sw1dw0pw1,0.26709
onednn_verbose,primitive,exec,gpu:0,convolution,jit:ir,backward_weights,src_f32::blocked:abcd::f0 wei_f32::blocked:abcd::f0 bia_f32::blocked:a::f0 dst_f32::blocked:abcd::f0,attr-scratchpad:user attr-fpmath:tf32,alg:convolution_direct,mb20_ic16oc33_ih50oh24kh3sh2dh0ph0_iw100ow49kw3sw2dw0pw0,0.219971

```
According to the field `fpmath:tf32` in verbose, we could see that, current context setting utils could successfully trigger tf32 computation in conv forward/backward_data/backward_weights kernels.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137570
Approved by: https://github.com/guangyey, https://github.com/EikanWang, https://github.com/atalman, https://github.com/malfet

Co-authored-by: Yu, Guangye <guangye.yu@intel.com>
2025-02-17 01:46:43 +00:00
ade5af9430 [XPU] Align XPU convolution_backward output layout between fake tensor and real output tensor. (#146880)
Fix #146879

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146880
Approved by: https://github.com/EikanWang, https://github.com/jansel
ghstack dependencies: #146762, #145248
2025-02-14 01:39:18 +00:00
5aa5a5763e [inductor triton] Disable incorrect TF32 usage on CUDA capability < 8 (#145684)
Triton 2.2 and greater have a bug where allowing TF32 generation for a GPU that does not support TF32 will cause code generation errors. Patch around this problem by:

1. Adding a function to `torch.cuda` that determines whether CUDA hardware is capable of using the TF32 format.
2. Using that function to explicitly disable TF32 generation when calling Triton, where needed.

To demonstrate that this fix works, try running `test/inductor/test_max_autotune.py` on a GPU with CUDA compute capability < 8 (e.g. any NVIDIA consumer GPU) without this fix.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145684
Approved by: https://github.com/eqy
2025-01-28 22:01:08 +00:00
24dee99cb7 Populate submodules of torch._C to sys.modules recursively (#132216)
See comment:

e9d1c26275/torch/__init__.py (L938-L950)

This PR recursively sets the submodules in the C extension to `sys.modules` (e.g., `_C._dynamo.eval_frame`).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132216
Approved by: https://github.com/ezyang
2024-08-08 10:20:25 +00:00
ff81ca8e0c Revert "Populate submodules of torch._C to sys.modules recursively (#132216)"
This reverts commit 672ce4610e41386da9763e07375b0879dc351905.

Reverted https://github.com/pytorch/pytorch/pull/132216 on behalf of https://github.com/PaliC due to was breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/132216#issuecomment-2274112397))
2024-08-07 18:45:00 +00:00
672ce4610e Populate submodules of torch._C to sys.modules recursively (#132216)
See comment:

e9d1c26275/torch/__init__.py (L938-L950)

This PR recursively sets the submodules in the C extension to `sys.modules` (e.g., `_C._dynamo.eval_frame`).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132216
Approved by: https://github.com/ezyang
2024-08-01 12:04:59 +00:00
548c460bf1 [BE][Easy][7/19] enforce style for empty lines in import segments in test/[a-c]*/ and test/[q-z]*/ (#129758)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129758
Approved by: https://github.com/ezyang
2024-07-31 10:54:03 +00:00
c6aa03bd4e Add allow_xpu to enable XPU UTs (#130312)
# Motivation
enable UTs under folder test/xpu/

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130312
Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/albanD
2024-07-17 02:40:28 +00:00
36af9c0d7d [Aten] Fix XPU convolution_overrideable input memory format. (#124841)
[Aten] Fix convolution_overrideable input memory format.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124841
Approved by: https://github.com/EikanWang, https://github.com/jgong5, https://github.com/albanD
2024-04-26 15:55:01 +00:00
ad7b5d32b6 Intel GPU oneDNN Upstreaming: Convolution operators support (#117529)
# Motivation

This PR is a part of RFC #114848,. This PR would depend on oneDNN compilation in #117098 and basic integration support in #117112 and Conv integration code in #117512.  Some runtime support is needed in #116019.

This PR implements the convolution and deconvolution operators for XPU that should be defined in `aten` libraries. Also, the backward support is also supported.

With this PR, the conv-related operators should be functionality ready.

Co-authored-by: xiaolil1 <xiaoli.liu@intel.com>
Co-authored-by: lei,zhenyuan <zhenyuan.lei@intel.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117529
Approved by: https://github.com/EikanWang, https://github.com/malfet
ghstack dependencies: #117512
2024-04-22 13:22:36 +00:00