230 Commits

Author SHA1 Message Date
fdab48a7c1 Enable all PIE rules on ruff (#165814)
This PR enables all PIE rules on ruff, there are already some enabled rules from this family, the new added rules are
```
PIE796  Enum contains duplicate value: {value}
PIE808  Unnecessary start argument in range
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165814
Approved by: https://github.com/ezyang
2025-10-18 07:36:18 +00:00
24520b8386 Revert "Enable all PIE rules on ruff (#165814)"
This reverts commit c79dfdc6550e872783aa5cb5fc9e86589bf18872.

Reverted https://github.com/pytorch/pytorch/pull/165814 on behalf of https://github.com/cyyever due to Need to cover more files ([comment](https://github.com/pytorch/pytorch/pull/165814#issuecomment-3417931863))
2025-10-18 07:21:08 +00:00
c79dfdc655 Enable all PIE rules on ruff (#165814)
This PR enables all PIE rules on ruff, there are already some enabled rules from this family, the new added rules are
```
PIE796  Enum contains duplicate value: {value}
PIE808  Unnecessary start argument in range
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165814
Approved by: https://github.com/ezyang
2025-10-18 06:40:12 +00:00
9c12651417 Improve error message for non-positive groups in convolution (#165669)
Prevents from segmentation fault for invalid groups value in convolution.

Fixes #142835

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165669
Approved by: https://github.com/mikaylagawarecki
2025-10-17 19:06:05 +00:00
8de85896e0 Enable ruff rule E721 (#165162)
`E721` checks for object type comparisons using == and other comparison operators. This is useful because it is recommended to use `is` for type comparisons.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165162
Approved by: https://github.com/Skylion007
2025-10-13 01:48:55 +00:00
816fb7f48d Revert "Enable ruff rule E721 (#165162)"
This reverts commit 9e7c19f72b6d0690915c307409c0c0a76b5a3bf0.

Reverted https://github.com/pytorch/pytorch/pull/165162 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](https://github.com/pytorch/pytorch/pull/165162#issuecomment-3393328271))
2025-10-11 13:25:40 +00:00
9e7c19f72b Enable ruff rule E721 (#165162)
`E721` checks for object type comparisons using == and other comparison operators. This is useful because it is recommended to use `is` for type comparisons.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165162
Approved by: https://github.com/Skylion007
2025-10-11 06:43:53 +00:00
94e634942a Fix int32 overflow in embedding_dense_backward (#165095)
If `max_partial_segment` is large we can overflow `gid` and cause a bunch of IMA.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165095
Approved by: https://github.com/ngimel, https://github.com/eqy
2025-10-10 19:47:38 +00:00
4a6abba0d9 [ROCm][CI] test_convolution.py uses miopen immediate mode (#164598)
This should help stabilize some flaky test behavior where miopen would pick different solutions for different parts of the same test and the test expects bitwise identical results.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164598
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-10-06 17:48:50 +00:00
6b7970192f [ROCm][CI] fix test_cudnn_convolution_relu_cuda (#164466)
Fixes #162816.
Test was comparing output of conv vs fused conv but inputs were different memory formats. Also fix test_cudnn_convolution_add_relu.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164466
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-10-02 20:36:54 +00:00
95be302889 Skip test_conv3d_cudnn_broken on ROCM (#164138)
Followup after https://github.com/pytorch/pytorch/pull/163903  Fixes https://github.com/pytorch/pytorch/issues/164137

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164138
Approved by: https://github.com/Camyll
2025-09-29 16:56:51 +00:00
e2817ac204 [cuDNN][Convolution] Disable cuDNN for 3D convolutions with kernel size != 1 for cuDNN 9.8+ (#163581)
To workaround #163539

Still confirming whether 9.10 is affected. The original test states that the convolution is "large," but note that the input size does not apepar to require 64-bit indexing.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163581
Approved by: https://github.com/ngimel, https://github.com/malfet

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2025-09-26 23:47:29 +00:00
0ebfa3d7d2 Avoid fast path mask left-align check in compiled TransformerEncoder (#163773)
Fixes #163640

This PR avoids a mask left align check in the case that we're operating under torch.compile / torch.export. Originally, I planned to make a more invasive change to auto-disable the fast path entirely underneath torch.compile / torch.export, but I realized during testing that the fast path wasn't actually causing compile issues outside of the narrow issue identified here.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163773
Approved by: https://github.com/mikaylagawarecki
2025-09-26 22:29:37 +00:00
eqy
0ea10f9912 [cuDNN][conv][64-bit] Disable cuDNN for 64-bit depthwise convs again (#163171)
test is breaking, will check if there's an older version that we can enable on to avoid completely dropping support

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163171
Approved by: https://github.com/ngimel, https://github.com/malfet
2025-09-26 22:12:17 +00:00
d4e4f70768 Fix overflow in slow_conv3d when kernel size is too large. (#162718)
Also, adding check for padding to avoid segmentation fault caused by overflow.

Fixes #141846

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162718
Approved by: https://github.com/jgong5, https://github.com/Skylion007
2025-09-26 13:39:29 +00:00
3cbfbbd691 [ROCm] Transformer/SDPA unit test parity (#163745)
## Major Changes

* Efficient Attention on ROCM requires last dimensions of input tensors align with 16 bytes.
  - Unlike FA, ME does not pad input tensors in `scaled_dot_product_attention` and hence this is required.
* Fix `atomic_counter` handling in varlen FA API
* Unskips a few unit tests.

Fixes #157120
Fixes #157121
Fixes #157122
Fixes #157167
Fixes #155217
Fixes #157043
Fixes #157060

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163745
Approved by: https://github.com/jeffdaily
2025-09-25 17:14:19 +00:00
0256f91558 [BUG] MaxUnpool2d/3d should check output dim before accessing its elements (#163507)
Fixes #163409
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163507
Approved by: https://github.com/malfet, https://github.com/Skylion007
2025-09-22 21:36:48 +00:00
281bb56cc5 Enable half precision types on test_conv_cudnn_nhwc_support (#163444)
This PR adds flaot16 and bfloat16 cases to `test_conv_cudnn_nhwc_support` and removes outdated comments.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163444
Approved by: https://github.com/Skylion007
2025-09-22 04:11:20 +00:00
ce5637be29 Fix invalid indices bug for max_unpool2d/3d on MPS (#163036)
Fixes #163035
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163036
Approved by: https://github.com/kulinseth, https://github.com/malfet

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2025-09-19 05:13:21 +00:00
29ea6254a0 [Bug] Add more boundary check for FractionalMaxPool3d (#161876)
This PR aims to fix the bug mentioned at [#161853](https://github.com/pytorch/pytorch/issues/161853#issuecomment-3240695121)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161876
Approved by: https://github.com/malfet
2025-09-16 06:59:02 +00:00
0def79fdd9 [ROCm] fix conv relu fusion (#162856)
Fixes #162816.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162856
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-09-15 22:49:32 +00:00
429052f151 fix: raise value error on init ParametrizationList if original.device != new.device (#162717)
raise value error on init `ParametrizationList`, if `original.device != new.device`.
currently `_maybe_set` will throw below error in such situations, which I think it's not convenient to debug.

```
[rank1]: RuntimeError: Attempted to set the storage of a tensor on device "cuda:1" to a storage on different device "cpu".  This is no longer allowed; the devices must match.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162717
Approved by: https://github.com/lezcano
2025-09-11 23:07:58 +00:00
468c1f9e9d Revert "[nn] Assert parsed iterable arguments are an appropriate length (#162340)"
This reverts commit b5e6e58050bd2a15f4173cfffa00c7e32e382b49.

Reverted https://github.com/pytorch/pytorch/pull/162340 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to break an MPS tests on ExecuTorch ([comment](https://github.com/pytorch/pytorch/pull/162340#issuecomment-3282676242))
2025-09-11 21:22:57 +00:00
d65ffdef3d [ROCm] fix miopen batchnorm changing output format (#162112)
It was found that the integration of miopen batchnorm was causing the output to always be in default contig memory format even when the input was channels last.  This also unskips a number of related unit tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162112
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
Co-authored-by: Dmitry Nikolaev <dmitry.nikolaev@amd.com>
Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>
2025-09-11 19:37:48 +00:00
eqy
5dbee5691c [cuDNN][Convolution][TF32][64bit] Add tf32_on_and_off decorator to conv3d 64bit test (#161004)
cuDNN has new generated kernels that can use TF32.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161004
Approved by: https://github.com/janeyx99, https://github.com/Skylion007
2025-09-10 21:39:35 +00:00
b5e6e58050 [nn] Assert parsed iterable arguments are an appropriate length (#162340)
Fixes #162327
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162340
Approved by: https://github.com/Skylion007
2025-09-10 15:15:49 +00:00
3a20a20e70 Fix largeTensorTest malfunction on XPU (#161988)
# Motivation
https://github.com/pytorch/pytorch/pull/143553/files#diff-6492991193449e118ff0c8d42ca544cc38a73604e505ff246a3c711aeab91748R1345 makes `largeTensorTest` malfunction on XPU. This PR aims to fix it.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161988
Approved by: https://github.com/EikanWang, https://github.com/albanD
2025-09-04 16:10:03 +00:00
99f356fa58 [ROCm] revamp miopen integration (#161687)
Update sources under ATen/miopen and ATen/native/miopen to align with best practices. Avoid reshape_ calls inside backward operations.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161687
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-09-03 22:28:09 +00:00
f391afe9bf [cuDNN][convolution] remove redundant conv3d 64bit test (#161177)
turns out it's the same as
```
    @onlyCUDA
    @largeTensorTest("40GB")
    @largeTensorTest("24GB", "cpu")
    @tf32_on_and_off(0.005)
    def test_conv3d_64bit_indexing(self, device):
        x = torch.rand(1, 32, 512, 512, 256)
        m = torch.nn.Conv3d(32, 1, kernel_size=1, padding=0, stride=1, bias=False)
        yref = m(x)
        y = m.to(device=device)(x.to(device=device))
        self.assertEqual(yref, y)
 ```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161177
Approved by: https://github.com/Skylion007
2025-08-25 15:01:05 +00:00
eqy
9903ca4f70 [cuDNN][64-bit indexing] update conv depthwise 64bit indexing dispatch condition to match native kernel (#156140)
The native kernel doesn't support batch splitting so the previous check wasn't aggressive enough in dispatching to cuDNN

https://github.com/pytorch/pytorch/issues/155225

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156140
Approved by: https://github.com/ngimel, https://github.com/atalman
2025-08-12 18:07:41 +00:00
e06b110f73 [Testing] Add MPS to NATIVE_DEVICES (#153835)
This would allow me to enable more opinfo tests against MPS device eventually and supposed to be a very simple test, but actually required minor adjustments to lots of test files, namely:
- Introduce `all_mps_types_and` that is very similar to `all_types_and`, but skips `float64`
- Decorate lots of tests with `@dtypesIfMPS(*all_mps_types())`
- Skip `test_from_dlpack_noncontinguous` as it currently crashes (need to be fixed)
- Add lots of `expectedFailureIfMPS`
- Delete all `@onlyNativeDeviceTypesAnd("mps")`

&lt;sarcasm&gt; I love how well documented this variable are &lt;/sarcasm&gt;

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153835
Approved by: https://github.com/Skylion007
2025-08-05 18:57:35 +00:00
52b9af163c Add avg_pool3d for MPS (#158877)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158877
Approved by: https://github.com/malfet
2025-07-29 15:22:22 +00:00
eqy
c89fa88acb [conv][cuDNN][64-bit indexing] reduce memory usage of depthwise conv 64-bit indexing test (#158981)
Use half instead for reduced memory usage

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158981
Approved by: https://github.com/soulitzer, https://github.com/Skylion007
2025-07-25 23:58:45 +00:00
2b19d85d70 FractionalMaxPool3d add kernel_size check (#155549)
Fixes #96316

## Test Result

```python
>>> import torch
>>> from torch.func import jacrev, grad, vmap
>>>
>>> torch.manual_seed(420)
<torch._C.Generator object at 0x7fe4767810d0>
>>>
>>> input = torch.randn(1, 1, 5, 5, 5, requires_grad=True)
>>>
>>> def func(input):
...     model = torch.nn.FractionalMaxPool3d(kernel_size=0, output_size=(1, 1, 1))
...     output = model(input)
...     return output
...
>>>
>>> func(input).sum().backward()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in func
  File "/home/zong/code/pytorch/torch/nn/modules/pooling.py", line 1054, in __init__
    raise ValueError(f"kernel_size must greater than 0, but got {kernel_size}")
ValueError: kernel_size must greater than 0, but got 0

```

![image](https://github.com/user-attachments/assets/52780ce7-3951-4d1c-95a4-5ce2bf65c727)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155549
Approved by: https://github.com/albanD
2025-07-10 04:55:06 +00:00
510c398a4f Add max_pool3d backward pass for MPS (#157498)
Note on backward precision over fp16:

A float16 number has 10 bits of mantissa, 5 bits of exponent, and 1 bit for the sign. If the sign bit is positive, then with a mantissa $m$ and exponent $e$ represented in base 10, the number that the float16 format represents is $(1 + m / 1024)  \exp2(e)$. ([source](https://en.wikipedia.org/wiki/Half-precision_floating-point_format))

Consider adding two numbers $a$ and $b$ which have arbitrary mantissas, and say their exponents are $e_a = 1$ (so $2 \le a \lt 4$) and $e_b=-3$ (so $0.175 \le b \lt 0.25$). Assume that the result has the same exponent as $a$. Since the exponents differ by 4, we'll effectively need to truncate the 4 rightmost bits of $b$'s mantissa, which would introduce a maximum error on the order of $(2^4 / 1024)  \exp2(-3) \approx 0.002$.

The error is nearly the same if $e_b = -2$ (so $0.25 \le b \lt 0.5$), where the 3 rightmost bits are truncated, giving a maximum error on the order of $(2^3 / 1024)  \exp2(-2) \approx 0.002$. Same for $e_b=-1$.

So if we're adding up nine different numbers that all have exponents -3, -2, or -1, and they sum to a number with exponent 1, then we would expect a maximum error of several times greater than 0.002. In my comments above, summing those particular nine numbers in different ways gave results that ranged between 3.1816 and 3.1758, a difference of $0.0058 \approx 2.9  * 0.002$.

That's within the acceptable bounds, and we can safely just increase the error tolerance used in test_output_grad_match for the case of max_pool3d_backward with float16.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/157498
Approved by: https://github.com/malfet
2025-07-07 19:46:44 +00:00
71a650ad56 Fix typo: 'Intializing' → 'Initializing' in test_parametrization.py (#157362)
This pull request fixes a minor typo in the doc comments of `test/nn/test_parametrization.py`.

- Replaced `'Intializing'` with `'Initializing'` in two docstring comments to improve clarity and maintain consistency across the codebase.

This is a non-functional change and does not impact behavior or test outcomes.

Thank you for maintaining such a high-quality codebase. Please let me know if any adjustments are needed. I'd be happy to help!

Pull Request resolved: https://github.com/pytorch/pytorch/pull/157362
Approved by: https://github.com/ezyang
2025-07-05 12:21:15 +00:00
317af4c87b Revert "[cuDNN][64-bit indexing] update conv depthwise 64bit indexing dispatch condition to match native kernel (#156140)"
This reverts commit a5f59cc2eab3a5201712c52fe48c268357ba4f3c.

Reverted https://github.com/pytorch/pytorch/pull/156140 on behalf of https://github.com/atalman due to breaks internal builds ([comment](https://github.com/pytorch/pytorch/pull/156140#issuecomment-2988441548))
2025-06-19 15:09:29 +00:00
eqy
a5f59cc2ea [cuDNN][64-bit indexing] update conv depthwise 64bit indexing dispatch condition to match native kernel (#156140)
The native kernel doesn't support batch splitting so the previous check wasn't aggressive enough in dispatching to cuDNN

https://github.com/pytorch/pytorch/issues/155225

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156140
Approved by: https://github.com/ngimel
2025-06-18 17:32:36 +00:00
596b418391 [BE][PYFMT] migrate PYFMT for {torch,test}/{nn,optim}/** to ruff format (#144548)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144548
Approved by: https://github.com/ezyang
2025-06-14 11:27:04 +00:00
297805fd8f Typo fixes for "overridden" in comments and function names (#155944)
This word appears often in class descriptions and is not consistently spelled. Update comments and some function names to use the correct spelling consistently. Facilitates searching the codebase.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155944
Approved by: https://github.com/Skylion007
2025-06-14 03:37:38 +00:00
eqy
bd3c32916c [cuDNN] Enabled dilation for deterministic convolutions in cuDNN (#154292)
Provides order-of-magnitude speedup over fallback impl.

https://github.com/pytorch/pytorch/issues/28777

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154292
Approved by: https://github.com/Skylion007

Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>
2025-06-11 23:35:52 +00:00
671a9d175b Add warning for module full backward hook when no input requires gradient (#155339)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155339
Approved by: https://github.com/Skylion007
2025-06-10 04:42:06 +00:00
981bdb39ca Enable ConvTranspose3D for FP32 and Complex64 (#154696)
Fixes #154615

Enables using ConvTranspose3D since it seems support exists both on MacOS 14 and 15.

For the half dtypes the discrepancy of CPU and GPU implementations is too large to conclude whether there is a bug in the implementation or not without a more rigorous study on what bounds are there to the expected error. So they are left unsupported for now and an assert is added to notify the user if the op is called with fp16 or bf16 inputs.

Tests for ConvTranspose3D were enabled for the supported data types.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154696
Approved by: https://github.com/malfet
2025-06-02 16:24:03 +00:00
dbad6d71c7 [BE][Ez]: Unskip conv1d MPS test (#154795)
Fixes issue I noticed where conv1d test is skipped for complex types unconditionally
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154795
Approved by: https://github.com/jansel
2025-05-31 23:01:19 +00:00
eqy
823a35807c [CUDA][CUDNN] Dispatch to cuDNN for non-batch-splittable 64-bit NCHW convolutions (#153101)
For #152816

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153101
Approved by: https://github.com/Skylion007
2025-05-20 20:19:03 +00:00
bf0fe4f828 Revert "[CUDA][CUDNN] Dispatch to cuDNN for non-batch-splittable 64-bit NCHW convolutions (#153101)"
This reverts commit ced90d23d3dfff42379fa032fe6a83b764d12e9f.

Reverted https://github.com/pytorch/pytorch/pull/153101 on behalf of https://github.com/jeanschmidt due to Seems to have introduced breakages on main, tentative revert: https://github.com/pytorch/pytorch/actions/runs/15024667248/job/42224521705 ([comment](https://github.com/pytorch/pytorch/pull/153101#issuecomment-2881208171))
2025-05-14 18:52:07 +00:00
eqy
ced90d23d3 [CUDA][CUDNN] Dispatch to cuDNN for non-batch-splittable 64-bit NCHW convolutions (#153101)
For #152816

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153101
Approved by: https://github.com/Skylion007
2025-05-14 15:22:47 +00:00
ec68d082a1 [CUDA][TF32] Account for TF32 in test_conv2d_same_padding (#152618)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152618
Approved by: https://github.com/msaroufim, https://github.com/Skylion007
2025-05-02 20:19:00 +00:00
0d99b4e9e2 ROCm: Enable tf32 testing on test_nn (#148945)
Add tf32 support for ROCm tests.
test command: python test/test_nn.py -v

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148945
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-04-28 23:01:04 +00:00
8ce3d4a541 test(Conv3d): use correct class for test_Conv3d_module_same_padding (#152187)
The test for the class `Conv3d` is calling `Conv2d`. This PR just ensure that we are testing the correct module.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152187
Approved by: https://github.com/Skylion007
2025-04-28 16:59:12 +00:00