292 Commits

Author SHA1 Message Date
fdab48a7c1 Enable all PIE rules on ruff (#165814)
This PR enables all PIE rules on ruff, there are already some enabled rules from this family, the new added rules are
```
PIE796  Enum contains duplicate value: {value}
PIE808  Unnecessary start argument in range
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165814
Approved by: https://github.com/ezyang
2025-10-18 07:36:18 +00:00
24520b8386 Revert "Enable all PIE rules on ruff (#165814)"
This reverts commit c79dfdc6550e872783aa5cb5fc9e86589bf18872.

Reverted https://github.com/pytorch/pytorch/pull/165814 on behalf of https://github.com/cyyever due to Need to cover more files ([comment](https://github.com/pytorch/pytorch/pull/165814#issuecomment-3417931863))
2025-10-18 07:21:08 +00:00
c79dfdc655 Enable all PIE rules on ruff (#165814)
This PR enables all PIE rules on ruff, there are already some enabled rules from this family, the new added rules are
```
PIE796  Enum contains duplicate value: {value}
PIE808  Unnecessary start argument in range
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165814
Approved by: https://github.com/ezyang
2025-10-18 06:40:12 +00:00
e925dfcc6b Enable all SIM rules except disabled ones (#164645)
`SIM` rules are useful for simplifying boolean expressions and enhances code readability.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164645
Approved by: https://github.com/ezyang, https://github.com/mlazos
2025-10-17 07:27:11 +00:00
b63bbe1661 Remove old ROCm version check in tests (#164245)
This PR removes ROCm<6 version checks.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164245
Approved by: https://github.com/jeffdaily
2025-10-06 22:42:01 +00:00
e64dd8c694 [Fix] Adding missing f prefixes to formatted strings [4/N] (#164068)
As stated in the title.

* __->__ #164068
* #164067
* #164066
* #164065

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164068
Approved by: https://github.com/Skylion007
2025-09-29 04:07:07 +00:00
7194d77550 Revert "enable test_sampled_addmm_zero_sized_cuda for rocm (#121940)" (#163848)
This reverts commit 5494b2a8d38c3ddbeb2d96a5ac990e20ec4c48fd.

Need to skip `test_sparse_csr.py::TestSparseCSRCUDA::test_sampled_addmm_zero_sized_cuda_*` again. Tests are failing now with "core dumped" error
```
python test_sparse_csr.py -v -k test_sampled_addmm_zero_sized_cuda_float64

  test_sampled_addmm_zero_sized_cuda_float64 (__main__.TestSparseCSRCUDA) ... /tmp/pytorch/test/test_sparse_csr.py:2503:   c = torch.empty(m, n, dtype=dtype, device=device, layout=torch.sparse_csr)
GPU core dump created: gpucore.186789
:0:rocdevice.cpp            :2992: 4701819131755 us:  Callback: Queue 0x760cdcd00000 aborting with error : HSA_STATUS_ERROR_EXCEPTION: An HSAIL operation resulted in a hardware exception. code: 0x1016
Aborted (core dumped)
```
These failures are linked to `test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int32_cuda_*` due to incorrect test log parsing. We will be able to close these issues also:

- Fixes https://github.com/pytorch/pytorch/issues/163663
- Fixes https://github.com/pytorch/pytorch/issues/160786
- Fixes https://github.com/pytorch/pytorch/issues/160785
- Fixes https://github.com/pytorch/pytorch/issues/160784

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163848
Approved by: https://github.com/jeffdaily
2025-09-25 16:38:00 +00:00
ebddbe787a [ROCm][CI] skip test_sparse_triangular_solve (#163651)
need more time to debug, but also need clean CI signal test was unskipped by #163495, but had been skipp on rocm prior

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163651
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-09-23 15:55:51 +00:00
5d749ceb92 Remove test conditions for CUDA<12 (#163495)
Because it required that CUDA >=12.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163495
Approved by: https://github.com/janeyx99
2025-09-23 07:52:00 +00:00
c0142f5c06 [ROCm] Enabling several UTs (#161715)
All these UTs are working as is, just removing the skip
- test_p2p_ipc
- test_repros.py: working, added fp8 support
- test_activation_checkpointing.py
- test_content_store.py
- test_cuda_multigpu.py
- test_compute_comm_reordering.py
- test_segment_reductions.py
- test_dataloader.py
- test_math_ops.py
- test_loop_ordering.py
- test_control_flow.py
- distributed_test.py
- test_mem_tracker.py
- test_fsdp_optim_state.py
- test_fully_shard_mixed_precision.py: skippped for < ROCm7.0
- test_aot_inductor_custom_ops.py
- test_c10d_ops_nccl.py
- test_eager_transforms.py
- test_sparse_csr.py
- test_inductor_collectives.py
- test_fake_tensor.py
- test_cupy_as_tensor.py
- test_cuda.py: enable UTs that are working
- test_matmul_cuda.py: enable UTs that are working

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161715
Approved by: https://github.com/msaroufim

Co-authored-by: Mark Saroufim <marksaroufim@fb.com>
2025-09-09 15:49:21 +00:00
8235c4f65d Revert "[ROCm] Enabling several UTs (#161715)"
This reverts commit b9ba612f7a968f7b27e121ca8f4d0a4d954f5354.

Reverted https://github.com/pytorch/pytorch/pull/161715 on behalf of https://github.com/jeanschmidt due to Need to revert in order to revert https://github.com/pytorch/pytorch/pull/159473, feel free to merge it back once conflicts are cleared ([comment](https://github.com/pytorch/pytorch/pull/161715#issuecomment-3264040604))
2025-09-07 21:03:17 +00:00
b9ba612f7a [ROCm] Enabling several UTs (#161715)
All these UTs are working as is, just removing the skip
- test_p2p_ipc
- test_repros.py: working, added fp8 support
- test_activation_checkpointing.py
- test_content_store.py
- test_cuda_multigpu.py
- test_compute_comm_reordering.py
- test_segment_reductions.py
- test_dataloader.py
- test_math_ops.py
- test_loop_ordering.py
- test_control_flow.py
- distributed_test.py
- test_mem_tracker.py
- test_fsdp_optim_state.py
- test_fully_shard_mixed_precision.py: skippped for < ROCm7.0
- test_aot_inductor_custom_ops.py
- test_c10d_ops_nccl.py
- test_eager_transforms.py
- test_sparse_csr.py
- test_inductor_collectives.py
- test_fake_tensor.py
- test_cupy_as_tensor.py
- test_cuda.py: enable UTs that are working
- test_matmul_cuda.py: enable UTs that are working

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161715
Approved by: https://github.com/pruthvistony, https://github.com/jeffdaily
2025-09-04 20:43:03 +00:00
6162e650b0 [BE] remove torch deploy - conditionals (#158288)
This PR is part of the work to deprecate torch::deploy in OSS. Effectively it does 3 things to get started.
1. Remove test_deploy_interaction as we no longer need to worry about this
2. Remove all torch._running_with_deploy checks and use the False path always (surfaced 1)
3. Remove `USE_DEPLOY` and switch to the default path always

Note: MyPy does fail on a bunch of things here as a bunch of older files are touched. It may be better to fix these things on a separate PR

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158288
Approved by: https://github.com/albanD
2025-07-29 17:40:49 +00:00
f8fafdc7a6 Revert "[BE] remove torch deploy - conditionals (#158288)"
This reverts commit ab26d4fbeb5bc4b4e6ef1c37fbec9fab6e5a9edd.

Reverted https://github.com/pytorch/pytorch/pull/158288 on behalf of https://github.com/ZainRizvi due to Reverting as per offline discussion to fix internal breaks.  @PaliC will reland this as a codev diff. Instructions here: https://fburl.com/fixing-ghfirst-reverts ([comment](https://github.com/pytorch/pytorch/pull/158288#issuecomment-3119037960))
2025-07-25 16:09:39 +00:00
ab26d4fbeb [BE] remove torch deploy - conditionals (#158288)
This PR is part of the work to deprecate torch::deploy in OSS. Effectively it does 3 things to get started.
1. Remove test_deploy_interaction as we no longer need to worry about this
2. Remove all torch._running_with_deploy checks and use the False path always (surfaced 1)
3. Remove `USE_DEPLOY` and switch to the default path always

Note: MyPy does fail on a bunch of things here as a bunch of older files are touched. It may be better to fix these things on a separate PR

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158288
Approved by: https://github.com/albanD
2025-07-23 20:27:28 +00:00
ee5a434f8c Revert "[BE] remove torch deploy - conditionals (#158288)"
This reverts commit 1a4268b8113d5160d71225bab980f03c2318a0a4.

Reverted https://github.com/pytorch/pytorch/pull/158288 on behalf of https://github.com/ZainRizvi due to Sorry but this is breaking internally, see D78496147 for details. To validate your fixes internally, you can follow the instructions here: https://fburl.com/fixing-ghfirst-reverts ([comment](https://github.com/pytorch/pytorch/pull/158288#issuecomment-3099826158))
2025-07-21 23:17:39 +00:00
1a4268b811 [BE] remove torch deploy - conditionals (#158288)
This PR is part of the work to deprecate torch::deploy in OSS. Effectively it does 3 things to get started.
1. Remove test_deploy_interaction as we no longer need to worry about this
2. Remove all torch._running_with_deploy checks and use the False path always (surfaced 1)
3. Remove `USE_DEPLOY` and switch to the default path always

Note: MyPy does fail on a bunch of things here as a bunch of older files are touched. It may be better to fix these things on a separate PR

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158288
Approved by: https://github.com/albanD
2025-07-17 05:56:07 +00:00
fc0376e8b1 [BE][2/6] fix typos in test/ (test/test_*.py) (#157636)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157636
Approved by: https://github.com/yewentao256, https://github.com/mlazos
ghstack dependencies: #156311, #156609
2025-07-09 11:02:23 +00:00
f419067e50 [ROCm] improve sparse addmm, enable complex (#153262)
PR to:
- enable complex data types for sparse matmul on ROCm
- fix sparse addmm/baddbmm on ROCm
- fix sparse hipification for ROCm
- fix/enable sparse tests on ROCm (~40 tests total):
```
test_sparse_csr.py::TestSparseCSRCUDA::test_bmm_cuda_*
test_sparse.py::TestSparseCUDA::test_sparse_matmul_cuda_*
test_sparse_csr.py::TestSparseCSRCUDA::test_mm_cuda_float64
test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_all_sparse_csr_SparseCS*
test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_*
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153262
Approved by: https://github.com/jeffdaily, https://github.com/pruthvistony
2025-05-19 22:23:18 +00:00
eqy
17bf59340c [cuSPARSE][B200] Bump tolerances for test_sparse_csr matvec (#148721)
Small tolerance bump for blackwell (appears to use same kernel as prev. arches)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148721
Approved by: https://github.com/nWEIdia, https://github.com/ngimel
2025-04-16 18:44:18 +00:00
eqy
7bd7f735d4 [CUDA][SDPA] Compute reference in test_triton_scaled_dot_product_attention_block_size_16_cuda_float32 in float64 (#146461)
Seems to currently fail with mismatches in the 1e-4 range presumably due to sdpa calling into the `MATH` backend here which is less fused than a triton kernel. Doing the ref computation in `float64` appears to fix it.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146461
Approved by: https://github.com/drisspg
2025-02-06 23:28:56 +00:00
2e42be0595 Use random64 in Fischer-Yates algorithm for large N (#143682)
Fixes bug in randperm https://nbsanity.com/static/a4774194938414dedcec7d6e99727d31/Shuffling_20in_20torch_20vs_20numpy-public.html

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143682
Approved by: https://github.com/eqy, https://github.com/albanD, https://github.com/malfet
2025-01-07 03:48:56 +00:00
f6801ba4b3 Revert "Use random64 in Fischer-Yates algorithm for large N (#143682)"
This reverts commit 7013be0094e8d3ded2ba2f948082f98d63e622bb.

Reverted https://github.com/pytorch/pytorch/pull/143682 on behalf of https://github.com/wdvr due to failing Meta internal tests that need to be updated ([comment](https://github.com/pytorch/pytorch/pull/143682#issuecomment-2563487675))
2024-12-27 09:09:33 +00:00
7013be0094 Use random64 in Fischer-Yates algorithm for large N (#143682)
Fixes bug in randperm https://nbsanity.com/static/a4774194938414dedcec7d6e99727d31/Shuffling_20in_20torch_20vs_20numpy-public.html

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143682
Approved by: https://github.com/eqy, https://github.com/albanD
2024-12-25 01:19:19 +00:00
d8c8ba2440 Fix unused Python variables in test/[e-z]* (#136964)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136964
Approved by: https://github.com/justinchuby, https://github.com/albanD
2024-12-18 23:02:30 +00:00
8c840fb921 Add out_dtype kw argument to optimize_bsr_dense_addmm (#136626)
As in the title.

Addresses the task in https://github.com/pytorch/ao/pull/821#issuecomment-2373290266

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136626
Approved by: https://github.com/amjames, https://github.com/cpuhrsch
2024-10-22 09:52:25 +00:00
17ed403644 [ROCm] Enable test_triton* in test_sparse_csr suite (#137712)
All test_triton* UTs are now passing on ROCm within test_sparse_csr suite. See logs here: https://ossci-raw-job-status.s3.amazonaws.com/log/31376189926

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137712
Approved by: https://github.com/jithunnair-amd, https://github.com/malfet
2024-10-15 15:41:21 +00:00
b76d1b79e6 Add scaling arguments to bsr_dense_addmm (#136104)
As in the title.

Tackles https://github.com/pytorch/ao/pull/821/files#r1759821413

The PR assumes that the existing tuning parameters are good also when using scaling arguments. This needs to be verified as a follow-up task.

Also, this PR redefines triton-contiguous tensors: the tensor must have strides not larger than 1. This will now allow zero strides that previously triggered `contiguous` call although the underlying memory buffer was contiguous.

Re: "a considerable slow-down occurs because tensor data is copied element-wise rather than chunk-wise" - this note should refer to a code (torch or triton?) that implements the element/chunk-wise copy so that we could verify that allowing zero strides indeed would not trigger element-wise copies. Atm, the performance increase in ViT-H benchmarks (that involve using 0 strides) is an evidence that allowing zero strides does not lead to slow-downs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136104
Approved by: https://github.com/cpuhrsch
2024-09-16 20:26:54 +00:00
90c821814e SparseCsrCUDA: cuDSS backend for linalg.solve (#129856)
This PR switches to cuDSS library and has the same purpose of #127692, which is to add Sparse CSR tensor support to linalg.solve.
Fixes #69538

Minimum example of usage:
```
import torch

if __name__ == '__main__':
    spd = torch.rand(4, 3)
    A = spd.T @ spd
    b = torch.rand(3).to(torch.float64).cuda()
    A = A.to_sparse_csr().to(torch.float64).cuda()

    x = torch.linalg.solve(A, b)
    print((A @ x - b).norm())

```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129856
Approved by: https://github.com/amjames, https://github.com/lezcano, https://github.com/huydhn

Co-authored-by: Zihang Fang <zhfang1108@gmail.com>
Co-authored-by: Huy Do <huydhn@gmail.com>
2024-08-22 07:57:30 +00:00
345578afb4 Add int8 support to bsr_dense_addmm and bsr_dense_mm Triton kernels (#133855)
As in the title. In addition, the PR introduces `_int_bsr_dense_addmm` that is equivalent to `bsr_dense_addmm` except for int8 inputs the operation result is int32 tensor (similar to existing `_int_mm`).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133855
Approved by: https://github.com/cpuhrsch
2024-08-21 20:44:40 +00:00
215b14530a Add Half for sparse.mm reduce (#133672)
This PR is to add Half support for sparse.mm reduce in CPU backend.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133672
Approved by: https://github.com/Skylion007
2024-08-17 15:20:39 +00:00
1471473b84 Add tests to bsr_dense_addmm_meta. Tune bsr_dense_addmm kernel for ViT shapes. (#132646)
As in the title.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132646
Approved by: https://github.com/cpuhrsch
2024-08-05 20:22:33 +00:00
973037be6a [BE][Easy] apply autofix for ruff rules unnecessary-collection-call (C408): list() / tuple() / dict() (#130199)
This PR changes the empty collection factory call to Python literals:

- `list()` -> `[]`
- `tuple()` -> `()`
- `dict()` -> `{}`

The Python literals are more performant and safer. For example, the bytecode for building an empty dictionary:

```bash
$ python3 -m dis - <<EOS
import collections

d1 = {}
d2 = dict()

dict = collections.OrderedDict
d3 = dict()
EOS
```

```text
  0           0 RESUME                   0

  1           2 LOAD_CONST               0 (0)
              4 LOAD_CONST               1 (None)
              6 IMPORT_NAME              0 (collections)
              8 STORE_NAME               0 (collections)

  3          10 BUILD_MAP                0
             12 STORE_NAME               1 (d1)

  4          14 PUSH_NULL
             16 LOAD_NAME                2 (dict)
             18 CALL                     0
             26 STORE_NAME               3 (d2)

  6          28 LOAD_NAME                0 (collections)
             30 LOAD_ATTR                8 (OrderedDict)
             50 STORE_NAME               2 (dict)

  7          52 PUSH_NULL
             54 LOAD_NAME                2 (dict)
             56 CALL                     0
             64 STORE_NAME               5 (d3)
             66 RETURN_CONST             1 (None)
```

The dict literal `{}` only has one bytecode `BUILD_MAP`, while the factory call `dict()` has three `PUSH_NULL + LOAD_NAME + CALL`. Also, the factory call is not safe if users override the `dict` name in `locals` or `globals` (see the example of replacing with `OrderedDict` above).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130199
Approved by: https://github.com/malfet
2024-07-11 17:30:28 +00:00
5a1216bb2e [BE]: Update ruff to 0.4.1 (#124549)
Update ruff to 0.4.1 .
This version fixes a lot false negatives/false positives, is 20-40% faster, and has various other bug fixes.

Below is a before and after table showing the execution time of ruff lint and ruff format in milliseconds courtesy of https://astral.sh/blog/ruff-v0.4.0

| Repository                                         | Linter (v0.3) | Linter (v0.4) | Formatter (v0.3) | Formatter (v0.4) |
|----------------------------------------------------|---------------|---------------|------------------|------------------|
| [pytorch/pytorch](https://github.com/pytorch/pytorch) | 328.7         | 251.8         | 351.1            | 274.9            |

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124549
Approved by: https://github.com/ezyang
2024-04-21 14:06:23 +00:00
5494b2a8d3 enable test_sampled_addmm_zero_sized_cuda for rocm (#121940)
Enable test_sampled_addmm_zero_sized_cuda_* only for ROCm and CUDA issue is currently active.
Passed since ROCm 5.6

test_sampled_addmm_zero_sized_cuda_float32
test_sampled_addmm_zero_sized_cuda_float64
test_sampled_addmm_zero_sized_cuda_complex64
test_sampled_addmm_zero_sized_cuda_complex128

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121940
Approved by: https://github.com/pruthvistony, https://github.com/jeffdaily, https://github.com/malfet
2024-04-04 22:38:29 +00:00
72662bf05b [BE] Add torch.ops.aten._sparse_compressed_tensor_with_dims (#123083)
Used in https://github.com/pytorch/pytorch/pull/123084 and allows simplifying `empty_like` implementation for sparse compressed tensors (see https://github.com/pytorch/pytorch/pull/121900#issuecomment-2029835473).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123083
Approved by: https://github.com/cpuhrsch
2024-04-02 10:12:21 +00:00
656134c38f [ROCm] enable complex128 in test_addmm_sizes_all_sparse_csr for rocm for trivial (k,n,m) cases (#120504)
This PR enables `test_addmm_sizes_all_sparse_csr_k_*_n_*_m_*_cuda_complex128` for ROCm for trivial cases  (m or n or k = 0)

CUSPARSE_SPMM_COMPLEX128_SUPPORTED also used for `test_addmm_all_sparse_csr` and ` test_sparse_matmul` and both of them are skipped for ROCm by `@skipIfRocm` or `@skipCUDAIf(not _check_cusparse_spgemm_available())`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120504
Approved by: https://github.com/jithunnair-amd, https://github.com/ezyang
2024-03-12 07:29:57 +00:00
c7328602ed [ROCm] enable tests test_sampled_addmm_autograd_cuda_*, test_sample… (#117501)
These tests PASS on ROCM 5.6+ now:

- test_sampled_addmm_autograd_cuda_complex128
- test_sampled_addmm_autograd_cuda_complex64
- test_sampled_addmm_autograd_cuda_float32
- test_sampled_addmm_autograd_cuda_float64
- test_sampled_addmm_cuda_complex128
- test_sampled_addmm_cuda_complex64
- test_sampled_addmm_cuda_float32
- test_sampled_addmm_cuda_float64
- test_autograd_dense_output_addmm_cuda_float64
- test_autograd_dense_output_addmv_cuda_float64
- test_autograd_dense_output_mv_cuda_float64

@pruthvistony @jithunnair-amd

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117501
Approved by: https://github.com/pruthvistony, https://github.com/jeffdaily, https://github.com/malfet
2024-02-22 17:24:25 +00:00
3a8bf25fdd [SparseCsr] Remove triton sdpa skip after triton pin update (#109601)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109601
Approved by: https://github.com/desertfire, https://github.com/amjames
2024-02-08 16:40:25 +00:00
c6c851102f Fix test_compressed_layout_conversions_coverage to check BSC format (#117951)
test_compressed_layout_conversions_coverage verifies torch's conversions between different memory layouts using numpy as a reference. Since numpy doesn't support BSC format it just skipped that. Instead fake it by using a transposed BSR format.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117951
Approved by: https://github.com/zou3519
2024-02-03 08:10:15 +00:00
a27a6e8cf1 [ROCm] skip test_sparse_csr test_triton_bsr_softmax_cuda (#118006)
The tests were taking too long and leading to CI timeouts.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118006
Approved by: https://github.com/huydhn
2024-01-23 00:09:42 +00:00
9dbe4eae82 [codemod] markDynamoStrictTest batch 14 (#117133)
[codemod] markDynamoStrictTest test_utils
[codemod] markDynamoStrictTest test_unary_ufuncs
[codemod] markDynamoStrictTest test_sparse_semi_structured
[codemod] markDynamoStrictTest test_sparse_csr
[codemod] markDynamoStrictTest test_sparse
[codemod] markDynamoStrictTest test_reductions
[codemod] markDynamoStrictTest test_proxy_tensor
[codemod] markDynamoStrictTest test_prims
[codemod] markDynamoStrictTest test_maskedtensor
[codemod] markDynamoStrictTest test_masked
[codemod] markDynamoStrictTest test_legacy_vmap
[codemod] markDynamoStrictTest test_binary_ufuncs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117133
Approved by: https://github.com/voznesenskym
ghstack dependencies: #117114, #117127, #117128, #117129
2024-01-11 04:28:57 +00:00
db79ceb110 [ROCm] Enabling additional UTs on ROCm (#115738)
Unskips mostly for dynamo/inductor UT.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115738
Approved by: https://github.com/jithunnair-amd, https://github.com/malfet
2024-01-09 08:36:07 +00:00
4bfaa6bc25 [MPS] Fix addmm (#116547)
Remove weird logic for designating matrices as transposed if sizes match(which always true if square matrices are multiplied with each other), which resulted in `torch.addmm` returns transposed matrix compared to `torch.mm`, see below:
```
% python -c "import torch;torch.set_default_device('mps');a=torch.eye(2);b=torch.arange(4.0).reshape(2, 2);print(a@b);print(torch.addmm(torch.zeros(2, 2), a,b))"
tensor([[0., 1.],
        [2., 3.]], device='mps:0')
tensor([[0., 2.],
        [1., 3.]], device='mps:0')
```

Fixes introduced to `torch.mm` in https://github.com/pytorch/pytorch/pull/77462 suggests that this is not needed

Modify `sample_inputs_addmm` to test `torch.addmm` with square matrices, but skip this config for `test_autograd_dense_output_addmm`, see https://github.com/pytorch/pytorch/issues/116565

TODO: probably tweak tolerances, as `test_output_match_addmm_cpu_float16` fails with 2x2 matrices, but passes using 3x3 ones with errors slightly exceeding the tolerance

Fixes https://github.com/pytorch/pytorch/issues/116331
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116547
Approved by: https://github.com/albanD, https://github.com/Skylion007
2023-12-31 02:28:59 +00:00
4b97ed2ed8 [SparseCompressed] support csc layout for add sparse/dense. (#115433)
`add` when passed one sparse and one dense argument  will error if the
sparse argument does not have  csr layout. This PR modifies the
underlying algorithm to be generic on the compressed dimension handling
both csr and csc. The functions are renamed to use the
`sparse_compressed` qualifier rather than `sparse_csr`

Fixes: #114807

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115433
Approved by: https://github.com/cpuhrsch, https://github.com/pearu
ghstack dependencies: #115432
2023-12-22 01:47:55 +00:00
910baa3a03 [SparseCompressed] Support add(sparse_compressed, dense) (#115432)
Addition involving sparse compressed and dense arguments is implemented
requiring that the dense tensor be on the LHS. This change adds support
for the other pattern `sparse + dense by permuting arguments.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115432
Approved by: https://github.com/cpuhrsch, https://github.com/pearu
2023-12-22 01:47:55 +00:00
6de28e92d2 [BE]: Apply FURB118 (prev): replaces unnecessary lambdas with operator. (#116027)
This replaces a bunch of unnecessary lambdas with the operator package. This is semantically equivalent, but the operator package is faster, and arguably more readable. When the FURB rules are taken out of preview, I will enable it as a ruff check.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116027
Approved by: https://github.com/malfet
2023-12-20 19:35:08 +00:00
d72d99e591 Fix sparse compressed tensor invariants checks when nnz==0 (#115826)
Fixes https://github.com/pytorch/pytorch/issues/115755

This PR is a step toward deprecating `torch.empty(..., layout=<sparse compressed tensor layout>)` that usage should be minimized as it will produce invalid tensors, see also https://github.com/pytorch/pytorch/issues/90695 .

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115826
Approved by: https://github.com/cpuhrsch, https://github.com/amjames
2023-12-20 12:16:07 +00:00
419f2ca3e3 Fix a crash in sparse compressed tensor invariants check when nnz == 0 (#115825)
Fixes python crash example from https://github.com/pytorch/pytorch/issues/115755

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115825
Approved by: https://github.com/cpuhrsch
2023-12-17 17:36:15 +00:00
32286512cc Add tune_bsr_dense_addmm as an API to find optimal triton kernel parameters for bsr_dense_addmm (#115499)
As in the title.

In addition:
- improve the algorithm for finding a minima of operation timings: break the inner loop early when a next minima candidate is found
- add tests and fix bugs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115499
Approved by: https://github.com/cpuhrsch
2023-12-12 16:44:51 +00:00