pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 12:54:11 +08:00

Author	SHA1	Message	Date
Yuanyuan Chen	fdab48a7c1	Enable all PIE rules on ruff (#165814 ) This PR enables all PIE rules on ruff, there are already some enabled rules from this family, the new added rules are ``` PIE796 Enum contains duplicate value: {value} PIE808 Unnecessary start argument in range ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/165814 Approved by: https://github.com/ezyang	2025-10-18 07:36:18 +00:00
PyTorch MergeBot	24520b8386	Revert "Enable all PIE rules on ruff (#165814 )" This reverts commit c79dfdc6550e872783aa5cb5fc9e86589bf18872. Reverted https://github.com/pytorch/pytorch/pull/165814 on behalf of https://github.com/cyyever due to Need to cover more files ([comment](https://github.com/pytorch/pytorch/pull/165814#issuecomment-3417931863))	2025-10-18 07:21:08 +00:00
Yuanyuan Chen	c79dfdc655	Enable all PIE rules on ruff (#165814 ) This PR enables all PIE rules on ruff, there are already some enabled rules from this family, the new added rules are ``` PIE796 Enum contains duplicate value: {value} PIE808 Unnecessary start argument in range ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/165814 Approved by: https://github.com/ezyang	2025-10-18 06:40:12 +00:00
Yuanyuan Chen	e925dfcc6b	Enable all SIM rules except disabled ones (#164645 ) `SIM` rules are useful for simplifying boolean expressions and enhances code readability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164645 Approved by: https://github.com/ezyang, https://github.com/mlazos	2025-10-17 07:27:11 +00:00
Yuanyuan Chen	b63bbe1661	Remove old ROCm version check in tests (#164245 ) This PR removes ROCm<6 version checks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164245 Approved by: https://github.com/jeffdaily	2025-10-06 22:42:01 +00:00
can-gaa-hou	e64dd8c694	[Fix] Adding missing `f` prefixes to formatted strings [4/N] (#164068 ) As stated in the title. * __->__ #164068 * #164067 * #164066 * #164065 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164068 Approved by: https://github.com/Skylion007	2025-09-29 04:07:07 +00:00
Dmitry Nikolaev	7194d77550	Revert "enable test_sampled_addmm_zero_sized_cuda for rocm (#121940 )" (#163848 ) This reverts commit 5494b2a8d38c3ddbeb2d96a5ac990e20ec4c48fd. Need to skip `test_sparse_csr.py::TestSparseCSRCUDA::test_sampled_addmm_zero_sized_cuda_` again. Tests are failing now with "core dumped" error ``` python test_sparse_csr.py -v -k test_sampled_addmm_zero_sized_cuda_float64 test_sampled_addmm_zero_sized_cuda_float64 (__main__.TestSparseCSRCUDA) ... /tmp/pytorch/test/test_sparse_csr.py:2503: c = torch.empty(m, n, dtype=dtype, device=device, layout=torch.sparse_csr) GPU core dump created: gpucore.186789 :0:rocdevice.cpp :2992: 4701819131755 us: Callback: Queue 0x760cdcd00000 aborting with error : HSA_STATUS_ERROR_EXCEPTION: An HSAIL operation resulted in a hardware exception. code: 0x1016 Aborted (core dumped) ``` These failures are linked to `test_sparse_csr.py::TestSparseCSRCUDA::test_select_SparseBSC_int32_cuda_` due to incorrect test log parsing. We will be able to close these issues also: - Fixes https://github.com/pytorch/pytorch/issues/163663 - Fixes https://github.com/pytorch/pytorch/issues/160786 - Fixes https://github.com/pytorch/pytorch/issues/160785 - Fixes https://github.com/pytorch/pytorch/issues/160784 Pull Request resolved: https://github.com/pytorch/pytorch/pull/163848 Approved by: https://github.com/jeffdaily	2025-09-25 16:38:00 +00:00
Jeff Daily	ebddbe787a	[ROCm][CI] skip test_sparse_triangular_solve (#163651 ) need more time to debug, but also need clean CI signal test was unskipped by #163495, but had been skipp on rocm prior Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/163651 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-09-23 15:55:51 +00:00
Yuanyuan Chen	5d749ceb92	Remove test conditions for CUDA<12 (#163495 ) Because it required that CUDA >=12. Pull Request resolved: https://github.com/pytorch/pytorch/pull/163495 Approved by: https://github.com/janeyx99	2025-09-23 07:52:00 +00:00
Prachi Gupta	c0142f5c06	[ROCm] Enabling several UTs (#161715 ) All these UTs are working as is, just removing the skip - test_p2p_ipc - test_repros.py: working, added fp8 support - test_activation_checkpointing.py - test_content_store.py - test_cuda_multigpu.py - test_compute_comm_reordering.py - test_segment_reductions.py - test_dataloader.py - test_math_ops.py - test_loop_ordering.py - test_control_flow.py - distributed_test.py - test_mem_tracker.py - test_fsdp_optim_state.py - test_fully_shard_mixed_precision.py: skippped for < ROCm7.0 - test_aot_inductor_custom_ops.py - test_c10d_ops_nccl.py - test_eager_transforms.py - test_sparse_csr.py - test_inductor_collectives.py - test_fake_tensor.py - test_cupy_as_tensor.py - test_cuda.py: enable UTs that are working - test_matmul_cuda.py: enable UTs that are working Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/161715 Approved by: https://github.com/msaroufim Co-authored-by: Mark Saroufim <marksaroufim@fb.com>	2025-09-09 15:49:21 +00:00
PyTorch MergeBot	8235c4f65d	Revert "[ROCm] Enabling several UTs (#161715 )" This reverts commit b9ba612f7a968f7b27e121ca8f4d0a4d954f5354. Reverted https://github.com/pytorch/pytorch/pull/161715 on behalf of https://github.com/jeanschmidt due to Need to revert in order to revert https://github.com/pytorch/pytorch/pull/159473, feel free to merge it back once conflicts are cleared ([comment](https://github.com/pytorch/pytorch/pull/161715#issuecomment-3264040604))	2025-09-07 21:03:17 +00:00
Prachi Gupta	b9ba612f7a	[ROCm] Enabling several UTs (#161715 ) All these UTs are working as is, just removing the skip - test_p2p_ipc - test_repros.py: working, added fp8 support - test_activation_checkpointing.py - test_content_store.py - test_cuda_multigpu.py - test_compute_comm_reordering.py - test_segment_reductions.py - test_dataloader.py - test_math_ops.py - test_loop_ordering.py - test_control_flow.py - distributed_test.py - test_mem_tracker.py - test_fsdp_optim_state.py - test_fully_shard_mixed_precision.py: skippped for < ROCm7.0 - test_aot_inductor_custom_ops.py - test_c10d_ops_nccl.py - test_eager_transforms.py - test_sparse_csr.py - test_inductor_collectives.py - test_fake_tensor.py - test_cupy_as_tensor.py - test_cuda.py: enable UTs that are working - test_matmul_cuda.py: enable UTs that are working Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/161715 Approved by: https://github.com/pruthvistony, https://github.com/jeffdaily	2025-09-04 20:43:03 +00:00
PaliC	6162e650b0	[BE] remove torch deploy - conditionals (#158288 ) This PR is part of the work to deprecate torch::deploy in OSS. Effectively it does 3 things to get started. 1. Remove test_deploy_interaction as we no longer need to worry about this 2. Remove all torch._running_with_deploy checks and use the False path always (surfaced 1) 3. Remove `USE_DEPLOY` and switch to the default path always Note: MyPy does fail on a bunch of things here as a bunch of older files are touched. It may be better to fix these things on a separate PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/158288 Approved by: https://github.com/albanD	2025-07-29 17:40:49 +00:00
PyTorch MergeBot	f8fafdc7a6	Revert "[BE] remove torch deploy - conditionals (#158288 )" This reverts commit ab26d4fbeb5bc4b4e6ef1c37fbec9fab6e5a9edd. Reverted https://github.com/pytorch/pytorch/pull/158288 on behalf of https://github.com/ZainRizvi due to Reverting as per offline discussion to fix internal breaks. @PaliC will reland this as a codev diff. Instructions here: https://fburl.com/fixing-ghfirst-reverts ([comment](https://github.com/pytorch/pytorch/pull/158288#issuecomment-3119037960))	2025-07-25 16:09:39 +00:00
PaliC	ab26d4fbeb	[BE] remove torch deploy - conditionals (#158288 ) This PR is part of the work to deprecate torch::deploy in OSS. Effectively it does 3 things to get started. 1. Remove test_deploy_interaction as we no longer need to worry about this 2. Remove all torch._running_with_deploy checks and use the False path always (surfaced 1) 3. Remove `USE_DEPLOY` and switch to the default path always Note: MyPy does fail on a bunch of things here as a bunch of older files are touched. It may be better to fix these things on a separate PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/158288 Approved by: https://github.com/albanD	2025-07-23 20:27:28 +00:00
PyTorch MergeBot	ee5a434f8c	Revert "[BE] remove torch deploy - conditionals (#158288 )" This reverts commit 1a4268b8113d5160d71225bab980f03c2318a0a4. Reverted https://github.com/pytorch/pytorch/pull/158288 on behalf of https://github.com/ZainRizvi due to Sorry but this is breaking internally, see D78496147 for details. To validate your fixes internally, you can follow the instructions here: https://fburl.com/fixing-ghfirst-reverts ([comment](https://github.com/pytorch/pytorch/pull/158288#issuecomment-3099826158))	2025-07-21 23:17:39 +00:00
PaliC	1a4268b811	[BE] remove torch deploy - conditionals (#158288 ) This PR is part of the work to deprecate torch::deploy in OSS. Effectively it does 3 things to get started. 1. Remove test_deploy_interaction as we no longer need to worry about this 2. Remove all torch._running_with_deploy checks and use the False path always (surfaced 1) 3. Remove `USE_DEPLOY` and switch to the default path always Note: MyPy does fail on a bunch of things here as a bunch of older files are touched. It may be better to fix these things on a separate PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/158288 Approved by: https://github.com/albanD	2025-07-17 05:56:07 +00:00
Xuehai Pan	fc0376e8b1	[BE][2/6] fix typos in test/ (test/test_*.py) (#157636 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/157636 Approved by: https://github.com/yewentao256, https://github.com/mlazos ghstack dependencies: #156311, #156609	2025-07-09 11:02:23 +00:00
Dmitry Nikolaev	f419067e50	[ROCm] improve sparse addmm, enable complex (#153262 ) PR to: - enable complex data types for sparse matmul on ROCm - fix sparse addmm/baddbmm on ROCm - fix sparse hipification for ROCm - fix/enable sparse tests on ROCm (~40 tests total): ``` test_sparse_csr.py::TestSparseCSRCUDA::test_bmm_cuda_* test_sparse.py::TestSparseCUDA::test_sparse_matmul_cuda_* test_sparse_csr.py::TestSparseCSRCUDA::test_mm_cuda_float64 test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_all_sparse_csr_SparseCS* test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_* ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/153262 Approved by: https://github.com/jeffdaily, https://github.com/pruthvistony	2025-05-19 22:23:18 +00:00
eqy	17bf59340c	[cuSPARSE][B200] Bump tolerances for test_sparse_csr matvec (#148721 ) Small tolerance bump for blackwell (appears to use same kernel as prev. arches) Pull Request resolved: https://github.com/pytorch/pytorch/pull/148721 Approved by: https://github.com/nWEIdia, https://github.com/ngimel	2025-04-16 18:44:18 +00:00
eqy	7bd7f735d4	[CUDA][SDPA] Compute reference in `test_triton_scaled_dot_product_attention_block_size_16_cuda_float32` in `float64` (#146461 ) Seems to currently fail with mismatches in the 1e-4 range presumably due to sdpa calling into the `MATH` backend here which is less fused than a triton kernel. Doing the ref computation in `float64` appears to fix it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146461 Approved by: https://github.com/drisspg	2025-02-06 23:28:56 +00:00
Natalia Gimelshein	2e42be0595	Use random64 in Fischer-Yates algorithm for large N (#143682 ) Fixes bug in randperm https://nbsanity.com/static/a4774194938414dedcec7d6e99727d31/Shuffling_20in_20torch_20vs_20numpy-public.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/143682 Approved by: https://github.com/eqy, https://github.com/albanD, https://github.com/malfet	2025-01-07 03:48:56 +00:00
PyTorch MergeBot	f6801ba4b3	Revert "Use random64 in Fischer-Yates algorithm for large N (#143682 )" This reverts commit 7013be0094e8d3ded2ba2f948082f98d63e622bb. Reverted https://github.com/pytorch/pytorch/pull/143682 on behalf of https://github.com/wdvr due to failing Meta internal tests that need to be updated ([comment](https://github.com/pytorch/pytorch/pull/143682#issuecomment-2563487675))	2024-12-27 09:09:33 +00:00
Natalia Gimelshein	7013be0094	Use random64 in Fischer-Yates algorithm for large N (#143682 ) Fixes bug in randperm https://nbsanity.com/static/a4774194938414dedcec7d6e99727d31/Shuffling_20in_20torch_20vs_20numpy-public.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/143682 Approved by: https://github.com/eqy, https://github.com/albanD	2024-12-25 01:19:19 +00:00
Tom Ritchford	d8c8ba2440	Fix unused Python variables in test/[e-z]* (#136964 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136964 Approved by: https://github.com/justinchuby, https://github.com/albanD	2024-12-18 23:02:30 +00:00
Pearu Peterson	8c840fb921	Add out_dtype kw argument to optimize_bsr_dense_addmm (#136626 ) As in the title. Addresses the task in https://github.com/pytorch/ao/pull/821#issuecomment-2373290266 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136626 Approved by: https://github.com/amjames, https://github.com/cpuhrsch	2024-10-22 09:52:25 +00:00
Prachi Gupta	17ed403644	[ROCm] Enable test_triton* in test_sparse_csr suite (#137712 ) All test_triton* UTs are now passing on ROCm within test_sparse_csr suite. See logs here: https://ossci-raw-job-status.s3.amazonaws.com/log/31376189926 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/137712 Approved by: https://github.com/jithunnair-amd, https://github.com/malfet	2024-10-15 15:41:21 +00:00
Pearu Peterson	b76d1b79e6	Add scaling arguments to bsr_dense_addmm (#136104 ) As in the title. Tackles https://github.com/pytorch/ao/pull/821/files#r1759821413 The PR assumes that the existing tuning parameters are good also when using scaling arguments. This needs to be verified as a follow-up task. Also, this PR redefines triton-contiguous tensors: the tensor must have strides not larger than 1. This will now allow zero strides that previously triggered `contiguous` call although the underlying memory buffer was contiguous. Re: "a considerable slow-down occurs because tensor data is copied element-wise rather than chunk-wise" - this note should refer to a code (torch or triton?) that implements the element/chunk-wise copy so that we could verify that allowing zero strides indeed would not trigger element-wise copies. Atm, the performance increase in ViT-H benchmarks (that involve using 0 strides) is an evidence that allowing zero strides does not lead to slow-downs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/136104 Approved by: https://github.com/cpuhrsch	2024-09-16 20:26:54 +00:00
Zitong Zhan	90c821814e	SparseCsrCUDA: cuDSS backend for linalg.solve (#129856 ) This PR switches to cuDSS library and has the same purpose of #127692, which is to add Sparse CSR tensor support to linalg.solve. Fixes #69538 Minimum example of usage: ``` import torch if __name__ == '__main__': spd = torch.rand(4, 3) A = spd.T @ spd b = torch.rand(3).to(torch.float64).cuda() A = A.to_sparse_csr().to(torch.float64).cuda() x = torch.linalg.solve(A, b) print((A @ x - b).norm()) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129856 Approved by: https://github.com/amjames, https://github.com/lezcano, https://github.com/huydhn Co-authored-by: Zihang Fang <zhfang1108@gmail.com> Co-authored-by: Huy Do <huydhn@gmail.com>	2024-08-22 07:57:30 +00:00
Pearu Peterson	345578afb4	Add int8 support to bsr_dense_addmm and bsr_dense_mm Triton kernels (#133855 ) As in the title. In addition, the PR introduces `_int_bsr_dense_addmm` that is equivalent to `bsr_dense_addmm` except for int8 inputs the operation result is int32 tensor (similar to existing `_int_mm`). Pull Request resolved: https://github.com/pytorch/pytorch/pull/133855 Approved by: https://github.com/cpuhrsch	2024-08-21 20:44:40 +00:00
Jiang, Yanbing	215b14530a	Add Half for sparse.mm reduce (#133672 ) This PR is to add Half support for sparse.mm reduce in CPU backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133672 Approved by: https://github.com/Skylion007	2024-08-17 15:20:39 +00:00
Pearu Peterson	1471473b84	Add tests to bsr_dense_addmm_meta. Tune bsr_dense_addmm kernel for ViT shapes. (#132646 ) As in the title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132646 Approved by: https://github.com/cpuhrsch	2024-08-05 20:22:33 +00:00
Xuehai Pan	973037be6a	[BE][Easy] apply autofix for ruff rules unnecessary-collection-call (C408): `list()` / `tuple()` / `dict()` (#130199 ) This PR changes the empty collection factory call to Python literals: - `list()` -> `[]` - `tuple()` -> `()` - `dict()` -> `{}` The Python literals are more performant and safer. For example, the bytecode for building an empty dictionary: ```bash $ python3 -m dis - <<EOS import collections d1 = {} d2 = dict() dict = collections.OrderedDict d3 = dict() EOS ``` ```text 0 0 RESUME 0 1 2 LOAD_CONST 0 (0) 4 LOAD_CONST 1 (None) 6 IMPORT_NAME 0 (collections) 8 STORE_NAME 0 (collections) 3 10 BUILD_MAP 0 12 STORE_NAME 1 (d1) 4 14 PUSH_NULL 16 LOAD_NAME 2 (dict) 18 CALL 0 26 STORE_NAME 3 (d2) 6 28 LOAD_NAME 0 (collections) 30 LOAD_ATTR 8 (OrderedDict) 50 STORE_NAME 2 (dict) 7 52 PUSH_NULL 54 LOAD_NAME 2 (dict) 56 CALL 0 64 STORE_NAME 5 (d3) 66 RETURN_CONST 1 (None) ``` The dict literal `{}` only has one bytecode `BUILD_MAP`, while the factory call `dict()` has three `PUSH_NULL + LOAD_NAME + CALL`. Also, the factory call is not safe if users override the `dict` name in `locals` or `globals` (see the example of replacing with `OrderedDict` above). Pull Request resolved: https://github.com/pytorch/pytorch/pull/130199 Approved by: https://github.com/malfet	2024-07-11 17:30:28 +00:00
Aaron Gokaslan	5a1216bb2e	[BE]: Update ruff to 0.4.1 (#124549 ) Update ruff to 0.4.1 . This version fixes a lot false negatives/false positives, is 20-40% faster, and has various other bug fixes. Below is a before and after table showing the execution time of ruff lint and ruff format in milliseconds courtesy of https://astral.sh/blog/ruff-v0.4.0 \| Repository \| Linter (v0.3) \| Linter (v0.4) \| Formatter (v0.3) \| Formatter (v0.4) \| \|----------------------------------------------------\|---------------\|---------------\|------------------\|------------------\| \| [pytorch/pytorch](https://github.com/pytorch/pytorch) \| 328.7 \| 251.8 \| 351.1 \| 274.9 \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/124549 Approved by: https://github.com/ezyang	2024-04-21 14:06:23 +00:00
Dmitry Nikolaev	5494b2a8d3	enable test_sampled_addmm_zero_sized_cuda for rocm (#121940 ) Enable test_sampled_addmm_zero_sized_cuda_* only for ROCm and CUDA issue is currently active. Passed since ROCm 5.6 test_sampled_addmm_zero_sized_cuda_float32 test_sampled_addmm_zero_sized_cuda_float64 test_sampled_addmm_zero_sized_cuda_complex64 test_sampled_addmm_zero_sized_cuda_complex128 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121940 Approved by: https://github.com/pruthvistony, https://github.com/jeffdaily, https://github.com/malfet	2024-04-04 22:38:29 +00:00
Pearu Peterson	72662bf05b	[BE] Add torch.ops.aten._sparse_compressed_tensor_with_dims (#123083 ) Used in https://github.com/pytorch/pytorch/pull/123084 and allows simplifying `empty_like` implementation for sparse compressed tensors (see https://github.com/pytorch/pytorch/pull/121900#issuecomment-2029835473). Pull Request resolved: https://github.com/pytorch/pytorch/pull/123083 Approved by: https://github.com/cpuhrsch	2024-04-02 10:12:21 +00:00
Dmitry Nikolaev	656134c38f	[ROCm] enable complex128 in test_addmm_sizes_all_sparse_csr for rocm for trivial (k,n,m) cases (#120504 ) This PR enables `test_addmm_sizes_all_sparse_csr_k__n__m_*_cuda_complex128` for ROCm for trivial cases (m or n or k = 0) CUSPARSE_SPMM_COMPLEX128_SUPPORTED also used for `test_addmm_all_sparse_csr` and ` test_sparse_matmul` and both of them are skipped for ROCm by `@skipIfRocm` or `@skipCUDAIf(not _check_cusparse_spgemm_available())` Pull Request resolved: https://github.com/pytorch/pytorch/pull/120504 Approved by: https://github.com/jithunnair-amd, https://github.com/ezyang	2024-03-12 07:29:57 +00:00
Dmitry Nikolaev	c7328602ed	[ROCm] enable tests test_sampled_addmm_autograd_cuda_*, test_sample… (#117501 ) These tests PASS on ROCM 5.6+ now: - test_sampled_addmm_autograd_cuda_complex128 - test_sampled_addmm_autograd_cuda_complex64 - test_sampled_addmm_autograd_cuda_float32 - test_sampled_addmm_autograd_cuda_float64 - test_sampled_addmm_cuda_complex128 - test_sampled_addmm_cuda_complex64 - test_sampled_addmm_cuda_float32 - test_sampled_addmm_cuda_float64 - test_autograd_dense_output_addmm_cuda_float64 - test_autograd_dense_output_addmv_cuda_float64 - test_autograd_dense_output_mv_cuda_float64 @pruthvistony @jithunnair-amd Pull Request resolved: https://github.com/pytorch/pytorch/pull/117501 Approved by: https://github.com/pruthvistony, https://github.com/jeffdaily, https://github.com/malfet	2024-02-22 17:24:25 +00:00
Peter Bell	3a8bf25fdd	[SparseCsr] Remove triton sdpa skip after triton pin update (#109601 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109601 Approved by: https://github.com/desertfire, https://github.com/amjames	2024-02-08 16:40:25 +00:00
Aaron Orenstein	c6c851102f	Fix test_compressed_layout_conversions_coverage to check BSC format (#117951 ) test_compressed_layout_conversions_coverage verifies torch's conversions between different memory layouts using numpy as a reference. Since numpy doesn't support BSC format it just skipped that. Instead fake it by using a transposed BSR format. Pull Request resolved: https://github.com/pytorch/pytorch/pull/117951 Approved by: https://github.com/zou3519	2024-02-03 08:10:15 +00:00
Jeff Daily	a27a6e8cf1	[ROCm] skip test_sparse_csr test_triton_bsr_softmax_cuda (#118006 ) The tests were taking too long and leading to CI timeouts. Pull Request resolved: https://github.com/pytorch/pytorch/pull/118006 Approved by: https://github.com/huydhn	2024-01-23 00:09:42 +00:00
rzou	9dbe4eae82	[codemod] markDynamoStrictTest batch 14 (#117133 ) [codemod] markDynamoStrictTest test_utils [codemod] markDynamoStrictTest test_unary_ufuncs [codemod] markDynamoStrictTest test_sparse_semi_structured [codemod] markDynamoStrictTest test_sparse_csr [codemod] markDynamoStrictTest test_sparse [codemod] markDynamoStrictTest test_reductions [codemod] markDynamoStrictTest test_proxy_tensor [codemod] markDynamoStrictTest test_prims [codemod] markDynamoStrictTest test_maskedtensor [codemod] markDynamoStrictTest test_masked [codemod] markDynamoStrictTest test_legacy_vmap [codemod] markDynamoStrictTest test_binary_ufuncs Pull Request resolved: https://github.com/pytorch/pytorch/pull/117133 Approved by: https://github.com/voznesenskym ghstack dependencies: #117114, #117127, #117128, #117129	2024-01-11 04:28:57 +00:00
Jack Taylor	db79ceb110	[ROCm] Enabling additional UTs on ROCm (#115738 ) Unskips mostly for dynamo/inductor UT. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115738 Approved by: https://github.com/jithunnair-amd, https://github.com/malfet	2024-01-09 08:36:07 +00:00
Nikita Shulga	4bfaa6bc25	[MPS] Fix addmm (#116547 ) Remove weird logic for designating matrices as transposed if sizes match(which always true if square matrices are multiplied with each other), which resulted in `torch.addmm` returns transposed matrix compared to `torch.mm`, see below: ``` % python -c "import torch;torch.set_default_device('mps');a=torch.eye(2);b=torch.arange(4.0).reshape(2, 2);print(a@b);print(torch.addmm(torch.zeros(2, 2), a,b))" tensor([[0., 1.], [2., 3.]], device='mps:0') tensor([[0., 2.], [1., 3.]], device='mps:0') ``` Fixes introduced to `torch.mm` in https://github.com/pytorch/pytorch/pull/77462 suggests that this is not needed Modify `sample_inputs_addmm` to test `torch.addmm` with square matrices, but skip this config for `test_autograd_dense_output_addmm`, see https://github.com/pytorch/pytorch/issues/116565 TODO: probably tweak tolerances, as `test_output_match_addmm_cpu_float16` fails with 2x2 matrices, but passes using 3x3 ones with errors slightly exceeding the tolerance Fixes https://github.com/pytorch/pytorch/issues/116331 Pull Request resolved: https://github.com/pytorch/pytorch/pull/116547 Approved by: https://github.com/albanD, https://github.com/Skylion007	2023-12-31 02:28:59 +00:00
Andrew M. James	4b97ed2ed8	[SparseCompressed] support csc layout for add sparse/dense. (#115433 ) `add` when passed one sparse and one dense argument will error if the sparse argument does not have csr layout. This PR modifies the underlying algorithm to be generic on the compressed dimension handling both csr and csc. The functions are renamed to use the `sparse_compressed` qualifier rather than `sparse_csr` Fixes: #114807 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115433 Approved by: https://github.com/cpuhrsch, https://github.com/pearu ghstack dependencies: #115432	2023-12-22 01:47:55 +00:00
Andrew M. James	910baa3a03	[SparseCompressed] Support `add(sparse_compressed, dense)` (#115432 ) Addition involving sparse compressed and dense arguments is implemented requiring that the dense tensor be on the LHS. This change adds support for the other pattern `sparse + dense by permuting arguments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115432 Approved by: https://github.com/cpuhrsch, https://github.com/pearu	2023-12-22 01:47:55 +00:00
Aaron Gokaslan	6de28e92d2	[BE]: Apply FURB118 (prev): replaces unnecessary lambdas with operator. (#116027 ) This replaces a bunch of unnecessary lambdas with the operator package. This is semantically equivalent, but the operator package is faster, and arguably more readable. When the FURB rules are taken out of preview, I will enable it as a ruff check. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116027 Approved by: https://github.com/malfet	2023-12-20 19:35:08 +00:00
Pearu Peterson	d72d99e591	Fix sparse compressed tensor invariants checks when nnz==0 (#115826 ) Fixes https://github.com/pytorch/pytorch/issues/115755 This PR is a step toward deprecating `torch.empty(..., layout=<sparse compressed tensor layout>)` that usage should be minimized as it will produce invalid tensors, see also https://github.com/pytorch/pytorch/issues/90695 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/115826 Approved by: https://github.com/cpuhrsch, https://github.com/amjames	2023-12-20 12:16:07 +00:00
Pearu Peterson	419f2ca3e3	Fix a crash in sparse compressed tensor invariants check when nnz == 0 (#115825 ) Fixes python crash example from https://github.com/pytorch/pytorch/issues/115755 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115825 Approved by: https://github.com/cpuhrsch	2023-12-17 17:36:15 +00:00
Pearu Peterson	32286512cc	Add tune_bsr_dense_addmm as an API to find optimal triton kernel parameters for bsr_dense_addmm (#115499 ) As in the title. In addition: - improve the algorithm for finding a minima of operation timings: break the inner loop early when a next minima candidate is found - add tests and fix bugs Pull Request resolved: https://github.com/pytorch/pytorch/pull/115499 Approved by: https://github.com/cpuhrsch	2023-12-12 16:44:51 +00:00

1 2 3 4 5 ...

292 Commits