Fixes#122757
## Test Result
```python
import torch
model_output = torch.randn(10, 5).cuda()
labels = torch.randint(0, 5, (10,)).cuda()
weights = torch.randn(5)
loss_fn = torch.nn.CrossEntropyLoss(weight=weights)
loss = loss_fn(input=model_output, target=labels)
print(loss)
Traceback (most recent call last):
File "/home/zong/code/pytorch/../loss2.py", line 17, in <module>
loss = loss_fn(input=model_output, target=labels)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zong/code/pytorch/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zong/code/pytorch/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zong/code/pytorch/torch/nn/modules/loss.py", line 1297, in forward
return F.cross_entropy(
^^^^^^^^^^^^^^^^
File "/home/zong/code/pytorch/torch/nn/functional.py", line 3494, in cross_entropy
return torch._C._nn.cross_entropy_loss(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Expected all tensors to be on the same device, but got weight is on cpu, different from other tensors on cuda:0 (when checking argument in method wrapper_CUDA_nll_loss_forward)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150750
Approved by: https://github.com/malfet
As per title.
The following implementation removes the usage of `repeat_interleave, tile` and `full_coo_indices` and replaces them with broadcasting. That way we reduce memory traffic (and are likely to hit cache a lot) and the total number of launched kernels.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/142364
Approved by: https://github.com/amjames, https://github.com/cpuhrsch
This PR fixes a bug in `search_end_matrix_indices_cuda_kernel` causing an illegal memory access when calling `bmm_sparse_cuda` on a sparse matrix with no non-zero values in the first batch dimension. Reproducible example:
```py
import torch
ind = torch.tensor([[1], [0], [0]], device="cuda")
val = torch.tensor([1.], device="cuda")
A = torch.sparse_coo_tensor(ind, val, size=(2, 1, 1))
B = torch.zeros((2, 1, 1), device="cuda")
C = torch.bmm(A, B)
```
## Details
In the previous code, we may for example end up with the following situation:
```
i : indices_1D[i]
------------------------------------------
0 : 1 <- start_idx, mid_idx
1 : 1 <- end_idx
...
```
When `target_mat_num = 0`, the next iteration of the while loop will assign `-1` to `end_idx` and thus `(0 + (-1)) >> 1 = -1` to `mid_idx`, causing an access error on line 703. The updated code maintains the invariant `start_idx <= end_idx` and will not go out of bounds.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131977
Approved by: https://github.com/amjames, https://github.com/pearu, https://github.com/nikitaved
Update ruff to 0.4.1 .
This version fixes a lot false negatives/false positives, is 20-40% faster, and has various other bug fixes.
Below is a before and after table showing the execution time of ruff lint and ruff format in milliseconds courtesy of https://astral.sh/blog/ruff-v0.4.0
| Repository | Linter (v0.3) | Linter (v0.4) | Formatter (v0.3) | Formatter (v0.4) |
|----------------------------------------------------|---------------|---------------|------------------|------------------|
| [pytorch/pytorch](https://github.com/pytorch/pytorch) | 328.7 | 251.8 | 351.1 | 274.9 |
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124549
Approved by: https://github.com/ezyang
This PR enables `test_addmm_sizes_all_sparse_csr_k_*_n_*_m_*_cuda_complex128` for ROCm for trivial cases (m or n or k = 0)
CUSPARSE_SPMM_COMPLEX128_SUPPORTED also used for `test_addmm_all_sparse_csr` and ` test_sparse_matmul` and both of them are skipped for ROCm by `@skipIfRocm` or `@skipCUDAIf(not _check_cusparse_spgemm_available())`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120504
Approved by: https://github.com/jithunnair-amd, https://github.com/ezyang
Argument "device" was missed.
So, "test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_*_cuda" was always run on the default device ("cpu") if another default torch device was not configured before.
This fix will probably detect a number of issues on various devices which were previously missed.
Should fix failed rocm CI jobs with "##[error]The action has timed out." and speedup test execution
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111584
Approved by: https://github.com/soulitzer
Compared to #104848, this PR makes a step further: when the enable_sparse_support decorator is applied to `torch.autograd.gradcheck`, the resulting callable is equivalent to `torch.autograd.gradcheck` with an extra feature of supporting functions that can have input sparse tensors or/and can return sparse tensors.
At the same time, the underlying call to `torch.autograd.gradcheck` will operate on strided tensors only. This basically means that torch/autograd/gradcheck.py can be cleaned up by removing the code that deals with sparse tensors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107150
Approved by: https://github.com/albanD, https://github.com/amjames, https://github.com/cpuhrsch
ghstack dependencies: #107638, #107777
Resolves https://github.com/pytorch/pytorch/issues/107097
After this PR, instead of
```python
torch.sparse_coo_tensor(indices, values, size)._coalesced_(is_coalesced)
```
(that does not work in the autograd context, see #107097), use
```python
torch.sparse_coo_tensor(indices, values, size, is_coalesced=is_coalesced)
```
All sparse coo factory functions that take indices as input support the `is_coalesced` argument.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107638
Approved by: https://github.com/cpuhrsch
Update distutils.Version to packaging.version due to the deprecation warning.
```python
/root/Git.d/pytorch/pytorch/torch/testing/_internal/common_methods_invocations.py:17136: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
active_if=TEST_SCIPY and LooseVersion(scipy.__version__) < "1.4.0"),
/root/Git.d/pytorch/pytorch/torch/testing/_internal/common_methods_invocations.py:17138: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
active_if=TEST_SCIPY and LooseVersion(scipy.__version__) < "1.4.0"),
/root/Git.d/pytorch/pytorch/torch/testing/_internal/common_methods_invocations.py:17140: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
active_if=TEST_SCIPY and LooseVersion(scipy.__version__) < "1.4.0"),
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107207
Approved by: https://github.com/soulitzer