292 Commits

Author SHA1 Message Date
325b43661e add/add_ for compressed sparse inputs: bypass BLAS in some trivial cases (#95293)
In `add(self, other, out=...)` we can bypass calls to BLAS in cases when `self == other == out` and `self == other`.

This PR fixes the repro from https://github.com/pytorch/pytorch/issues/94966, but the issue is still present when `x.add_(x)` is replaced, say, with `x = x.clone().add_(x)`.
Could that be a synchronization issue? CC @IvanYashchuk .

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95293
Approved by: https://github.com/cpuhrsch
2023-02-27 16:06:02 +00:00
c620ece726 port sparse_mm.reduce to pytorch and optimize it on CPU (#83727)
### Motivation of this PR

This patch is to migrate `spmm_reduce` from `torch-sparse` (a 3rd party dependency for PyG) to `torch`, which is a response to the initial proposal for fusion of **Gather, Apply Scatter** in Message Passing of GNN inference/training. https://github.com/pytorch/pytorch/issues/71300

**GAS** is the major step for Message Passing, the behavior of **GAS** can be classified into 2 kinds depending on the storage type of `EdgeIndex` which records the connections of nodes:

* COO: the hotspot is `scatter_reduce`
* CSR: the hotspot is `spmm_reduce`

The reduce type can be choose from: "max", "mean", "max",  "min".

extend `torch.sparse.mm` with an `reduce` argument, maps to `torch.sparse_mm.reduce` internally.
`sparse_mm_reduce` is registered under the TensorTypeId of `SparseCsrCPU`, and this operator requires an internal interface `_sparse_mm_reduce_impl` which has dual outputs:
* `out` - the actual output
* `arg_out` - records output indices in the non zero elements if the reduce type is "max" or "min", this is only useful for training. So for inference, it will not be calculated.

### Performance

Benchmark on GCN for obgn-products on Xeon single socket, the workload is improved by `4.3x` with this patch.

Performance benefit for training will be bigger, the original backward impl for `sum|mean` is sequential; the original backward impl for `max|min` is not fused.

#### before:
```
-----------------------------  ------------  ------------  ------------  ------------  ------------  ------------
                         Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls
-----------------------------  ------------  ------------  ------------  ------------  ------------  ------------
       torch_sparse::spmm_sum        97.09%       56.086s        97.09%       56.088s        6.232s             9
                 aten::linear         0.00%      85.000us         1.38%     795.485ms      88.387ms             9
                 aten::matmul         0.00%      57.000us         1.38%     795.260ms      88.362ms             9
                     aten::mm         1.38%     795.201ms         1.38%     795.203ms      88.356ms             9
                   aten::relu         0.00%      50.000us         0.76%     440.434ms      73.406ms             6
              aten::clamp_min         0.76%     440.384ms         0.76%     440.384ms      73.397ms             6
                   aten::add_         0.57%     327.801ms         0.57%     327.801ms      36.422ms             9
            aten::log_softmax         0.00%      23.000us         0.10%      55.503ms      18.501ms             3
```

#### after
```
-----------------------------  ------------  ------------  ------------  ------------  ------------  ------------
                         Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls
-----------------------------  ------------  ------------  ------------  ------------  ------------  ------------
               aten::spmm_sum        87.35%       11.826s        87.36%       11.827s        1.314s             9
                 aten::linear         0.00%      92.000us         5.87%     794.451ms      88.272ms             9
                 aten::matmul         0.00%      62.000us         5.87%     794.208ms      88.245ms             9
                     aten::mm         5.87%     794.143ms         5.87%     794.146ms      88.238ms             9
                   aten::relu         0.00%      53.000us         3.35%     452.977ms      75.496ms             6
              aten::clamp_min         3.35%     452.924ms         3.35%     452.924ms      75.487ms             6
                   aten::add_         2.58%     348.663ms         2.58%     348.663ms      38.740ms             9
                 aten::argmax         0.42%      57.473ms         0.42%      57.475ms      14.369ms             4
            aten::log_softmax         0.00%      22.000us         0.39%      52.605ms      17.535ms             3
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83727
Approved by: https://github.com/jgong5, https://github.com/cpuhrsch, https://github.com/rusty1s, https://github.com/pearu
2023-02-10 15:56:40 +00:00
e1f17b3530 Add CSR->BSC and CSC->BSR conversions (#93301)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93301
Approved by: https://github.com/cpuhrsch
2023-02-07 19:22:05 +00:00
bb6af061a0 torch.triangular_solve for CSR: materialize diagonal elements when unitriangular=True. (#93352)
Fixes https://github.com/pytorch/pytorch/issues/88890

A temporary fix until MKL is fixed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93352
Approved by: https://github.com/cpuhrsch
2023-01-31 16:33:57 +00:00
53f7fb9a22 Add CSC->BSC conversion (#92307)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92307
Approved by: https://github.com/cpuhrsch
2023-01-30 17:03:36 +00:00
65d6802e2f Improve error messages for sparse methods on tensors with unsupported backends/layouts. (#93149)
Fixes https://github.com/pytorch/pytorch/issues/92790

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93149
Approved by: https://github.com/cpuhrsch
2023-01-27 19:50:23 +00:00
7012d985fa Revert "Improve bsr @ strided performance in baddmm for bfloat16/half with Triton kernels. (#88078)"
This reverts commit 46f16b93636615a81242b0d5cded84c5a57fd2e2.

Reverted https://github.com/pytorch/pytorch/pull/88078 on behalf of https://github.com/ZainRizvi due to Causing a test to fail consistently: test_decomp.py::HasDecompTest::test_has_decomposition
2023-01-26 16:22:29 +00:00
46f16b9363 Improve bsr @ strided performance in baddmm for bfloat16/half with Triton kernels. (#88078)
As per title.

Additionally we also introduce support for:
- Rectangular block sizes which are powers of 2 and at least 16 (triton's `dot` limitation).
- Batch support with broadcasting for either of the arguments.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88078
Approved by: https://github.com/cpuhrsch
2023-01-26 07:58:27 +00:00
0bf7506051 [CUDA] Drop CUDA < 11.0 test flags (#92605)
Follow-up of #89582 to drop flags like `CUDA11OrLater` in tests. Note that in some places it appears that `TEST_WITH_ROCM` is _implicitly_ guarded against via the `CUDA11OrLater` version check, based on my best-guess of how `torch.version.cuda` would behave in ROCM builds, so I've added `not TEST_WITH_ROCM` in cases where ROCM wasn't previously explicitly allowed.

CC @ptrblck @malfet @ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92605
Approved by: https://github.com/ngimel
2023-01-24 04:34:06 +00:00
0ab4ab9f8d [Dynamo] Fix calling UserDefinedObject.func should pass self object (#92050)
Fixes #90834

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92050
Approved by: https://github.com/jansel
2023-01-21 05:47:01 +00:00
60bf851931 Revert "Improve bsr @ strided performance in baddmm for bfloat16/half with Triton kernels. (#88078)"
This reverts commit 8383b5c488399f2ae295c7c0f993bdd353dfd75c.

Reverted https://github.com/pytorch/pytorch/pull/88078 on behalf of https://github.com/malfet due to This seems to have broke sm_86 testing, see https://hud.pytorch.org/hud/pytorch/pytorch/master/1?per_page=50&name_filter=sm86%20%2F%20test%20(default%2C%203
2023-01-19 23:37:59 +00:00
8383b5c488 Improve bsr @ strided performance in baddmm for bfloat16/half with Triton kernels. (#88078)
As per title.

Additionally we also introduce support for:
- Rectangular block sizes which are powers of 2 and at least 16 (triton's `dot` limitation).
- Batch support with broadcasting for either of the arguments.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88078
Approved by: https://github.com/cpuhrsch
2023-01-19 03:14:54 +00:00
89f1ad08b4 Revert "Improve bsr @ strided performance in baddmm for bfloat16/half with Triton kernels. (#88078)"
This reverts commit 7f256fff77c49729131aa6d092e60e891d0c4948.

Reverted https://github.com/pytorch/pytorch/pull/88078 on behalf of https://github.com/huydhn due to This breaks lint 7f256fff77
2023-01-17 22:14:37 +00:00
7f256fff77 Improve bsr @ strided performance in baddmm for bfloat16/half with Triton kernels. (#88078)
As per title.

Additionally we also introduce support for:
- Rectangular block sizes which are powers of 2 and at least 16 (triton's `dot` limitation).
- Batch support with broadcasting for either of the arguments.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88078
Approved by: https://github.com/cpuhrsch
2023-01-17 21:43:20 +00:00
b3e4f5029b Add check-sparse-tensor-invariants flag to Context - 2nd try. (#92094)
This PR is a copy of https://github.com/pytorch/pytorch/pull/90849 that merge was reverted.

The PR adds "check sparse tensor invariants" flag to Context that when enabled will trigger sparse tensor data invariants checks in unsafe methods of constructing sparse COO/CSR/CSC/BSR/BSC tensors. The feature includes the following changes to UI:

`torch.sparse.check_sparse_tensor_invariants` class provides different ways to enable/disable the invariant checking.

`torch.sparse_coo/csr/csc/bsr/bsc/compressed_tensor` functions have a new optional argument `check_invariants` to enable/disable the invariant checks explicitly. When the `check_invariants` argument is specified, the global state of the feature is temporarily overridden.

The PR fixes https://github.com/pytorch/pytorch/issues/90833

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92094
Approved by: https://github.com/cpuhrsch
2023-01-13 14:50:33 +00:00
3ab58fd5ed optimize sampled_addmm performance on CPU (SparseCSR) (#90978)
### Target and Background
This PR is improving the performance of `sampled_addmm` on CPU device. This is part of effort for improving PyG performance on CPU for GNN training/inference.

The current implementation is a reference design which converts `SparseCSR` tensor back to dense tensor and then do the addmm and convert back to `SparseCSR` again: this is going to be very slow and won't be able to run most of the datasets under https://github.com/snap-stanford/ogb (convert to dense would trigger `OOM`).

### Benchmarks

Right now we don't have any hands-on benchmark or workload to test this since this operator is not used in PyG yet. I fetched the dataset from `ogb-products` where:

* number of nodes: 2.4 * 10^6
* number of edges: 1.26 * 10^8
* number of features: 128

So if we store the **adjacency matrix** is dense, it is going to be 2.4 * 2.4 * 4 * 10^12 bytes, this will be OOB on current code. I abstract the first 1k rows to compare, **1100x** speedup:

CPU: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz, dual socket, 20 cores per socket.
```
### before: run 1000 rows from the whole dataset
sampled_addmm: running dataset ogb-products first 1000 rows: each iter takes 1212.000 ms!

### after: run 1000 rows from the whole dataset
sampled_addmm: running dataset ogb-products first 1000 rows: each iter takes 1.102 ms!

### after: run the whole dataset
sampled_addmm: running dataset ogb-products (the whole dataset) 2449029 rows: each iter takes 873.306 ms!
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90978
Approved by: https://github.com/pearu, https://github.com/cpuhrsch
2023-01-12 12:04:07 +00:00
c7a22bb7c7 Revert "Add check-sparse-tensor-invariants flag to Context. (#90849)"
This reverts commit b9a035c1c58630f3eef5242cb4849881b8376b39.

Reverted https://github.com/pytorch/pytorch/pull/90849 on behalf of https://github.com/DanilBaibak due to Break internal build
2023-01-12 09:58:16 +00:00
8612ec5b90 Implement hybrid sparse to/from dense conversions. (#90177)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90177
Approved by: https://github.com/cpuhrsch, https://github.com/pearu
2023-01-12 03:31:30 +00:00
c5836153f5 Revert "optimize sampled_addmm performance on CPU (SparseCSR) (#90978)"
This reverts commit 645fb217c06348a4f1ccdf68a93bd711f7158c62.

Reverted https://github.com/pytorch/pytorch/pull/90978 on behalf of https://github.com/seemethere due to This broke internal builds for android due to the new file added being missing in build_variables.bzl
2023-01-11 20:12:12 +00:00
b9a035c1c5 Add check-sparse-tensor-invariants flag to Context. (#90849)
This PR adds "check sparse tensor invariants" flag to Context that when enabled will trigger sparse tensor data invariants checks in unsafe methods of constructing sparse COO/CSR/CSC/BSR/BSC tensors. The feature includes the following changes to UI:

- `torch.enable_check_sparse_tensor_invariants` and `torch.is_check_sparse_tensor_invariants_enabled` functions to globally enable/disable the invariant checks and to retrieve the state of the feature, respectively
- `torch.sparse_coo/csr/csc/bsr/bsc/compressed_tensor` functions have a new optional argument `check_invariants` to enable/disable the invariant checks explicitly. When the `check_invariants` argument is specified, the global state of the feature is temporarily overridden.

The PR also fixes https://github.com/pytorch/pytorch/issues/90833

# Main issue

*The following content is outdated after merging the PRs in this ghstack but kept for the record.*

The importance of this feature is that when enabling the invariants checks by default, say, via

<details>

```
$ git diff
diff --git a/torch/__init__.py b/torch/__init__.py
index c8543057c7..19a91d0482 100644
--- a/torch/__init__.py
+++ b/torch/__init__.py
@@ -1239,3 +1239,8 @@ if 'TORCH_CUDA_SANITIZER' in os.environ:

 # Populate magic methods on SymInt and SymFloat
 import torch.fx.experimental.symbolic_shapes
+
+# temporarily enable sparse tensor arguments validation in unsafe
+# constructors:
+
+torch._C._set_check_sparse_tensor_invariants(True)
```

</details>

a massive number of test failures/errors occur in test_sparse_csr.py tests:
```
$ pytest -sv test/test_sparse_csr.py
<snip>
==== 4293 failed, 1557 passed, 237 skipped, 2744 errors in 69.71s (0:01:09) ====
```
that means that we are silently constructing sparse compressed tensors that do not satisfy the sparse tensor invariants. In particular, the following errors are raised:

```
AssertionError: "resize_as_sparse_compressed_tensor_: self and src must have the same layout" does not match "expected values to be a strided and contiguous tensor"

RuntimeError: CUDA error: device-side assert triggered

RuntimeError: `col_indices[..., crow_indices[..., i - 1]:crow_indices[..., i]] for all i = 1, ..., nrows are sorted and distinct along the last dimension values` is not satisfied.

RuntimeError: expected col_indices to be a strided and contiguous tensor

RuntimeError: expected row_indices to be a strided and contiguous tensor

RuntimeError: expected values to be a strided and contiguous tensor

RuntimeError: for_each: failed to synchronize: cudaErrorAssert: device-side assert triggered

RuntimeError: tensor dimensionality must be sum of batch, base, and dense dimensionalities (=0 + 2 + 0) but got 3
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90849
Approved by: https://github.com/amjames, https://github.com/cpuhrsch
2023-01-11 01:05:14 +00:00
645fb217c0 optimize sampled_addmm performance on CPU (SparseCSR) (#90978)
### Target and Background
This PR is improving the performance of `sampled_addmm` on CPU device. This is part of effort for improving PyG performance on CPU for GNN training/inference.

The current implementation is a reference design which converts `SparseCSR` tensor back to dense tensor and then do the addmm and convert back to `SparseCSR` again: this is going to be very slow and won't be able to run most of the datasets under https://github.com/snap-stanford/ogb (convert to dense would trigger `OOM`).

### Benchmarks

Right now we don't have any hands-on benchmark or workload to test this since this operator is not used in PyG yet. I fetched the dataset from `ogb-products` where:

* number of nodes: 2.4 * 10^6
* number of edges: 1.26 * 10^8
* number of features: 128

So if we store the **adjacency matrix** is dense, it is going to be 2.4 * 2.4 * 4 * 10^12 bytes, this will be OOB on current code. I abstract the first 1k rows to compare, **1100x** speedup:

CPU: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz, dual socket, 20 cores per socket.
```
### before: run 1000 rows from the whole dataset
sampled_addmm: running dataset ogb-products first 1000 rows: each iter takes 1212.000 ms!

### after: run 1000 rows from the whole dataset
sampled_addmm: running dataset ogb-products first 1000 rows: each iter takes 1.102 ms!

### after: run the whole dataset
sampled_addmm: running dataset ogb-products (the whole dataset) 2449029 rows: each iter takes 873.306 ms!
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90978
Approved by: https://github.com/pearu, https://github.com/cpuhrsch
2023-01-10 22:13:35 +00:00
cdc30048e5 Fix numel() result after resizing a sparse compressed tensor. (#91831)
Fixes #91830

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91831
Approved by: https://github.com/cpuhrsch
2023-01-10 18:21:07 +00:00
b797a24259 Support indices contiguity per batch and non-contiguous values in sparse compressed tensors (#91243)
Fixes https://github.com/pytorch/pytorch/issues/91062

With this PR, all reported failures in https://github.com/pytorch/pytorch/pull/90849 are resolved (modulo test_bmm that uses an unorthodox way to construct a batch CSR tensor).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91243
Approved by: https://github.com/nikitaved, https://github.com/amjames, https://github.com/lezcano
2023-01-02 18:08:46 +00:00
08a47549af Rename Tensor._storage to Tensor.untyped_storage and update docs (#91414)
Fixes #89224

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91414
Approved by: https://github.com/ezyang
2022-12-28 19:21:34 +00:00
4c5928e387 Fix for mul(compressed, wrapped scalar) (#91239)
Fixes https://github.com/pytorch/pytorch/issues/90819.

The path with `Scalar` should have been picked up by the dispatcher, but still the path with a 0-dim wrapped scalar was broken.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91239
Approved by: https://github.com/pearu, https://github.com/cpuhrsch
2022-12-22 13:11:13 +00:00
01e7f46215 Ensure sorted indices from the CSR->BSR conversion (#90918)
Fixes https://github.com/pytorch/pytorch/issues/90910

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90918
Approved by: https://github.com/cpuhrsch
2022-12-16 15:49:48 +00:00
c2c14f9597 Sparse compressed mm: fix for orthogonal inputs (#90917)
Fixes https://github.com/pytorch/pytorch/issues/90836
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90917
Approved by: https://github.com/cpuhrsch
2022-12-16 13:08:00 +00:00
4dd3de23dd Sparse compressed mm: fix for empty inputs (#90763)
Fixes [#90693
](https://github.com/pytorch/pytorch/issues/90693)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90763
Approved by: https://github.com/cpuhrsch
2022-12-16 12:33:57 +00:00
76c6dfeaa6 Add layout and blocksize arguments to Tensor.to_sparse method (#89502)
This PR extends the `Tensor.to_sparse()` method to `Tensor.to_sparse(layout=None, blocksize=None)` in a BC manner (`layout=None` means `layout=torch.sparse_coo`).

In addition, the PR adds support for the following conversions:
- non-hybrid/hybrid COO tensor to CSR or CSC or a COO tensor
- short, bool, byte, char, bfloat16, int, long, half CSR tensor to a BSR tensor

and fixes the following conversions:
- hybrid COO to COO tensor
- non-batch/batch hybrid BSR to BSR or BSC tensor

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89502
Approved by: https://github.com/amjames, https://github.com/cpuhrsch
2022-11-30 20:21:10 +00:00
296e1ba4d0 Row and column select support for block compressed sparse tensors (#88733)
As in the title:

- Support `select` and `select_copy` on block sparse compressed tensors
- Fixes incorrect results when selecting dense dimensions

The PR also improves the performance of indexing sparse compressed tensors considerably:

<details>

Before:

```python
In [3]: a=torch.rand((1000, 1000)).to_sparse_csr()

In [4]: %timeit a.select(0, 0)
606 µs ± 4.27 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [5]: %timeit a.select(1, 0)
527 µs ± 57.7 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [6]: %timeit a[0, 0]
617 µs ± 3.74 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [7]: a = a.cuda()

In [8]: %timeit a.select(0, 0); torch.cuda.synchronize();
1.19 ms ± 137 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [9]: %timeit a.select(1, 0); torch.cuda.synchronize();
1.2 ms ± 119 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [10]: %timeit a[0, 0]; torch.cuda.synchronize();
1.23 ms ± 482 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

This PR:

```python
In [3]: a=torch.rand((1000, 1000)).to_sparse_csr()

In [4]: %timeit a.select(0, 0)
4.75 µs ± 8.94 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [5]: %timeit a.select(1, 0)
565 µs ± 156 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [6]: %timeit a[0, 0]
13.1 µs ± 435 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [7]: a = a.cuda()

In [8]: %timeit a.select(0, 0); torch.cuda.synchronize();
21.6 µs ± 23.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [9]: %timeit a.select(1, 0); torch.cuda.synchronize();
1.15 ms ± 3.13 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [10]: %timeit a[0, 0]; torch.cuda.synchronize();
63.7 µs ± 2.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
```

</details>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88733
Approved by: https://github.com/nikitaved, https://github.com/amjames, https://github.com/cpuhrsch
2022-11-30 11:15:56 +00:00
90bed8874f Generator of tensor inputs with variable layout and structure (batch/non-batch, hybrid/non-hybrid, block/non-block) (#88914)
This PR introduces `TestCase.generate_simple_inputs` method that is an improved and generalized version of the `TestSparseCompressed._generate_small_inputs` method.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88914
Approved by: https://github.com/cpuhrsch
2022-11-30 02:13:33 +00:00
50e2e4faf3 Sparse CSC/BSR/BSC serialization and pickle support (#89553)
Fixes https://github.com/pytorch/pytorch/issues/89497

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89553
Approved by: https://github.com/cpuhrsch
2022-11-23 20:56:48 +00:00
a41f70603a Round out rad2deg sparse support (#88442)
- Add sparse coo dispatch
- Modify backward to work with sparse compressed layouts
- Enable sparse_compressed autograd testing
- Correct layout support attributes on OpInfo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88442
Approved by: https://github.com/cpuhrsch
2022-11-17 06:00:23 +00:00
8dc3353b0b add to(dtype) support for all sparse compressed formats (#89055)
Fixes [#88419](https://github.com/pytorch/pytorch/issues/88419)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89055
Approved by: https://github.com/cpuhrsch
2022-11-15 21:16:18 +00:00
03296844aa Fix typos in messages under aten (#88964)
This PR fixes typos of messages and parms in c++ source files under `aten` directory.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88964
Approved by: https://github.com/lezcano
2022-11-14 09:50:50 +00:00
ff6770a9a1 enable backward for log1p (sparse layouts) (#88155)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88155
Approved by: https://github.com/cpuhrsch
2022-11-04 20:59:26 +00:00
6938dd0b2c Support sparse inputs to deg2rad (#88156)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88156
Approved by: https://github.com/cpuhrsch
2022-11-04 20:59:26 +00:00
1964d8c34f Enable sparse_csr autograd testing for relu (#88154)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88154
Approved by: https://github.com/cpuhrsch
2022-11-04 20:59:23 +00:00
f03302ba49 Add sparse layout support for torch.frac (#88153)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88153
Approved by: https://github.com/cpuhrsch
2022-11-04 20:59:22 +00:00
b2dfd20260 Remove BSC conversion skip from TestSparseCompressed.test_consistency (#88152)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88152
Approved by: https://github.com/cpuhrsch
2022-11-01 22:18:56 +00:00
d044b4cc58 Update torch.abs and torch.positive opinfos to reflect sparse support (#88151)
cc @nikitaved @pearu @cpuhrsch @bhosmer
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88151
Approved by: https://github.com/cpuhrsch
2022-11-01 22:18:56 +00:00
51ea441862 Upcast to fp32 in test_addmm_block ref_half_bfloat16 (#86682)
Fixes https://github.com/pytorch/pytorch/issues/86681
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86682
Approved by: https://github.com/nikitaved
2022-10-11 16:39:57 +00:00
e15a48def7 (bsr/csr) x dense mm (#85551)
As per title. This implementation is not the most optimal and could be improved albeit with native kernels (i.e. block matching need not be materialized).

Compared to existing kernels it offers:

- Half float support (In fact, any dtype that supports `matmul` will work).
- Arbitrary block sizes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85551
Approved by: https://github.com/amjames, https://github.com/cpuhrsch
2022-09-29 17:12:04 +00:00
8a926b3187 Enable CSC @ CSC addmm (#85379)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85379
Approved by: https://github.com/pearu, https://github.com/cpuhrsch
2022-09-27 19:49:31 +00:00
bb5001ce3d Enable dense x bsc mm/addmm (#85308)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85308
Approved by: https://github.com/pearu
2022-09-27 19:49:31 +00:00
aaef5d8f2c sparse mm/addmm enable dense x csc, csc x dense and simplify layout check logic. (#85307)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85307
Approved by: https://github.com/pearu, https://github.com/cpuhrsch
2022-09-27 16:46:28 +00:00
f64857189d resize_as_sparse support all compressed layouts (#85378)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85378
Approved by: https://github.com/pearu, https://github.com/cpuhrsch
2022-09-27 06:59:18 +00:00
686555b663 [maskedtensor] port torch/_masked into torch/masked (#85515)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85515
Approved by: https://github.com/cpuhrsch
2022-09-26 23:41:13 +00:00
a4c94f0739 Fix cuda issue with sparse.sampled_addmm (#85194)
fixes https://github.com/pytorch/pytorch/issues/85169

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85194
Approved by: https://github.com/amjames, https://github.com/nikitaved
2022-09-23 20:52:23 +00:00
0278a141fc csr <-> csc, csc <-> csc, bsr <-> bsc, bsc <-> bsc, bsr <-> bsr conversions (#85091)
As per title. Required to enable a wider selection of sparse formats for `nn.functional.linear`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85091
Approved by: https://github.com/amjames, https://github.com/cpuhrsch
2022-09-21 20:10:26 +00:00