Summary:
Both cuSPASRELt and CUTLASS support 1:2 semi-structured sparsity for
fp32, which this PR enables.(thanks @alexsamardzic).
Furthermore, this PR also updates the sparse_config to take into account
the different shape constraints for sparse and dense matrices.
Technically, cuSPARSELt supports smaller sparse matrix constraints as it
seens to pad to the CUTLASS constraints under the hood. However, in
practice small sparse matrices are not commonly used and we care more
about the dense constraints for LLM inference.
For now, we keep the CUTLASS constraints in place for both cuSPARSELt
and CUTLASS tensors
This PR also reconnects the _FUSE_TRANSPOSE flag for cuSPARSELt tensors.
Test Plan:
```
python test/test_sparse_semi_structured.py
```
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115550
Approved by: https://github.com/cpuhrsch
As in the title.
In addition:
- improve the algorithm for finding a minima of operation timings: break the inner loop early when a next minima candidate is found
- add tests and fix bugs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115499
Approved by: https://github.com/cpuhrsch
The `bsr_dense_addmm` triton kernel introduced in https://github.com/pytorch/pytorch/pull/114595 is a generalization of `bsr_dense_mm` triton kernel and a more efficient version of it because it uses an extra kernel parameter `SPLIT_N` that has notable effect to performance for r.h.s operand with a larger number of columns.
This PR eliminates the `bsr_dense_mm` triton kernel in favor of using `bsr_dense_addmm` triton kernel.
The performance increase of `bsr_dense_mm` is as follows (float16, `NVIDIA A100-SXM4-80GB`):
- with 16x16 blocks, the average/maximal speed up is 50/71 %
- with 32x32 blocks, the average/maximal speed up is 30/63 %
- with 64x64 blocks, the average/maximal speed up is 12/26 %
- with 128x128 blocks, the average/maximal speed up is 7/17 %
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115030
Approved by: https://github.com/cpuhrsch
Continuation of #112185, following the design in this [doc](https://docs.google.com/document/d/1ipSxcTzEMMOAPvxP-YJlD5JBZZmIGgh8Q34ixtOUCRo).
Summary:
* Introduce `SubclassSymbolicPolicy` containing separate dynamic dim / constraint policies for the outer and inner tensors
* Expand the automatic dynamic algorithm to recurse into inner tensors and produce one of these for a subclass instance
* Maintain legacy behavior for subclasses by recursively calling `mark_dynamic()` on inner tensors *of the same dim as outer* when `mark_dynamic(outer, ...)` is called
* Addresses this: 6a86cf00ad/torch/_dynamo/variables/builder.py (L1750)
* Add `outer_size` and `outer_stride` arguments to `__tensor_unflatten__()` so that you can find out what symbols were allocated for the outer size / stride (you are expected to return a tensor that compares equal to the outer symbols)
* Signatures now:
```python
# attrs is a list of inner tensor attributes on x; inner_tensor = getattr(x, attr)
# ctx is anything useful for rebuilding the class we want to guard on
attrs, ctx = x.__tensor_flatten__()
...
# inner_tensors is a dict of {attr -> tensor}
# ctx is taken unmodified from flattening and (eventually) guarded on
# outer_size is the expected size of the output; possibly symbolic
# outer_stride is the expected strides of the output; possibly symbolic
y = MySubclass.__tensor_unflatten__(inner_tensors, ctx, outer_size, outer_stride)
# at the __tensor_unflatten__() call-site in PT2, we assert y.shape == outer_size and y.stride() == outer_stride
# the assert simplifies symbols when there are relationships between outer and inner symbols
```
* Size info needed for `NestedTensor` at least, stride info needed for `DTensor` at least
* Punting on `outer_storage_offset` because storage_offset handling is horribly broken in PT2 right now
* ~~Add new `__tensor_mark_dynamic__()` to allow overriding the behavior of mark_dynamic on a per-subclass basis~~ (booted to future work)
* ~~Add guards for tensor subclasses by calling `__tensor_flatten__()` in the guard to test equality on `ctx`~~
* Now handled in #114469
* Next PR: add TENSOR_MATCH guards on inner tensors
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114311
Approved by: https://github.com/ezyang, https://github.com/drisspg, https://github.com/voznesenskym, https://github.com/bdhirsh
Summary:
This PR adds in support for passing in a alpha Tensor, which represents
a tensor of alpha values to fuse into the matmul.
```
cusparselt_sparse_mm = alpha A @ B + bias
```
This operation is necessary for quantization, where we would like to
fuse one of the dequant matmuls into the sparse op.
Test Plan:
```
python test/test_sparse_semi_structured -k alpha
```
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112056
Approved by: https://github.com/cpuhrsch
As in the title.
In addition, this PR fixes a bug in `bsr_dense_mm` and `bsr_dense_addmm` return value handling where computations are performed on `make_triton_contiguous` return value while `bsr_dense_mm`/`bsr_dense_addmm` return a tensor that is an input to `make_triton_contiguous`. If `make_triton_contiguous` makes a copy of the input, the return values of `bsr_dense_mm`/`bsr_dense_addmm` will contain garbage.
The PR increases the performance of nn.linear as follows (float32, `NVIDIA A100-SXM4-80GB`):
- with 16x16 blocks, the average/maximal speed up is 67/78 %
- with 32x32 blocks, the average/maximal speed up is 72/79 %
- with 64x64 blocks, the average/maximal speed up is 71/79 %
- with 128x128 blocks, the average/maximal speed up is 62/76 %
The performance increase is illustrated also by the following sparsity-speedup graphs (before and after this PR):
<img src="https://github.com/pytorch/pytorch/assets/402156/55ce0bf7-8ef2-47ab-99e8-8878f159037d" width="48%"> <img src="https://github.com/pytorch/pytorch/assets/402156/df256175-a594-4bd7-b244-90867fb9a45e" width="48%">
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114757
Approved by: https://github.com/cpuhrsch
As in the title.
The `bsr_dense_addmm` kernel implemented in this PR is a generalization of `bsr_dense_mm` in the following respects (in addition of having input, beta, and alpha parameters):
- it implements `SPLIT_N` kernel parameter that enables efficient kernel launches in the case of wide inputs. For instance, the timing of nn.linear with 256x256 BSR weights having 16x16 blocks and 256x131072 strided input reduced about 16x (this corresponds to the 94 % speed up value listed below).
- it supports rectangular blocks in sparse BSR tensor weights
The performance increase of nn.linear is as follows (float16, `NVIDIA A100-SXM4-80GB`):
- with 16x16 blocks, the average/maximal speed up is 55/94 %
- with 32x32 blocks, the average/maximal speed up is 33/63 %
- with 64x64 blocks, the average/maximal speed up is 23/42 %
- with 128x128 blocks, the average/maximal speed up is 15/39 %
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114595
Approved by: https://github.com/cpuhrsch
This PR:
- updates `torch/sparse/_triton_ops_meta.py` for the API change in `triton.testing.do_bench`
- force `num_stages` to be 1 when blocksize is 128x128 to avoid out of resources exception when `bsr_dense_mm` is called from `nn.linear`.
- as in the title. The performance of `nn.linear` on BSR tensor weights (dtypes `float16` and `bfloat16`) is increased as follows (`NVIDIA A100-SXM4-80GB`):
- for blocksize 16x16, the average/maximum speed up is about 11/20 %
- for blocksize 32x32, the average/maximum speed up is about 15/24 %
- for blocksize 64x64, the average/maximum speed up is about 18/26 %
- for blocksize 128x128, the average/maximum speed up is about 15/28 %
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114026
Approved by: https://github.com/cpuhrsch
As in the title.
This PR is a follow-up to PR https://github.com/pytorch/pytorch/pull/112737 to address bfloat16 and float32 dtype cases. The performance increase is as follows (`NVIDIA A100-SXM4-80GB`):
- bsr_scatter_mm and bfloat16
- for blocksize 16x16, the average/maximum speed up is about 29/75 %.
- for blocksize 32x32, the average/maximum speed up is about 23/58 %.
- for blocksize 64x64, the average/maximum speed up is about 27/66 %.
- for blocksize 128x128, the average/maximum speed up is about 33/72 %.
- bsr_dense_mm and bfloat16
- for blocksize 16x16, the average/maximum speed up is about 47/61 %.
- for blocksize 32x32, the average/maximum speed up is about 29/43 %.
- for blocksize 64x64, the average/maximum speed up is about 21/41 %.
- for blocksize 128x128, the average/maximum speed up is about 12/29 %.
- bsr_dense_mm and float32
- for blocksize 16x16, the average/maximum speed up is about 35/49 %.
- for blocksize 32x32, the average/maximum speed up is about 2/5 %.
- for blocksize 64x64, the average/maximum speed up is about 2/21 %.
- for blocksize 128x128, the average/maximum speed up is about 79/84 %.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113553
Approved by: https://github.com/cpuhrsch
Fixes#113194
docstrings updated.
Here are the outputs with the number before and after:-
1) torch/sparse/__init__.py
Before:
```
/home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:1 at module level:
D104: Missing docstring in public package
/home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:183 in public function `sum`:
D205: 1 blank line required between summary line and description (found 0)
/home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:183 in public function `sum`:
D400: First line should end with a period (not 'n')
/home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:183 in public function `sum`:
D401: First line should be in imperative mood (perhaps 'Return', not 'Returns')
/home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:391 in public class `check_sparse_tensor_invariants`:
D207: Docstring is under-indented
/home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:436 in public method `is_enabled`:
D207: Docstring is under-indented
/home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:436 in public method `is_enabled`:
D401: First line should be in imperative mood (perhaps 'Return', not 'Returns')
/home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:448 in public method `enable`:
D207: Docstring is under-indented
/home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:468 in public method `disable`:
D207: Docstring is under-indented
/home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:475 in public method `__init__`:
D107: Missing docstring in __init__
/home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:479 in public method `__enter__`:
D105: Missing docstring in magic method
/home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:486 in public method `__exit__`:
D105: Missing docstring in magic method
/home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:492 in public method `__call__`:
D102: Missing docstring in public method
/home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:502 in public function `as_sparse_gradcheck`:
D205: 1 blank line required between summary line and description (found 0)
/home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:502 in public function `as_sparse_gradcheck`:
D400: First line should end with a period (not 'l')
/home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:502 in public function `as_sparse_gradcheck`:
D401: First line should be in imperative mood (perhaps 'Decorate', not 'Decorator')
/home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:518 in private nested function `gradcheck_with_sparse_support`:
D205: 1 blank line required between summary line and description (found 0)
/home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:518 in private nested function `gradcheck_with_sparse_support`:
D400: First line should end with a period (not 's')
/home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:518 in private nested function `gradcheck_with_sparse_support`:
D401: First line should be in imperative mood; try rephrasing (found 'Same')
/home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:528 in private nested function `convert_to_strided_representation`:
D205: 1 blank line required between summary line and description (found 0)
/home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:528 in private nested function `convert_to_strided_representation`:
D400: First line should end with a period (not 'n')
/home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:559 in private nested function `restore_from_strided_representation`:
D205: 1 blank line required between summary line and description (found 0)
/home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:559 in private nested function `restore_from_strided_representation`:
D400: First line should end with a period (not 'd')
23
```
After:
```
/home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:1 at module level:
D104: Missing docstring in public package
/home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:476 in public method `__init__`:
D107: Missing docstring in __init__
/home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:480 in public method `__enter__`:
D105: Missing docstring in magic method
/home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:487 in public method `__exit__`:
D105: Missing docstring in magic method
/home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:493 in public method `__call__`:
D102: Missing docstring in public method
5
```
2) torch/contrib/_tensorboard_vis.py
Before:
```
/home/ubuntu/Desktop/Docathon/pytorch/torch/contrib/_tensorboard_vis.py:21 in public function `dump_tensorboard_summary`:
D103: Missing docstring in public function
/home/ubuntu/Desktop/Docathon/pytorch/torch/contrib/_tensorboard_vis.py:54 in public function `visualize_graph_executor`:
D401: First line should be in imperative mood (perhaps 'Append', not 'Appends')
2
```
After:
```
/home/ubuntu/Desktop/Docathon/pytorch/torch/contrib/_tensorboard_vis.py:21 in public function `dump_tensorboard_summary`:
D103: Missing docstring in public function
1
```
3) torch/jit/_state.py
Before:
```
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_state.py:1 at module level:
D400: First line should end with a period (not 'e')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_state.py:20 in public method `__init__`:
D107: Missing docstring in __init__
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_state.py:25 in public method `parse_env`:
D102: Missing docstring in public method
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_state.py:41 in public method `__bool__`:
D105: Missing docstring in magic method
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_state.py:48 in public function `disable`:
D103: Missing docstring in public function
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_state.py:52 in public function `enable`:
D103: Missing docstring in public function
6
```
After:
```
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_state.py:20 in public method `__init__`:
D107: Missing docstring in __init__
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_state.py:25 in public method `parse_env`:
D102: Missing docstring in public method
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_state.py:41 in public method `__bool__`:
D105: Missing docstring in magic method
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_state.py:48 in public function `disable`:
D103: Missing docstring in public function
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_state.py:52 in public function `enable`:
D103: Missing docstring in public function
5
```
4) torch/jit/_monkeytype_config.py
Before:
```
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:27 in public function `is_torch_native_class`:
D103: Missing docstring in public function
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:40 in public function `get_type`:
D200: One-line docstring should fit on one line with quotes (found 3)
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:40 in public function `get_type`:
D401: First line should be in imperative mood; try rephrasing (found 'Helper')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:62 in public function `get_optional_of_element_type`:
D205: 1 blank line required between summary line and description (found 0)
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:62 in public function `get_optional_of_element_type`:
D400: First line should end with a period (not 'l')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:62 in public function `get_optional_of_element_type`:
D401: First line should be in imperative mood; try rephrasing (found 'Helper')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:75 in public function `get_qualified_name`:
D103: Missing docstring in public function
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:84 in public method `__init__`:
D107: Missing docstring in __init__
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:87 in public method `log`:
D102: Missing docstring in public method
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:90 in public class `JitTypeTraceStore`:
D101: Missing docstring in public class
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:91 in public method `__init__`:
D107: Missing docstring in __init__
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:98 in public method `add`:
D102: Missing docstring in public method
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:103 in public method `filter`:
D102: Missing docstring in public method
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:111 in public method `analyze`:
D102: Missing docstring in public method
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:122 in public method `consolidate_types`:
D102: Missing docstring in public method
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:139 in public method `get_args_types`:
D102: Missing docstring in public method
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:142 in public class `JitTypeTraceConfig`:
D101: Missing docstring in public class
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:143 in public method `__init__`:
D107: Missing docstring in __init__
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:148 in public method `trace_logger`:
D205: 1 blank line required between summary line and description (found 0)
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:148 in public method `trace_logger`:
D400: First line should end with a period (not 'd')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:148 in public method `trace_logger`:
D401: First line should be in imperative mood (perhaps 'Return', not 'Returns')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:154 in public method `trace_store`:
D102: Missing docstring in public method
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:157 in public method `code_filter`:
D102: Missing docstring in public method
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:163 in public class `JitTypeTraceStoreLogger`:
D101: Missing docstring in public class
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:164 in public method `__init__`:
D107: Missing docstring in __init__
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:167 in public class `JitTypeTraceStore`:
D101: Missing docstring in public class
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:168 in public method `__init__`:
D107: Missing docstring in __init__
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:171 in public class `JitTypeTraceConfig`:
D101: Missing docstring in public class
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:172 in public method `__init__`:
D107: Missing docstring in __init__
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:179 in public function `jit_code_filter`:
D205: 1 blank line required between summary line and description (found 0)
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:179 in public function `jit_code_filter`:
D401: First line should be in imperative mood; try rephrasing (found 'Custom')
31
```
After:
```
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:27 in public function `is_torch_native_class`:
D103: Missing docstring in public function
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:74 in public function `get_qualified_name`:
D103: Missing docstring in public function
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:83 in public method `__init__`:
D107: Missing docstring in __init__
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:86 in public method `log`:
D102: Missing docstring in public method
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:89 in public class `JitTypeTraceStore`:
D101: Missing docstring in public class
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:90 in public method `__init__`:
D107: Missing docstring in __init__
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:97 in public method `add`:
D102: Missing docstring in public method
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:102 in public method `filter`:
D102: Missing docstring in public method
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:110 in public method `analyze`:
D102: Missing docstring in public method
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:121 in public method `consolidate_types`:
D102: Missing docstring in public method
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:138 in public method `get_args_types`:
D102: Missing docstring in public method
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:141 in public class `JitTypeTraceConfig`:
D101: Missing docstring in public class
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:142 in public method `__init__`:
D107: Missing docstring in __init__
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:150 in public method `trace_store`:
D102: Missing docstring in public method
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:153 in public method `code_filter`:
D102: Missing docstring in public method
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:159 in public class `JitTypeTraceStoreLogger`:
D101: Missing docstring in public class
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:160 in public method `__init__`:
D107: Missing docstring in __init__
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:163 in public class `JitTypeTraceStore`:
D101: Missing docstring in public class
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:164 in public method `__init__`:
D107: Missing docstring in __init__
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:167 in public class `JitTypeTraceConfig`:
D101: Missing docstring in public class
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:168 in public method `__init__`:
D107: Missing docstring in __init__
21
```
5) torch/jit/_fuser.py
Before:
```
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_fuser.py:9 in public function `optimized_execution`:
D205: 1 blank line required between summary line and description (found 0)
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_fuser.py:9 in public function `optimized_execution`:
D400: First line should end with a period (not 'n')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_fuser.py:9 in public function `optimized_execution`:
D401: First line should be in imperative mood; try rephrasing (found 'A')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_fuser.py:23 in public function `fuser`:
D205: 1 blank line required between summary line and description (found 0)
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_fuser.py:23 in public function `fuser`:
D400: First line should end with a period (not 'n')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_fuser.py:23 in public function `fuser`:
D401: First line should be in imperative mood; try rephrasing (found 'A')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_fuser.py:136 in public function `set_fusion_strategy`:
D401: First line should be in imperative mood (perhaps 'Set', not 'Sets')
7
```
After:
```
0
```
6) torch/jit/_async.py
Before:
```
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_async.py:1 at module level:
D205: 1 blank line required between summary line and description (found 0)
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_async.py:1 at module level:
D400: First line should end with a period (not 'I')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_async.py:20 in public function `fork`:
D205: 1 blank line required between summary line and description (found 0)
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_async.py:20 in public function `fork`:
D400: First line should end with a period (not 'e')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_async.py:20 in public function `fork`:
D401: First line should be in imperative mood (perhaps 'Create', not 'Creates')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_async.py:88 in public function `wait`:
D205: 1 blank line required between summary line and description (found 0)
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_async.py:88 in public function `wait`:
D400: First line should end with a period (not 'e')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_async.py:88 in public function `wait`:
D401: First line should be in imperative mood (perhaps 'Force', not 'Forces')
8
```
After:
```
0
```
7) torch/jit/_await.py
Before:
```
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_await.py:11 in private function `_awaitable`:
D205: 1 blank line required between summary line and description (found 0)
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_await.py:11 in private function `_awaitable`:
D400: First line should end with a period (not ',')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_await.py:11 in private function `_awaitable`:
D401: First line should be in imperative mood (perhaps 'Create', not 'Creates')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_await.py:19 in private function `_awaitable_wait`:
D205: 1 blank line required between summary line and description (found 0)
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_await.py:19 in private function `_awaitable_wait`:
D400: First line should end with a period (not ',')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_await.py:19 in private function `_awaitable_wait`:
D401: First line should be in imperative mood (perhaps 'Request', not 'Requests')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_await.py:27 in private function `_awaitable_nowait`:
D200: One-line docstring should fit on one line with quotes (found 3)
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_await.py:27 in private function `_awaitable_nowait`:
D401: First line should be in imperative mood (perhaps 'Create', not 'Creates')
8
```
After:
```
0
```
8) torch/jit/_check.py
Before:
```
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_check.py:10 in public class `AttributeTypeIsSupportedChecker`:
D205: 1 blank line required between summary line and description (found 0)
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_check.py:10 in public class `AttributeTypeIsSupportedChecker`:
D400: First line should end with a period (not 'e')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_check.py:10 in public class `AttributeTypeIsSupportedChecker`:
D412: No blank lines allowed between a section header and its content ('Example')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_check.py:61 in public method `check`:
D102: Missing docstring in public method
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_check.py:110 in public method `visit_Assign`:
D205: 1 blank line required between summary line and description (found 0)
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_check.py:110 in public method `visit_Assign`:
D400: First line should end with a period (not 'n')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_check.py:132 in public method `visit_AnnAssign`:
D205: 1 blank line required between summary line and description (found 0)
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_check.py:132 in public method `visit_AnnAssign`:
D400: First line should end with a period (not '`')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_check.py:187 in public method `visit_Call`:
D205: 1 blank line required between summary line and description (found 0)
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_check.py:187 in public method `visit_Call`:
D400: First line should end with a period (not '`')
10
```
After:
```
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_check.py:58 in public method `check`:
D102: Missing docstring in public method
1
```
9) torch/jit/_freeze.py
Before:
```
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_freeze.py:1 at module level:
D400: First line should end with a period (not 'g')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_freeze.py:16 in public function `freeze`:
D205: 1 blank line required between summary line and description (found 0)
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_freeze.py:16 in public function `freeze`:
D400: First line should end with a period (not 'd')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_freeze.py:127 in public function `run_frozen_optimizations`:
D205: 1 blank line required between summary line and description (found 0)
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_freeze.py:127 in public function `run_frozen_optimizations`:
D401: First line should be in imperative mood (perhaps 'Run', not 'Runs')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_freeze.py:182 in public function `optimize_for_inference`:
D205: 1 blank line required between summary line and description (found 0)
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_freeze.py:182 in public function `optimize_for_inference`:
D400: First line should end with a period (not 'e')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_freeze.py:182 in public function `optimize_for_inference`:
D401: First line should be in imperative mood (perhaps 'Perform', not 'Performs')
8
```
After:
```
0
```
10) torch/jit/_recursive.py
Before:
```
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:69 in public function `make_stub`:
D103: Missing docstring in public function
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:75 in public function `make_stub_from_method`:
D103: Missing docstring in public function
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:90 in public function `make_stubs_from_exported_methods`:
D103: Missing docstring in public function
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:103 in public function `jit_ignored_properties`:
D103: Missing docstring in public function
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:155 in public class `SourceContext`:
D101: Missing docstring in public class
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:156 in public method `__init__`:
D107: Missing docstring in __init__
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:160 in public function `get_annotations`:
D103: Missing docstring in public function
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:186 in public function `infer_concrete_type_builder`:
D205: 1 blank line required between summary line and description (found 0)
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:186 in public function `infer_concrete_type_builder`:
D400: First line should end with a period (not 's')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:423 in public class `ConcreteTypeStore`:
D101: Missing docstring in public class
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:427 in public method `__init__`:
D107: Missing docstring in __init__
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:434 in public method `get_or_create_concrete_type`:
D205: 1 blank line required between summary line and description (found 0)
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:434 in public method `get_or_create_concrete_type`:
D400: First line should end with a period (not 'T')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:459 in public function `create_methods_and_properties_from_stubs`:
D103: Missing docstring in public function
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:474 in public function `create_hooks_from_stubs`:
D103: Missing docstring in public function
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:485 in public function `get_module_concrete_type`:
D205: 1 blank line required between summary line and description (found 0)
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:485 in public function `get_module_concrete_type`:
D400: First line should end with a period (not 'e')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:485 in public function `get_module_concrete_type`:
D401: First line should be in imperative mood (perhaps 'Get', not 'Gets')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:539 in public function `create_script_module`:
D400: First line should end with a period (not 'e')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:539 in public function `create_script_module`:
D401: First line should be in imperative mood (perhaps 'Create', not 'Creates')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:725 in public function `script_model_defines_attr`:
D103: Missing docstring in public function
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:735 in public function `add_python_attr_to_scripted_model`:
D103: Missing docstring in public function
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:740 in public function `get_overload_annotations`:
D103: Missing docstring in public function
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:772 in public function `get_overload_name_mapping`:
D103: Missing docstring in public function
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:797 in public function `make_stubs_for_overloads`:
D103: Missing docstring in public function
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:816 in public function `check_module_initialized`:
D103: Missing docstring in public function
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:842 in public function `infer_methods_to_compile`:
D205: 1 blank line required between summary line and description (found 0)
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:842 in public function `infer_methods_to_compile`:
D400: First line should end with a period (not 'g')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:842 in public function `infer_methods_to_compile`:
D401: First line should be in imperative mood (perhaps 'Implement', not 'Implements')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:904 in public function `get_hook_stubs`:
D200: One-line docstring should fit on one line with quotes (found 3)
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:904 in public function `get_hook_stubs`:
D400: First line should end with a period (not 's')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:904 in public function `get_hook_stubs`:
D401: First line should be in imperative mood (perhaps 'Return', not 'Returns')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:940 in public function `get_property_stubs`:
D205: 1 blank line required between summary line and description (found 0)
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:940 in public function `get_property_stubs`:
D400: First line should end with a period (not 'd')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:963 in public function `interface_script`:
D205: 1 blank line required between summary line and description (found 0)
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:963 in public function `interface_script`:
D400: First line should end with a period (not 'r')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:963 in public function `interface_script`:
D401: First line should be in imperative mood (perhaps 'Make', not 'Makes')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:977 in private nested function `infer_interface_methods_to_compile`:
D205: 1 blank line required between summary line and description (found 0)
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:977 in private nested function `infer_interface_methods_to_compile`:
D400: First line should end with a period (not 'h')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:989 in public function `try_compile_fn`:
D103: Missing docstring in public function
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:1014 in public function `wrap_cpp_class`:
D200: One-line docstring should fit on one line with quotes (found 3)
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:1021 in public function `wrap_cpp_module`:
D200: One-line docstring should fit on one line with quotes (found 3)
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:1021 in public function `wrap_cpp_module`:
D400: First line should end with a period (not 's')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:1040 in public function `compile_unbound_method`:
D103: Missing docstring in public function
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:1052 in public function `lazy_bind`:
D205: 1 blank line required between summary line and description (found 0)
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:1052 in public function `lazy_bind`:
D400: First line should end with a period (not 'd')
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:1052 in public function `lazy_bind`:
D401: First line should be in imperative mood (perhaps 'Return', not 'Returns')
47
```
After:
```
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:69 in public function `make_stub`:
D103: Missing docstring in public function
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:75 in public function `make_stub_from_method`:
D103: Missing docstring in public function
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:90 in public function `make_stubs_from_exported_methods`:
D103: Missing docstring in public function
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:103 in public function `jit_ignored_properties`:
D103: Missing docstring in public function
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:155 in public class `SourceContext`:
D101: Missing docstring in public class
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:156 in public method `__init__`:
D107: Missing docstring in __init__
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:160 in public function `get_annotations`:
D103: Missing docstring in public function
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:424 in public class `ConcreteTypeStore`:
D101: Missing docstring in public class
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:428 in public method `__init__`:
D107: Missing docstring in __init__
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:457 in public function `create_methods_and_properties_from_stubs`:
D103: Missing docstring in public function
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:472 in public function `create_hooks_from_stubs`:
D103: Missing docstring in public function
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:724 in public function `script_model_defines_attr`:
D103: Missing docstring in public function
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:734 in public function `add_python_attr_to_scripted_model`:
D103: Missing docstring in public function
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:739 in public function `get_overload_annotations`:
D103: Missing docstring in public function
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:771 in public function `get_overload_name_mapping`:
D103: Missing docstring in public function
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:796 in public function `make_stubs_for_overloads`:
D103: Missing docstring in public function
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:815 in public function `check_module_initialized`:
D103: Missing docstring in public function
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:979 in public function `try_compile_fn`:
D103: Missing docstring in public function
/home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:1026 in public function `compile_unbound_method`:
D103: Missing docstring in public function
19
```
@svekars
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113371
Approved by: https://github.com/davidberard98
Finding optimal meta parameters for bsr_dense_mm and bsr_scatter_mm triton kernels is a tedious job. This PR introduces a tool (a Python script `torch/sparse/_triton_ops_meta.py`) that finds the optimal set of meta parameters for a given set of matrix multiplication inputs and their block sizes. Currently, such a set is found for square bsr tensor inputs with sizes 256...16384 and square blocksizes 16...128, and dense tensor inputs with sizes 256...131072.
As a result, bsr_dense_mm performance has increased as follows (`NVIDIA A100-SXM4-80GB`):
- for blocksize 16x16, the average/maximum speed up is about 40/60 %.
- for blocksize 32x32, the average/maximum speed up is about 28/45 %.
- for blocksize 64x64, the average/maximum speed up is about 26/43 %.
- for blocksize 128x128, the average/maximum speed up is about 12/28 %.
To enable the performance improvements through meta parameter optimization for other CUDA devices, one must execute the `_triton_ops_meta.py` which will calculate the optimal meta parameters and store the results in a dictionary object defined in `_triton_ops_meta.py`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112737
Approved by: https://github.com/cpuhrsch
Summary:
This PR adds in torch.compile support for semi-structured sparsity,
using the subclass tracing @bdhirsh added.
Based on wether we are using cuSPARSELt or CUTLASS, we return a
different representation of the inner tensors.
Test Plan:
```
python test/test_sparse_semi_structured.py -k compile
```
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111049
Approved by: https://github.com/cpuhrsch
As in the title.
The figures below illustrate the performance differences of bsr_dense_mm with optimized parameters and bsr_dense_mm with default parameters (GPU: NVIDIA A100-SXM4-80GB). The first figure represents the performance equilibrium point in BSR tensor sparsity at which value bsr_dense_mm have the same performance characteristics as torch.matmul. The second figure represents speedups from using optimized meta parameters in bsr_dense_mm at its performance equilibrium points with respect to bsr_dense_mm with default meta parameters.
In sum, this PR speeds up `bsr_dense_mm` about 50 % depending on the bsr tensor shape and blocksize and lowers the performance equilibrium points of BSR tensor sparsity and strided tensor for matmul operations.
<img src="https://github.com/pytorch/pytorch/assets/402156/6fe9d35f-dd21-4aa0-bb01-6ee257254453" width="48%"> <img src="https://github.com/pytorch/pytorch/assets/402156/506921c6-3770-4209-ad3d-498d2ae4989d" width="48%">
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111760
Approved by: https://github.com/cpuhrsch
ghstack dependencies: #110396, #111470, #111489
This PR introduces `scatter_mm` operation (compute `mm` of arbitrary pairs of tensors given in batches of tensors) that is used to implement `bsr_scatter_mm` that is equivalent to `bsr_dense_mm` (the `mm` operation on bsr and strided tensors). The implementation is provided both in Triton (when tensor dimensions are multiples of 16) and in PyTorch (otherwise).
The figures below illustrate the performance differences of `bsr_scatter_mm` and `bsr_dense_mm` (GPU: `NVIDIA GeForce RTX 2060 SUPER`). The first figure represents the performance equilibrium point in BSR tensor sparsity at which value `bsr_scatter_mm` or `bsr_dense_mm` have the same performance characteristics as `torch.matmul`. The second figure represents speedups from using `bsr_scatter_mm` at its performance equilibrium points with respect to `bsr_dense_mm`.
<img src="https://github.com/pytorch/pytorch/assets/402156/526d182e-937f-4812-a6c4-904f52d6d5ab" width="48%"> <img src="https://github.com/pytorch/pytorch/assets/402156/ccb606ab-1f3f-4133-887c-b56285f4f168" width="48%">
The same figures for GPU card `NVIDIA A100-SXM4-80GB`:
<img src="https://github.com/pytorch/pytorch/assets/402156/25466f1d-df34-4d1c-a975-afb478e4d9f0" width="48%"> <img src="https://github.com/pytorch/pytorch/assets/402156/6ada91f0-a20f-4f0d-8a48-1f4ccc60d08e" width="48%">
In sum:
- `bsr_scatter_mm` is about 2x faster than `bsr_dense_mm` for small block sizes of 16 and 32 and large tensors [GPU: `NVIDIA GeForce RTX 2060 SUPER`].
- `bsr_scatter_mm` is up to 2x faster than `bsr_dense_mm` for small block sizes of 16 and large tensors [GPU: `NVIDIA A100-SXM4-80GB`].
- `bsr_dense_mm` is up to 20 % faster than `bsr_scatter_mm` for block sizes of 64 or larger [GPU: `NVIDIA GeForce RTX 2060 SUPER`].
- However, `bsr_dense_mm` fails with `OutOfResources` exception for block sizes of 256 or larger whereas `bsr_scatter_mm` succeeds.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110396
Approved by: https://github.com/cpuhrsch
Summary:
This PR adds in torch.compile support for semi-structured sparsity,
using the subclass tracing @bdhirsh added.
Based on wether we are using cuSPARSELt or CUTLASS, we return a
different representation of the inner tensors.
Test Plan:
```
python test/test_sparse_semi_structured.py -k compile
```
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111049
Approved by: https://github.com/cpuhrsch
ghstack dependencies: #110583
Summary:
This PR adds in torch.compile support for semi-structured sparsity,
using the subclass tracing @bdhirsh added.
Based on wether we are using cuSPARSELt or CUTLASS, we return a
different representation of the inner tensors.
Test Plan:
```
python test/test_sparse_semi_structured.py -k compile
```
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111049
Approved by: https://github.com/cpuhrsch
ghstack dependencies: #110583
Summary:
Currently we have shape constraints in semi-structured sparsity for both
CUTLASS and cuSPARSELt
These shape constraints unfortunately apply to both the dense and sparse
matrices in sparsedense matmul.
This PR adds in support for calling `F.pad` in order to pad dense
matrices to the right size with zeros and then pull out the
corresponding rows from the resultant result matrix.
We also throw a warning in this case.
The tests have also been updated to take in a dense_input_shape
parameter.
Test Plan:
```
python test/test_sparse_semi_structured.py
```
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110583
Approved by: https://github.com/alexsamardzic, https://github.com/cpuhrsch
Summary:
Currently, PyTorch incorrectly calculates the size of the returned
matrix when we pass a non-contiguous batched (>2d) input to the
semi-structured sparse subclass.
This is most common in MLP layers, where we have 2 linear layers back to back.
This will lead to an error like the following:
```
RuntimeError: shape '[20, 64, 64, 3072]' is invalid for input of size
62914560
```
Where the size of the sparse matmul result is off because we infer the
output shape with the wrong tensor shape.
This happens because of a bug where we did not update the subclass
tensor shape when doing transpose.
For semi-structured sparsity, transposing is a no-op where we just set
the boolean flag, but we forgot to also update the tensor shape.
Note that this error goes away in inference mode, since we avoid
decomposing the aten.linear op and handle shape folding ourselves,
which changes the execution path.
An alternative way to fix this issue is to set
TORCH_FLATTEN_LINEAR_3D=True, which will also fix this error.
Test Plan:
```
python test/test_sparse_semi_structured.py -k test_mlp
```
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110420
Approved by: https://github.com/alexsamardzic, https://github.com/cpuhrsch
As in the title.
The bsr_dense_mm performance on inputs using column-major storage order is relevant for `linear(x, W)` operation that for BSR weights is defined as `bsr_dense_mm(W, x.transpose(-2, -1)).transpose(-2, 1)` so that the second argument to `bse_dense_mm` is a strided tensor using column-major storage order when `x` is C-contiguous.
For large inputs (size > 1000) and moderate sparsity in the BSR input, the speed up can be more than 3 times, as illustrated in the following figure (raw data: [bench_bsr_dense_mm_1_results.txt](https://github.com/pytorch/pytorch/files/12512245/bench_bsr_dense_mm_1_results.txt)):

For small inputs (size=512), there exists a slight degradation of performance.
For row-major ordered inputs, there is no change in performance (see raw data above).
For inputs with float16 dtype, there is no considerable change in performance (see blue marks in the figure).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108512
Approved by: https://github.com/cpuhrsch
Compared to #104848, this PR makes a step further: when the enable_sparse_support decorator is applied to `torch.autograd.gradcheck`, the resulting callable is equivalent to `torch.autograd.gradcheck` with an extra feature of supporting functions that can have input sparse tensors or/and can return sparse tensors.
At the same time, the underlying call to `torch.autograd.gradcheck` will operate on strided tensors only. This basically means that torch/autograd/gradcheck.py can be cleaned up by removing the code that deals with sparse tensors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107150
Approved by: https://github.com/albanD, https://github.com/amjames, https://github.com/cpuhrsch
ghstack dependencies: #107638, #107777
Summary:
This is a duplicate PR of 102133, which was reverted because it was
failing internal tests.
It seems like that internal builds did not like my guard to check if
cuSPARSELt was available or not.
Test Plan: python test/test_sparse_semi_structured.py
Differential Revision: D48440330
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107398
Approved by: https://github.com/cpuhrsch
This PR contains two new private ops, added for cuSPARSELt support.
These ops call into the cuSPASRELt kernels using the bindings they
provide. For more information, see the documentation
[here](https://docs.nvidia.com/cuda/cusparselt/index.html).
The two new private ops added are:
```
_cslt_compress()
_cslt_sparse_mm()
```
_cslt_compress is an op that reuturns the compressesed matrix given a
sparse matrix that is passed in.
_cslt_sparse_mm is an op that expects a compressed matrix (the result of
_cslt_compress) and a dense matrix and performs sparse-dense matmul
These ops will throw runtime errors if they cusparselt is not present.
This PR also modifies the test and tensor sublass to reflect the new
cuSPARSELt support.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102133
Approved by: https://github.com/cpuhrsch
This PR re-lands
- [Typing] Fix PEP 484 Violation (#105022)
- Update mypy to 1.4.1 (#91983)
That were reverted due to the conflict with internal source repo.
Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional)
Plus few real fixes:
- Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi`
- Add missing return statement to `torch._export. deserialize_graph`
- Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights`
- Add assert it `torch/optim/optimizer.py` that Optional list is not None
TODO (in followup PR):
- Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py`
Unrelated, to bypass CI failures due to the gcc9 dependency update in Ubuntu-18.04:
- Add hack to squash older libstdc++ from conda environment in favor one from OS to `.ci/docker/install_conda.sh`
- Update bazel cuda builds to focal, as with libstdc++-6.0.32 bazel builds loose the ability to catch exceptions (probably because they link with cupti statically, but I could not found where it is done)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105227
Approved by: https://github.com/atalman, https://github.com/albanD, https://github.com/Skylion007
This PR re-lands
- [Typing] Fix PEP 484 Violation (#105022)
- Update mypy to 1.4.1 (#91983)
That were reverted due to the conflict with internal source repo.
Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional)
Plus few real fixes:
- Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi`
- Add missing return statement to `torch._export. deserialize_graph`
- Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights`
- Add assert it `torch/optim/optimizer.py` that Optional list is not None
TODO (in followup PR):
- Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105227
Approved by: https://github.com/atalman, https://github.com/albanD, https://github.com/Skylion007
This PR implements a (yet private) frontend for scaled_dot_product_attention that works with BSR `attn_mask`.
This function is directly comparable (with suitable masks) with `torch.nn.functional.scaled_dot_product_attention` once `attn_mask.dtype == torch.bool`, but it's behavior is different when `attn_mask.dtype != torch.bool`. This is because `torch.nn.functional.scaled_dot_product_attention` assumes that irrelevant values are supposed to be filled with `-inf`, while the selected ones should be `0`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104042
Approved by: https://github.com/amjames, https://github.com/cpuhrsch
This PR adds in support for semi-structured sparsity via a tensor
subclass. It currently uses the CUTLASS kernels merged in PR #100881.
In the future we plan to add in cuSPARSELt support (see the other PRs in
the stack), which will give us larger performance gains.
This PR adds in 2 things:
- a Tensor subclass, `SparseSemiStructuredTensor` to store the
sparse tensor in copmressed form and override `__torch_dispatch__`.
- a conversion function that takes in a dense tensor and a
semi-structured sparse bool mask and creates an instance of the
subclass.
**SparseSemiStructuredTensor**
The subclass stores the dense tensor in a contiguous flattened tensor
for future compatability with cuSPARSELt, which expects this format.
Note that the CUTLASS kernels do not have this limitation, as the
specified values and the metadata are passed separately in
`_structured_sparse_linear`. In the future we can use the cuSPARSELT bindings
[here](https://github.com/pytorch/pytorch/pull/103700) for faster matmul, better dtype converage, and relaxed shape
constraints.
Since we currently don't have a way to go back from the sparse
representation to the dense representation, and we store the weights in
compressed form, we don't have a great way to handle .t().
Instead, we keep track of how often we've called transpose on our
tensor, and if it's an unexpected number we throw an error. When the first
argument is sparse, we expect an even number of calls to transpose,
while when the second argument is sparse, we expect an odd number of
calls. This is because we support second argument sparse matrix
multiplications by using transpose properties.
**to_sparse_semi_structured**
This is a conversion function to convert a dense tensor and a
semi-structured sparse bool mask into a subclass. Currently, we must
pass in a bool mask, since we can't infer it becuase there may be
additional zero elements in the dense tensor, so `tensor !=0` is not 2:4
sparse.
Once we add either a method to derive the mask from the dense tensor or
cuSPARSELt, we no longer need to pass in the mask. cuSPARSELt has it's
own helper functions to create the metadata mask.
**User Details**
We have implemented support for the following ops for `torch.float16`
and `torch.int8`:
```
torch.addmm(bias, dense, sparse.t())
torch.mm(dense, sparse)
torch.mm(sparse, dense)
aten.linear.default
aten.t.default
aten.t.detach
```
The end user interface to accelerate a nn.Linaer module with the
subclass would look like this:
```
from torch.sparse import to_sparse_semi_structured
mask = torch.Tensor([0, 0, 1, 1]).tile(128, 32).cuda().bool()
linear = Model(128, 128).half().cuda()
linear.weight = nn.Parameter(to_sparse_semi_structured(linear.weight,
mask=linear.weight.bool())
```
This also updates tests and the `torch.sparse` module docstring to
reflect these changes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102135
Approved by: https://github.com/albanD
This PR adds in support for semi-structured sparsity via a tensor
subclass. It currently uses the CUTLASS kernels merged in PR #100881.
In the future we plan to add in cuSPARSELt support (see the other PRs in
the stack), which will give us larger performance gains.
This PR adds in 2 things:
- a Tensor subclass, `SparseSemiStructuredTensor` to store the
sparse tensor in copmressed form and override `__torch_dispatch__`.
- a conversion function that takes in a dense tensor and a
semi-structured sparse bool mask and creates an instance of the
subclass.
**SparseSemiStructuredTensor**
The subclass stores the dense tensor in a contiguous flattened tensor
for future compatability with cuSPARSELt, which expects this format.
Note that the CUTLASS kernels do not have this limitation, as the
specified values and the metadata are passed separately in
`_structured_sparse_linear`. In the future we can use the cuSPARSELT bindings
[here](https://github.com/pytorch/pytorch/pull/103700) for faster matmul, better dtype converage, and relaxed shape
constraints.
Since we currently don't have a way to go back from the sparse
representation to the dense representation, and we store the weights in
compressed form, we don't have a great way to handle .t().
Instead, we keep track of how often we've called transpose on our
tensor, and if it's an unexpected number we throw an error. When the first
argument is sparse, we expect an even number of calls to transpose,
while when the second argument is sparse, we expect an odd number of
calls. This is because we support second argument sparse matrix
multiplications by using transpose properties.
**to_sparse_semi_structured**
This is a conversion function to convert a dense tensor and a
semi-structured sparse bool mask into a subclass. Currently, we must
pass in a bool mask, since we can't infer it becuase there may be
additional zero elements in the dense tensor, so `tensor !=0` is not 2:4
sparse.
Once we add either a method to derive the mask from the dense tensor or
cuSPARSELt, we no longer need to pass in the mask. cuSPARSELt has it's
own helper functions to create the metadata mask.
**User Details**
We have implemented support for the following ops for `torch.float16`
and `torch.int8`:
```
torch.addmm(bias, dense, sparse.t())
torch.mm(dense, sparse)
torch.mm(sparse, dense)
aten.linear.default
aten.t.default
aten.t.detach
```
The end user interface to accelerate a nn.Linaer module with the
subclass would look like this:
```
from torch.sparse import to_sparse_semi_structured
mask = torch.Tensor([0, 0, 1, 1]).tile(128, 32).cuda().bool()
linear = Model(128, 128).half().cuda()
linear.weight = nn.Parameter(to_sparse_semi_structured(linear.weight,
mask=linear.weight.bool())
```
This also updates tests and the `torch.sparse` module docstring to
reflect these changes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102135
Approved by: https://github.com/albanD