pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Files

Pearu Peterson 69c4819f53 Add bsr_dense_addmm triton kernel (#114595 )

As in the title.

The `bsr_dense_addmm` kernel implemented in this PR is a generalization of `bsr_dense_mm` in the following respects (in addition of having input, beta, and alpha parameters):
- it implements `SPLIT_N` kernel parameter that enables efficient kernel launches in the case of wide inputs. For instance, the timing of nn.linear with 256x256 BSR weights having 16x16 blocks and 256x131072 strided input reduced about 16x (this corresponds to the 94 % speed up value listed below).
- it supports rectangular blocks in sparse BSR tensor weights

The performance increase of nn.linear is as follows (float16, `NVIDIA A100-SXM4-80GB`):
- with 16x16 blocks, the average/maximal speed up is  55/94 %
- with 32x32 blocks, the average/maximal speed up is  33/63 %
- with 64x64 blocks, the average/maximal speed up is  23/42 %
- with 128x128 blocks, the average/maximal speed up is  15/39 %

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114595
Approved by: https://github.com/cpuhrsch

2023-11-29 05:29:25 +00:00

__init__.py

Fixed docstring errors in _fuser.py, _state.py, __init__.py, _freeze.py, _async.py, _recursive.py, _tensorboard_vis.py, _trace.py, _await.py, _check.py, _serialization.py, _script.py, annotations.py, _monkeytype_config.py (#113371 )

2023-11-12 03:19:02 +00:00

_semi_structured_conversions.py

[sparse] semi-structured sparse + torch.compile support (#111049 )

2023-10-24 02:23:20 +00:00

_triton_ops_meta.py

Add bsr_dense_addmm triton kernel (#114595 )

2023-11-29 05:29:25 +00:00

_triton_ops.py

Add bsr_dense_addmm triton kernel (#114595 )

2023-11-29 05:29:25 +00:00

semi_structured.py

[sparse] semi-structured sparse + torch.compile support (#111049 )

2023-10-24 02:23:20 +00:00