pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Files

Pearu Peterson e1c872e009 Add optimal triton kernel parameters to bsr_dense_mm and scatter_mm for bfloat16 and float32 dtypes (#113553 )

As in the title.

This PR is a follow-up to PR https://github.com/pytorch/pytorch/pull/112737 to address bfloat16 and float32 dtype cases. The performance increase is as follows (`NVIDIA A100-SXM4-80GB`):

- bsr_scatter_mm and bfloat16
  - for blocksize 16x16, the average/maximum speed up is about 29/75 %.
  - for blocksize 32x32, the average/maximum speed up is about 23/58 %.
  - for blocksize 64x64, the average/maximum speed up is about 27/66 %.
  - for blocksize 128x128, the average/maximum speed up is about 33/72 %.
- bsr_dense_mm and bfloat16
  - for blocksize 16x16, the average/maximum speed up is about 47/61 %.
  - for blocksize 32x32, the average/maximum speed up is about 29/43 %.
  - for blocksize 64x64, the average/maximum speed up is about 21/41 %.
  - for blocksize 128x128, the average/maximum speed up is about 12/29 %.
- bsr_dense_mm and  float32
  - for blocksize 16x16, the average/maximum speed up is about 35/49 %.
  - for blocksize 32x32, the average/maximum speed up is about 2/5 %.
  - for blocksize 64x64, the average/maximum speed up is about 2/21 %.
  - for blocksize 128x128, the average/maximum speed up is about 79/84 %.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113553
Approved by: https://github.com/cpuhrsch

2023-11-14 00:47:59 +00:00

__init__.py

Fixed docstring errors in _fuser.py, _state.py, __init__.py, _freeze.py, _async.py, _recursive.py, _tensorboard_vis.py, _trace.py, _await.py, _check.py, _serialization.py, _script.py, annotations.py, _monkeytype_config.py (#113371 )

2023-11-12 03:19:02 +00:00

_semi_structured_conversions.py

[sparse] semi-structured sparse + torch.compile support (#111049 )

2023-10-24 02:23:20 +00:00

_triton_ops_meta.py

Add optimal triton kernel parameters to bsr_dense_mm and scatter_mm for bfloat16 and float32 dtypes (#113553 )

2023-11-14 00:47:59 +00:00

_triton_ops.py

Add optimal triton kernel parameters to bsr_dense_mm and scatter_mm for bfloat16 and float32 dtypes (#113553 )

2023-11-14 00:47:59 +00:00

semi_structured.py

[sparse] semi-structured sparse + torch.compile support (#111049 )

2023-10-24 02:23:20 +00:00