[inductor] Add configuration control for CUTLASS operation selection. (#155770)

Added a new configuration option `cutlass_enabled_ops` that allows users to control which operations use CUTLASS lowerings. By default, CUTLASS is enabled for all operations (maintaining backward compatibility), but users can now selectively enable it only for specific operations to optimize compilation time.

**Fixes #155718**

## Usage Examples

```bash
# Enable CUTLASS for all operations (default behavior)
export TORCHINDUCTOR_CUTLASS_ENABLED_OPS="ALL"

# Enable CUTLASS only for matrix multiplication operations
export TORCHINDUCTOR_CUTLASS_ENABLED_OPS="mm,addmm"

# Enable CUTLASS only for batch operations
export TORCHINDUCTOR_CUTLASS_ENABLED_OPS="bmm,baddbmm"

# Disable CUTLASS for all operations
export TORCHINDUCTOR_CUTLASS_ENABLED_OPS=""
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155770
Approved by: https://github.com/henrylhtsang
This commit is contained in:
penknife6153
2025-06-14 08:19:50 +00:00
committed by PyTorch MergeBot
parent 1982ec2d22
commit 3e38feb05f
4 changed files with 60 additions and 6 deletions

View File

@ -1578,6 +1578,14 @@ def use_cutlass_template(layout: Layout, m: int, n: int, k: int) -> bool:
return res
def _use_cutlass_for_op(op_name: str) -> bool:
"""Check if CUTLASS should be used for the given operation."""
enabled_ops = config.cuda.cutlass_enabled_ops.upper()
if enabled_ops == "ALL":
return True
return op_name.upper() in [x.strip() for x in enabled_ops.split(",")]
decompose_k_threshold = 32
# To limit compile time