121 Commits

Author SHA1 Message Date
8de85896e0 Enable ruff rule E721 (#165162)
`E721` checks for object type comparisons using == and other comparison operators. This is useful because it is recommended to use `is` for type comparisons.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165162
Approved by: https://github.com/Skylion007
2025-10-13 01:48:55 +00:00
816fb7f48d Revert "Enable ruff rule E721 (#165162)"
This reverts commit 9e7c19f72b6d0690915c307409c0c0a76b5a3bf0.

Reverted https://github.com/pytorch/pytorch/pull/165162 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](https://github.com/pytorch/pytorch/pull/165162#issuecomment-3393328271))
2025-10-11 13:25:40 +00:00
9e7c19f72b Enable ruff rule E721 (#165162)
`E721` checks for object type comparisons using == and other comparison operators. This is useful because it is recommended to use `is` for type comparisons.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165162
Approved by: https://github.com/Skylion007
2025-10-11 06:43:53 +00:00
96181d6f76 [BE][cutlass backend] BE changes post cutlass_cppgen name change (#164589)
Differential Revision: D83809105

Handle reviews from https://github.com/pytorch/pytorch/pull/164159

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164589
Approved by: https://github.com/Skylion007
2025-10-06 17:22:08 +00:00
6c209bfc5c [cutlass-4][take 2] upgrade to cutlass 4.2.1 (#164159)
Test Plan: Sandcastle

Differential Revision: D83492704

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164159
Approved by: https://github.com/Skylion007, https://github.com/mlazos
2025-10-03 03:47:59 +00:00
349e9e922d [cutass backend] remove cutlass presets (#164380)
Differential Revision: [D83674898](https://our.internmc.facebook.com/intern/diff/D83674898/)

Changes made by claude code (need to remove test too)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164380
Approved by: https://github.com/Skylion007, https://github.com/mlazos
2025-10-02 01:26:00 +00:00
19b754dff8 Revert "Update cutlass version for fbcode (#163091)"
This reverts commit 509c4e86270cc4decca58905d0f446e1fc0cf618.

Reverted https://github.com/pytorch/pytorch/pull/163091 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/163091#issuecomment-3322428791))
2025-09-23 05:08:42 +00:00
509c4e8627 Update cutlass version for fbcode (#163091)
Differential Revision: [D82567751](https://our.internmc.facebook.com/intern/diff/D82567751/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163091
Approved by: https://github.com/drisspg
2025-09-22 14:31:11 +00:00
a81a2e54ed [submodule] CUTLASS upgrade to 4.2.0 and change cutlass to cutlass_cppgen (#163092)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163092
Approved by: https://github.com/drisspg, https://github.com/Skylion007
2025-09-18 18:03:51 +00:00
d8e6b2fddc [Cutlass] Add exp and sigmoid activations (#162536)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162536
Approved by: https://github.com/henrylhtsang, https://github.com/eellison
ghstack dependencies: #162535
2025-09-10 21:44:26 +00:00
31c25c7d01 [Cutlass] Add tanh activation and test case for activations (#162535)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162535
Approved by: https://github.com/henrylhtsang
2025-09-10 21:44:26 +00:00
92a43025e0 [cutlass backend] Add FP8 tests for multiple linears (#160782)
Adding a test that is closer to real use case. Thanks @mlazos for fixing a few issues so this test works for most cases.

We still have to skip the AOTI and dynamic case due to accuracy issues.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160782
Approved by: https://github.com/mlazos
2025-09-05 20:23:25 +00:00
e9eb2096a5 [cutlass backend] Allow bmm use cases when batch stride is 0 (#160356)
Differential Revision: [D80035771](https://our.internmc.facebook.com/intern/diff/D80035771/)

The motivation and the original change is to reduce the number parameters we pass into the kernel, which was motivated by aesthetic reasons only.

But seeing the need to use different batch stride, we should just pass in the batch stride. That would be a good long term fix.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160356
Approved by: https://github.com/mlazos
2025-08-13 20:52:24 +00:00
b90feeac86 [BE][cutlass backend] Fix subproc addmm tests (#160295)
Differential Revision: [D79977421](https://our.internmc.facebook.com/intern/diff/D79977421/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160295
Approved by: https://github.com/jingsh
2025-08-12 01:41:06 +00:00
af10f1f86c Fix requires_cuda to requires_cuda_and_triton (#160222)
Fixes ##159399

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160222
Approved by: https://github.com/janeyx99
2025-08-10 07:05:52 +00:00
4e2ddb5db6 [Inductor][CUTLASS] Copy cutlass_mock_imports directory (#159724)
Pip wheels of PyTorch nightly and 2.8 release candidates do not contain `cutlass_mock_imports`.

This is the path to the source code:
```
root@8120d02fd9c5:$ tree ./torch/_inductor/codegen/cuda/cutlass_lib_extensions/
./torch/_inductor/codegen/cuda/cutlass_lib_extensions/
├── cutlass_mock_imports
│   ├── cuda
│   │   ├── __init__.py
│   │   ├── cuda.py
│   │   └── cudart.py
│   ├── pydot
│   │   └── __init__.py
│   └── scipy
│       ├── __init__.py
│       └── special.py
├── evt_extensions.py
└── gemm_operation_extensions.py

5 directories, 8 files
```

And this what installed wheel has:
```
root@8120d02fd9c5:$ tree /usr/local/lib/python3.12/dist-packages/torch/_inductor/codegen/cuda/cutlass_lib_extensions/
/usr/local/lib/python3.12/dist-packages/torch/_inductor/codegen/cuda/cutlass_lib_extensions/
├── __init__.py
├── evt_extensions.py
└── gemm_operation_extensions.py

1 directory, 3 files
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159724
Approved by: https://github.com/henrylhtsang
2025-08-08 22:56:05 +00:00
50f23ff6f8 rename-HAS_CUDA-to-HAS_CUDA_AND_TRITON (#159883)
Fixes #159399
"Modified torch.testing._internal.inductor_utils and test/inductor"

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159883
Approved by: https://github.com/janeyx99
2025-08-08 15:44:52 +00:00
bdb07a2bc5 [Cutlass] Allow offsets to be passed as arguments to kernel (#159761)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159761
Approved by: https://github.com/henrylhtsang
ghstack dependencies: #159760
2025-08-05 21:59:07 +00:00
ddbdcdc710 [cutlass backend][test] Expand FP8 tests to FP16 (#159538)
Differential Revision: [D79317343](https://our.internmc.facebook.com/intern/diff/D79317343/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159538
Approved by: https://github.com/mlazos
2025-08-04 23:01:55 +00:00
32840d19f9 [cutlass backend] skip stream k if shape is dynamic (#159442)
Differential Revision: [D79229210](https://our.internmc.facebook.com/intern/diff/D79229210/)

Motivation is workspace size is hard to determine, and varies for different shape. What I observed is sometimes the shape got smaller, but the workspace can increase. So it is hard to upper bound it.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159442
Approved by: https://github.com/ColinPeppler
2025-08-01 20:42:24 +00:00
f7f550649f [cutlass backend] Change default inst level mm config number (#158901)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158901
Approved by: https://github.com/ColinPeppler, https://github.com/jingsh, https://github.com/Skylion007
2025-07-23 20:53:22 +00:00
662dd7db5b [cutlass backend] cache maybe_append_choices (#156781)
This PR attempts to cache:
* codegen for cutlass backend for the same kernel. Even if runtime params are different.

From some profiling, most of the time spent is on render. So we only target to cache that part for now.

The output of render is `code`, and we are able to cache that easily. Also, I have to cache size_args, since it depends on `kernel.get_dynamic_shape_args()`, which depends on the state of self when we call render.

make_key is doing most of the work here: We are hashing on input node layouts, output node layout and op.configuration_name() (this is what hash(op) would do anyway).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156781
Approved by: https://github.com/ColinPeppler
2025-07-21 19:02:39 +00:00
6200584193 [cutlass backend][BE] remove force disable cache in tests (#158053)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158053
Approved by: https://github.com/coconutruben
2025-07-15 10:35:34 +00:00
a9ac9f2635 [cutlass backend] Change serialization protocol to use more json and cache (#157840)
Differential Revision: [D77949177](https://our.internmc.facebook.com/intern/diff/D77949177/)

What this diff does:
* use lru_cache for serialization and deserialization
* json dumps more. This seems to help perf.

For instantiation level 3332, the loading time decreases from 33s to 20s (roughly 40%) decrease.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/157840
Approved by: https://github.com/ColinPeppler
ghstack dependencies: #157839
2025-07-10 17:44:33 +00:00
bbe681ed51 [cutlass backend][BE][ez] Make matmul layouts be row x column (#156656)
Differential Revision: [D77184232](https://our.internmc.facebook.com/intern/diff/D77184232/)

Motivation:
* This is the case we care the most.
* We are caching the kernels for this row x column layout. So testing on them can potentially make ci run faster.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156656
Approved by: https://github.com/ColinPeppler
2025-07-08 16:57:33 +00:00
d3efd73234 Revert "[cutlass backend][BE][ez] Make matmul layouts be row x column (#156656)"
This reverts commit 84c588e5eada9e7921608065edc444a15c22cb1c.

Reverted https://github.com/pytorch/pytorch/pull/156656 on behalf of https://github.com/henrylhtsang due to breaking fbcode A100 tests ([comment](https://github.com/pytorch/pytorch/pull/156656#issuecomment-3020769914))
2025-06-30 21:16:04 +00:00
84c588e5ea [cutlass backend][BE][ez] Make matmul layouts be row x column (#156656)
Differential Revision: [D77184232](https://our.internmc.facebook.com/intern/diff/D77184232/)

Motivation:
* This is the case we care the most.
* We are caching the kernels for this row x column layout. So testing on them can potentially make ci run faster.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156656
Approved by: https://github.com/ColinPeppler
2025-06-27 17:15:45 +00:00
81bf278537 [cutlass] rename cutlass python lib to python-cutlass (#156655)
Differential Revision: [D77173366](https://our.internmc.facebook.com/intern/diff/D77173366/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156655
Approved by: https://github.com/Skylion007
2025-06-26 02:47:14 +00:00
e071837594 [cutlass backend] compile and link for .so files (#155876)
Differential Revision: [D76482736](https://our.internmc.facebook.com/intern/diff/D76482736/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155876
Approved by: https://github.com/coconutruben, https://github.com/ColinPeppler
2025-06-25 17:01:56 +00:00
f5e6e52f25 [BE][PYFMT] migrate PYFMT for test/inductor/ to ruff format (#148186)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148186
Approved by: https://github.com/jansel
2025-06-24 11:12:11 +00:00
a2a75be0f8 Rename inductor cache (#156128)
Requested by Simon on a different PR

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156128
Approved by: https://github.com/xmfan
2025-06-17 03:57:18 +00:00
3040ca6d0f [Cutlass] Include fp8 headers in aoti cpp wrapper (#155173)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155173
Approved by: https://github.com/desertfire
ghstack dependencies: #154829, #154835, #155195
2025-06-11 01:21:16 +00:00
40d02eb481 [Cutlass] Allow filtering by fast_accum for scaled_mm (#155195)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155195
Approved by: https://github.com/drisspg
ghstack dependencies: #154829, #154835
2025-06-09 22:46:18 +00:00
9a42f01586 [Cutlass] EVT dynamic shapes support (#154835)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154835
Approved by: https://github.com/henrylhtsang
ghstack dependencies: #154829
2025-06-05 20:17:01 +00:00
5911f870c0 [Cutlass] fp8 dynamic shapes test (#154829)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154829
Approved by: https://github.com/henrylhtsang, https://github.com/eellison
2025-06-05 20:17:01 +00:00
6f93ce3c86 Revert "[Cutlass] fp8 dynamic shapes test (#154829)"
This reverts commit 36596ad2a009a0906848fa264954d4b200efc50e.

Reverted https://github.com/pytorch/pytorch/pull/154829 on behalf of https://github.com/seemethere due to This is failing internal tests see, [fburl.com/diff/3gomp7i3](https://fburl.com/diff/3gomp7i3). Please re-land this as a co-dev diff ([comment](https://github.com/pytorch/pytorch/pull/154829#issuecomment-2940494361))
2025-06-04 15:36:27 +00:00
3fa3dbdb1f Revert "[Cutlass] EVT dynamic shapes support (#154835)"
This reverts commit 4224a7df01a9607830da771fd4884c8eba150630.

Reverted https://github.com/pytorch/pytorch/pull/154835 on behalf of https://github.com/seemethere due to This is part of a stack that is failing internal tests see, [fburl.com/diff/3gomp7i3](https://fburl.com/diff/3gomp7i3). Please re-land this as a co-dev diff ([comment](https://github.com/pytorch/pytorch/pull/154835#issuecomment-2940463211))
2025-06-04 15:33:09 +00:00
4224a7df01 [Cutlass] EVT dynamic shapes support (#154835)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154835
Approved by: https://github.com/henrylhtsang
ghstack dependencies: #154775, #154761, #154829
2025-06-03 22:20:34 +00:00
36596ad2a0 [Cutlass] fp8 dynamic shapes test (#154829)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154829
Approved by: https://github.com/henrylhtsang, https://github.com/eellison
ghstack dependencies: #154775, #154761
2025-06-03 22:20:33 +00:00
1c2b9cecd2 [Cutlass] Support bias arg for fp8 GEMM (#154761)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154761
Approved by: https://github.com/drisspg
ghstack dependencies: #154775
2025-06-03 22:20:27 +00:00
cb56df55dc [Inductor]Cleanup autotune_fallback_to_aten post-deprecation (#154331)
Fixes #153298

This PR is the 3rd and final step of #147479
All references to autotune_fallback_to_aten have been removed, and the feature is now deprecated.
All calls to should_fallback_to_aten() were also removed, as they were deemed unnecessary.

[henrylhtsang](https://github.com/henrylhtsang)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154331
Approved by: https://github.com/henrylhtsang
2025-05-29 20:29:58 +00:00
423fc671e9 [Cutlass] Support float8_e4m3fn GEMM (#153890)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153890
Approved by: https://github.com/drisspg, https://github.com/eellison
2025-05-22 08:37:33 +00:00
053ca7439a [cutlass backend] Add serializer for cutlass ops (#153894)
Differential Revision: [D74524786](https://our.internmc.facebook.com/intern/diff/D74524786/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153894
Approved by: https://github.com/ColinPeppler, https://github.com/mlazos
2025-05-21 22:01:40 +00:00
7ebea09986 [Cutlass] Enable fusion with FusedSchedulerNodes (#153588)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153588
Approved by: https://github.com/eellison
ghstack dependencies: #152815
2025-05-17 12:29:10 +00:00
f604732e2e [Cutlass] E2E Tests for EVT (#152815)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152815
Approved by: https://github.com/henrylhtsang, https://github.com/eellison
2025-05-17 12:29:10 +00:00
1f5cf19f56 [cutlass backend] Use src code to generate cutlass gemm name (#153006)
This shaves off 40s for at least small cases, since we don't have to recompile the kernel again.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153006
Approved by: https://github.com/mlazos
2025-05-11 00:57:03 +00:00
595e21a9dd [cutlass-3] Add cutlass key for fbcode and OSS (#153081)
Differential Revision: [D74337959](https://our.internmc.facebook.com/intern/diff/D74337959/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153081
Approved by: https://github.com/drisspg
2025-05-09 17:38:31 +00:00
8b9c9a327f [cutlass backend] cache filtered ops based on layouts (#152580)
Differential Revision: [D73972687](https://our.internmc.facebook.com/intern/diff/D73972687/)

Add cache to store the list of filtered ops for a specific shape + layout + dtype (aka hash on input_nodes).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152580
Approved by: https://github.com/eellison
2025-05-07 16:38:22 +00:00
61aa77e216 [cutlass backend][BE][clean-up] refactor to remove use of autotune_fallback_to_aten=True in cutlass backend tests (#152850)
Differential Revision: [D74192001](https://our.internmc.facebook.com/intern/diff/D74192001/)

Motivation: clean up post https://github.com/pytorch/pytorch/issues/147479. I plan to leave the rest of the clean-up as an first time issue.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152850
Approved by: https://github.com/chenyang78
2025-05-06 23:48:57 +00:00
f2cc07d202 [cutlass backend] Add addmm dynamic support (#152498)
Differential Revision: [D73893133](https://our.internmc.facebook.com/intern/diff/D73893133/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152498
Approved by: https://github.com/ColinPeppler
2025-05-01 01:40:08 +00:00