[BE]: Update cutlass submodule to 3.9.2 (#152779)

A lot of last minute bugfixes for CUTLASS blackwell that we should upstream. It's a header only library and a minor release so this should strictly improve compiler support and fix some bugs. Needed to update some instruction numbers in torch compile baselines for the new kernels

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152779
Approved by: https://github.com/henrylhtsang
This commit is contained in:
Aaron Gokaslan
2025-05-06 16:08:20 +00:00
committed by PyTorch MergeBot
parent f56bcd2408
commit 07a29dbe81
2 changed files with 10 additions and 10 deletions

View File

@ -14,19 +14,19 @@ add_loop_inductor_dynamic_gpu,compile_time_instruction_count,42960000000,0.025
add_loop_inductor_gpu,compile_time_instruction_count,25505620920,0.015
add_loop_inductor_gpu,compile_time_instruction_count,25630000000,0.015
basic_modules_ListOfLinears_eager,compile_time_instruction_count,1005000000,0.015
basic_modules_ListOfLinears_eager,compile_time_instruction_count,1011000000,0.015
basic_modules_ListOfLinears_inductor,compile_time_instruction_count,17990000000,0.015
basic_modules_ListOfLinears_inductor,compile_time_instruction_count,18150000000,0.015
basic_modules_ListOfLinears_inductor_gpu_force_shape_pad,compile_time_instruction_count,16220000000,0.015
basic_modules_ListOfLinears_inductor_gpu_force_shape_pad,compile_time_instruction_count,16340000000,0.015
@ -34,7 +34,7 @@ basic_modules_ListOfLinears_inductor_gpu,compile_time_instruction_count,97140000
update_hint_regression,compile_time_instruction_count,1608000000,0.02
update_hint_regression,compile_time_instruction_count,1622000000,0.02
@ -46,11 +46,11 @@ sum_floordiv_regression,compile_time_instruction_count,998400000,0.015
symint_sum,compile_time_instruction_count,3189000000,0.015
symint_sum,compile_time_instruction_count,3227000000,0.015
symint_sum_loop,compile_time_instruction_count,4180000000,0.015
symint_sum_loop,compile_time_instruction_count,4224000000,0.015
@ -62,11 +62,11 @@ aotdispatcher_inference_subclass_cpu,compile_time_instruction_count,5944000000,0
aotdispatcher_partitioner_cpu,compile_time_instruction_count,8501000000,0.015
aotdispatcher_partitioner_cpu,compile_time_instruction_count,8586000000,0.015
aotdispatcher_partitioner_cpu2,compile_time_instruction_count,1856000000,0.015
aotdispatcher_partitioner_cpu2,compile_time_instruction_count,1884000000,0.015

1 add_loop_eager compile_time_instruction_count 2960000000 0.015
14 symint_sum_loop compile_time_instruction_count 4180000000 4224000000 0.015
15 aotdispatcher_inference_nosubclass_cpu compile_time_instruction_count 2075364055 0.015
16 aotdispatcher_inference_subclass_cpu compile_time_instruction_count 5944000000 0.015
17 aotdispatcher_partitioner_cpu compile_time_instruction_count 8501000000 8586000000 0.015
18 aotdispatcher_partitioner_cpu2 compile_time_instruction_count 1856000000 1884000000 0.015
19 aotdispatcher_training_nosubclass_cpu compile_time_instruction_count 3795000000 0.015
20 aotdispatcher_training_subclass_cpu compile_time_instruction_count 10280000000 0.015
21
22
23
24
25
26
27
28
29
30
31
32
34
35
36
37
38
39
40
46
47
48
49
50
51
52
53
54
55
56
62
63
64
65
66
67
68
69
70
71
72