pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 12:54:11 +08:00

Author	SHA1	Message	Date
Yuanyuan Chen	b2953f5643	[9/N] Apply ruff UP035 rule (#165515 ) This is follow-up of #165214 to continue applying ruff UP035 rule to the code base. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165515 Approved by: https://github.com/Lucaskabela	2025-10-17 00:09:51 +00:00
Oguz Ulgen	a2a75be0f8	Rename inductor cache (#156128 ) Requested by Simon on a different PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/156128 Approved by: https://github.com/xmfan	2025-06-17 03:57:18 +00:00
Henry Tsang	b878ca0c91	[cutlass backend] add fp8 to cutlass benchmark script (#155507 ) Summary: Add fp8. Right now FP8 only allows fast_accum. Test Plan: ``` Experiment group: _scaled_mm (8192x8192, 8192x8192) torch.float8_e4m3fn +-----------------------+--------------------+--------------------+----------------------+--------------------+ \| name \| forward_time (us) \| teraflops (TFLOPS) \| compilation_time (s) \| perf_over_aten (%) \| +-----------------------+--------------------+--------------------+----------------------+--------------------+ \| aten \| 967.1226739883423 \| 1136.8895149998868 \| 1.219131228979677 \| NA \| \| triton \| 1764.6185159683228 \| 623.08743664783 \| 20.373826419003308 \| 82.46067054670186 \| \| triton_persistent_tma \| 1769.0335512161255 \| 621.5323768280928 \| 20.48663099599071 \| 82.91718297956578 \| \| cutlass_lvl_default \| 790.5075550079346 \| 1390.8932568835019 \| 13.788519630907103 \| -18.26191482535096 \| \| cutlass_lvl_3332 \| 803.7384748458862 \| 1367.996757884245 \| 226.81587297911756 \| -16.89384434227684 \| +-----------------------+--------------------+--------------------+----------------------+--------------------+ ``` Rollback Plan: Differential Revision: D76310809 Pull Request resolved: https://github.com/pytorch/pytorch/pull/155507 Approved by: https://github.com/ColinPeppler	2025-06-13 05:11:15 +00:00
henrylhtsang	2481c4b2ea	[cutlass backend] add teraflops and increase rep for benchmark script (#154944 ) Differential Revision: [D75840023](https://our.internmc.facebook.com/intern/diff/D75840023/) I think I will continue to use do_bench for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154944 Approved by: https://github.com/mlazos	2025-06-05 17:20:29 +00:00
Joaquin	cb56df55dc	[Inductor]Cleanup autotune_fallback_to_aten post-deprecation (#154331 ) Fixes #153298 This PR is the 3rd and final step of #147479 All references to autotune_fallback_to_aten have been removed, and the feature is now deprecated. All calls to should_fallback_to_aten() were also removed, as they were deemed unnecessary. [henrylhtsang](https://github.com/henrylhtsang) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154331 Approved by: https://github.com/henrylhtsang	2025-05-29 20:29:58 +00:00
henrylhtsang	00ebbbb701	[cutlass backend] add addmm and bmm for cutlass backend benchmark (#152163 ) Copying what @kadeng did. ``` FINAL results... Experiment group: bmm (BS: 8, 1024x1024, 1024x1024) torch.float16 +-----------------------+--------------------+----------------------+---------------------+ \| name \| forward_time (us) \| compilation_time (s) \| perf_over_aten (%) \| +-----------------------+--------------------+----------------------+---------------------+ \| aten \| 44.454172253608704 \| 3.0991086587309837 \| NA \| \| triton \| 44.06978189945221 \| 0.07496077567338943 \| -0.8646890374284049 \| \| triton_persistent_tma \| 43.598245829343796 \| 0.06154991965740919 \| -1.9254130284597197 \| \| cutlass_lvl_default \| 39.91834074258804 \| 0.056073310784995556 \| -10.20338762612423 \| +-----------------------+--------------------+----------------------+---------------------+ Experiment group: bmm (BS: 8, 1024x1024, 1024x1024) torch.bfloat16 +-----------------------+-------------------+----------------------+---------------------+ \| name \| forward_time (us) \| compilation_time (s) \| perf_over_aten (%) \| +-----------------------+-------------------+----------------------+---------------------+ \| aten \| 49.05610531568527 \| 0.160279156640172 \| NA \| \| triton \| 43.97720843553543 \| 0.0660805031657219 \| -10.353241145961718 \| \| triton_persistent_tma \| 43.94153505563736 \| 0.061738294549286366 \| -10.425960697724962 \| \| cutlass_lvl_default \| 40.2066633105278 \| 0.034127906896173954 \| -18.039430460713596 \| +-----------------------+-------------------+----------------------+---------------------+ Average edge over aten (max(-edge, 0), higher is better): triton: 5.608965091695062 (from 2 valid values) triton_persistent_tma: 6.175686863092341 (from 2 valid values) cutlass_lvl_default: 14.121409043418913 (from 2 valid values) ``` Differential Revision: [D73625766](https://our.internmc.facebook.com/intern/diff/D73625766/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152163 Approved by: https://github.com/jingsh	2025-04-28 20:16:17 +00:00
henrylhtsang	5a51de5ab1	[cutlass backend] Add more logs for cutlass backend benchmark (#150639 ) Goal is to have a way to compare if a change make it better or worse. ``` Average edge over aten (max(-edge, 0), higher is better): triton: 8.596507086950552 (from 6 valid values) triton_persistent_tma: 9.517193693923307 (from 6 valid values) cutlass_lvl_default: 3.3234737908691785 (from 6 valid values) cutlass_lvl_1111: 7.088173348313991 (from 6 valid values) cutlass_lvl_2222: 7.291869722320318 (from 6 valid values) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/150639 Approved by: https://github.com/ColinPeppler	2025-04-15 04:19:51 +00:00
henrylhtsang	f2d43d866c	[cutlass backend] switch layout for cutlass backend benchmark (#149009 ) ``` python benchmarks/inductor_backends/cutlass.py ``` logs: ``` Experiment group: mm (1024x1024, 1024x1024) torch.float16 +-----------------------+--------------------+----------------------+---------------------+ \| name \| forward_time (us) \| compilation_time (s) \| perf_over_aten (%) \| +-----------------------+--------------------+----------------------+---------------------+ \| aten \| 13.059554621577263 \| 1.580178506206721 \| NA \| \| triton \| 10.245470330119133 \| 0.04118620231747627 \| -21.54808776410064 \| \| triton_persistent_tma \| 10.388538241386414 \| 0.04225084185600281 \| -20.45258400908819 \| \| cutlass_lvl_default \| 12.882896699011326 \| 231.14990583620965 \| -1.3527101626732294 \| \| cutlass_lvl_1111 \| 11.362981051206589 \| 126.41650272067636 \| -12.99105229490415 \| \| cutlass_lvl_2222 \| 11.107578873634338 \| 555.8380545829423 \| -14.946725248331441 \| +-----------------------+--------------------+----------------------+---------------------+ Experiment group: mm (1024x1024, 1024x1024) torch.bfloat16 +-----------------------+--------------------+----------------------+---------------------+ \| name \| forward_time (us) \| compilation_time (s) \| perf_over_aten (%) \| +-----------------------+--------------------+----------------------+---------------------+ \| aten \| 14.037585817277431 \| 0.21587548777461052 \| NA \| \| triton \| 10.571777820587158 \| 78.15654796129093 \| -24.68948750735019 \| \| triton_persistent_tma \| 10.761583223938942 \| 1.3195342738181353 \| -23.337364672110443 \| \| cutlass_lvl_default \| 12.872588820755482 \| 237.0100042372942 \| -8.299126443010406 \| \| cutlass_lvl_1111 \| 11.08622644096613 \| 137.55013868492097 \| -21.02469338195443 \| \| cutlass_lvl_2222 \| 11.044904589653015 \| 551.265836935956 \| -21.319059178545007 \| +-----------------------+--------------------+----------------------+---------------------+ Experiment group: mm (2048x2048, 2048x2048) torch.float16 +-----------------------+--------------------+----------------------+---------------------+ \| name \| forward_time (us) \| compilation_time (s) \| perf_over_aten (%) \| +-----------------------+--------------------+----------------------+---------------------+ \| aten \| 30.483894050121307 \| 0.27990864124149084 \| NA \| \| triton \| 29.567627236247063 \| 99.87172158574685 \| -3.005740711366232 \| \| triton_persistent_tma \| 29.66325916349888 \| 1.3695051120594144 \| -2.692027748401006 \| \| cutlass_lvl_default \| 29.82821688055992 \| 72.61214569816366 \| -2.150897022812533 \| \| cutlass_lvl_1111 \| 29.476772993803024 \| 67.7428645719774 \| -3.303780857728953 \| \| cutlass_lvl_2222 \| 30.113255605101585 \| 233.84051702311262 \| -1.2158500630212203 \| +-----------------------+--------------------+----------------------+---------------------+ Experiment group: mm (2048x2048, 2048x2048) torch.bfloat16 +-----------------------+--------------------+----------------------+---------------------+ \| name \| forward_time (us) \| compilation_time (s) \| perf_over_aten (%) \| +-----------------------+--------------------+----------------------+---------------------+ \| aten \| 30.58255836367607 \| 0.058386584743857384 \| NA \| \| triton \| 29.799651354551315 \| 100.18178300186992 \| -2.559978795150901 \| \| triton_persistent_tma \| 29.362043365836143 \| 1.534341821912676 \| -3.990885861562106 \| \| cutlass_lvl_default \| 29.4346883893013 \| 73.68858492700383 \| -3.7533484305817093 \| \| cutlass_lvl_1111 \| 29.164200648665428 \| 75.44329373072833 \| -4.637799421958348 \| \| cutlass_lvl_2222 \| 29.13798950612545 \| 227.33327346481383 \| -4.7235056020244 \| +-----------------------+--------------------+----------------------+---------------------+ Experiment group: mm (8192x8192, 8192x8192) torch.float16 +-----------------------+--------------------+----------------------+--------------------+ \| name \| forward_time (us) \| compilation_time (s) \| perf_over_aten (%) \| +-----------------------+--------------------+----------------------+--------------------+ \| aten \| 1656.6237211227417 \| 0.0549461180344224 \| NA \| \| triton \| 1892.8285837173462 \| 2.3174119112081826 \| 14.258208401997386 \| \| triton_persistent_tma \| 1665.332317352295 \| 2.7922237082384527 \| 0.525683419747917 \| \| cutlass_lvl_default \| 1705.5492401123047 \| 108.31571159465238 \| 2.9533272019312116 \| \| cutlass_lvl_1111 \| 1714.9059772491455 \| 17.64627545280382 \| 3.518134829489478 \| \| cutlass_lvl_2222 \| 1680.4152727127075 \| 306.9972395859659 \| 1.4361469829637354 \| +-----------------------+--------------------+----------------------+--------------------+ Experiment group: mm (8192x8192, 8192x8192) torch.bfloat16 +-----------------------+--------------------+----------------------+--------------------+ \| name \| forward_time (us) \| compilation_time (s) \| perf_over_aten (%) \| +-----------------------+--------------------+----------------------+--------------------+ \| aten \| 1621.416687965393 \| 0.06300561130046844 \| NA \| \| triton \| 1782.3902368545532 \| 2.318530729971826 \| 9.927956834535548 \| \| triton_persistent_tma \| 1586.0934257507324 \| 2.7931175641715527 \| -2.178543151605614 \| \| cutlass_lvl_default \| 1657.4617624282837 \| 43.31810224894434 \| 2.2230605328307784 \| \| cutlass_lvl_1111 \| 1641.5367126464844 \| 17.648567833006382 \| 1.2408916739557292 \| \| cutlass_lvl_2222 \| 1645.8417177200317 \| 249.33647010894492 \| 1.5064005407078918 \| +-----------------------+--------------------+----------------------+--------------------+ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/149009 Approved by: https://github.com/chenyang78, https://github.com/jingsh	2025-03-13 01:57:47 +00:00
henrylhtsang	66300d3d55	[cutlass backend] try make cutlass backend benchmark more robust (#149015 ) Differential Revision: [D71006269](https://our.internmc.facebook.com/intern/diff/D71006269/) I want to make sure the benchmark even if failed on some experiment can still print most of the results. ``` Experiment group: mm (3x3, 3x3) torch.bfloat16 +-----------------------+-------------------+----------------------+---------------------+ \| name \| forward_time (us) \| compilation_time (s) \| perf_over_aten (%) \| +-----------------------+-------------------+----------------------+---------------------+ \| aten \| 6.175220478326082 \| 0.5982149520423263 \| NA \| \| triton \| 5.326753947883844 \| 3.2067150759976357 \| -13.739858089605114 \| \| triton_persistent_tma \| 5.340870004147291 \| 3.279932268196717 \| -13.51126615004617 \| \| cutlass_lvl_default \| inf \| inf \| inf \| \| cutlass_lvl_1111 \| inf \| inf \| inf \| \| cutlass_lvl_2222 \| inf \| inf \| inf \| \| cutlass_lvl_3333 \| inf \| inf \| inf \| +-----------------------+-------------------+----------------------+---------------------+ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/149015 Approved by: https://github.com/chenyang78, https://github.com/jingsh	2025-03-12 18:59:49 +00:00
Henry Tsang	17518007b2	[cutlass backend] Benchmark compared to aten and triton (#148347 ) Benchmark for cutlass backend. ``` python benchmarks/inductor_backends/cutlass.py ``` Test Plan: ``` Experiment group: mm (1024x1024, 1024x1024) torch.float16 +-----------------------+--------------------+----------------------+---------------------+ \| name \| forward_time (us) \| compilation_time (s) \| perf_over_aten (%) \| +-----------------------+--------------------+----------------------+---------------------+ \| aten \| 12.759539298713207 \| 2.7271360370796174 \| NA \| \| triton \| 10.573655366897583 \| 1.8661278090439737 \| -17.131370346859384 \| \| triton_persistent_tma \| 10.884030722081661 \| 0.5315794269554317 \| -14.698873781600327 \| \| cutlass_lvl_default \| 13.09632882475853 \| 0.5520401500398293 \| 2.6395116481931873 \| \| cutlass_lvl_1111 \| 11.05172373354435 \| 0.569593315012753 \| -13.384617776451302 \| \| cutlass_lvl_2222 \| 11.371277272701263 \| 133.58984916994814 \| -10.880189272601317 \| +-----------------------+--------------------+----------------------+---------------------+ Experiment group: mm (1024x1024, 1024x1024) torch.bfloat16 +-----------------------+--------------------+----------------------+---------------------+ \| name \| forward_time (us) \| compilation_time (s) \| perf_over_aten (%) \| +-----------------------+--------------------+----------------------+---------------------+ \| aten \| 14.472318813204765 \| 1.5445372510002926 \| NA \| \| triton \| 10.568295605480671 \| 16.583424195996486 \| -26.975796056689987 \| \| triton_persistent_tma \| 10.45411266386509 \| 5.830657540936954 \| -27.764770809729562 \| \| cutlass_lvl_default \| 12.742593884468079 \| 28.994930602959357 \| -11.951954286402668 \| \| cutlass_lvl_1111 \| 11.522261425852776 \| 79.85037935699802 \| -20.38413764531163 \| \| cutlass_lvl_2222 \| 10.993581265211105 \| 132.86601971101481 \| -24.037181552548486 \| +-----------------------+--------------------+----------------------+---------------------+ Experiment group: mm (2048x2048, 2048x2048) torch.float16 +-----------------------+--------------------+----------------------+---------------------+ \| name \| forward_time (us) \| compilation_time (s) \| perf_over_aten (%) \| +-----------------------+--------------------+----------------------+---------------------+ \| aten \| 30.700622126460075 \| 2.225986961973831 \| NA \| \| triton \| 29.17378954589367 \| 38.571991189033724 \| -4.97329524553989 \| \| triton_persistent_tma \| 29.642896726727486 \| 7.2848734309664 \| -3.4452897904663744 \| \| cutlass_lvl_default \| 29.514770954847336 \| 29.819900761009194 \| -3.8626291243482167 \| \| cutlass_lvl_1111 \| 29.411429539322853 \| 23.82907024596352 \| -4.19923929172139 \| \| cutlass_lvl_2222 \| 29.57325428724289 \| 134.31008586101234 \| -3.672133530628152 \| +-----------------------+--------------------+----------------------+---------------------+ Experiment group: mm (2048x2048, 2048x2048) torch.bfloat16 +-----------------------+--------------------+----------------------+--------------------+ \| name \| forward_time (us) \| compilation_time (s) \| perf_over_aten (%) \| +-----------------------+--------------------+----------------------+--------------------+ \| aten \| 30.858177691698074 \| 1.181898436974734 \| NA \| \| triton \| 28.630023822188377 \| 39.24473957403097 \| -7.220626868414034 \| \| triton_persistent_tma \| 28.641965240240097 \| 5.275042273919098 \| -7.181929126210897 \| \| cutlass_lvl_default \| 29.16003204882145 \| 29.934022572939284 \| -5.503065216107967 \| \| cutlass_lvl_1111 \| 28.79570797085762 \| 23.948012012057006 \| -6.683705504085324 \| \| cutlass_lvl_2222 \| 29.02756631374359 \| 136.25560767308343 \| -5.932337924306467 \| +-----------------------+--------------------+----------------------+--------------------+ Experiment group: mm (8192x8192, 8192x8192) torch.float16 +-----------------------+--------------------+----------------------+--------------------+ \| name \| forward_time (us) \| compilation_time (s) \| perf_over_aten (%) \| +-----------------------+--------------------+----------------------+--------------------+ \| aten \| 1456.143856048584 \| 1.020197194069624 \| NA \| \| triton \| 1708.2737684249878 \| 5.766509635956027 \| 17.31490410985819 \| \| triton_persistent_tma \| 1476.485013961792 \| 7.455113030038774 \| 1.3969195302177155 \| \| cutlass_lvl_default \| 1583.3594799041748 \| 50.408804678940214 \| 8.736473620182366 \| \| cutlass_lvl_1111 \| 1636.4418268203735 \| 82.82403108896688 \| 12.381879030898025 \| \| cutlass_lvl_2222 \| 1507.5665712356567 \| 260.03901409788523 \| 3.531430975962381 \| +-----------------------+--------------------+----------------------+--------------------+ Experiment group: mm (8192x8192, 8192x8192) torch.bfloat16 +-----------------------+--------------------+----------------------+--------------------+ \| name \| forward_time (us) \| compilation_time (s) \| perf_over_aten (%) \| +-----------------------+--------------------+----------------------+--------------------+ \| aten \| 1382.230520248413 \| 1.2586536260787398 \| NA \| \| triton \| 1646.9683647155762 \| 5.442052865982987 \| 19.15294450447995 \| \| triton_persistent_tma \| 1423.9195585250854 \| 6.515797697938979 \| 3.016069871556595 \| \| cutlass_lvl_default \| 1500.9030103683472 \| 51.36402789200656 \| 8.58557877152115 \| \| cutlass_lvl_1111 \| 1446.9740390777588 \| 30.65435610699933 \| 4.683988515729638 \| \| cutlass_lvl_2222 \| 1419.661521911621 \| 205.1948991640238 \| 2.7080144096717635 \| +-----------------------+--------------------+----------------------+--------------------+ ``` Differential Revision: D70147589 Pull Request resolved: https://github.com/pytorch/pytorch/pull/148347 Approved by: https://github.com/drisspg, https://github.com/chenyang78	2025-03-04 01:45:36 +00:00

10 Commits