pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Files

David Berard 62bac07981 [inductor][triton] support profile_scratch launcher arg (#159772 )

This adds support for Triton after https://github.com/triton-lang/triton/pull/7258 landed. https://github.com/triton-lang/triton/pull/7258 adds a new argument to all the Triton kernels - a profile_scratch argument, similar to global_scratch. This PR updates the static cuda launcher and the AOTI kernel callers to pass in these arguments when calling the Triton kernel.

Tests: https://github.com/pytorch/pytorch/pull/159158. I also verified these test locally with triton 3.2, 3.3, and 3.4.

Fixes:
* static_cuda_launcher (test/repro: `python tools/dynamo/verify_dynamo.py`)
* AOTI calling logic (test/repro: `TORCHINDUCTOR_CPP_WRAPPER=1 python test/inductor/test_torchinductor_opinfo.py -k test_comprehensive_linalg_vander_cuda_float32`)

Differential Revision: [D79825121](https://our.internmc.facebook.com/intern/diff/D79825121)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159772
Approved by: https://github.com/NikhilAPatel, https://github.com/eellison

2025-08-08 14:27:38 +00:00

analysis

[BE][2/16] fix typos in torch/ (torch/_*/) (#156312 )

2025-07-12 05:47:06 +00:00

autoheuristic

[BE][Ez]: Optimize unnecessary lambda with operator (#154722 )

2025-05-30 23:47:10 +00:00

codegen

[inductor][triton] support profile_scratch launcher arg (#159772 )

2025-08-08 14:27:38 +00:00

compile_worker

Revert "[pytorch] Moving torch.compile worker process logs to a dedicated rank based log directory (#159874 )"

2025-08-06 23:21:29 +00:00

fx_passes

[inductor] move all cpu scalars using pinned memory for graph partition (#155360 ) (#158983 )

2025-08-07 17:07:26 +00:00

kernel

integrate kernacle into inductor (#160121 )

2025-08-08 02:14:44 +00:00

package

Grab bag of (mostly) typing improvements (#158075 )

2025-07-21 19:17:01 +00:00

runtime

[inductor][triton] support profile_scratch launcher arg (#159772 )

2025-08-08 14:27:38 +00:00

__autotune_main__.py

Improve subproc autotuning implementation (#149700 )

2025-03-28 01:06:39 +00:00

__init__.py

Grab bag of (mostly) typing improvements (#158075 )

2025-07-21 19:17:01 +00:00

analyze_preserves_zero_mask.py

Revert two recent prologue prs (#151013 )

2025-04-10 23:48:41 +00:00

aoti_eager.py

PEP585 update - torch/_inductor/[_-i]* (#145137 )

2025-01-19 01:22:47 +00:00

async_compile.py

Wire in pt2_triton_builds (#159897 )

2025-08-06 07:39:51 +00:00

autotune_process.py

Remove unnecessary "# noqa: set_linter" comments (#159467 )

2025-08-06 21:31:52 +00:00

await_utils.py

integrate kernacle into inductor (#160121 )

2025-08-08 02:14:44 +00:00

bounds.py

[inductor] Refactor op handlers part 5 (#146257 )

2025-02-08 18:00:30 +00:00

choices.py

[inductor] consolidate common GEMM triton param retrieval (#159383 )

2025-08-05 11:42:25 +00:00

codecache.py

[BE]: ruff PLC0207 - use maxsplit kwarg (#160107 )

2025-08-08 03:14:59 +00:00

comm_analysis.py

[cpp wrapper] add AOTI shim for collective ops (#154492 )

2025-06-25 01:20:05 +00:00

comm_lowering.py

[AOTI] Fix memory leak from all_reduce (#159818 )

2025-08-06 18:11:14 +00:00

comms.py

[inductor] Fix collectives_reordering overwrite real_dep with fake_dep with the same name (#158960 )

2025-07-24 11:08:58 +00:00

compile_fx_async.py

[pc] introduce ProgressiveCompilationState and clear callback (#157619 )

2025-07-05 07:55:11 +00:00

compile_fx_ext.py

Migrate from lru_cache to cache (#155613 )

2025-06-11 19:44:18 +00:00

compile_fx_subproc.py

Rename inductor cache (#156128 )

2025-06-17 03:57:18 +00:00

compile_fx.py

Extract some HOP utils to be importable (#159705 )

2025-08-05 23:59:47 +00:00

compiler_bisector.py

Migrate from lru_cache to cache (#155613 )

2025-06-11 19:44:18 +00:00

config.py

integrate kernacle into inductor (#160121 )

2025-08-08 02:14:44 +00:00

constant_folding.py

Add dont constant fold flag (#154945 )

2025-06-10 14:52:26 +00:00

cpp_builder.py

[inductor] unification for inductor debug. (#159998 )

2025-08-07 16:38:00 +00:00

cpu_vec_isa.py

Revert "Set PYTHONHOME for inductor subprocesses using torch (#159382 )"

2025-08-06 05:30:20 +00:00

cudagraph_trees.py

Fix types in graphs.py (#158192 )

2025-07-15 19:49:38 +00:00

cudagraph_utils.py

[BE][3/16] fix typos in torch/ (torch/_inductor/) (#156313 )

2025-06-23 02:57:12 +00:00

custom_graph_pass.py

[BE][3/16] fix typos in torch/ (torch/_inductor/) (#156313 )

2025-06-23 02:57:12 +00:00

debug.py

[inductor] Add TLParse artifact for logging runtime of collective and compute ops (#159730 )

2025-08-05 22:06:32 +00:00

decomposition.py

Revert "[inductor] add lowering for repeat_interleave.Tensor with output size specified (#147160 ) (#158462 )" (#159798 )

2025-08-04 23:39:20 +00:00

dependencies.py

DDE-Free select with unbacked index. (#157605 )

2025-07-24 20:08:05 +00:00

dtype_propagation.py

Migrate from lru_cache to cache (#155613 )

2025-06-11 19:44:18 +00:00

exc.py

PEP585 update - torch/_inductor/[_-i]* (#145137 )

2025-01-19 01:22:47 +00:00

extern_node_serializer.py

[BE][AOTI] Remove duplicate schema for ExternKernelNode (#155867 )

2025-06-14 02:03:27 +00:00

freezing_utils.py

PEP585: More UP006 fixes (#146392 )

2025-02-20 06:18:13 +00:00

freezing.py

mypy 1.16.0 (#155821 )

2025-06-14 18:18:43 +00:00

fuzzer.py

Automatically load and save dynamo entries via caching_precompile (#155913 )

2025-07-07 23:57:17 +00:00

fx_utils.py

Inductor logging + analysis of torch.profile (#149697 )

2025-07-07 22:13:34 +00:00

graph.py

[inductor] respect layout tags for ops with registered lowerings (#159134 )

2025-07-31 21:29:40 +00:00

hooks.py

PEP585 update - torch/_inductor/[_-i]* (#145137 )

2025-01-19 01:22:47 +00:00

index_propagation.py

[BE][3/16] fix typos in torch/ (torch/_inductor/) (#156313 )

2025-06-23 02:57:12 +00:00

inductor_prims.py

[inductor] lowering for fractional_max_pool3d (#148630 )

2025-05-22 16:06:29 +00:00

ir.py

[inductor] allocate non-blocking copy destinations in pinned memory (#155121 ) (#158758 )

2025-08-07 17:07:26 +00:00

jagged_lowerings.py

Fix XPU CI UT test_circular_dependencies (#158189 )

2025-07-13 09:30:57 +00:00

kernel_inputs.py

[inductor] consolidate common GEMM triton param retrieval (#159383 )

2025-08-05 11:42:25 +00:00

loop_body.py

Migrate from lru_cache to cache (#155613 )

2025-06-11 19:44:18 +00:00

lowering.py

Revert "[inductor] add lowering for repeat_interleave.Tensor with output size specified (#147160 ) (#158462 )" (#159798 )

2025-08-04 23:39:20 +00:00

memory.py

Fix inductor memory estimation when a single buf has multiple mutations. Add runtime verification of mem tracking (#159569 )

2025-08-05 19:58:11 +00:00

metrics.py

Replace runtime type parameterization (#155221 )

2025-06-05 21:43:54 +00:00

mkldnn_ir.py

[inductor] Add typing to _inductor/ir.py (#149958 )

2025-06-30 15:56:35 +00:00

mkldnn_lowerings.py

[Inductor][Float8] Add float8_e4m3fn into assertion dtype list. (#157684 )

2025-07-15 06:02:01 +00:00

mock_cache.py

PEP585 update - torch/_inductor (#145198 )

2025-01-21 21:04:33 +00:00

ops_handler.py

[inductor] Add typing to _inductor/ir.py (#149958 )

2025-06-30 15:56:35 +00:00

optimize_indexing.py

PEP585 update - torch/_inductor (#145198 )

2025-01-21 21:04:33 +00:00

output_code.py

Add user annotation for FX graph cache key (#159318 )

2025-07-30 05:52:50 +00:00

pattern_matcher.py

[Dynamo][Better Engineering] Add typing annotations to guard and source (#158397 ) (#159491 )

2025-07-30 22:57:50 +00:00

quantized_lowerings.py

[BE][1/X] Phase out usage of use_max_autotune() (#155847 )

2025-06-14 03:16:20 +00:00

remote_cache.py

pt2_remote_cache: Log sample for failures, and log the explicit reason we're faling. (#156874 )

2025-07-18 20:28:27 +00:00

remote_gemm_autotune_cache.py

integrate kernacle into inductor (#160121 )

2025-08-08 02:14:44 +00:00

scheduler.py

[BE]: ruff PLC0207 - use maxsplit kwarg (#160107 )

2025-08-08 03:14:59 +00:00

script.ld

Place .lrodata later in the binary (#117575 )

2024-01-18 17:58:18 +00:00

select_algorithm.py

integrate kernacle into inductor (#160121 )

2025-08-08 02:14:44 +00:00

sizevars.py

multi-kernel matmuls based on varying hint sizes (#156628 )

2025-07-12 15:08:21 +00:00

standalone_compile.py

[Dynamo][Better Engineering] Add typing annotations to guard and source (#158397 ) (#159491 )

2025-07-30 22:57:50 +00:00

subgraph_lowering.py

Fix XPU CI UT test_circular_dependencies (#158189 )

2025-07-13 09:30:57 +00:00

template_heuristics.py

[inductor] consolidate common GEMM triton param retrieval (#159383 )

2025-08-05 11:42:25 +00:00

template_registry.py

[inductor] consolidate common GEMM triton param retrieval (#159383 )

2025-08-05 11:42:25 +00:00

test_case.py

Rename inductor cache (#156128 )

2025-06-17 03:57:18 +00:00

test_operators.py

[BE] remove torch deploy - conditionals (#158288 )

2025-07-29 17:40:49 +00:00

tiling_utils.py

[BE][3/16] fix typos in torch/ (torch/_inductor/) (#156313 )

2025-06-23 02:57:12 +00:00

triton_bundler.py

[BE][3/16] fix typos in torch/ (torch/_inductor/) (#156313 )

2025-06-23 02:57:12 +00:00

utils.py

Remove unnecessary "# noqa: set_linter" comments (#159467 )

2025-08-06 21:31:52 +00:00

virtualized.py

[inductor] Add a helper for convert index_dtype to torch dtype (#149531 )

2025-03-20 21:33:29 +00:00

wrapper_benchmark.py

[inductor] Make times and repeat parameters command line args (#158590 )

2025-07-18 20:07:55 +00:00