pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-21 05:34:18 +08:00

Author	SHA1	Message	Date
Shangdi Yu	520fca82c8	Refactor Provenance Tracking (#163378 ) Summary: - Move the `provenance_level` flag check to inside the `set_kernel_post_grad_provenance_tracing` call to simply the code - Move the `set_kernel_post_grad_provenance_tracing` call and `write_provenance_debug_handle` call to `codegen_comment`. - If some `call_kernel` call sites don't have a proceeding `codegen_comment` call, add one. Now all `call_kernel` call sites are accompanied with a `codegen_comment` call. - Add a `codegen_comment` method to BaseScheduling and remove the noop `codegen_comment` method in Scheduling - Remove `debug_handle` from `call_kernel`. Test Plan: CI ``` buck run @//mode/opt-split-dwarf fbcode//caffe2/test/inductor:provenance_tracing ``` Differential Revision: D82839271 Pull Request resolved: https://github.com/pytorch/pytorch/pull/163378 Approved by: https://github.com/angelayi	2025-09-25 22:55:59 +00:00
Avik Chaudhuri	711c8c821e	shape guards (#161178 ) Summary: This PR introduces shape guards to export. Previously only value ranges, equalities, and specializations would be tracked for symbolic expressions, and we had a forward hook to check them. Instead now we create a function to check shape guards and call it in the exported program. Test Plan: updated several tests Rollback Plan: Differential Revision: D80713603 Pull Request resolved: https://github.com/pytorch/pytorch/pull/161178 Approved by: https://github.com/tugsbayasgalan	2025-09-08 22:44:09 +00:00
Shangdi Yu	dec72ea4b0	[reland] Add inductor provenance mapping for cpp extern kernel (#161656 ) (#162069 ) Summary: Add inductor provenance mapping for cpp extern kernel Test Plan: ``` buck run fbcode//caffe2/test/inductor:provenance_tracing -- -r test_cpu_extern_kernel ``` Differential Revision: D81598857 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162069 Approved by: https://github.com/angelayi	2025-09-04 04:18:43 +00:00
PyTorch MergeBot	15c77a8cfd	Revert "Add inductor provenance mapping for cpp extern kernel (#161656 )" This reverts commit 5e5870e858f60ff4bf87d03f3592097e934a9580. Reverted https://github.com/pytorch/pytorch/pull/161656 on behalf of https://github.com/jeffdaily due to causing failures on ROCm MI300, will add label to PR ([comment](https://github.com/pytorch/pytorch/pull/161656#issuecomment-3246965676))	2025-09-02 22:19:19 +00:00
Shangdi Yu	5e5870e858	Add inductor provenance mapping for cpp extern kernel (#161656 ) Summary: Add inductor provenance mapping for cpp extern kernel Test Plan: ``` buck run fbcode//caffe2/test/inductor:provenance_tracing -- -r test_cpu_extern_kernel ``` Differential Revision: D81161751 Pull Request resolved: https://github.com/pytorch/pytorch/pull/161656 Approved by: https://github.com/angelayi	2025-09-02 17:54:04 +00:00
Sandeep Narendranath Karjala	ec585ceab4	[inductor] structured-log graph execution order + test (#160448 ) Summary: - Emit a structured trace per compiled graph execution to reconstruct execution order in TLParse. - Adds debug.log_graph_execution(name) called from `CompiledFxGraph.__call__`, producing an artifact named inductor_graph_execution with payload {"graph": "graph_<id>"}. Testing: - Add inline test to verify structure and output Pull Request resolved: https://github.com/pytorch/pytorch/pull/160448 Approved by: https://github.com/xmfan	2025-08-27 18:12:46 +00:00
Shangdi Yu	9a12bab0d3	Add debug handle to inductor provenance tracking (#161110 ) Summary: Use debug handle on kernel names to distinguish different calls to the same kernel. Previous kernel name: kernel_name New kernel name: kernel_name:debug_handle We add the debug handle to the tlparse artifacts: `inductor_provenance_tracking_node_mappings` and `inductor_provenance_tracking_kernel_stack_traces`. We also add debug handles in the comments of the generated code so we can map to them in the provenance tracking highlighter tool: https://github.com/pytorch/tlparse/pull/134 Example output code is below. If a kernel doesn't have a debug handle, the `[Provenance debug handles]` comment line will not be written. ``` # Topologically Sorted Source Nodes: [y, z], Original ATen: [aten.addmm, aten.gelu] # [Provenance debug handles] triton_poi_fused_addmm_gelu_2:3 stream0 = get_raw_stream(0) triton_poi_fused_addmm_gelu_2.run(buf4, primals_5, 300, stream=stream0) ``` The debug handles will also be used by downstream profilers such as zoomer. Test Plan: ``` buck run mode/opt fbcode//caffe2/test/inductor:provenance_tracing ``` Rollback Plan: Differential Revision: D78994959 Pull Request resolved: https://github.com/pytorch/pytorch/pull/161110 Approved by: https://github.com/angelayi	2025-08-27 04:56:11 +00:00
PyTorch MergeBot	4a1aca11c2	Revert "[inductor] structured-log graph execution order + test (#160448 )" This reverts commit 995397d47a0e27394ee1010f158e181eb304100a. Reverted https://github.com/pytorch/pytorch/pull/160448 on behalf of https://github.com/atalman due to internal failure please see associated diff ([comment](https://github.com/pytorch/pytorch/pull/160448#issuecomment-3223939035))	2025-08-26 12:20:37 +00:00
Sandeep Narendranath Karjala	995397d47a	[inductor] structured-log graph execution order + test (#160448 ) Summary: - Emit a structured trace per compiled graph execution to reconstruct execution order in TLParse. - Adds debug.log_graph_execution(name) called from `CompiledFxGraph.__call__`, producing an artifact named inductor_graph_execution with payload {"graph": "graph_<id>"}. Testing: - Add inline test to verify structure and output Pull Request resolved: https://github.com/pytorch/pytorch/pull/160448 Approved by: https://github.com/xmfan	2025-08-25 20:12:18 +00:00
Sandeep Narendranath Karjala	2b62ef7420	Add kernel information JSON generation for AOTI packages (#160540 ) Summary: Build on D80031559. Generate kernel_information.json in AOTI compiled artifacts by combining stack traces and node mappings from provenance tracking. This implementation delivers exactly what Zoomer team requested: 1. Core Function: `create_kernel_information_json()` in debug.py combines 3 data sources: - `_inductor_kernel_stack_trace` → `stack_traces` field - `_inductor_triton_kernel_to_post_grad_node_info` → `post_grad_nodes` field - `_inductor_post_to_pre_grad_nodes["postToPre"]` → `pre_grad_nodes` field 2. AOTI Integration: codecache.py writes `kernel_information.json` to pt2 packages when both AOTI packaging and provenance tracking are enabled. 3. Test Coverage: TestKernelInformationAOTI class validates: - JSON file creation in AOTI packages using zipfile - Exact format compliance - Proper disabling without provenance tracking Output Format (exact specification): ```json { "triton_kernel_name_1": { "stack_traces": [str, str, ...], "post_grad_nodes": [str, str, ...], "pre_grad_nodes": [str, str, ...] } } ``` Test Plan: ``` buck test fbcode//caffe2/test/inductor:provenance_tracing -- TestKernelInformationAOTI ``` Manual validation: ```python import torch model = torch.nn.Linear(10, 1) with torch._inductor.config.patch("aot_inductor.package", True): with torch._inductor.config.patch("trace.basic_provenance_tracking", True): # AOTI compilation should generate kernel_information.json compiled = torch.export.export(model, (torch.randn(1, 10),)) ``` --- Rollback Plan: Differential Revision: D80139160 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160540 Approved by: https://github.com/yushangdi	2025-08-20 02:33:45 +00:00
Shangdi Yu	41b3e80a55	Fix duplicated kernel name in kernel stack trace tracking (#160905 ) Summary: as title. When we have two kernels with the same name, the stack traces should be appended, not overwritten. Test Plan: ``` buck run mode/opt fbcode//caffe2/test/inductor:provenance_tracing ``` Rollback Plan: Differential Revision: D80472731 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160905 Approved by: https://github.com/angelayi	2025-08-19 01:14:34 +00:00
Shangdi Yu	4e90441133	Add signpost to provenance tracking error (#160755 ) Summary: As title, add signpost to better track error when computing provenance tracking related debugging information Test Plan: CI Rollback Plan: Differential Revision: D80292285 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160755 Approved by: https://github.com/angelayi	2025-08-18 19:17:47 +00:00
Sandeep Narendranath Karjala	c699668009	[inductor] TLParse tensor metadata logging + test (#160132 ) Summary: - Add TLParse artifact logging per op with output tensor shape, stride, and dtype for cross-rank aggregation. Testing: - Add test to verify structure and contents of tlparse artifiact Pull Request resolved: https://github.com/pytorch/pytorch/pull/160132 Approved by: https://github.com/xmfan	2025-08-17 04:27:49 +00:00
PyTorch MergeBot	26297c27e2	Revert "[inductor] TLParse tensor metadata logging + test (#160132 )" This reverts commit 2603e40be5fa4a66301e6654e34a82a67f2e4913. Reverted https://github.com/pytorch/pytorch/pull/160132 on behalf of https://github.com/clee2000 due to broke lint [GH job link](https://github.com/pytorch/pytorch/actions/runs/17010600949/job/48226137423) [HUD commit link](`2603e40be5`). landrace with another PR that changed some had_cuda related things ([comment](https://github.com/pytorch/pytorch/pull/160132#issuecomment-3193969792))	2025-08-16 23:47:03 +00:00
Sandeep Narendranath Karjala	2603e40be5	[inductor] TLParse tensor metadata logging + test (#160132 ) Summary: - Add TLParse artifact logging per op with output tensor shape, stride, and dtype for cross-rank aggregation. Testing: - Add test to verify structure and contents of tlparse artifiact Pull Request resolved: https://github.com/pytorch/pytorch/pull/160132 Approved by: https://github.com/xmfan ghstack dependencies: #160260	2025-08-16 16:37:18 +00:00
Shangdi Yu	b74c7cd335	Add kernel stack traces tlparse dump (#160608 ) (#160779 ) Summary: as title This is requested by the zoomer team so they can add stack trace information to profiler result. Test Plan: ``` buck run mode/dev-nosan fbcode//caffe2/test/inductor:provenance_tracing -- -r stack_traces ``` Rollback Plan: Differential Revision: D80050233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160779 Approved by: https://github.com/angelayi	2025-08-16 03:12:38 +00:00
Shangdi Yu	aa99e0958f	Separate provenance tracking to different levels (#160383 ) Summary: as title. We've got request from various parties who are interested in turning on the provenance tracking by default. In this PR, we prepare to turn on part of the provenance tracking that doesn't have too much overhead by default. - Change `provenance_tracking` config to `provenance_tracking_level` - turn on the following provenance tracking by default when `basic_provenance_tracking`=True - `set_kernel_post_grad_provenance_tracing` for kernels, this add mapping between triton kernels and post_grad nodes - `dump_inductor_provenance_info` if we're dumping tlparse log - `get_graph_provenance_json` and dump `reate_mapping_pre_post_grad_nodes`. This creates mapping between pre_grad and post_grad nodes. Since we're not turning on the provenance tracking in GraphTransformObserver by default, the mapping here maybe incomplete/limited. - add stack trace from post grad nodes to inductor IR nodes - add exception swallowing for all functions above Test Plan: CI Rollback Plan: Differential Revision: D80031559 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160383 Approved by: https://github.com/angelayi	2025-08-15 04:59:35 +00:00
Shangdi Yu	d1950d4bb5	Change IR node's stack trace to be computed lazily (#160487 ) Summary: When an IR node is an inherited class, post_init is called once for each super().__init__() call. To avoid duplicated calls, we make stack trace computation happen lazily. Test Plan: CI Rollback Plan: Differential Revision: D80137870 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160487 Approved by: https://github.com/angelayi	2025-08-13 21:41:25 +00:00
Sandeep Narendranath Karjala	fc80f6859e	Fix collective schedule logging and runtime tests (#160260 ) Summary: - Fix collective schedule logging so that only logs when collectives present - Fix runtime estimate test to check if each op has a number value Pull Request resolved: https://github.com/pytorch/pytorch/pull/160260 Approved by: https://github.com/Skylion007	2025-08-11 20:58:52 +00:00
Sandeep Narendranath Karjala	8034b2a732	[inductor] Add TLParse artifact for logging runtime of collective and compute ops (#159730 ) Summary: - debug.py: Added log_runtime_estimates() function to dump runtime estimation data as structured tlparse artifacts in JSON format - test_structured_trace.py: Added comprehensive test coverage with testing compute and collective ops Pull Request resolved: https://github.com/pytorch/pytorch/pull/159730 Approved by: https://github.com/yushangdi ghstack dependencies: #159190	2025-08-05 22:06:32 +00:00
Sandeep Narendranath Karjala	85e74d5ace	[inductor] Add logging for distributed collective ops for multi‑rank diagnostics (#159190 ) This change introduces structured logging of the collective communication schedule, enabling downstream tools (e.g. TLParse) to ingest and analyze per‑rank collective‐order information for multi‑rank jobs. - Iterates over scheduler.nodes, filters for _CollectiveKernel nodes - Extracts each op’s python_kernel_name - Emits a structured JSON payload under the inductor_collective_schedule artifact name - Dumps the full schedule list to collective_schedule.json via the PyTorch trace‑structured artifact - Added comprehensive unit tests for collective schedule tracing: Created test_collective_schedule_empty() and test_collective_schedule_real() tests to verify structured trace logging works correctly for both empty collective schedules and real collective operations (like all_reduce and wait_tensor from _c10d_functional ops). Pull Request resolved: https://github.com/pytorch/pytorch/pull/159190 Approved by: https://github.com/yushangdi, https://github.com/xmfan	2025-08-01 21:51:42 +00:00
PyTorch MergeBot	490cb3f1a4	Revert "[inductor] Add logging for distributed collective ops for multi‑rank diagnostics (#159190 )" This reverts commit bb62e1f769ef51e2ec149d7256c135d09425aaa0. Reverted https://github.com/pytorch/pytorch/pull/159190 on behalf of https://github.com/clee2000 due to broke [GH job link](https://github.com/pytorch/pytorch/actions/runs/16658705097/job/47150840171) [HUD commit link](`bb62e1f769`) on mac ([comment](https://github.com/pytorch/pytorch/pull/159190#issuecomment-3141513921))	2025-07-31 22:22:13 +00:00
Sandeep Narendranath Karjala	bb62e1f769	[inductor] Add logging for distributed collective ops for multi‑rank diagnostics (#159190 ) This change introduces structured logging of the collective communication schedule, enabling downstream tools (e.g. TLParse) to ingest and analyze per‑rank collective‐order information for multi‑rank jobs. - Iterates over scheduler.nodes, filters for _CollectiveKernel nodes - Extracts each op’s python_kernel_name - Emits a structured JSON payload under the inductor_collective_schedule artifact name - Dumps the full schedule list to collective_schedule.json via the PyTorch trace‑structured artifact - Added comprehensive unit tests for collective schedule tracing: Created test_collective_schedule_empty() and test_collective_schedule_real() tests to verify structured trace logging works correctly for both empty collective schedules and real collective operations (like all_reduce and wait_tensor from _c10d_functional ops). Pull Request resolved: https://github.com/pytorch/pytorch/pull/159190 Approved by: https://github.com/yushangdi, https://github.com/xmfan	2025-07-31 19:58:07 +00:00
Shangdi Yu	fd2c64e286	Fix duplicated sources in inductor provenance tracking (#159484 ) Summary: The `replace_hook` is called once for each user of the replaced node. This fix avoids adding duplicated node sources. This also means that if there are two nested pass like: ``` with GraphTransformObserver(gm, "outer"): with GraphTransformObserver(gm, "inner"): ..... ``` We'll only see the outer pass's pass name recorded for the replaced node in the "from_node" node meta. I think this is fine. In practice, the outer pass usually contains a more meaningful name, e.g. `decompose_auto_functionalized`, and the inner pass name is just a default pass name like `pattern_matcher`. Test Plan: ``` buck2 run @mode/dev-nosan fbcode//caffe2/test:fx -- -r test_graph_transform_observer_replace ``` Rollback Plan: Differential Revision: D79203058 Pull Request resolved: https://github.com/pytorch/pytorch/pull/159484 Approved by: https://github.com/angelayi	2025-07-30 23:03:11 +00:00
Shangdi Yu	1e86fa2e5b	Add stack trace to Inductor IR nodes if `inductor.config.trace.provenance_tracing=True` (#158576 ) Summary: - Split `create_mapping` to `create_mapping_pre_post_grad_nodes` and ` create_node_mapping_kernel_to_post_grad` - Store a mapping from pre_grad graph node names to stack traces in `_inductor_pre_grad_node_stack_trace` - Add `stack_traces` member to ir.Node and add it to the string representation of ir.Node - When we create an IR node, if `inductor.config.trace.provenance_tracing=True`, we populate `stack_traces` from `origins`. The nodes in `origins` are post_grad graph nodes. If a node has `node.stack_trace`, we store the stack_trace directly. This is particularly important for backward graph nodes because they don't have a mapping to pre-grad graph nodes. If a node doesn't have `.stack_trace ` (such as `linear`-> `addmm` nodes), we use the stack trace of the pre_grad graph nodes that it maps to. - A post grad graph node might not have stack trace if it correspond to multiple pre grad graph nodes, e.g. [GroupLinearFusion](`a00442421a/torch/_inductor/fx_passes/group_batch_fusion.py (L299)`) Example: ``` scheduling ExternKernelOut( python_kernel_name='extern_kernels.mm', name=buf0, layout=FixedLayout('cuda:0', torch.float32, size=[8, 16], stride=[16, 1]), inputs=[InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float32, size=[8, 10], stride=[10, 1])), ReinterpretView( StorageBox( ConstantBuffer(name='fc1_weight', layout=FixedLayout('cuda:0', torch.float32, size=[16, 10], stride=[10, 1])) ), FixedLayout('cuda:0', torch.float32, size=[10, 16], stride=[1, 10]), origins=OrderedSet([mm_default_1]), stack_traces = {, File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/7b4b7a52e15abb17/scripts/shangdiy/__aot__/aot#link-tree/scripts/shangdiy/aot.py", line 29, in forward, x = self.fc1(x), File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/7b4b7a52e15abb17/scripts/shangdiy/__aot__/aot#link-tree/torch/nn/modules/linear.py", line 125, in forward, return F.linear(input, self.weight, self.bias), } )], constant_args=(), kwargs={}, output_view=None, python_kernel_name=extern_kernels.mm, cpp_kernel_name=at::mm_out, ordered_kwargs_for_cpp_kernel=(), op_overload=None, arg_properties=[{}, {}], allarg_properties={}, kwarg_properties=None, unbacked_bindings={}, mutation_outputs=[], origin_node=mm_default_1, origins=OrderedSet([mm_default_1]), stack_traces = {, File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/7b4b7a52e15abb17/scripts/shangdiy/__aot__/aot#link-tree/scripts/shangdiy/aot.py", line 29, in forward, x = self.fc1(x), File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/7b4b7a52e15abb17/scripts/shangdiy/__aot__/aot#link-tree/torch/nn/modules/linear.py", line 125, in forward, return F.linear(input, self.weight, self.bias), } ) ``` Test Plan: ``` buck2 run mode/dev-nosan fbcode//caffe2/test/inductor:provenance_tracing ``` Rollback Plan: Differential Revision: D78365534 Pull Request resolved: https://github.com/pytorch/pytorch/pull/158576 Approved by: https://github.com/angelayi	2025-07-18 04:05:17 +00:00
Shangdi Yu	82a1ee1135	Refactor Provenance Tracking (#158399 ) Summary: As inductor provenance tracking is getting more use cases, we want to separate the inductor provenance tracking guarding flag from the general `trace.enabled`, so we can enable provenance tracking without all the overhead of `trace.enabled` - change the guard flag from `trace.enabled` to `trace.provenance_tracking`. It is turned on by either `TORCH_COMPILE_DEBUG=1` or `INDUCTOR_PROVENANCE=1`. - Move the provenance tracking logic and variables out of DebugContext, because DebugContext is only enabled with `trace.enabled`. Since the variables are now global variables, added `reset_provenance_globals()` context manager to reset them for each `compile_fx()` call. - Move `set_kernel_post_grad_provenance_tracing` from `util.py` to `debug.py` so now all provenance related logic is in `debug.py`. In the future, if we want to enable it further, we can change the provenance tracking flag to be enabled when `TORCH_TRACE` is set. I think we should do that in a separate PR, so it's easier to revert if this flag change creates any problem. See more motivation in internal Diff Test Plan: ``` buck2 run mode/dev-nosan fbcode//caffe2/test:fx -- -r test_graph_transform_observer buck run mode/dev-nosan fbcode//caffe2/test:fx -- -r graph_provenance buck2 run mode/dev-nosan fbcode//caffe2/test/inductor:provenance_tracing ``` Differential Revision: D78287976 Pull Request resolved: https://github.com/pytorch/pytorch/pull/158399 Approved by: https://github.com/angelayi	2025-07-17 00:23:00 +00:00
henrylhtsang	6c0b42fd2f	[inductor][cutlass backend] Log prescreening elpase (#155508 ) Differential Revision: [D76311352](https://our.internmc.facebook.com/intern/diff/D76311352/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/155508 Approved by: https://github.com/jingsh	2025-06-12 16:48:52 +00:00
Oguz Ulgen	d1947a8707	Migrate from lru_cache to cache (#155613 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/155613 Approved by: https://github.com/ezyang ghstack dependencies: #155612	2025-06-11 19:44:18 +00:00
Rachel Guo	cad0727fe1	Rename the provenance tracing artifact name for kernel <-> post_grad nodes mapping (#154046 ) Summary: Context: Recently we've added a couple more kernel types support other than inductor generated triton kernels, such as cpu cpp kernels, extern kernels. The name appeared in tlparse chrome link can be confusing to users. Rename from `inductor_triton_kernel_to_post_grad_nodes.json` to `inductor_generated_kernel_to_post_grad_nodes.json` Test Plan: CI Differential Revision: D75159042 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154046 Approved by: https://github.com/yushangdi	2025-05-22 19:20:56 +00:00
PyTorch MergeBot	3443627e07	Revert "[BE]: Enable RUFF TRY400 rule - log.exception (#153473 )" This reverts commit 4f4ecc583e0f48ad2d062a53bf91c61ab40b4948. Reverted https://github.com/pytorch/pytorch/pull/153473 on behalf of https://github.com/jeanschmidt due to seems to have broken internal signals, @albanD may I count on you to help the author merge his PR? D74837988 ([comment](https://github.com/pytorch/pytorch/pull/153473#issuecomment-2886017075))	2025-05-16 08:29:26 +00:00
Aaron Gokaslan	4f4ecc583e	[BE]: Enable RUFF TRY400 rule - log.exception (#153473 ) Change logging.error to logging.exception to log additional information when relevant. A few places have slipped in logging.errors in try except since I last did a clean up here and the rule is stabilized so I am enabling it codebase wide. I have NOQA'd much of our custom exception stack trace handling for RPC calls and distributed and tried to a fix a few errors based on whether we immediately reraised it or if we didn't print any exception handling where it could be useful. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153473 Approved by: https://github.com/albanD, https://github.com/cyyever	2025-05-15 13:36:59 +00:00
Vlad K	6a84fe65ec	Fix code portability when looking for Dot (#153259 ) When trying to plot a trace graph, Inductor checks if "dot" is installed. Currently, the code runs a "which dot" command. By default, Windows doesn't have the "which" command. This patch replaces it with the more portable alternative. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153259 Approved by: https://github.com/Skylion007	2025-05-10 16:12:44 +00:00
Benjamin Glass	01cbf5a30a	[AOTInductor] Add wrapper and kernel code to debug code logging (#153181 ) This is a simple PR to make the AOTInductor wrapper and kernel code get output by `TORCH_COMPILE_DEBUG=1`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153181 Approved by: https://github.com/desertfire	2025-05-10 15:31:18 +00:00
eellison	2295efa1b3	Fix only logging ir_post_fusion with torch_compile_debug enabled (#148499 ) Because we were invoking the logs through `V.debug`, it was not running if TORCH_COMPILE_DEBUG was not set. this is because there is some magic the in debug [getattr](`d789c22712/torch/_inductor/debug.py (L468-L480)`). Pull Request resolved: https://github.com/pytorch/pytorch/pull/148499 Approved by: https://github.com/shunting314	2025-03-05 05:35:09 +00:00
Xuehai Pan	1cb4e2df65	[BE][PYFMT] migrate PYFMT for `torch._inductor` to `ruff format` (#144550 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144550 Approved by: https://github.com/jansel	2025-02-28 13:33:19 +00:00
Riley Dulin	20295c017e	Fix import of getArtifactLogger for ir_pre_fusion and ir_post_fusion (#147560 ) Fixes #147002 There was an issue with the previous PR https://github.com/pytorch/pytorch/pull/147248 that didn't show up in CI, where a logging import was not complete in torch/_inductor/debug.py before importing it. This only happened if someone directly imported the file without doing any other imports before. Also set to off_by_default by request to reduce log spew. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147560 Approved by: https://github.com/Skylion007	2025-02-25 03:36:08 +00:00
Riley Dulin	93316cfe94	Move ir_pre_fusion.txt and ir_post_fusion.txt to TORCH_LOGS (#147248 ) Fixes #147002 Moves ir_{pre, post}_fusion.txt to be controlled by TORCH_LOGS instead of TORCH_COMPILE_DEBUG. Updated tests of these logs as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147248 Approved by: https://github.com/eellison	2025-02-20 00:26:17 +00:00
Shangdi Yu	a4e4368157	add node mapping processing (#146103 ) Summary: Add `node_mapping = create_node_mapping(pre_grad_graph_id, inductor_post_to_pre_grad_nodes, debug_info)`, to produce a `inductor_provenance_tracking_node_mappings.json` file. This file will be used by the provenance tracking highlighter tool to create provenance visualization. `inductor_triton_kernel_to_post_grad_nodes.json` and `inductor_provenance_tracking_node_mappings.json` files are not dumped if they are both empty. So it's removed from some of the `test_structured_trace` tests. Test Plan: CI ``` buck run mode/dev-nosan fbcode//caffe2/test:fx -- -r graph_provenance buck run mode/dev-nosan fbcode//caffe2/test/inductor:provenance_tracing python test/dynamo/test_structured_trace.py ``` Differential Revision: D68190173 Pull Request resolved: https://github.com/pytorch/pytorch/pull/146103 Approved by: https://github.com/chenyang78	2025-02-01 08:29:29 +00:00
shangdiy	6bd19e65b1	add inductor_triton_kernel_mapping_post_grad.json to tlparseadd changes (#145954 ) Landing D67612181 here. The original exported PR somehow fails OSS CI, but this one doesn't (though the PR content is the same). Add debug trace artifact to inductor_triton_kernel_mapping_post_grad.json (debug artifact for provenance tracking) to tlparse. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145954 Approved by: https://github.com/YUNQIUGUO	2025-01-30 06:18:48 +00:00
Randolf Scholz	835e770bad	Use `typing.IO[bytes]` instead of `io.BytesIO` in annotations (#144994 ) Fixes #144976 Using appoach ① `IO[bytes]`, but could also try with a protocol. ## Notes: - moved `torch.serialization.FILE_LIKE` to `torch.types.FileLike` - Use `FileLike` annotation where it makes sense - made sure those functions also support `os.PathLike` - Replaced `isinstance(x, io.BytesIO)` with `isinstance(x, (io.IOBase, IO))` where appropriate. - Replaced `BinaryIO` with `IO[bytes]` (the two ABCs are almost identical, the only difference is that `BinaryIO` allows `bytearray` input to `write`, whereas `IO[bytes]` only `bytes`) - needed to make `torch.serialization._opener` generic to avoid LSP violations. - skipped `torch/onnx/verification` for now (functions use `BytesIO.getvalue` which is not part of the `IO[bytes]` ABC, but it kind of seems that this is redundant, as e.g. `onnx.load` supports `str \| PathLike[str] \| IO[bytes]` directly... Pull Request resolved: https://github.com/pytorch/pytorch/pull/144994 Approved by: https://github.com/ezyang, https://github.com/Skylion007	2025-01-27 18:08:07 +00:00
Shangdi Yu	4cc5e880f9	Add accuracy issue support in AOTI Minifier (#145539 ) Summary: Add three more repro levels for AOTI minifier (level 2 already exists). They are the same as the existing dynamo minifier repro levels. Now AOTI minifier can minify and repro programs that have numerical accuracy issues as well. 1: Dumps the original graph out to repro.py if compilation fails 2: Dumps a minifier_launcher.py if aoti fails. 3: Always dumps a minifier_launcher.py. Good for segfaults. 4: Dumps a minifier_launcher.py if the accuracy fails. Refactor AOTI minifier unit tests to be cleaner and better re-use the existing minifier testing code. We do not need to manually patch {"aot_inductor.dump_aoti_minifier": True} to each test now, this config is generated in the test code. Differential Revision: D68294638 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145539 Approved by: https://github.com/desertfire	2025-01-24 23:07:19 +00:00
Aaron Orenstein	893ca1dfe1	PEP585 update - torch/_inductor/[_-i]* (#145137 ) See #145101 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145137 Approved by: https://github.com/bobrenjc93	2025-01-19 01:22:47 +00:00
Rachel Guo	9275091d6e	[provenance_tracking] Dump inductor_triton_kernel_to_post_grad_nodes.json info in debug_trace (#143055 ) Summary: This diff mainly adds code changes to dump `inductor_triton_kernel_to_post_grad_nodes.json` artifact which contains mapping info from post_grad -> inductor kernel code: `{"inductor_triton_kernel_name": [post_grad_node_0, post_grad_node_1, ..., ], "..."}.` Example paste: P1695235000 verified on the test model. See "Test Plan": We use this artifact to demonstrate provenance tracking in the frontend 3-tab highlighter tool: https://github.com/YUNQIUGUO/compiler_explorer (copy/pasted the input files for demo purpose for now and will integrate with Shangdi's tool to 4-tab) https://pxl.cl/66BzK Note: Currently only supports mapping for inductor's`TritonKernel` type. TODO for enhancing more support for `ExternKernel` and other inductor generated kernel type, etc. Test Plan: test_model_coverage.sh: ``` #!/bin/sh MODEL_ENTITY_ID=644688112 SNAPSHOT_ID=32 MODULE=merge # buck2 build --show-output mode/opt -c=python.package_style=inplace -c fbcode.enable_gpu_sections=true -c fbcode.platform=platform010 -c fbcode.split-dwarf=true -c fbcode.nvcc_arch=a100,h100 caffe2/torch/fb/model_transform/experimental/benchmark:mts_gpu_benchmark TORCH_COMPILE_DEBUG=1 CUDA_VISIBLE_DEVICES=0 TORCHINDUCTOR_FORCE_DISABLE_CACHES=1 TORCH_LOGS="+inductor, schedule, fusion, output_code" TORCH_TRACE="tmp/guorachel_tt" TORCHINDUCTOR_MAX_AUTOTUNE=1 TORCHINDUCTOR_UNIQUE_KERNEL_NAMES=1 ../buck-out/v2/gen/fbcode/d29ee94b913014f1/caffe2/torch/fb/model_transform/experimental/benchmark/__mts_gpu_benchmark__/mts_gpu_benchmark.par --model-path manifold://ads_storage_fblearner/tree/user/facebook/fblearner/predictor/${MODEL_ENTITY_ID}/${SNAPSHOT_ID}/gpu_lowering/input.predictor.disagg.gpu.merge --lower-backend AOT_INDUCTOR_EP --gpu-trace --aot-inductor-config="{'max_autotune': True}" 2>&1 \| tee output.txt ``` {F1973765026} ``` buck2 test 'fbcode//mode/opt' fbcode//caffe2/test/inductor:provenance_tracing -- --exact 'caffe2/test/inductor:provenance_tracing - test_triton_kernel_post_grad_mapping_aot_inductor (caffe2.test.inductor.test_provenance_tracing.TestProvenanceTracingArtifact)' ``` ``` TORCH_LOGS="+inductor, output_code" buck2 run -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=h100 @//mode/opt fbcode//caffe2/test/inductor:provenance_tracing -- -r test_triton_kernel_post_grad_mapping_aot_inductor ``` Differential Revision: D66967510 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143055 Approved by: https://github.com/chenyang78	2024-12-18 06:51:50 +00:00
Tom Ritchford	da67a6a7bb	[inductor] Replace set by OrderedSet (#138466 ) Uses the set_linter from https://github.com/pytorch/pytorch/pull/138454 and considerable manual editing Pull Request resolved: https://github.com/pytorch/pytorch/pull/138466 Approved by: https://github.com/eellison	2024-12-13 16:08:45 +00:00
Tom Ritchford	dc23f1944a	Remove unused Python variables in torch/[_-a]* (#133492 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133492 Approved by: https://github.com/albanD	2024-12-12 17:39:14 +00:00
PyTorch MergeBot	5c97ac9721	Revert "Remove unused Python variables in torch/[_-a]* (#133492 )" This reverts commit fda975a7b3071a20dab8fc2c4e453479e1bb7cf2. Reverted https://github.com/pytorch/pytorch/pull/133492 on behalf of https://github.com/clee2000 due to Sorry, I need to revert this in order to revert something else. The only thing you need to do is rebase and remerge ([comment](https://github.com/pytorch/pytorch/pull/133492#issuecomment-2536635516))	2024-12-11 17:29:12 +00:00
Tom Ritchford	fda975a7b3	Remove unused Python variables in torch/[_-a]* (#133492 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133492 Approved by: https://github.com/albanD	2024-12-10 21:48:44 +00:00
Shangdi Yu	02c509669a	Aoti minifier flatten (#141156 ) Flatten the inputs to minifier so AOTI Minifier can handle unflattened inputs and kwargs. - flatten the inputs in minifier - changed the "load_and_run" part of the minifier verification to run on the flattened inputs. - refactored code to keep `torch._inductor.__init__.py` clean - update doc `python test/inductor/test_minifier.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/141156 Approved by: https://github.com/desertfire	2024-12-06 07:12:45 +00:00
Jason Ansel	6eca0aee76	[inductor] Refactor ir.Layout into ir.OutputSpec (#140910 ) This separate the concepts of a Layout (size/stride/etc) and an OutputSpec (which includes multiple outputs). Which should make typing easier. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140910 Approved by: https://github.com/ezyang ghstack dependencies: #140895	2024-11-21 20:01:57 +00:00
Aaron Orenstein	06f619d999	typing ir.py - part 2 (#131846 ) See #131852 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131846 Approved by: https://github.com/eellison ghstack dependencies: #139238	2024-11-06 00:01:15 +00:00

1 2 3

104 Commits