Commit Graph

104 Commits

Author SHA1 Message Date
520fca82c8 Refactor Provenance Tracking (#163378)
Summary:
- Move the `provenance_level` flag check to inside the `set_kernel_post_grad_provenance_tracing` call to simply the code

- Move the `set_kernel_post_grad_provenance_tracing` call and `write_provenance_debug_handle` call to `codegen_comment`.

- If some `call_kernel` call sites don't have a proceeding `codegen_comment` call, add one. Now all `call_kernel` call sites are accompanied with a  `codegen_comment` call.

- Add a `codegen_comment` method to BaseScheduling and remove the noop `codegen_comment` method in Scheduling

- Remove `debug_handle` from `call_kernel`.

Test Plan:
CI

```
buck run @//mode/opt-split-dwarf fbcode//caffe2/test/inductor:provenance_tracing
```

Differential Revision: D82839271

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163378
Approved by: https://github.com/angelayi
2025-09-25 22:55:59 +00:00
711c8c821e shape guards (#161178)
Summary: This PR introduces shape guards to export. Previously only value ranges,  equalities, and specializations would be tracked for symbolic expressions, and we had a forward hook to check them. Instead now we create a function to check shape guards and call it in the exported program.

Test Plan:
updated several tests

Rollback Plan:

Differential Revision: D80713603

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161178
Approved by: https://github.com/tugsbayasgalan
2025-09-08 22:44:09 +00:00
dec72ea4b0 [reland] Add inductor provenance mapping for cpp extern kernel (#161656) (#162069)
Summary:

Add inductor provenance mapping for cpp extern kernel

Test Plan:
```
buck run fbcode//caffe2/test/inductor:provenance_tracing --  -r test_cpu_extern_kernel
```

Differential Revision: D81598857

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162069
Approved by: https://github.com/angelayi
2025-09-04 04:18:43 +00:00
15c77a8cfd Revert "Add inductor provenance mapping for cpp extern kernel (#161656)"
This reverts commit 5e5870e858f60ff4bf87d03f3592097e934a9580.

Reverted https://github.com/pytorch/pytorch/pull/161656 on behalf of https://github.com/jeffdaily due to causing failures on ROCm MI300, will add label to PR ([comment](https://github.com/pytorch/pytorch/pull/161656#issuecomment-3246965676))
2025-09-02 22:19:19 +00:00
5e5870e858 Add inductor provenance mapping for cpp extern kernel (#161656)
Summary: Add inductor provenance mapping for cpp extern kernel

Test Plan:

```
buck run fbcode//caffe2/test/inductor:provenance_tracing --  -r test_cpu_extern_kernel
```

Differential Revision: D81161751

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161656
Approved by: https://github.com/angelayi
2025-09-02 17:54:04 +00:00
ec585ceab4 [inductor] structured-log graph execution order + test (#160448)
Summary:

- Emit a structured trace per compiled graph execution to reconstruct execution order in TLParse.
- Adds debug.log_graph_execution(name) called from `CompiledFxGraph.__call__`, producing an artifact named inductor_graph_execution with payload {"graph": "graph_<id>"}.

Testing:
- Add inline test to verify structure and output

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160448
Approved by: https://github.com/xmfan
2025-08-27 18:12:46 +00:00
9a12bab0d3 Add debug handle to inductor provenance tracking (#161110)
Summary:
Use debug handle on kernel names to distinguish different calls to the same kernel.

Previous kernel name: kernel_name

New kernel name: kernel_name:debug_handle

We add the debug handle to the tlparse artifacts: `inductor_provenance_tracking_node_mappings` and `inductor_provenance_tracking_kernel_stack_traces`.

We also add debug handles in the comments of the generated code so we can map to them in the provenance tracking highlighter tool: https://github.com/pytorch/tlparse/pull/134

Example output code is below. If a kernel doesn't have a debug handle, the `[Provenance debug handles]` comment line will not be written.

```
        # Topologically Sorted Source Nodes: [y, z], Original ATen: [aten.addmm, aten.gelu]
        # [Provenance debug handles] triton_poi_fused_addmm_gelu_2:3
        stream0 = get_raw_stream(0)
        triton_poi_fused_addmm_gelu_2.run(buf4, primals_5, 300, stream=stream0)
```

The debug handles will also be used by downstream profilers such as zoomer.

Test Plan:
```
buck run mode/opt fbcode//caffe2/test/inductor:provenance_tracing
```

Rollback Plan:

Differential Revision: D78994959

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161110
Approved by: https://github.com/angelayi
2025-08-27 04:56:11 +00:00
4a1aca11c2 Revert "[inductor] structured-log graph execution order + test (#160448)"
This reverts commit 995397d47a0e27394ee1010f158e181eb304100a.

Reverted https://github.com/pytorch/pytorch/pull/160448 on behalf of https://github.com/atalman due to internal failure please see associated diff ([comment](https://github.com/pytorch/pytorch/pull/160448#issuecomment-3223939035))
2025-08-26 12:20:37 +00:00
995397d47a [inductor] structured-log graph execution order + test (#160448)
Summary:

- Emit a structured trace per compiled graph execution to reconstruct execution order in TLParse.
- Adds debug.log_graph_execution(name) called from `CompiledFxGraph.__call__`, producing an artifact named inductor_graph_execution with payload {"graph": "graph_<id>"}.

Testing:
- Add inline test to verify structure and output

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160448
Approved by: https://github.com/xmfan
2025-08-25 20:12:18 +00:00
2b62ef7420 Add kernel information JSON generation for AOTI packages (#160540)
Summary:
Build on D80031559. Generate kernel_information.json in AOTI compiled artifacts by combining stack traces and node mappings from provenance tracking.

This implementation delivers exactly what Zoomer team requested:

**1. Core Function**: `create_kernel_information_json()` in debug.py combines 3 data sources:
- `_inductor_kernel_stack_trace` → `stack_traces` field
- `_inductor_triton_kernel_to_post_grad_node_info` → `post_grad_nodes` field
- `_inductor_post_to_pre_grad_nodes["postToPre"]` → `pre_grad_nodes` field

**2. AOTI Integration**: codecache.py writes `kernel_information.json` to pt2 packages when both AOTI packaging and provenance tracking are enabled.

**3. Test Coverage**: TestKernelInformationAOTI class validates:
- JSON file creation in AOTI packages using zipfile
- Exact format compliance
- Proper disabling without provenance tracking

**Output Format** (exact specification):
```json
{
  "triton_kernel_name_1": {
    "stack_traces": [str, str, ...],
    "post_grad_nodes": [str, str, ...],
    "pre_grad_nodes": [str, str, ...]
  }
}
```

Test Plan:
```
buck test fbcode//caffe2/test/inductor:provenance_tracing -- TestKernelInformationAOTI
```

Manual validation:
```python
import torch
model = torch.nn.Linear(10, 1)
with torch._inductor.config.patch("aot_inductor.package", True):
    with torch._inductor.config.patch("trace.basic_provenance_tracking", True):
        # AOTI compilation should generate kernel_information.json
        compiled = torch.export.export(model, (torch.randn(1, 10),))
```
---

Rollback Plan:

Differential Revision: D80139160

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160540
Approved by: https://github.com/yushangdi
2025-08-20 02:33:45 +00:00
41b3e80a55 Fix duplicated kernel name in kernel stack trace tracking (#160905)
Summary: as title. When we have two kernels with the same name, the stack traces should be appended, not overwritten.

Test Plan:
```
 buck run mode/opt fbcode//caffe2/test/inductor:provenance_tracing
```

Rollback Plan:

Differential Revision: D80472731

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160905
Approved by: https://github.com/angelayi
2025-08-19 01:14:34 +00:00
4e90441133 Add signpost to provenance tracking error (#160755)
Summary: As title, add signpost to better track error when computing provenance tracking related debugging information

Test Plan:
CI

Rollback Plan:

Differential Revision: D80292285

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160755
Approved by: https://github.com/angelayi
2025-08-18 19:17:47 +00:00
c699668009 [inductor] TLParse tensor metadata logging + test (#160132)
Summary:
- Add TLParse artifact logging per op with output tensor shape, stride, and dtype for cross-rank aggregation.

Testing:
- Add test to verify structure and contents of tlparse artifiact

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160132
Approved by: https://github.com/xmfan
2025-08-17 04:27:49 +00:00
26297c27e2 Revert "[inductor] TLParse tensor metadata logging + test (#160132)"
This reverts commit 2603e40be5fa4a66301e6654e34a82a67f2e4913.

Reverted https://github.com/pytorch/pytorch/pull/160132 on behalf of https://github.com/clee2000 due to broke lint [GH job link](https://github.com/pytorch/pytorch/actions/runs/17010600949/job/48226137423) [HUD commit link](2603e40be5).  landrace with another PR that changed some had_cuda related things ([comment](https://github.com/pytorch/pytorch/pull/160132#issuecomment-3193969792))
2025-08-16 23:47:03 +00:00
2603e40be5 [inductor] TLParse tensor metadata logging + test (#160132)
Summary:
- Add TLParse artifact logging per op with output tensor shape, stride, and dtype for cross-rank aggregation.

Testing:
- Add test to verify structure and contents of tlparse artifiact

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160132
Approved by: https://github.com/xmfan
ghstack dependencies: #160260
2025-08-16 16:37:18 +00:00
b74c7cd335 Add kernel stack traces tlparse dump (#160608) (#160779)
Summary:

as title

This is requested by the zoomer team so they can add stack trace information to profiler result.

Test Plan:
```
buck run mode/dev-nosan fbcode//caffe2/test/inductor:provenance_tracing -- -r  stack_traces
```

Rollback Plan:

Differential Revision: D80050233

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160779
Approved by: https://github.com/angelayi
2025-08-16 03:12:38 +00:00
aa99e0958f Separate provenance tracking to different levels (#160383)
Summary: as title. We've got request from various parties who are interested in turning on the provenance tracking by default. In this PR, we prepare to turn on part of the provenance tracking that doesn't have too much overhead by default.

- Change `provenance_tracking` config to `provenance_tracking_level`
- turn on the following provenance tracking by default when `basic_provenance_tracking`=True
    - `set_kernel_post_grad_provenance_tracing` for kernels, this add mapping between triton kernels and post_grad nodes
    - `dump_inductor_provenance_info` if we're dumping tlparse log
    - `get_graph_provenance_json` and dump `reate_mapping_pre_post_grad_nodes`. This creates mapping between pre_grad and post_grad nodes. Since we're not turning on the provenance tracking in GraphTransformObserver by default, the mapping here maybe incomplete/limited.
    - add stack trace from post grad nodes to inductor IR nodes
    - add exception swallowing for all functions above

Test Plan:
CI

Rollback Plan:

Differential Revision: D80031559

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160383
Approved by: https://github.com/angelayi
2025-08-15 04:59:35 +00:00
d1950d4bb5 Change IR node's stack trace to be computed lazily (#160487)
Summary: When an IR node is an inherited class, post_init is called once for each super().__init__() call. To avoid duplicated calls, we make stack trace computation happen lazily.

Test Plan:
CI

Rollback Plan:

Differential Revision: D80137870

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160487
Approved by: https://github.com/angelayi
2025-08-13 21:41:25 +00:00
fc80f6859e Fix collective schedule logging and runtime tests (#160260)
Summary:

- Fix collective schedule logging so that only logs when collectives present
- Fix runtime estimate test to check if each op has a number value

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160260
Approved by: https://github.com/Skylion007
2025-08-11 20:58:52 +00:00
8034b2a732 [inductor] Add TLParse artifact for logging runtime of collective and compute ops (#159730)
Summary:

- debug.py: Added log_runtime_estimates() function to dump runtime estimation data as structured tlparse artifacts in JSON format
- test_structured_trace.py: Added comprehensive test coverage with testing compute and collective ops

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159730
Approved by: https://github.com/yushangdi
ghstack dependencies: #159190
2025-08-05 22:06:32 +00:00
85e74d5ace [inductor] Add logging for distributed collective ops for multi‑rank diagnostics (#159190)
This change introduces structured logging of the collective communication schedule, enabling downstream tools (e.g. TLParse) to ingest and analyze per‑rank collective‐order information for multi‑rank jobs.

- Iterates over scheduler.nodes, filters for _CollectiveKernel nodes
- Extracts each op’s python_kernel_name
- Emits a structured JSON payload under the inductor_collective_schedule artifact name
- Dumps the full schedule list to collective_schedule.json via the PyTorch trace‑structured artifact
- Added comprehensive unit tests for collective schedule tracing: Created test_collective_schedule_empty() and test_collective_schedule_real() tests to verify structured trace logging works correctly for both empty collective schedules and real collective operations (like all_reduce and wait_tensor from _c10d_functional ops).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159190
Approved by: https://github.com/yushangdi, https://github.com/xmfan
2025-08-01 21:51:42 +00:00
490cb3f1a4 Revert "[inductor] Add logging for distributed collective ops for multi‑rank diagnostics (#159190)"
This reverts commit bb62e1f769ef51e2ec149d7256c135d09425aaa0.

Reverted https://github.com/pytorch/pytorch/pull/159190 on behalf of https://github.com/clee2000 due to broke [GH job link](https://github.com/pytorch/pytorch/actions/runs/16658705097/job/47150840171) [HUD commit link](bb62e1f769) on mac ([comment](https://github.com/pytorch/pytorch/pull/159190#issuecomment-3141513921))
2025-07-31 22:22:13 +00:00
bb62e1f769 [inductor] Add logging for distributed collective ops for multi‑rank diagnostics (#159190)
This change introduces structured logging of the collective communication schedule, enabling downstream tools (e.g. TLParse) to ingest and analyze per‑rank collective‐order information for multi‑rank jobs.

- Iterates over scheduler.nodes, filters for _CollectiveKernel nodes
- Extracts each op’s python_kernel_name
- Emits a structured JSON payload under the inductor_collective_schedule artifact name
- Dumps the full schedule list to collective_schedule.json via the PyTorch trace‑structured artifact
- Added comprehensive unit tests for collective schedule tracing: Created test_collective_schedule_empty() and test_collective_schedule_real() tests to verify structured trace logging works correctly for both empty collective schedules and real collective operations (like all_reduce and wait_tensor from _c10d_functional ops).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159190
Approved by: https://github.com/yushangdi, https://github.com/xmfan
2025-07-31 19:58:07 +00:00
fd2c64e286 Fix duplicated sources in inductor provenance tracking (#159484)
Summary:

The `replace_hook` is called once for each user of the replaced node. This fix avoids adding duplicated node sources.

This also means that if there are two nested pass like:

```
with GraphTransformObserver(gm, "outer"):
      with GraphTransformObserver(gm, "inner"):
              .....
```

We'll only see the outer pass's pass name recorded for the replaced node in the "from_node" node meta. I think this is fine. In practice, the outer pass usually contains a more meaningful name, e.g. `decompose_auto_functionalized`, and the inner pass name is just a default pass name like `pattern_matcher`.

Test Plan:
```
buck2 run @mode/dev-nosan fbcode//caffe2/test:fx -- -r test_graph_transform_observer_replace
```

Rollback Plan:

Differential Revision: D79203058

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159484
Approved by: https://github.com/angelayi
2025-07-30 23:03:11 +00:00
1e86fa2e5b Add stack trace to Inductor IR nodes if inductor.config.trace.provenance_tracing=True (#158576)
Summary:
- Split `create_mapping` to `create_mapping_pre_post_grad_nodes` and  ` create_node_mapping_kernel_to_post_grad`
- Store a mapping from pre_grad graph node names to stack traces in `_inductor_pre_grad_node_stack_trace`
- Add `stack_traces` member to ir.Node and add it to the string representation of ir.Node
- When we create an IR node, if `inductor.config.trace.provenance_tracing=True`, we populate `stack_traces` from `origins`. The nodes in `origins` are post_grad graph nodes. If a node has `node.stack_trace`, we store the stack_trace directly. This is particularly important for backward graph nodes because they don't have a mapping to pre-grad graph nodes. If a node doesn't have `.stack_trace ` (such as `linear`-> `addmm` nodes), we use the stack trace of the pre_grad graph nodes that it maps to.
  - A post grad graph node might not have stack trace if it correspond to multiple pre grad graph nodes, e.g. [GroupLinearFusion](a00442421a/torch/_inductor/fx_passes/group_batch_fusion.py (L299))

Example:

```
scheduling ExternKernelOut(
  python_kernel_name='extern_kernels.mm',
  name=buf0,
  layout=FixedLayout('cuda:0', torch.float32, size=[8, 16], stride=[16, 1]),
  inputs=[InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float32, size=[8, 10], stride=[10, 1])), ReinterpretView(
    StorageBox(
      ConstantBuffer(name='fc1_weight', layout=FixedLayout('cuda:0', torch.float32, size=[16, 10], stride=[10, 1]))
    ),
    FixedLayout('cuda:0', torch.float32, size=[10, 16], stride=[1, 10]),
    origins=OrderedSet([mm_default_1]),
    stack_traces = {,
    File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/7b4b7a52e15abb17/scripts/shangdiy/__aot__/aot#link-tree/scripts/shangdiy/aot.py", line 29, in forward,
        x = self.fc1(x),
      File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/7b4b7a52e15abb17/scripts/shangdiy/__aot__/aot#link-tree/torch/nn/modules/linear.py", line 125, in forward,
        return F.linear(input, self.weight, self.bias),
    }
  )],
  constant_args=(),
  kwargs={},
  output_view=None,
  python_kernel_name=extern_kernels.mm,
  cpp_kernel_name=at::mm_out,
  ordered_kwargs_for_cpp_kernel=(),
  op_overload=None,
  arg_properties=[{}, {}],
  allarg_properties={},
  kwarg_properties=None,
  unbacked_bindings={},
  mutation_outputs=[],
  origin_node=mm_default_1,
  origins=OrderedSet([mm_default_1]),
  stack_traces = {,
  File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/7b4b7a52e15abb17/scripts/shangdiy/__aot__/aot#link-tree/scripts/shangdiy/aot.py", line 29, in forward,
      x = self.fc1(x),
    File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/7b4b7a52e15abb17/scripts/shangdiy/__aot__/aot#link-tree/torch/nn/modules/linear.py", line 125, in forward,
      return F.linear(input, self.weight, self.bias),
  }
)
```

Test Plan:
```
buck2 run mode/dev-nosan fbcode//caffe2/test/inductor:provenance_tracing
```

Rollback Plan:

Differential Revision: D78365534

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158576
Approved by: https://github.com/angelayi
2025-07-18 04:05:17 +00:00
82a1ee1135 Refactor Provenance Tracking (#158399)
Summary:
As inductor provenance tracking is getting more use cases, we want to separate the inductor provenance tracking guarding flag from the general `trace.enabled`, so we can enable provenance tracking without all the overhead of `trace.enabled`

- change the guard flag from `trace.enabled` to `trace.provenance_tracking`.  It is turned on by either `TORCH_COMPILE_DEBUG=1` or `INDUCTOR_PROVENANCE=1`.
- Move the provenance tracking logic and variables out of DebugContext, because DebugContext is only enabled with `trace.enabled`. Since the variables are now global variables, added `reset_provenance_globals()` context manager to reset them for each `compile_fx()` call.
- Move `set_kernel_post_grad_provenance_tracing` from `util.py` to `debug.py` so now all provenance related logic is in `debug.py`.

In the future, if we want to enable it further, we can change the provenance tracking flag to be enabled when `TORCH_TRACE` is set. I think we should do that in a separate PR, so it's easier to revert if this flag change creates any problem.

See more motivation in internal Diff

Test Plan:
```
buck2 run mode/dev-nosan fbcode//caffe2/test:fx -- -r test_graph_transform_observer
buck run mode/dev-nosan  fbcode//caffe2/test:fx -- -r graph_provenance
buck2 run mode/dev-nosan fbcode//caffe2/test/inductor:provenance_tracing
```

Differential Revision: D78287976

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158399
Approved by: https://github.com/angelayi
2025-07-17 00:23:00 +00:00
6c0b42fd2f [inductor][cutlass backend] Log prescreening elpase (#155508)
Differential Revision: [D76311352](https://our.internmc.facebook.com/intern/diff/D76311352/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155508
Approved by: https://github.com/jingsh
2025-06-12 16:48:52 +00:00
d1947a8707 Migrate from lru_cache to cache (#155613)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155613
Approved by: https://github.com/ezyang
ghstack dependencies: #155612
2025-06-11 19:44:18 +00:00
cad0727fe1 Rename the provenance tracing artifact name for kernel <-> post_grad nodes mapping (#154046)
Summary:
Context:

Recently we've added a couple more kernel types support other than inductor generated triton kernels,

such as cpu cpp kernels, extern kernels.

The name appeared in tlparse chrome link can be confusing to users.

Rename from

`inductor_triton_kernel_to_post_grad_nodes.json`

to `inductor_generated_kernel_to_post_grad_nodes.json`

Test Plan: CI

Differential Revision: D75159042

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154046
Approved by: https://github.com/yushangdi
2025-05-22 19:20:56 +00:00
3443627e07 Revert "[BE]: Enable RUFF TRY400 rule - log.exception (#153473)"
This reverts commit 4f4ecc583e0f48ad2d062a53bf91c61ab40b4948.

Reverted https://github.com/pytorch/pytorch/pull/153473 on behalf of https://github.com/jeanschmidt due to seems to have broken internal signals, @albanD may I count on you to help the author merge his PR? D74837988 ([comment](https://github.com/pytorch/pytorch/pull/153473#issuecomment-2886017075))
2025-05-16 08:29:26 +00:00
4f4ecc583e [BE]: Enable RUFF TRY400 rule - log.exception (#153473)
Change logging.error to logging.exception to log additional information when relevant.  A few places have slipped in logging.errors in try except since I last did a clean up here and the rule is stabilized so I am enabling it codebase wide. I have NOQA'd much of our custom exception stack trace handling for RPC calls and distributed and tried to a fix a few errors based on whether we immediately reraised it or if we didn't print any exception handling where it could be useful.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153473
Approved by: https://github.com/albanD, https://github.com/cyyever
2025-05-15 13:36:59 +00:00
6a84fe65ec Fix code portability when looking for Dot (#153259)
When trying to plot a trace graph, Inductor checks if "dot" is installed. Currently, the code runs a "which dot" command.

By default, Windows doesn't have the "which" command. This patch replaces it with the more portable alternative.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153259
Approved by: https://github.com/Skylion007
2025-05-10 16:12:44 +00:00
01cbf5a30a [AOTInductor] Add wrapper and kernel code to debug code logging (#153181)
This is a simple PR to make the AOTInductor wrapper and kernel code get output by `TORCH_COMPILE_DEBUG=1`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153181
Approved by: https://github.com/desertfire
2025-05-10 15:31:18 +00:00
2295efa1b3 Fix only logging ir_post_fusion with torch_compile_debug enabled (#148499)
Because we were invoking the logs through `V.debug`, it was not running if TORCH_COMPILE_DEBUG was not set. this is because there is some magic the in debug [getattr](d789c22712/torch/_inductor/debug.py (L468-L480)).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148499
Approved by: https://github.com/shunting314
2025-03-05 05:35:09 +00:00
1cb4e2df65 [BE][PYFMT] migrate PYFMT for torch._inductor to ruff format (#144550)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144550
Approved by: https://github.com/jansel
2025-02-28 13:33:19 +00:00
20295c017e Fix import of getArtifactLogger for ir_pre_fusion and ir_post_fusion (#147560)
Fixes #147002

There was an issue with the previous PR https://github.com/pytorch/pytorch/pull/147248 that didn't show up in CI,
where a logging import was not complete in torch/_inductor/debug.py before importing it.
This only happened if someone directly imported the file without doing any other imports before.

Also set to off_by_default by request to reduce log spew.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147560
Approved by: https://github.com/Skylion007
2025-02-25 03:36:08 +00:00
93316cfe94 Move ir_pre_fusion.txt and ir_post_fusion.txt to TORCH_LOGS (#147248)
Fixes #147002

Moves ir_{pre, post}_fusion.txt to be controlled by TORCH_LOGS instead of TORCH_COMPILE_DEBUG.
Updated tests of these logs as well.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147248
Approved by: https://github.com/eellison
2025-02-20 00:26:17 +00:00
a4e4368157 add node mapping processing (#146103)
Summary:
Add `node_mapping = create_node_mapping(pre_grad_graph_id, inductor_post_to_pre_grad_nodes, debug_info)`, to produce a `inductor_provenance_tracking_node_mappings.json` file. This file will be used by the provenance tracking highlighter tool to create provenance visualization.

`inductor_triton_kernel_to_post_grad_nodes.json` and `inductor_provenance_tracking_node_mappings.json` files are not dumped if they are both empty. So it's removed from some of the `test_structured_trace` tests.

Test Plan:
CI
```
buck run mode/dev-nosan  fbcode//caffe2/test:fx -- -r graph_provenance

buck run mode/dev-nosan fbcode//caffe2/test/inductor:provenance_tracing

python test/dynamo/test_structured_trace.py
```

Differential Revision: D68190173

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146103
Approved by: https://github.com/chenyang78
2025-02-01 08:29:29 +00:00
6bd19e65b1 add inductor_triton_kernel_mapping_post_grad.json to tlparseadd changes (#145954)
Landing D67612181 here. The original exported PR somehow fails OSS CI, but this one doesn't (though the PR content is the same).

Add debug trace artifact to inductor_triton_kernel_mapping_post_grad.json (debug artifact for provenance tracking) to tlparse.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145954
Approved by: https://github.com/YUNQIUGUO
2025-01-30 06:18:48 +00:00
835e770bad Use typing.IO[bytes] instead of io.BytesIO in annotations (#144994)
Fixes #144976

Using appoach ① `IO[bytes]`, but could also try with a protocol.

## Notes:

- moved `torch.serialization.FILE_LIKE` to `torch.types.FileLike`
- Use `FileLike` annotation where it makes sense
- made sure those functions also support `os.PathLike`
- Replaced `isinstance(x, io.BytesIO)` with `isinstance(x, (io.IOBase, IO))` where appropriate.
- Replaced `BinaryIO` with `IO[bytes]` (the two ABCs are almost identical, the only difference is that `BinaryIO` allows `bytearray` input to `write`, whereas `IO[bytes]` only `bytes`)
- needed to make `torch.serialization._opener` generic to avoid LSP violations.
- skipped `torch/onnx/verification` for now (functions use `BytesIO.getvalue` which is not part of the `IO[bytes]` ABC, but it kind of seems that this is redundant, as e.g. `onnx.load` supports `str | PathLike[str] | IO[bytes]` directly...

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144994
Approved by: https://github.com/ezyang, https://github.com/Skylion007
2025-01-27 18:08:07 +00:00
4cc5e880f9 Add accuracy issue support in AOTI Minifier (#145539)
Summary:

Add three more repro levels for AOTI minifier (level 2 already exists). They are the same as the existing dynamo minifier repro levels.

Now AOTI minifier can minify and repro programs that have numerical accuracy issues as well.

1: Dumps the original graph out to repro.py if compilation fails
2: Dumps a minifier_launcher.py if aoti fails.
3: Always dumps a minifier_launcher.py. Good for segfaults.
4: Dumps a minifier_launcher.py if the accuracy fails.

Refactor AOTI minifier unit tests to be cleaner and better re-use the existing minifier testing code. We do not need to manually patch {"aot_inductor.dump_aoti_minifier": True} to each test now, this config is generated in the test code.

Differential Revision: D68294638

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145539
Approved by: https://github.com/desertfire
2025-01-24 23:07:19 +00:00
893ca1dfe1 PEP585 update - torch/_inductor/[_-i]* (#145137)
See #145101 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145137
Approved by: https://github.com/bobrenjc93
2025-01-19 01:22:47 +00:00
9275091d6e [provenance_tracking] Dump inductor_triton_kernel_to_post_grad_nodes.json info in debug_trace (#143055)
Summary:
This diff mainly adds code changes to dump `inductor_triton_kernel_to_post_grad_nodes.json` artifact which contains mapping info from post_grad -> inductor kernel code:
`{"inductor_triton_kernel_name": [post_grad_node_0, post_grad_node_1, ..., ], "..."}.`

Example paste: P1695235000 verified on the test model.  See "Test Plan":

We use this artifact to demonstrate provenance tracking in the frontend 3-tab highlighter tool:
https://github.com/YUNQIUGUO/compiler_explorer (copy/pasted the input files for demo purpose for now and will integrate with Shangdi's tool to 4-tab)

https://pxl.cl/66BzK

Note: Currently only supports mapping for inductor's`TritonKernel` type. TODO for enhancing more support for `ExternKernel` and other inductor generated kernel type, etc.

Test Plan:
test_model_coverage.sh:
```
#!/bin/sh
MODEL_ENTITY_ID=644688112
SNAPSHOT_ID=32
MODULE=merge

# buck2 build --show-output mode/opt -c=python.package_style=inplace -c fbcode.enable_gpu_sections=true -c fbcode.platform=platform010 -c fbcode.split-dwarf=true -c fbcode.nvcc_arch=a100,h100 caffe2/torch/fb/model_transform/experimental/benchmark:mts_gpu_benchmark

TORCH_COMPILE_DEBUG=1 CUDA_VISIBLE_DEVICES=0 TORCHINDUCTOR_FORCE_DISABLE_CACHES=1 TORCH_LOGS="+inductor, schedule, fusion, output_code" TORCH_TRACE="tmp/guorachel_tt" TORCHINDUCTOR_MAX_AUTOTUNE=1 TORCHINDUCTOR_UNIQUE_KERNEL_NAMES=1 ../buck-out/v2/gen/fbcode/d29ee94b913014f1/caffe2/torch/fb/model_transform/experimental/benchmark/__mts_gpu_benchmark__/mts_gpu_benchmark.par --model-path manifold://ads_storage_fblearner/tree/user/facebook/fblearner/predictor/${MODEL_ENTITY_ID}/${SNAPSHOT_ID}/gpu_lowering/input.predictor.disagg.gpu.merge --lower-backend AOT_INDUCTOR_EP --gpu-trace --aot-inductor-config="{'max_autotune': True}" 2>&1 | tee output.txt
```
 {F1973765026}

```
buck2 test 'fbcode//mode/opt' fbcode//caffe2/test/inductor:provenance_tracing -- --exact 'caffe2/test/inductor:provenance_tracing - test_triton_kernel_post_grad_mapping_aot_inductor (caffe2.test.inductor.test_provenance_tracing.TestProvenanceTracingArtifact)'
```

```
TORCH_LOGS="+inductor, output_code" buck2 run -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=h100 @//mode/opt fbcode//caffe2/test/inductor:provenance_tracing -- -r test_triton_kernel_post_grad_mapping_aot_inductor
```

Differential Revision: D66967510

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143055
Approved by: https://github.com/chenyang78
2024-12-18 06:51:50 +00:00
da67a6a7bb [inductor] Replace set by OrderedSet (#138466)
Uses the set_linter from https://github.com/pytorch/pytorch/pull/138454
and considerable manual editing

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138466
Approved by: https://github.com/eellison
2024-12-13 16:08:45 +00:00
dc23f1944a Remove unused Python variables in torch/[_-a]* (#133492)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133492
Approved by: https://github.com/albanD
2024-12-12 17:39:14 +00:00
5c97ac9721 Revert "Remove unused Python variables in torch/[_-a]* (#133492)"
This reverts commit fda975a7b3071a20dab8fc2c4e453479e1bb7cf2.

Reverted https://github.com/pytorch/pytorch/pull/133492 on behalf of https://github.com/clee2000 due to Sorry, I need to revert this in order to revert something else.  The only thing you need to do is rebase and remerge ([comment](https://github.com/pytorch/pytorch/pull/133492#issuecomment-2536635516))
2024-12-11 17:29:12 +00:00
fda975a7b3 Remove unused Python variables in torch/[_-a]* (#133492)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133492
Approved by: https://github.com/albanD
2024-12-10 21:48:44 +00:00
02c509669a Aoti minifier flatten (#141156)
Flatten the inputs to minifier so AOTI Minifier can handle unflattened inputs and kwargs.

- flatten the inputs in minifier
- changed the "load_and_run" part of the minifier verification to run on the flattened inputs.
- refactored code to keep `torch._inductor.__init__.py` clean
- update doc

`python test/inductor/test_minifier.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141156
Approved by: https://github.com/desertfire
2024-12-06 07:12:45 +00:00
6eca0aee76 [inductor] Refactor ir.Layout into ir.OutputSpec (#140910)
This separate the concepts of a Layout (size/stride/etc) and an OutputSpec (which includes multiple outputs).  Which should make typing easier.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140910
Approved by: https://github.com/ezyang
ghstack dependencies: #140895
2024-11-21 20:01:57 +00:00
06f619d999 typing ir.py - part 2 (#131846)
See #131852

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131846
Approved by: https://github.com/eellison
ghstack dependencies: #139238
2024-11-06 00:01:15 +00:00