Summary: Fix the edge case by allowing `call_function` nodes with no deps as graph entry (starter_nodes) in the splitter.
Test Plan:
The test shall pass in the current diff (after fix), and fail in the parent diff (before fix)
```
buck test mode/opt //glow/fb/fx/lowering:split_tests -- test_dataclass_as_graph_entry
```
Rollback Plan:
Differential Revision: D81232435
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161716
Approved by: https://github.com/ezyang
Summary:
A tool to track events in graph split, specifically on how nodes being end up in acc or cpu subgraphs.
Usage: use env var to specify a mode and necessary arguments.
FX_NET_ACC_SPLITTER_TRACKER_MODE: Tracker mode.
```
Different modes of the event tracker:
"0": Tracker not enabled (by default)
"1": Tracker enabled but no dumps. Information available by setting breakpoints and visually inspect in pdb.
"2": Tracker enabled and dumps all events to DUMP_PREFIX_all.txt
"3": In addition to events dump, track nodes specified by ENV_FX_NET_ACC_SPLITTER_TRACKER_TRACKED_NODES recusrively and dump to DUMP_PREFIX_nodex.txt
"4:: In addition to events dump, track all nodes with more than 1 event recusrively and dump to DUMP_PREFIX_nodex.txt
```
FX_NET_ACC_SPLITTER_TRACKER_DUMP_PATH: overriding dump path. Leave empty for `~`.
FX_NET_ACC_SPLITTER_TRACKER_TRACKED_NODES: Nodes to track for mode "3".
Test Plan: New unit test
Reviewed By: georgiaphillips
Differential Revision: D79203595
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159795
Approved by: https://github.com/ezyang
Summary: as title. We've got request from various parties who are interested in turning on the provenance tracking by default. In this PR, we prepare to turn on part of the provenance tracking that doesn't have too much overhead by default.
- Change `provenance_tracking` config to `provenance_tracking_level`
- turn on the following provenance tracking by default when `basic_provenance_tracking`=True
- `set_kernel_post_grad_provenance_tracing` for kernels, this add mapping between triton kernels and post_grad nodes
- `dump_inductor_provenance_info` if we're dumping tlparse log
- `get_graph_provenance_json` and dump `reate_mapping_pre_post_grad_nodes`. This creates mapping between pre_grad and post_grad nodes. Since we're not turning on the provenance tracking in GraphTransformObserver by default, the mapping here maybe incomplete/limited.
- add stack trace from post grad nodes to inductor IR nodes
- add exception swallowing for all functions above
Test Plan:
CI
Rollback Plan:
Differential Revision: D80031559
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160383
Approved by: https://github.com/angelayi
Summary:
The `replace_hook` is called once for each user of the replaced node. This fix avoids adding duplicated node sources.
This also means that if there are two nested pass like:
```
with GraphTransformObserver(gm, "outer"):
with GraphTransformObserver(gm, "inner"):
.....
```
We'll only see the outer pass's pass name recorded for the replaced node in the "from_node" node meta. I think this is fine. In practice, the outer pass usually contains a more meaningful name, e.g. `decompose_auto_functionalized`, and the inner pass name is just a default pass name like `pattern_matcher`.
Test Plan:
```
buck2 run @mode/dev-nosan fbcode//caffe2/test:fx -- -r test_graph_transform_observer_replace
```
Rollback Plan:
Differential Revision: D79203058
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159484
Approved by: https://github.com/angelayi
Summary:
As inductor provenance tracking is getting more use cases, we want to separate the inductor provenance tracking guarding flag from the general `trace.enabled`, so we can enable provenance tracking without all the overhead of `trace.enabled`
- change the guard flag from `trace.enabled` to `trace.provenance_tracking`. It is turned on by either `TORCH_COMPILE_DEBUG=1` or `INDUCTOR_PROVENANCE=1`.
- Move the provenance tracking logic and variables out of DebugContext, because DebugContext is only enabled with `trace.enabled`. Since the variables are now global variables, added `reset_provenance_globals()` context manager to reset them for each `compile_fx()` call.
- Move `set_kernel_post_grad_provenance_tracing` from `util.py` to `debug.py` so now all provenance related logic is in `debug.py`.
In the future, if we want to enable it further, we can change the provenance tracking flag to be enabled when `TORCH_TRACE` is set. I think we should do that in a separate PR, so it's easier to revert if this flag change creates any problem.
See more motivation in internal Diff
Test Plan:
```
buck2 run mode/dev-nosan fbcode//caffe2/test:fx -- -r test_graph_transform_observer
buck run mode/dev-nosan fbcode//caffe2/test:fx -- -r graph_provenance
buck2 run mode/dev-nosan fbcode//caffe2/test/inductor:provenance_tracing
```
Differential Revision: D78287976
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158399
Approved by: https://github.com/angelayi
DCE was incorrectly eliminating unused random operations like torch.rand() that have global RNG side effects, causing inconsistent results between eager and compiled execution modes.
**Root cause**: Python random functions (torch.rand, torch.randn, etc.) don't have the _nondeterministic_seeded attribute, so node.is_impure() returns False, allowing DCE to eliminate them despite advancing global RNG state.
**Solution**: Enhanced is_impure() in torch/fx/node.py to recognize Python random functions and mark them as impure when they use global RNG, regardless of the impure_random parameter setting. This ensures consistency between eager and compiled execution even when config.fallback_random=False.
**Key features**:
- Handles comprehensive list of random functions: rand, randn, randint, randperm, rand_like, randn_like, randint_like, normal, poisson, bernoulli, multinomial
- Generator optimization: Only marks as impure when using global RNG (no generator or generator=None). Operations with explicit generators don't affect global state and can be optimized.
- Works with both impure_random=True and impure_random=False cases
- Cleaner architecture: addresses root cause rather than working around it
**Tests**: Enhanced test_impure_random to verify both FX tracing and AOT compilation codepaths, ensuring random operations are preserved and eager/compiled execution consistency is maintained.
🤖 Generated with [Claude Code](https://claude.ai/code)
Fixes https://github.com/pytorch/pytorch/issues/151524
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157981
Approved by: https://github.com/mlazos
Co-authored-by: Claude <noreply@anthropic.com>
This PR is part of a series attempting to re-submit #134592 as smaller PRs.
In fx tests:
- Add and use a common raise_on_run_directly method for when a user runs a test file directly which should not be run this way. Print the file which the user should have run.
- Raise a RuntimeError on tests which have been disabled (not run)
- Remove any remaining uses of "unittest.main()""
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154715
Approved by: https://github.com/Skylion007
Summary:
Previously, when there is no discrepancy in results for block mode, net_min_base will throw an OOB error.
This occurs due to the block _block_traverse_impl returning an OOB after exhausting subgraphs all the way down to a single node
There is also an issue where we may get an unsound subgraph (i.e. mark an earlier node as the "end" even if the correct end is later). This is due to an incorrect check (start_idx == mid) where there can possibly be two values left before the program pre-maturely returns
Test Plan:
Buck UI: https://www.internalfb.com/buck2/52524c26-ace5-4593-8a4b-843a54eb206a
Test UI: https://www.internalfb.com/intern/testinfra/testrun/3096224973363310
Network: Up: 0B Down: 15MiB (reSessionID-cd404e97-395f-49fc-8381-373e90a1378f)
Executing actions. Remaining 0/1
Command: test.
Time elapsed: 53.7s
Tests finished: Pass 7. Fail 0. Fatal 0. Skip 0. Build failure 0
Differential Revision: D75143242
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154076
Approved by: https://github.com/jfix71
Add a mode to fx_codegen_and_compile() to compile in a separate process. This is to prepare for async compile where we'll compile and run eager in parallel (and also be able to move the compile phase to a remote computer).
Added a test based which runs the test_torchinductor tests with subprocess compiling turned on.
Fixed the test which caused the previous version (#146134) to be reverted:
```
$ PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_WITH_SLOW=1 PYTORCH_TEST_SKIP_FAST=1 python test/inductor/test_compile_subprocess.py CpuTests.test_conv_bn_fuse_cpu
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148635
Approved by: https://github.com/jamesjwu
Add a mode to `fx_codegen_and_compile()` to compile in a separate process. This is to prepare for async compile where we'll compile and run eager in parallel (and also be able to move the compile phase to a remote computer).
Added a test based which runs the test_torchinductor tests with subprocess compiling turned on.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146134
Approved by: https://github.com/jamesjwu
This is the first step in supporting delayed compile. This library takes in example inputs and outputs a dict of dynamism across the inputs. We will use this to detect dynamism across multiple inputs in delayed compile. We will also use this to make shape collections more ergonomic by providing an affordance to generate a shape collection using example inputs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147981
Approved by: https://github.com/pianpwk, https://github.com/wdvr
This is the first step in supporting delayed compile. This library takes in example inputs and outputs a dict of dynamism across the inputs. We will use this to detect dynamism across multiple inputs in delayed compile. We will also use this to make shape collections more ergonomic by providing an affordance to generate a shape collection using example inputs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147981
Approved by: https://github.com/pianpwk
This is the first step in supporting delayed compile. This library takes in example inputs and outputs a dict of dynamism across the inputs. We will use this to detect dynamism across multiple inputs in delayed compile. We will also use this to make shape collections more ergonomic by providing an affordance to generate a shape collection using example inputs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147981
Approved by: https://github.com/pianpwk
Summary:
Add `node_mapping = create_node_mapping(pre_grad_graph_id, inductor_post_to_pre_grad_nodes, debug_info)`, to produce a `inductor_provenance_tracking_node_mappings.json` file. This file will be used by the provenance tracking highlighter tool to create provenance visualization.
`inductor_triton_kernel_to_post_grad_nodes.json` and `inductor_provenance_tracking_node_mappings.json` files are not dumped if they are both empty. So it's removed from some of the `test_structured_trace` tests.
Test Plan:
CI
```
buck run mode/dev-nosan fbcode//caffe2/test:fx -- -r graph_provenance
buck run mode/dev-nosan fbcode//caffe2/test/inductor:provenance_tracing
python test/dynamo/test_structured_trace.py
```
Differential Revision: D68190173
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146103
Approved by: https://github.com/chenyang78