Commit Graph

421 Commits

Author SHA1 Message Date
3255e7872b Enable all flake8-logging-format rules (#164655)
These rules are enabled by removing existing suppressions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164655
Approved by: https://github.com/janeyx99, https://github.com/mlazos
2025-10-19 00:59:28 +00:00
86ebce1766 [precompile] Pass tensor_to_context to backend. (#165702)
Summary:

Fixing a VLLM issue https://github.com/vllm-project/vllm/issues/27040 where
aot precompile fails on some models using symbolic shapes in inductor.

Test Plan:
pp HF_HUB_DISABLE_XET=1 VLLM_ENABLE_V1_MULTIPROCESSING=0 VLLM_USE_AOT_COMPILE=1 vllm bench latency --model microsoft/DialoGPT-small --input-len 128 --output-len 256 --num-iters 50 --dtype float16

Reviewers:

Subscribers:

Tasks:

Tags:

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165702
Approved by: https://github.com/tugsbayasgalan
2025-10-17 21:52:04 +00:00
839f6facdb [precompile] Fix frame construction for wrapped model. (#165454)
Summary: If a function is wrapped with functools, we should not look at the wrapped function signature but rather the wrapper, since we need to construct the frame for the top level function here.

Test Plan: test_decorated_function_with_functools_wrap_aot

Differential Revision: D84626752

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165454
Approved by: https://github.com/yiming0416
2025-10-15 02:01:46 +00:00
04e36611bb [user-cuda-streams] Pass streams/events to the graph via lookup table (#162899)
Stores streams in a global object look table that maps a dynamo selected index to objects. This index is generated during tracing, and at runtime, a helper function is called from the bytecode to populate this map.

This differs from the previous implementation that simply mapped IDs to the associated objects. This required specialization on the IDs of the specific objects, while this new approach does not.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162899
Approved by: https://github.com/anijain2305
ghstack dependencies: #163027
2025-10-14 05:43:19 +00:00
1f73b96668 [PGO] log missing sources in allowlist (#164881)
Summary:
- logs missing dynamic sources
- emits MLHub insight only on size mismatch recompiles

Test Plan: test_pgo

Differential Revision: D84098898

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164881
Approved by: https://github.com/bobrenjc93
2025-10-09 04:39:09 +00:00
4308b8a28f [dynamo] Support torch.fx.traceback.annotate (#164678)
Builds on top of https://github.com/pytorch/pytorch/pull/163673 and https://github.com/pytorch/pytorch/pull/164174. This will be used in the followup PRs to apply regional inductor compilation.

The existing implementation let Dynamo trace into the `torch.fx.traceback.annotate`, but thats not what we want. We want Dynamo to essentially run the torch.fx.traceback.annotate function in eager, so that every Fx node created in Dynamo Fx graph has the custom meta node.

What does not work?
* We still have to set the context manager `torch.fx.traceback.preserve_node_meta()` in the user code because CI was unhappy. This can be fixed but with some perseverance.
* This does not work with graph breaks yet. But we can solve that problem, if needed, in a separate PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164678
Approved by: https://github.com/SherlockNoMad, https://github.com/jansel, https://github.com/xmfan
2025-10-08 22:41:00 +00:00
c855f8632e Pyrefly suppressions 7/n (#164913)
Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283

Almost there!

Test plan:
dmypy restart && python3 scripts/lintrunner.py -a
pyrefly check

step 1: delete lines in the pyrefly.toml file from the project-excludes field
step 2: run pyrefly check
step 3: add suppressions, clean up unused suppressions
before: https://gist.github.com/maggiemoss/4b3bf2037014e116bc00706a16aef199

after:
 INFO 0 errors (6,884 ignored)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164913
Approved by: https://github.com/oulgen
2025-10-08 07:27:17 +00:00
3040a5d294 Revert "[dynamo] Support torch.fx.traceback.annotate (#164678)"
This reverts commit 801e282f39e9ef4424dfd3ecfd2b550a44595229.

Reverted https://github.com/pytorch/pytorch/pull/164678 on behalf of https://github.com/izaitsevfb due to breaks executorch internally, see [D84068062](https://www.internalfb.com/diff/D84068062?entry_point=16) ([comment](https://github.com/pytorch/pytorch/pull/164678#issuecomment-3379281844))
2025-10-08 01:49:34 +00:00
801e282f39 [dynamo] Support torch.fx.traceback.annotate (#164678)
Builds on top of https://github.com/pytorch/pytorch/pull/163673 and https://github.com/pytorch/pytorch/pull/164174. This will be used in the followup PRs to apply regional inductor compilation.

The existing implementation let Dynamo trace into the `torch.fx.traceback.annotate`, but thats not what we want. We want Dynamo to essentially run the torch.fx.traceback.annotate function in eager, so that every Fx node created in Dynamo Fx graph has the custom meta node.

This does not work with graph breaks yet. But we can solve that problem, if needed, in a separate PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164678
Approved by: https://github.com/SherlockNoMad, https://github.com/jansel, https://github.com/xmfan
2025-10-07 14:54:26 +00:00
cfc5cc17dc Revert "[dynamo] Support torch.fx.traceback.annotate (#164678)"
This reverts commit 2883b5ab773daf5861d43ff0b65be49a441ab3f9.

Reverted https://github.com/pytorch/pytorch/pull/164678 on behalf of https://github.com/izaitsevfb due to fails inductor:max_autotune tests internally, see D83948169 ([comment](https://github.com/pytorch/pytorch/pull/164678#issuecomment-3374407009))
2025-10-06 22:03:42 +00:00
2883b5ab77 [dynamo] Support torch.fx.traceback.annotate (#164678)
Builds on top of https://github.com/pytorch/pytorch/pull/163673 and https://github.com/pytorch/pytorch/pull/164174. This will be used in the followup PRs to apply regional inductor compilation.

The existing implementation let Dynamo trace into the `torch.fx.traceback.annotate`, but thats not what we want. We want Dynamo to essentially run the torch.fx.traceback.annotate function in eager, so that every Fx node created in Dynamo Fx graph has the custom meta node.

This does not work with graph breaks yet. But we can solve that problem, if needed, in a separate PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164678
Approved by: https://github.com/SherlockNoMad, https://github.com/jansel, https://github.com/xmfan
2025-10-06 02:59:24 +00:00
1e754d5a80 docs and optional kwargs for full graph capture (#163550)
Test Plan: existing tests

Differential Revision: D82995546

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163550
Approved by: https://github.com/tugsbayasgalan
2025-09-24 01:20:27 +00:00
6ef74879f6 [dynamo] Fix TorchFunctionMode handling with get_rng_state (#163412)
Fixes #162624
Fixes #162586

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163412
Approved by: https://github.com/eellison
ghstack dependencies: #163386, #163398, #163387, #163414, #163415, #163419, #163434, #163393
2025-09-23 15:37:45 +00:00
9e3725e8e5 make fullgraph_capture work on mod, args, kwargs (#162849)
Summary:
Today `fullgraph_capture` takes a frame, but clients usually take a callable (`nn.Module`, function, or method) and example inputs (args and kwargs) and then explicitly set up the frame to pass. This is boilerplate—and potentially tricky to get right—that can be hidden inside the API.

The original `fullgraph_capture` now becomes `_fullgraph_capture_frame`.

Test Plan:
existing tests

Rollback Plan:

Differential Revision: D82339400

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162849
Approved by: https://github.com/zhxchen17
2025-09-20 22:48:06 +00:00
d70c0babf5 minimize graph capture output (#162211)
Currently OutputGraphGuardsState is separated out as a serializable interface for OutputGraph, but some of the typing around it is incorrect in dynamo's guards.py and output_graph.py: more fields are used by code than claimed by OutputGraphGuardsState, and it works because either the full OutputGraph is passed in or the parts that use those fields are dead when OutputGraphGuardsState is passed in.
In this PR we try to further separate the necessary fields of OutputGraph that should be retained by a full graph capture mechanism, not just limited to dynamo (as it is currently) but also something like make_fx (in the future). Since these fields do not need to be serialized, the result is an intermediate "common" data structure that is between OutputGraphGuardsState and OutputGraph in the inheritance hierarchy.

Differential Revision: D81718791

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162211
Approved by: https://github.com/zhxchen17
2025-09-20 15:52:28 +00:00
6189a5f731 [dynamo][ez] Initialize tracer_output to None by default. (#163169)
Summary:
In edge cases, tracer_output can be left unset if there's double exception raised which causes the following issue:
```
UnboundLocalError: local variable 'tracer_output' referenced before assignment
```

Default initialize this variable so that it's always present.

Test Plan:
CI

Rollback Plan:

Differential Revision: D82652815

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163169
Approved by: https://github.com/tugsbayasgalan
2025-09-18 01:30:23 +00:00
e93706c2c8 [Intel GPU][pre_compile] Add XPU toolkit version and hardware info in compiled model check. (#162951)
Following #162438, this PR generalized the origin CUDA only check, and add XPU check.

Fixes #162939, Fixes #162938, Fixes #163032,Fixes #163045

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162951
Approved by: https://github.com/EikanWang, https://github.com/jansel
2025-09-18 00:04:22 +00:00
ccb450b190 [pre_compile] Add check for cuda and hardware version (#162438)
if we detect compiled model is using cuda in meaningful way, we should store information about cuda + hardware

 Example: `SystemInfo(python_version='3.12.9', torch_version='2.9.0a0+gite02b0e6', cuda_version='12.6', triton_version=(3, 4), gpu_name='NVIDIA PG509-210')`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162438
Approved by: https://github.com/zhxchen17
2025-09-12 01:42:07 +00:00
6d65737aee testing infra and some fixes (#162183)
This PR is quite large in that it covers most of rough edges in the new strict export flow:

1. Handle nn_module_stack correctly now that we are tracing wrapper module
2. module_call_spec needs to get queried from source directly because we are not running the bytecode anymore.
3. Correct input and output handling.

@diff-train-skip-merge

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162183
Approved by: https://github.com/zhxchen17
2025-09-10 20:48:12 +00:00
60d009267e Revert "testing infra and some fixes (#162183)"
This reverts commit d8b6622bb6a3879d3832ab6cdc26ff4188ea4a2d.

Reverted https://github.com/pytorch/pytorch/pull/162183 on behalf of https://github.com/huydhn due to Failing a test on macos ([comment](https://github.com/pytorch/pytorch/pull/162183#issuecomment-3268922096))
2025-09-09 05:26:32 +00:00
d8b6622bb6 testing infra and some fixes (#162183)
This PR is quite large in that it covers most of rough edges in the new strict export flow:

1. Handle nn_module_stack correctly now that we are tracing wrapper module
2. module_call_spec needs to get queried from source directly because we are not running the bytecode anymore.
3. Correct input and output handling.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162183
Approved by: https://github.com/zhxchen17
ghstack dependencies: #162167
2025-09-09 02:42:11 +00:00
189a054cfb Remove guard_size_oblivious from default contiguity python check, and add aten.sym_is_contiguous. [attempt2] (#160869)
[relanding again after fixing internal build]
Summary:
This might cause some new DDEs on call sites that do not use is_contiguous_or_false() or sym_is_contiguous()
but want to find those call sites to handle this properly by calling  is_contiguous_or_false() and not is_contiguous() explitly when appropriate.
I had to fix one issue after removing the implicit size oblivious reasoning. here is context

we defined in this https://github.com/pytorch/pytorch/pull/157472 sym_is_contiguous to be the function computing contiguity for dynamic shapes in c++. It returns a symbolic expression that represents contiguity and guaranteed not to throw a DDE.

when people call is_contiguous we do sym_is_contiguous().guard_bool()
when people call is_contiguous_or_false we do sym_is_contiguous().guard_or_false()

one issue not handled well was this path
```
c10::SymBool TensorImpl::sym_is_contiguous_custom(
    at::MemoryFormat memory_format) const {
  if (C10_UNLIKELY(matches_python_custom(SizesStridesPolicy::CustomStrides))) {
    return pyobj_slot_.load_pyobj_interpreter()->is_contiguous(
        this, memory_format);
  }

  return sym_is_contiguous_default(memory_format);
}
```
namely if we call sym_is_contiguous_custom but we have matches_python_custom(SizesStridesPolicy::CustomStrides) return true , then we used to call is_contiguous(this, memory_format);

This used to go through the load_pyobj_interpreter and end up calling the python is_contiguous call which used implicit size oblivious reasoning.
once we removed that implicit size oblivious reasoning, the right thing we want is to call
return pyobj_slot_.load_pyobj_interpreter()->sym_is_contiguous(this, memory_format);
otherwise we would get DDE even if the caller is doing sym_is_contiguous.

so I had to define it for pyinterpreter, and then I had to override it for nested tensors.

Approved by: https://github.com/ezyang

Test Plan:
contbuild & OSS CI, see e444cd24d4

Rollback Plan:

Differential Revision: D80435179

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160869
Approved by: https://github.com/ezyang
2025-09-08 22:59:13 +00:00
047603d35b New export implementation with flat inp/out (#162167)
This is my first attempt of building new export API. The main thing it addresses is correctly getting input and output relations. Subsequent diffs willl add functionality for dynamic shapes, nn_module_stack etc.

Differential Revision: [D81793205](https://our.internmc.facebook.com/intern/diff/D81793205)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162167
Approved by: https://github.com/zhxchen17, https://github.com/avikchaudhuri
2025-09-06 20:03:52 +00:00
f36f285953 [dynamo] change error_on_graph_break/fullgraph semantics (#161747)
This PR implements the semantics change to `torch._dynamo.error_on_graph_break`:
- ~`torch.compile` now has a new `error_on_graph_break` kwarg that serves as a lower-priority toggle for erroring/continuing on graph breaks~
- `error_on_graph_break` is a new internal `torch.compile `setting that is lower-priority than `fullgraph`. It allows the user to toggle erroring/continuing on graph breaks.
- `error_on_graph_break` does nothing when `fullgraph=True`
- `error_on_graph_break` does NOT guarantee a single graph

Followup [DONE]: need to change the programming model docs to reflect the 3 graph break modes for compilation:
- `fullgraph=True`: enforce one graph, no graph breaks, cannot be toggled
- `fullgraph=False, error_on_graph_break=True`: errors on graph breaks, latter can be toggled during compile time
- `fullgraph=False, error_on_graph_break=False`: resumes tracing on graph breaks, latter can be toggled during compile time

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161747
Approved by: https://github.com/mlazos
ghstack dependencies: #161739
2025-09-04 17:10:17 +00:00
8678d831c4 [dynamo] rename set_fullgraph to error_on_graph_break (#161739)
Renaming `set_fullgraph` to `error_on_graph_break` for now. There are no semantic differences yet. In a followup PR, we will introduce a new `torch.compile` option `error_on_graph_break` that has lower priority than `fullgraph` so that `fullgraph` really returns 1 graph.

I could keep `set_fullgraph` as a deprecated alias for `error_on_graph_break` for now, but I'm hoping that won't be necessary since it's still private API (there are no internal callsites yet, and there are no significant OSS callsites yet).

 cc @albanD @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames @Lucaskabela @mlazos @guilhermeleobas @xmfan as primary users for `set_fullgraph`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161739
Approved by: https://github.com/xmfan, https://github.com/Lucaskabela, https://github.com/anijain2305, https://github.com/mlazos
2025-09-04 01:15:06 +00:00
8ec551bb35 [aot-compile] strip internal tracebacks for non-verbose graph breaks + include user file/lineno (#162005)
pytest test/dynamo/test_aot_compile.py -k test_aot_compile_graph_break_error_fmt

before
```
Traceback (most recent call last):
  File "/data/users/$USER/vllm-tests/graph-break.py", line 15, in <module>
    aot_compiled_fn = compiled.aot_compile((example_inputs, {}))
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/users/$USER/pytorch/torch/_dynamo/eval_frame.py", line 717, in aot_compile
    return aot_compile_fullgraph(
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/data/users/$USER/pytorch/torch/_dynamo/aot_compile.py", line 132, in aot_compile_fullgraph
    capture_output = convert_frame.fullgraph_capture(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/users/$USER/pytorch/torch/_dynamo/convert_frame.py", line 947, in fullgraph_capture
    dynamo_output = compile_frame(
                    ^^^^^^^^^^^^^^
  File "/data/users/$USER/pytorch/torch/_dynamo/convert_frame.py", line 1020, in compile_frame
    bytecode, tracer_output = transform_code_object(code, transform)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/users/$USER/pytorch/torch/_dynamo/bytecode_transformation.py", line 1592, in transform_code_object
    tracer_output = transformations(instructions, code_options)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/users/$USER/pytorch/torch/_dynamo/convert_frame.py", line 992, in transform
    tracer_output = trace_frame(
                    ^^^^^^^^^^^^
  File "/data/users/$USER/pytorch/torch/_dynamo/convert_frame.py", line 312, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/data/users/$USER/pytorch/torch/_dynamo/convert_frame.py", line 821, in trace_frame
    run_tracer()
  File "/data/users/$USER/pytorch/torch/_dynamo/convert_frame.py", line 803, in run_tracer
    tracer.run()
  File "/data/users/$USER/pytorch/torch/_dynamo/symbolic_convert.py", line 1472, in run
    while self.step():
          ^^^^^^^^^^^
  File "/data/users/$USER/pytorch/torch/_dynamo/symbolic_convert.py", line 1342, in step
    self.dispatch_table[inst.opcode](self, inst)
  File "/data/users/$USER/pytorch/torch/_dynamo/symbolic_convert.py", line 902, in wrapper
    return inner_fn(self, inst)
           ^^^^^^^^^^^^^^^^^^^^
  File "/data/users/$USER/pytorch/torch/_dynamo/symbolic_convert.py", line 3364, in CALL
    self._call(inst)
  File "/data/users/$USER/pytorch/torch/_dynamo/symbolic_convert.py", line 3358, in _call
    self.call_function(fn, args, kwargs)
  File "/data/users/$USER/pytorch/torch/_dynamo/symbolic_convert.py", line 1260, in call_function
    self.push(fn.call_function(self, args, kwargs))  # type: ignore[arg-type]
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/users/$USER/pytorch/torch/_dynamo/variables/lazy.py", line 212, in realize_and_forward
    return getattr(self.realize(), name)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/users/$USER/pytorch/torch/_dynamo/variables/functions.py", line 1513, in call_function
    unimplemented_v2(
  File "/data/users/$USER/pytorch/torch/_dynamo/exc.py", line 596, in unimplemented_v2
    raise Unsupported(msg)
torch._dynamo.exc.Unsupported: Call to `torch._dynamo.graph_break()`
  Explanation: User-inserted graph break. Message: None
  Hint: Remove the `torch._dynamo.graph_break()` call.

  Developer debug context: Called `torch._dynamo.graph_break()` with args `[]`, kwargs `{}`

 For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0025.html
```
after
```
Traceback (most recent call last):
  File "/data/users/$USER/vllm-tests/graph-break.py", line 15, in <module>
    aot_compiled_fn = compiled.aot_compile((example_inputs, {}))
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/users/$USER/pytorch/torch/_dynamo/eval_frame.py", line 737, in aot_compile
    raise e.with_traceback(None) from e.__cause__  # User compiler error
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch._dynamo.exc.Unsupported: Call to `torch._dynamo.graph_break()`
  Explanation: User-inserted graph break. Message: None
  Hint: Remove the `torch._dynamo.graph_break()` call.

  Developer debug context: Called `torch._dynamo.graph_break()` with args `[]`, kwargs `{}`

 For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0025.html

from user code:
   File "/data/users/$USER/vllm-tests/graph-break.py", line 5, in foo
    torch._dynamo.graph_break()

Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
```
consistent w/ std torch.compile
```
Traceback (most recent call last):
  File "/data/users/$USER/vllm-tests/graph-break.py", line 16, in <module>
    res = compiled(*example_inputs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/users/$USER/pytorch/torch/_dynamo/eval_frame.py", line 850, in compile_wrapper
    raise e.with_traceback(None) from e.__cause__  # User compiler error
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch._dynamo.exc.Unsupported: Call to `torch._dynamo.graph_break()`
  Explanation: User-inserted graph break. Message: None
  Hint: Remove the `torch._dynamo.graph_break()` call.

  Developer debug context: Called `torch._dynamo.graph_break()` with args `[]`, kwargs `{}`

 For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0025.html

from user code:
   File "/data/users/$USER/vllm-tests/graph-break.py", line 5, in foo
    torch._dynamo.graph_break()

Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162005
Approved by: https://github.com/zhxchen17, https://github.com/tugsbayasgalan
2025-09-03 23:19:47 +00:00
e4bd0ff4f8 [aot precompile] Handle closure variables. (#161990)
We previously assume aot precompile should only work on non closures. This is hard to enforce in practice because we will see a lot of cases with decorater (e.g. hugging face models)
```
def check_inputs(fn):
    def _fn(self, *args, **kwargs):
        for arg in args:
            assert arg.shape[0] > 1

        return fn(*args, **kwargs)
    return _fn

@check_inputs
def foo(x, y):
    a = x + x
    b = y + y
    c = a + b
    return c
```
It doesn't make sense to not support these cases since they are straightfowrad to do.

This PR adds the logic to handle closure and make sure they can be precompiled properly.

Differential Revision: [D81509535](https://our.internmc.facebook.com/intern/diff/D81509535/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161990
Approved by: https://github.com/angelayi
2025-09-02 22:26:04 +00:00
5edc3d814f Add option for TorchDispatchMode to ignore torch.compile internals (#161648)
If TorchDispatchMode.ignore_compile_internals() is True, then we turn
off the TorchDispatchMode during the compilation process, instead
turning it back on during runtime of the compiled artifact.

Test Plan:
- new test

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161648
Approved by: https://github.com/bdhirsh
2025-08-28 02:41:33 +00:00
97a548b640 [PGO] skip allowlist logging for empty graphs (#161530)
Summary: reduces spurious logging

Test Plan:
test_pgo

Rollback Plan:

Differential Revision: D81060182

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161530
Approved by: https://github.com/bobrenjc93, https://github.com/mlazos
2025-08-28 00:12:13 +00:00
10d93325b1 [dynamo, nested graph breaks] support very simple nested graph breaks (#159329)
e.g. this graph breaks once now:
```python
import torch

torch._dynamo.config.nested_graph_breaks = True

def inner(x):
    x = x + 1
    torch._dynamo.graph_break()
    return x + 2

@torch.compile(backend="eager")
def outer(x):
    return inner(x)

print(outer(torch.ones(3)))
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159329
Approved by: https://github.com/anijain2305
2025-08-27 21:53:37 +00:00
a4fb65701b Revert "[dynamo, nested graph breaks] support very simple nested graph breaks (#159329)"
This reverts commit 8dab6d4c414bf997297804008c3da893e69cd51f.

Reverted https://github.com/pytorch/pytorch/pull/159329 on behalf of https://github.com/atalman due to failing internally ([comment](https://github.com/pytorch/pytorch/pull/159329#issuecomment-3225617445))
2025-08-26 20:24:10 +00:00
74124d1b46 [reland] [dynamo] Refactor convert_frame.compile_frame to be self contained function. [5/n] (#161514)
Summary:
convert_frame.compile_frame used to take a callback transform function which will capture the frame object it has, but the frame information is not passed directly into compile_frame function.

This PR changes the signature of compile_frame so that frame information is directly passed in the function without taking a callback. This makes it easier to build fullgraph capture API on top of compile_frame.

Test Plan:
CI

Rollback Plan:

Differential Revision: D81041296

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161514
Approved by: https://github.com/tugsbayasgalan
2025-08-26 19:16:05 +00:00
e795450a35 Revert "[dynamo] Refactor convert_frame.compile_frame to be self contained function. [5/n] (#160900)"
This reverts commit 447d34b5f80fb7350f79decd855cb599cab39083.

Reverted https://github.com/pytorch/pytorch/pull/160900 on behalf of https://github.com/atalman due to reverting since can't land existing diff internally, will need to reland it ([comment](https://github.com/pytorch/pytorch/pull/160900#issuecomment-3224029031))
2025-08-26 12:45:59 +00:00
8dab6d4c41 [dynamo, nested graph breaks] support very simple nested graph breaks (#159329)
e.g. this graph breaks once now:
```python
import torch

torch._dynamo.config.nested_graph_breaks = True

def inner(x):
    x = x + 1
    torch._dynamo.graph_break()
    return x + 2

@torch.compile(backend="eager")
def outer(x):
    return inner(x)

print(outer(torch.ones(3)))
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159329
Approved by: https://github.com/anijain2305
ghstack dependencies: #157971, #159281, #144516
2025-08-26 00:58:07 +00:00
447d34b5f8 [dynamo] Refactor convert_frame.compile_frame to be self contained function. [5/n] (#160900)
convert_frame.compile_frame used to take a callback transform function which will capture the frame object it has, but the frame information is not passed directly into compile_frame function.

This PR changes the signature of compile_frame so that frame information is directly passed in the function without taking a callback. This makes it easier to build fullgraph capture API on top of compile_frame.
@exported-using-ghexport

Differential Revision: [D80469801](https://our.internmc.facebook.com/intern/diff/D80469801/)

Differential Revision: [D80469801](https://our.internmc.facebook.com/intern/diff/D80469801)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160900
Approved by: https://github.com/tugsbayasgalan, https://github.com/anijain2305
2025-08-25 23:16:21 +00:00
3e210f90c2 Revert "[dynamo] Refactor convert_frame.compile_frame to be self contained function. [5/n] (#160900)"
This reverts commit 1113e7de30da95973c1eac7921601f9a0e94f2db.

Reverted https://github.com/pytorch/pytorch/pull/160900 on behalf of https://github.com/atalman due to executorch failure ([comment](https://github.com/pytorch/pytorch/pull/160900#issuecomment-3221372096))
2025-08-25 18:56:18 +00:00
1113e7de30 [dynamo] Refactor convert_frame.compile_frame to be self contained function. [5/n] (#160900)
convert_frame.compile_frame used to take a callback transform function which will capture the frame object it has, but the frame information is not passed directly into compile_frame function.

This PR changes the signature of compile_frame so that frame information is directly passed in the function without taking a callback. This makes it easier to build fullgraph capture API on top of compile_frame.
@exported-using-ghexport

Differential Revision: [D80469801](https://our.internmc.facebook.com/intern/diff/D80469801/)

Differential Revision: [D80469801](https://our.internmc.facebook.com/intern/diff/D80469801)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160900
Approved by: https://github.com/tugsbayasgalan, https://github.com/anijain2305
2025-08-25 14:53:54 +00:00
2fdd4f918c Log exception_stack_trace to dynamo_compile (#161096)
Note: Adding unit test for this is tricky as having errors in the specific unit test would cause test_utils.py to crash all together.

Tested as follows:
1. Added x = 1/0 after guarded_code = compile_inner(code, one_graph, hooks, transform) in convert_frame.py
2. Printed exception_stack_trace and got: ['Traceback (most recent call last):\n  File "/data/users/jovian/pytorch/torch/_dynamo/convert_frame.py", line 1207, in _compile\n    x = 1/0\n        ~^~\nZeroDivisionError: division by zero\n']

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161096
Approved by: https://github.com/c00w
2025-08-22 03:29:15 +00:00
c02e26bf31 Fix filename showing up as ints in dynamo_compile stack_trace column. (#160916)
Test plan:
$ python -m test_utils

Note:
Another way is adding the actual file_name to from_traceback, but since it's referenced in multiple places and may have associated tests this seems safer. Lmk if changes are needed @c00w

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160916
Approved by: https://github.com/c00w, https://github.com/masnesral
2025-08-20 18:38:38 +00:00
5255e65c01 [dynamo] Refactor convert_frame to remove usage of nonlocal tracer output return. [4/n] (#160899)
Today convert_frame is implemented like the following:
```
def _compile():
    tracer_output = None
    def transform():
        nonlocal tracer_output
        ...
    def _compile_inner():
         transform(...)

     compile_inner(...)
```

The code is using unconventional nonlocal variable as the return value. This is not ideal for 2 reasons:
1. Reasoning about the code, especially together with error handling code becomes harder.
2. more importantly, this makes it harder to extract out common code pieces into a shared library because everything must depend on a central global state.

In this diff we remove the usage of nonlocal return and just use the conventional function return to output the compilation data.

Differential Revision: [D80461258](https://our.internmc.facebook.com/intern/diff/D80461258/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160899
Approved by: https://github.com/tugsbayasgalan
ghstack dependencies: #160814, #160815, #160855
2025-08-20 17:37:26 +00:00
9e050b6339 [dynamo] Refactor convert_frame._compile_inner to return compiled bytecode + output graph. [3/n] (#160855)
We are refactoring dynamo code for convert frame so that we can have modularized pieces sharable between different compiler frontends (e.g. torch.compile, precompile and torch.export).

This PR adds a new helper function compile_frame() which takes a bytecode and a transform function and return compiled bytecode + output graph as DynamoOutput type.

Differential Revision: [D80430802](https://our.internmc.facebook.com/intern/diff/D80430802/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160855
Approved by: https://github.com/tugsbayasgalan
ghstack dependencies: #160814, #160815
2025-08-20 17:37:26 +00:00
599f639ddb [dynamo] Refactor transform() so that instruction translator can be used as a tracing function. [2/n] (#160815)
We are refactoring dynamo code for convert frame so that we can have modularized pieces sharable between different compiler frontends (e.g. torch.compile, precompile and torch.export).

This PR follows the last one which separate out the part to run instruction translator on a given frame and return a DynamoTracerOutput.

The end result is a free function that runs instruction translator indepedently. A follow up diff will wrap the low level function.

Differential Revision: [D80388694](https://our.internmc.facebook.com/intern/diff/D80388694/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160815
Approved by: https://github.com/anijain2305
ghstack dependencies: #160814
2025-08-20 01:16:35 +00:00
e9209e0854 [dynamo] Refactor tracer logic in convert_frame so that it doesn't leak to outer layer. [1/n] (#160814)
We are refactoring dynamo code for convert frame so that we can have modularized pieces sharable between different compiler frontends (e.g. torch.compile, precompile and torch.export).

One incremental step we can take is to refactor out InstructionTranslator as a functional piece providing bytecode tracing.

To separate out this part, we notice currently the tracer object is being passed around in the entire convert frame compile function. This is not very ideal because we want to build a boundary between the tracing and downstream compiler stack. Ideally, we should extract all the relevant information out of the tracer object and return a new data structure that is free of internal states of InstructionTranslator.

Luckily, there aren't many data used from tracer, after tracing is finished. The major one is OutputGraph, other than that, we only need to record two boolean flags for error handling purposes.

The new type we're adding is called DynamoTracerOutput, which contains all the information needed by torch.compile internal after symbolic convert is finished. To simplify the current PR, we leave out the part which reduce OutputGraph into a minimal set, since this can be done in a separate PR.

Differential Revision: [D80388693](https://our.internmc.facebook.com/intern/diff/D80388693/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160814
Approved by: https://github.com/tugsbayasgalan
2025-08-19 01:46:24 +00:00
4014672b30 Replace guard_serialization_mode with save_guards, remove load cases (#160531)
This PR replaces "guard_serialization_mode" into `save_guards`. All cases where we care about whether or not we're *loading* guards can be inferred automatically from the existing inputs.

The only case that's special here is whether or not to check guards. We don't want to check guards on guard load in CheckFnManager, because these guards have already been checked on save. Therefore, we put the setting in OutputGraphGuardsState, so that when we save, we bypass the guards check.

Because of this change, it is *technically* possible to do a load and a save in the *same* CheckFunctionManager.__init__() by passing all the necessary parts, and also passing `save_guards=True`. This should just work out of the box, but so far no callsites need it, so not super important.

Next up, we'll work on removing save_guards from GuardBuilder, and putting it into its own phase.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160531
Approved by: https://github.com/zhxchen17
2025-08-18 17:04:17 +00:00
b82aa3df20 Revert "Remove guard_size_oblivious from default contiguity python check, and add aten.sym_is_contiguous. (#159197)"
This reverts commit e444cd24d48b3a46f067974f2cc157f5ed27709f.

Reverted https://github.com/pytorch/pytorch/pull/159197 on behalf of https://github.com/laithsakka due to internal build failures ([comment](https://github.com/pytorch/pytorch/pull/159197#issuecomment-3195436668))
2025-08-18 07:22:13 +00:00
e444cd24d4 Remove guard_size_oblivious from default contiguity python check, and add aten.sym_is_contiguous. (#159197)
This might cause some new DDEs on call sites that do not use is_contiguous_or_false() or sym_is_contiguous()
but want to find those call sites to handle this properly by calling  is_contiguous_or_false() and not is_contiguous() explitly when appropriate.
I had to fix one issue after removing the implicit size oblivious reasoning. here is context

we defined in this https://github.com/pytorch/pytorch/pull/157472 sym_is_contiguous to be the function computing contiguity for dynamic shapes in c++. It returns a symbolic expression that represents contiguity and guaranteed not to throw a DDE.

when people call is_contiguous we do sym_is_contiguous().guard_bool()
when people call is_contiguous_or_false we do sym_is_contiguous().guard_or_false()

one issue not handled well was this path
```
c10::SymBool TensorImpl::sym_is_contiguous_custom(
    at::MemoryFormat memory_format) const {
  if (C10_UNLIKELY(matches_python_custom(SizesStridesPolicy::CustomStrides))) {
    return pyobj_slot_.load_pyobj_interpreter()->is_contiguous(
        this, memory_format);
  }

  return sym_is_contiguous_default(memory_format);
}
```
namely if we call sym_is_contiguous_custom but we have matches_python_custom(SizesStridesPolicy::CustomStrides) return true , then we used to call is_contiguous(this, memory_format);

This used to go through the load_pyobj_interpreter and end up calling the python is_contiguous call which used implicit size oblivious reasoning.
once we removed that implicit size oblivious reasoning, the right thing we want is to call
return pyobj_slot_.load_pyobj_interpreter()->sym_is_contiguous(this, memory_format);
otherwise we would get DDE even if the caller is doing sym_is_contiguous.

so I had to define it for pyinterpreter, and then I had to override it for nested tensors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159197
Approved by: https://github.com/ezyang
2025-08-16 09:15:58 +00:00
0242d40fa5 Enable trace through the collections module (#159365)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159365
Approved by: https://github.com/zou3519
2025-08-15 19:08:21 +00:00
052c441cf4 Add logging for when inbuilt_inline_nn_modules will help with ID_MATCH guard triggered recompiles (#160592)
We add a logging around when an ID_MATCH guard is added at a place where inbuilt_inline_nn_modules would inline it. This is done with the aim of tagging recompiles that could be avoided by setting inbuilt_inline_nn_modules flag.
It will help us log and track the flag's adoption and potentially quantify saving in the the number of recompiles.

Differential Revision: D80075975

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160592
Approved by: https://github.com/anijain2305
2025-08-15 17:09:39 +00:00
cd8d8c18f5 [pytorch][dynamo_compile] Log graph_node_shape to dynamo_compile (#160556)
This PR adds the dynamo graph node shape logging to dynamo compile. Also added unit tests to check if correct graph node shape is being logged.

Test Plan:
$ python -m test_utils
Ran 12 tests in 36.447s
OK

Note: Will merge after D80185628 lands.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160556
Approved by: https://github.com/masnesral, https://github.com/jingsh
2025-08-14 16:42:35 +00:00
9a0f7a3bb0 [retry-land][pytorch][dynamo_compile] Log stack_trace to dynamo_compile (#160348)
refer: https://github.com/pytorch/pytorch/pull/159655

Earlier pr failed on dynamo/test_utils.py::TestDynamoTimed::test_dynamo_timed.
Updated test_dynamo_timed + re-ran locally to test.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160348
Approved by: https://github.com/masnesral
2025-08-12 06:24:54 +00:00