pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 12:54:11 +08:00

Author	SHA1	Message	Date
Yuanyuan Chen	3255e7872b	Enable all flake8-logging-format rules (#164655 ) These rules are enabled by removing existing suppressions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164655 Approved by: https://github.com/janeyx99, https://github.com/mlazos	2025-10-19 00:59:28 +00:00
Zhengxu Chen	86ebce1766	[precompile] Pass tensor_to_context to backend. (#165702 ) Summary: Fixing a VLLM issue https://github.com/vllm-project/vllm/issues/27040 where aot precompile fails on some models using symbolic shapes in inductor. Test Plan: pp HF_HUB_DISABLE_XET=1 VLLM_ENABLE_V1_MULTIPROCESSING=0 VLLM_USE_AOT_COMPILE=1 vllm bench latency --model microsoft/DialoGPT-small --input-len 128 --output-len 256 --num-iters 50 --dtype float16 Reviewers: Subscribers: Tasks: Tags: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/165702 Approved by: https://github.com/tugsbayasgalan	2025-10-17 21:52:04 +00:00
Zhengxu Chen	839f6facdb	[precompile] Fix frame construction for wrapped model. (#165454 ) Summary: If a function is wrapped with functools, we should not look at the wrapped function signature but rather the wrapper, since we need to construct the frame for the top level function here. Test Plan: test_decorated_function_with_functools_wrap_aot Differential Revision: D84626752 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165454 Approved by: https://github.com/yiming0416	2025-10-15 02:01:46 +00:00
Michael Lazos	04e36611bb	[user-cuda-streams] Pass streams/events to the graph via lookup table (#162899 ) Stores streams in a global object look table that maps a dynamo selected index to objects. This index is generated during tracing, and at runtime, a helper function is called from the bytecode to populate this map. This differs from the previous implementation that simply mapped IDs to the associated objects. This required specialization on the IDs of the specific objects, while this new approach does not. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162899 Approved by: https://github.com/anijain2305 ghstack dependencies: #163027	2025-10-14 05:43:19 +00:00
Pian Pawakapan	1f73b96668	[PGO] log missing sources in allowlist (#164881 ) Summary: - logs missing dynamic sources - emits MLHub insight only on size mismatch recompiles Test Plan: test_pgo Differential Revision: D84098898 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164881 Approved by: https://github.com/bobrenjc93	2025-10-09 04:39:09 +00:00
Animesh Jain	4308b8a28f	[dynamo] Support torch.fx.traceback.annotate (#164678 ) Builds on top of https://github.com/pytorch/pytorch/pull/163673 and https://github.com/pytorch/pytorch/pull/164174. This will be used in the followup PRs to apply regional inductor compilation. The existing implementation let Dynamo trace into the `torch.fx.traceback.annotate`, but thats not what we want. We want Dynamo to essentially run the torch.fx.traceback.annotate function in eager, so that every Fx node created in Dynamo Fx graph has the custom meta node. What does not work? * We still have to set the context manager `torch.fx.traceback.preserve_node_meta()` in the user code because CI was unhappy. This can be fixed but with some perseverance. * This does not work with graph breaks yet. But we can solve that problem, if needed, in a separate PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164678 Approved by: https://github.com/SherlockNoMad, https://github.com/jansel, https://github.com/xmfan	2025-10-08 22:41:00 +00:00
Maggie Moss	c855f8632e	Pyrefly suppressions 7/n (#164913 ) Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283 Almost there! Test plan: dmypy restart && python3 scripts/lintrunner.py -a pyrefly check step 1: delete lines in the pyrefly.toml file from the project-excludes field step 2: run pyrefly check step 3: add suppressions, clean up unused suppressions before: https://gist.github.com/maggiemoss/4b3bf2037014e116bc00706a16aef199 after: INFO 0 errors (6,884 ignored) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164913 Approved by: https://github.com/oulgen	2025-10-08 07:27:17 +00:00
PyTorch MergeBot	3040a5d294	Revert "[dynamo] Support torch.fx.traceback.annotate (#164678 )" This reverts commit 801e282f39e9ef4424dfd3ecfd2b550a44595229. Reverted https://github.com/pytorch/pytorch/pull/164678 on behalf of https://github.com/izaitsevfb due to breaks executorch internally, see [D84068062](https://www.internalfb.com/diff/D84068062?entry_point=16) ([comment](https://github.com/pytorch/pytorch/pull/164678#issuecomment-3379281844))	2025-10-08 01:49:34 +00:00
Animesh Jain	801e282f39	[dynamo] Support torch.fx.traceback.annotate (#164678 ) Builds on top of https://github.com/pytorch/pytorch/pull/163673 and https://github.com/pytorch/pytorch/pull/164174. This will be used in the followup PRs to apply regional inductor compilation. The existing implementation let Dynamo trace into the `torch.fx.traceback.annotate`, but thats not what we want. We want Dynamo to essentially run the torch.fx.traceback.annotate function in eager, so that every Fx node created in Dynamo Fx graph has the custom meta node. This does not work with graph breaks yet. But we can solve that problem, if needed, in a separate PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164678 Approved by: https://github.com/SherlockNoMad, https://github.com/jansel, https://github.com/xmfan	2025-10-07 14:54:26 +00:00
PyTorch MergeBot	cfc5cc17dc	Revert "[dynamo] Support torch.fx.traceback.annotate (#164678 )" This reverts commit 2883b5ab773daf5861d43ff0b65be49a441ab3f9. Reverted https://github.com/pytorch/pytorch/pull/164678 on behalf of https://github.com/izaitsevfb due to fails inductor:max_autotune tests internally, see D83948169 ([comment](https://github.com/pytorch/pytorch/pull/164678#issuecomment-3374407009))	2025-10-06 22:03:42 +00:00
Animesh Jain	2883b5ab77	[dynamo] Support torch.fx.traceback.annotate (#164678 ) Builds on top of https://github.com/pytorch/pytorch/pull/163673 and https://github.com/pytorch/pytorch/pull/164174. This will be used in the followup PRs to apply regional inductor compilation. The existing implementation let Dynamo trace into the `torch.fx.traceback.annotate`, but thats not what we want. We want Dynamo to essentially run the torch.fx.traceback.annotate function in eager, so that every Fx node created in Dynamo Fx graph has the custom meta node. This does not work with graph breaks yet. But we can solve that problem, if needed, in a separate PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164678 Approved by: https://github.com/SherlockNoMad, https://github.com/jansel, https://github.com/xmfan	2025-10-06 02:59:24 +00:00
Avik Chaudhuri	1e754d5a80	docs and optional kwargs for full graph capture (#163550 ) Test Plan: existing tests Differential Revision: D82995546 Pull Request resolved: https://github.com/pytorch/pytorch/pull/163550 Approved by: https://github.com/tugsbayasgalan	2025-09-24 01:20:27 +00:00
Jason Ansel	6ef74879f6	[dynamo] Fix TorchFunctionMode handling with get_rng_state (#163412 ) Fixes #162624 Fixes #162586 Pull Request resolved: https://github.com/pytorch/pytorch/pull/163412 Approved by: https://github.com/eellison ghstack dependencies: #163386, #163398, #163387, #163414, #163415, #163419, #163434, #163393	2025-09-23 15:37:45 +00:00
Avik Chaudhuri	9e3725e8e5	make fullgraph_capture work on mod, args, kwargs (#162849 ) Summary: Today `fullgraph_capture` takes a frame, but clients usually take a callable (`nn.Module`, function, or method) and example inputs (args and kwargs) and then explicitly set up the frame to pass. This is boilerplate—and potentially tricky to get right—that can be hidden inside the API. The original `fullgraph_capture` now becomes `_fullgraph_capture_frame`. Test Plan: existing tests Rollback Plan: Differential Revision: D82339400 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162849 Approved by: https://github.com/zhxchen17	2025-09-20 22:48:06 +00:00
Avik Chaudhuri	d70c0babf5	minimize graph capture output (#162211 ) Currently OutputGraphGuardsState is separated out as a serializable interface for OutputGraph, but some of the typing around it is incorrect in dynamo's guards.py and output_graph.py: more fields are used by code than claimed by OutputGraphGuardsState, and it works because either the full OutputGraph is passed in or the parts that use those fields are dead when OutputGraphGuardsState is passed in. In this PR we try to further separate the necessary fields of OutputGraph that should be retained by a full graph capture mechanism, not just limited to dynamo (as it is currently) but also something like make_fx (in the future). Since these fields do not need to be serialized, the result is an intermediate "common" data structure that is between OutputGraphGuardsState and OutputGraph in the inheritance hierarchy. Differential Revision: D81718791 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162211 Approved by: https://github.com/zhxchen17	2025-09-20 15:52:28 +00:00
Zhengxu Chen	6189a5f731	[dynamo][ez] Initialize tracer_output to None by default. (#163169 ) Summary: In edge cases, tracer_output can be left unset if there's double exception raised which causes the following issue: ``` UnboundLocalError: local variable 'tracer_output' referenced before assignment ``` Default initialize this variable so that it's always present. Test Plan: CI Rollback Plan: Differential Revision: D82652815 Pull Request resolved: https://github.com/pytorch/pytorch/pull/163169 Approved by: https://github.com/tugsbayasgalan	2025-09-18 01:30:23 +00:00
xinan.lin	e93706c2c8	[Intel GPU][pre_compile] Add XPU toolkit version and hardware info in compiled model check. (#162951 ) Following #162438, this PR generalized the origin CUDA only check, and add XPU check. Fixes #162939, Fixes #162938, Fixes #163032，Fixes #163045 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162951 Approved by: https://github.com/EikanWang, https://github.com/jansel	2025-09-18 00:04:22 +00:00
Shangdi Yu	ccb450b190	[pre_compile] Add check for cuda and hardware version (#162438 ) if we detect compiled model is using cuda in meaningful way, we should store information about cuda + hardware Example: `SystemInfo(python_version='3.12.9', torch_version='2.9.0a0+gite02b0e6', cuda_version='12.6', triton_version=(3, 4), gpu_name='NVIDIA PG509-210')` Pull Request resolved: https://github.com/pytorch/pytorch/pull/162438 Approved by: https://github.com/zhxchen17	2025-09-12 01:42:07 +00:00
Tugsbayasgalan Manlaibaatar	6d65737aee	testing infra and some fixes (#162183 ) This PR is quite large in that it covers most of rough edges in the new strict export flow: 1. Handle nn_module_stack correctly now that we are tracing wrapper module 2. module_call_spec needs to get queried from source directly because we are not running the bytecode anymore. 3. Correct input and output handling. @diff-train-skip-merge Pull Request resolved: https://github.com/pytorch/pytorch/pull/162183 Approved by: https://github.com/zhxchen17	2025-09-10 20:48:12 +00:00
PyTorch MergeBot	60d009267e	Revert "testing infra and some fixes (#162183 )" This reverts commit d8b6622bb6a3879d3832ab6cdc26ff4188ea4a2d. Reverted https://github.com/pytorch/pytorch/pull/162183 on behalf of https://github.com/huydhn due to Failing a test on macos ([comment](https://github.com/pytorch/pytorch/pull/162183#issuecomment-3268922096))	2025-09-09 05:26:32 +00:00
Tugsbayasgalan Manlaibaatar	d8b6622bb6	testing infra and some fixes (#162183 ) This PR is quite large in that it covers most of rough edges in the new strict export flow: 1. Handle nn_module_stack correctly now that we are tracing wrapper module 2. module_call_spec needs to get queried from source directly because we are not running the bytecode anymore. 3. Correct input and output handling. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162183 Approved by: https://github.com/zhxchen17 ghstack dependencies: #162167	2025-09-09 02:42:11 +00:00
Laith Sakka	189a054cfb	Remove guard_size_oblivious from default contiguity python check, and add aten.sym_is_contiguous. [attempt2] (#160869 ) [relanding again after fixing internal build] Summary: This might cause some new DDEs on call sites that do not use is_contiguous_or_false() or sym_is_contiguous() but want to find those call sites to handle this properly by calling is_contiguous_or_false() and not is_contiguous() explitly when appropriate. I had to fix one issue after removing the implicit size oblivious reasoning. here is context we defined in this https://github.com/pytorch/pytorch/pull/157472 sym_is_contiguous to be the function computing contiguity for dynamic shapes in c++. It returns a symbolic expression that represents contiguity and guaranteed not to throw a DDE. when people call is_contiguous we do sym_is_contiguous().guard_bool() when people call is_contiguous_or_false we do sym_is_contiguous().guard_or_false() one issue not handled well was this path ``` c10::SymBool TensorImpl::sym_is_contiguous_custom( at::MemoryFormat memory_format) const { if (C10_UNLIKELY(matches_python_custom(SizesStridesPolicy::CustomStrides))) { return pyobj_slot_.load_pyobj_interpreter()->is_contiguous( this, memory_format); } return sym_is_contiguous_default(memory_format); } ``` namely if we call sym_is_contiguous_custom but we have matches_python_custom(SizesStridesPolicy::CustomStrides) return true , then we used to call is_contiguous(this, memory_format); This used to go through the load_pyobj_interpreter and end up calling the python is_contiguous call which used implicit size oblivious reasoning. once we removed that implicit size oblivious reasoning, the right thing we want is to call return pyobj_slot_.load_pyobj_interpreter()->sym_is_contiguous(this, memory_format); otherwise we would get DDE even if the caller is doing sym_is_contiguous. so I had to define it for pyinterpreter, and then I had to override it for nested tensors. Approved by: https://github.com/ezyang Test Plan: contbuild & OSS CI, see `e444cd24d4` Rollback Plan: Differential Revision: D80435179 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160869 Approved by: https://github.com/ezyang	2025-09-08 22:59:13 +00:00
Tugsbayasgalan Manlaibaatar	047603d35b	New export implementation with flat inp/out (#162167 ) This is my first attempt of building new export API. The main thing it addresses is correctly getting input and output relations. Subsequent diffs willl add functionality for dynamic shapes, nn_module_stack etc. Differential Revision: [D81793205](https://our.internmc.facebook.com/intern/diff/D81793205) Pull Request resolved: https://github.com/pytorch/pytorch/pull/162167 Approved by: https://github.com/zhxchen17, https://github.com/avikchaudhuri	2025-09-06 20:03:52 +00:00
William Wen	f36f285953	[dynamo] change error_on_graph_break/fullgraph semantics (#161747 ) This PR implements the semantics change to `torch._dynamo.error_on_graph_break`: - ~`torch.compile` now has a new `error_on_graph_break` kwarg that serves as a lower-priority toggle for erroring/continuing on graph breaks~ - `error_on_graph_break` is a new internal `torch.compile `setting that is lower-priority than `fullgraph`. It allows the user to toggle erroring/continuing on graph breaks. - `error_on_graph_break` does nothing when `fullgraph=True` - `error_on_graph_break` does NOT guarantee a single graph Followup [DONE]: need to change the programming model docs to reflect the 3 graph break modes for compilation: - `fullgraph=True`: enforce one graph, no graph breaks, cannot be toggled - `fullgraph=False, error_on_graph_break=True`: errors on graph breaks, latter can be toggled during compile time - `fullgraph=False, error_on_graph_break=False`: resumes tracing on graph breaks, latter can be toggled during compile time Pull Request resolved: https://github.com/pytorch/pytorch/pull/161747 Approved by: https://github.com/mlazos ghstack dependencies: #161739	2025-09-04 17:10:17 +00:00
William Wen	8678d831c4	[dynamo] rename set_fullgraph to error_on_graph_break (#161739 ) Renaming `set_fullgraph` to `error_on_graph_break` for now. There are no semantic differences yet. In a followup PR, we will introduce a new `torch.compile` option `error_on_graph_break` that has lower priority than `fullgraph` so that `fullgraph` really returns 1 graph. I could keep `set_fullgraph` as a deprecated alias for `error_on_graph_break` for now, but I'm hoping that won't be necessary since it's still private API (there are no internal callsites yet, and there are no significant OSS callsites yet). cc @albanD @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames @Lucaskabela @mlazos @guilhermeleobas @xmfan as primary users for `set_fullgraph` Pull Request resolved: https://github.com/pytorch/pytorch/pull/161739 Approved by: https://github.com/xmfan, https://github.com/Lucaskabela, https://github.com/anijain2305, https://github.com/mlazos	2025-09-04 01:15:06 +00:00
dolpm	8ec551bb35	[aot-compile] strip internal tracebacks for non-verbose graph breaks + include user file/lineno (#162005 ) pytest test/dynamo/test_aot_compile.py -k test_aot_compile_graph_break_error_fmt before ``` Traceback (most recent call last): File "/data/users/$USER/vllm-tests/graph-break.py", line 15, in <module> aot_compiled_fn = compiled.aot_compile((example_inputs, {})) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/users/$USER/pytorch/torch/_dynamo/eval_frame.py", line 717, in aot_compile return aot_compile_fullgraph( ^^^^^^^^^^^^^^^^^^^^^^ File "/data/users/$USER/pytorch/torch/_dynamo/aot_compile.py", line 132, in aot_compile_fullgraph capture_output = convert_frame.fullgraph_capture( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/users/$USER/pytorch/torch/_dynamo/convert_frame.py", line 947, in fullgraph_capture dynamo_output = compile_frame( ^^^^^^^^^^^^^^ File "/data/users/$USER/pytorch/torch/_dynamo/convert_frame.py", line 1020, in compile_frame bytecode, tracer_output = transform_code_object(code, transform) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/users/$USER/pytorch/torch/_dynamo/bytecode_transformation.py", line 1592, in transform_code_object tracer_output = transformations(instructions, code_options) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/users/$USER/pytorch/torch/_dynamo/convert_frame.py", line 992, in transform tracer_output = trace_frame( ^^^^^^^^^^^^ File "/data/users/$USER/pytorch/torch/_dynamo/convert_frame.py", line 312, in _fn return fn(args, kwargs) ^^^^^^^^^^^^^^^^^^^ File "/data/users/$USER/pytorch/torch/_dynamo/convert_frame.py", line 821, in trace_frame run_tracer() File "/data/users/$USER/pytorch/torch/_dynamo/convert_frame.py", line 803, in run_tracer tracer.run() File "/data/users/$USER/pytorch/torch/_dynamo/symbolic_convert.py", line 1472, in run while self.step(): ^^^^^^^^^^^ File "/data/users/$USER/pytorch/torch/_dynamo/symbolic_convert.py", line 1342, in step self.dispatch_table[inst.opcode](self, inst) File "/data/users/$USER/pytorch/torch/_dynamo/symbolic_convert.py", line 902, in wrapper return inner_fn(self, inst) ^^^^^^^^^^^^^^^^^^^^ File "/data/users/$USER/pytorch/torch/_dynamo/symbolic_convert.py", line 3364, in CALL self._call(inst) File "/data/users/$USER/pytorch/torch/_dynamo/symbolic_convert.py", line 3358, in _call self.call_function(fn, args, kwargs) File "/data/users/$USER/pytorch/torch/_dynamo/symbolic_convert.py", line 1260, in call_function self.push(fn.call_function(self, args, kwargs)) # type: ignore[arg-type] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/users/$USER/pytorch/torch/_dynamo/variables/lazy.py", line 212, in realize_and_forward return getattr(self.realize(), name)(args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/users/$USER/pytorch/torch/_dynamo/variables/functions.py", line 1513, in call_function unimplemented_v2( File "/data/users/$USER/pytorch/torch/_dynamo/exc.py", line 596, in unimplemented_v2 raise Unsupported(msg) torch._dynamo.exc.Unsupported: Call to `torch._dynamo.graph_break()` Explanation: User-inserted graph break. Message: None Hint: Remove the `torch._dynamo.graph_break()` call. Developer debug context: Called `torch._dynamo.graph_break()` with args `[]`, kwargs `{}` For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0025.html ``` after ``` Traceback (most recent call last): File "/data/users/$USER/vllm-tests/graph-break.py", line 15, in <module> aot_compiled_fn = compiled.aot_compile((example_inputs, {})) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/users/$USER/pytorch/torch/_dynamo/eval_frame.py", line 737, in aot_compile raise e.with_traceback(None) from e.__cause__ # User compiler error ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ torch._dynamo.exc.Unsupported: Call to `torch._dynamo.graph_break()` Explanation: User-inserted graph break. Message: None Hint: Remove the `torch._dynamo.graph_break()` call. Developer debug context: Called `torch._dynamo.graph_break()` with args `[]`, kwargs `{}` For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0025.html from user code: File "/data/users/$USER/vllm-tests/graph-break.py", line 5, in foo torch._dynamo.graph_break() Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" ``` consistent w/ std torch.compile ``` Traceback (most recent call last): File "/data/users/$USER/vllm-tests/graph-break.py", line 16, in <module> res = compiled(example_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/users/$USER/pytorch/torch/_dynamo/eval_frame.py", line 850, in compile_wrapper raise e.with_traceback(None) from e.__cause__ # User compiler error ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ torch._dynamo.exc.Unsupported: Call to `torch._dynamo.graph_break()` Explanation: User-inserted graph break. Message: None Hint: Remove the `torch._dynamo.graph_break()` call. Developer debug context: Called `torch._dynamo.graph_break()` with args `[]`, kwargs `{}` For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0025.html from user code: File "/data/users/$USER/vllm-tests/graph-break.py", line 5, in foo torch._dynamo.graph_break() Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/162005 Approved by: https://github.com/zhxchen17, https://github.com/tugsbayasgalan	2025-09-03 23:19:47 +00:00
zhxchen17	e4bd0ff4f8	[aot precompile] Handle closure variables. (#161990 ) We previously assume aot precompile should only work on non closures. This is hard to enforce in practice because we will see a lot of cases with decorater (e.g. hugging face models) ``` def check_inputs(fn): def _fn(self, args, kwargs): for arg in args: assert arg.shape[0] > 1 return fn(args, **kwargs) return _fn @check_inputs def foo(x, y): a = x + x b = y + y c = a + b return c ``` It doesn't make sense to not support these cases since they are straightfowrad to do. This PR adds the logic to handle closure and make sure they can be precompiled properly. Differential Revision: [D81509535](https://our.internmc.facebook.com/intern/diff/D81509535/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/161990 Approved by: https://github.com/angelayi	2025-09-02 22:26:04 +00:00
rzou	5edc3d814f	Add option for TorchDispatchMode to ignore torch.compile internals (#161648 ) If TorchDispatchMode.ignore_compile_internals() is True, then we turn off the TorchDispatchMode during the compilation process, instead turning it back on during runtime of the compiled artifact. Test Plan: - new test Pull Request resolved: https://github.com/pytorch/pytorch/pull/161648 Approved by: https://github.com/bdhirsh	2025-08-28 02:41:33 +00:00
Pian Pawakapan	97a548b640	[PGO] skip allowlist logging for empty graphs (#161530 ) Summary: reduces spurious logging Test Plan: test_pgo Rollback Plan: Differential Revision: D81060182 Pull Request resolved: https://github.com/pytorch/pytorch/pull/161530 Approved by: https://github.com/bobrenjc93, https://github.com/mlazos	2025-08-28 00:12:13 +00:00
William Wen	10d93325b1	[dynamo, nested graph breaks] support very simple nested graph breaks (#159329 ) e.g. this graph breaks once now: ```python import torch torch._dynamo.config.nested_graph_breaks = True def inner(x): x = x + 1 torch._dynamo.graph_break() return x + 2 @torch.compile(backend="eager") def outer(x): return inner(x) print(outer(torch.ones(3))) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/159329 Approved by: https://github.com/anijain2305	2025-08-27 21:53:37 +00:00
PyTorch MergeBot	a4fb65701b	Revert "[dynamo, nested graph breaks] support very simple nested graph breaks (#159329 )" This reverts commit 8dab6d4c414bf997297804008c3da893e69cd51f. Reverted https://github.com/pytorch/pytorch/pull/159329 on behalf of https://github.com/atalman due to failing internally ([comment](https://github.com/pytorch/pytorch/pull/159329#issuecomment-3225617445))	2025-08-26 20:24:10 +00:00
Zhengxu Chen	74124d1b46	[reland] [dynamo] Refactor convert_frame.compile_frame to be self contained function. [5/n] (#161514 ) Summary: convert_frame.compile_frame used to take a callback transform function which will capture the frame object it has, but the frame information is not passed directly into compile_frame function. This PR changes the signature of compile_frame so that frame information is directly passed in the function without taking a callback. This makes it easier to build fullgraph capture API on top of compile_frame. Test Plan: CI Rollback Plan: Differential Revision: D81041296 Pull Request resolved: https://github.com/pytorch/pytorch/pull/161514 Approved by: https://github.com/tugsbayasgalan	2025-08-26 19:16:05 +00:00
PyTorch MergeBot	e795450a35	Revert "[dynamo] Refactor convert_frame.compile_frame to be self contained function. [5/n] (#160900 )" This reverts commit 447d34b5f80fb7350f79decd855cb599cab39083. Reverted https://github.com/pytorch/pytorch/pull/160900 on behalf of https://github.com/atalman due to reverting since can't land existing diff internally, will need to reland it ([comment](https://github.com/pytorch/pytorch/pull/160900#issuecomment-3224029031))	2025-08-26 12:45:59 +00:00
William Wen	8dab6d4c41	[dynamo, nested graph breaks] support very simple nested graph breaks (#159329 ) e.g. this graph breaks once now: ```python import torch torch._dynamo.config.nested_graph_breaks = True def inner(x): x = x + 1 torch._dynamo.graph_break() return x + 2 @torch.compile(backend="eager") def outer(x): return inner(x) print(outer(torch.ones(3))) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/159329 Approved by: https://github.com/anijain2305 ghstack dependencies: #157971, #159281, #144516	2025-08-26 00:58:07 +00:00
zhxchen17	447d34b5f8	[dynamo] Refactor convert_frame.compile_frame to be self contained function. [5/n] (#160900 ) convert_frame.compile_frame used to take a callback transform function which will capture the frame object it has, but the frame information is not passed directly into compile_frame function. This PR changes the signature of compile_frame so that frame information is directly passed in the function without taking a callback. This makes it easier to build fullgraph capture API on top of compile_frame. @exported-using-ghexport Differential Revision: [D80469801](https://our.internmc.facebook.com/intern/diff/D80469801/) Differential Revision: [D80469801](https://our.internmc.facebook.com/intern/diff/D80469801) Pull Request resolved: https://github.com/pytorch/pytorch/pull/160900 Approved by: https://github.com/tugsbayasgalan, https://github.com/anijain2305	2025-08-25 23:16:21 +00:00
PyTorch MergeBot	3e210f90c2	Revert "[dynamo] Refactor convert_frame.compile_frame to be self contained function. [5/n] (#160900 )" This reverts commit 1113e7de30da95973c1eac7921601f9a0e94f2db. Reverted https://github.com/pytorch/pytorch/pull/160900 on behalf of https://github.com/atalman due to executorch failure ([comment](https://github.com/pytorch/pytorch/pull/160900#issuecomment-3221372096))	2025-08-25 18:56:18 +00:00
zhxchen17	1113e7de30	[dynamo] Refactor convert_frame.compile_frame to be self contained function. [5/n] (#160900 ) convert_frame.compile_frame used to take a callback transform function which will capture the frame object it has, but the frame information is not passed directly into compile_frame function. This PR changes the signature of compile_frame so that frame information is directly passed in the function without taking a callback. This makes it easier to build fullgraph capture API on top of compile_frame. @exported-using-ghexport Differential Revision: [D80469801](https://our.internmc.facebook.com/intern/diff/D80469801/) Differential Revision: [D80469801](https://our.internmc.facebook.com/intern/diff/D80469801) Pull Request resolved: https://github.com/pytorch/pytorch/pull/160900 Approved by: https://github.com/tugsbayasgalan, https://github.com/anijain2305	2025-08-25 14:53:54 +00:00
Jovian Anthony Jaison	2fdd4f918c	Log exception_stack_trace to dynamo_compile (#161096 ) Note: Adding unit test for this is tricky as having errors in the specific unit test would cause test_utils.py to crash all together. Tested as follows: 1. Added x = 1/0 after guarded_code = compile_inner(code, one_graph, hooks, transform) in convert_frame.py 2. Printed exception_stack_trace and got: ['Traceback (most recent call last):\n File "/data/users/jovian/pytorch/torch/_dynamo/convert_frame.py", line 1207, in _compile\n x = 1/0\n ~^~\nZeroDivisionError: division by zero\n'] Pull Request resolved: https://github.com/pytorch/pytorch/pull/161096 Approved by: https://github.com/c00w	2025-08-22 03:29:15 +00:00
Jovian Anthony Jaison	c02e26bf31	Fix filename showing up as ints in dynamo_compile stack_trace column. (#160916 ) Test plan: $ python -m test_utils Note: Another way is adding the actual file_name to from_traceback, but since it's referenced in multiple places and may have associated tests this seems safer. Lmk if changes are needed @c00w Pull Request resolved: https://github.com/pytorch/pytorch/pull/160916 Approved by: https://github.com/c00w, https://github.com/masnesral	2025-08-20 18:38:38 +00:00
zhxchen17	5255e65c01	[dynamo] Refactor convert_frame to remove usage of nonlocal tracer output return. [4/n] (#160899 ) Today convert_frame is implemented like the following: ``` def _compile(): tracer_output = None def transform(): nonlocal tracer_output ... def _compile_inner(): transform(...) compile_inner(...) ``` The code is using unconventional nonlocal variable as the return value. This is not ideal for 2 reasons: 1. Reasoning about the code, especially together with error handling code becomes harder. 2. more importantly, this makes it harder to extract out common code pieces into a shared library because everything must depend on a central global state. In this diff we remove the usage of nonlocal return and just use the conventional function return to output the compilation data. Differential Revision: [D80461258](https://our.internmc.facebook.com/intern/diff/D80461258/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/160899 Approved by: https://github.com/tugsbayasgalan ghstack dependencies: #160814, #160815, #160855	2025-08-20 17:37:26 +00:00
zhxchen17	9e050b6339	[dynamo] Refactor convert_frame._compile_inner to return compiled bytecode + output graph. [3/n] (#160855 ) We are refactoring dynamo code for convert frame so that we can have modularized pieces sharable between different compiler frontends (e.g. torch.compile, precompile and torch.export). This PR adds a new helper function compile_frame() which takes a bytecode and a transform function and return compiled bytecode + output graph as DynamoOutput type. Differential Revision: [D80430802](https://our.internmc.facebook.com/intern/diff/D80430802/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/160855 Approved by: https://github.com/tugsbayasgalan ghstack dependencies: #160814, #160815	2025-08-20 17:37:26 +00:00
zhxchen17	599f639ddb	[dynamo] Refactor transform() so that instruction translator can be used as a tracing function. [2/n] (#160815 ) We are refactoring dynamo code for convert frame so that we can have modularized pieces sharable between different compiler frontends (e.g. torch.compile, precompile and torch.export). This PR follows the last one which separate out the part to run instruction translator on a given frame and return a DynamoTracerOutput. The end result is a free function that runs instruction translator indepedently. A follow up diff will wrap the low level function. Differential Revision: [D80388694](https://our.internmc.facebook.com/intern/diff/D80388694/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/160815 Approved by: https://github.com/anijain2305 ghstack dependencies: #160814	2025-08-20 01:16:35 +00:00
zhxchen17	e9209e0854	[dynamo] Refactor tracer logic in convert_frame so that it doesn't leak to outer layer. [1/n] (#160814 ) We are refactoring dynamo code for convert frame so that we can have modularized pieces sharable between different compiler frontends (e.g. torch.compile, precompile and torch.export). One incremental step we can take is to refactor out InstructionTranslator as a functional piece providing bytecode tracing. To separate out this part, we notice currently the tracer object is being passed around in the entire convert frame compile function. This is not very ideal because we want to build a boundary between the tracing and downstream compiler stack. Ideally, we should extract all the relevant information out of the tracer object and return a new data structure that is free of internal states of InstructionTranslator. Luckily, there aren't many data used from tracer, after tracing is finished. The major one is OutputGraph, other than that, we only need to record two boolean flags for error handling purposes. The new type we're adding is called DynamoTracerOutput, which contains all the information needed by torch.compile internal after symbolic convert is finished. To simplify the current PR, we leave out the part which reduce OutputGraph into a minimal set, since this can be done in a separate PR. Differential Revision: [D80388693](https://our.internmc.facebook.com/intern/diff/D80388693/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/160814 Approved by: https://github.com/tugsbayasgalan	2025-08-19 01:46:24 +00:00
James Wu	4014672b30	Replace guard_serialization_mode with save_guards, remove load cases (#160531 ) This PR replaces "guard_serialization_mode" into `save_guards`. All cases where we care about whether or not we're loading guards can be inferred automatically from the existing inputs. The only case that's special here is whether or not to check guards. We don't want to check guards on guard load in CheckFnManager, because these guards have already been checked on save. Therefore, we put the setting in OutputGraphGuardsState, so that when we save, we bypass the guards check. Because of this change, it is technically possible to do a load and a save in the same CheckFunctionManager.__init__() by passing all the necessary parts, and also passing `save_guards=True`. This should just work out of the box, but so far no callsites need it, so not super important. Next up, we'll work on removing save_guards from GuardBuilder, and putting it into its own phase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/160531 Approved by: https://github.com/zhxchen17	2025-08-18 17:04:17 +00:00
PyTorch MergeBot	b82aa3df20	Revert "Remove guard_size_oblivious from default contiguity python check, and add aten.sym_is_contiguous. (#159197 )" This reverts commit e444cd24d48b3a46f067974f2cc157f5ed27709f. Reverted https://github.com/pytorch/pytorch/pull/159197 on behalf of https://github.com/laithsakka due to internal build failures ([comment](https://github.com/pytorch/pytorch/pull/159197#issuecomment-3195436668))	2025-08-18 07:22:13 +00:00
Laith Sakka	e444cd24d4	Remove guard_size_oblivious from default contiguity python check, and add aten.sym_is_contiguous. (#159197 ) This might cause some new DDEs on call sites that do not use is_contiguous_or_false() or sym_is_contiguous() but want to find those call sites to handle this properly by calling is_contiguous_or_false() and not is_contiguous() explitly when appropriate. I had to fix one issue after removing the implicit size oblivious reasoning. here is context we defined in this https://github.com/pytorch/pytorch/pull/157472 sym_is_contiguous to be the function computing contiguity for dynamic shapes in c++. It returns a symbolic expression that represents contiguity and guaranteed not to throw a DDE. when people call is_contiguous we do sym_is_contiguous().guard_bool() when people call is_contiguous_or_false we do sym_is_contiguous().guard_or_false() one issue not handled well was this path ``` c10::SymBool TensorImpl::sym_is_contiguous_custom( at::MemoryFormat memory_format) const { if (C10_UNLIKELY(matches_python_custom(SizesStridesPolicy::CustomStrides))) { return pyobj_slot_.load_pyobj_interpreter()->is_contiguous( this, memory_format); } return sym_is_contiguous_default(memory_format); } ``` namely if we call sym_is_contiguous_custom but we have matches_python_custom(SizesStridesPolicy::CustomStrides) return true , then we used to call is_contiguous(this, memory_format); This used to go through the load_pyobj_interpreter and end up calling the python is_contiguous call which used implicit size oblivious reasoning. once we removed that implicit size oblivious reasoning, the right thing we want is to call return pyobj_slot_.load_pyobj_interpreter()->sym_is_contiguous(this, memory_format); otherwise we would get DDE even if the caller is doing sym_is_contiguous. so I had to define it for pyinterpreter, and then I had to override it for nested tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/159197 Approved by: https://github.com/ezyang	2025-08-16 09:15:58 +00:00
Guilherme Leobas	0242d40fa5	Enable trace through the collections module (#159365 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/159365 Approved by: https://github.com/zou3519	2025-08-15 19:08:21 +00:00
Prajesh Praveen Anchalia	052c441cf4	Add logging for when inbuilt_inline_nn_modules will help with ID_MATCH guard triggered recompiles (#160592 ) We add a logging around when an ID_MATCH guard is added at a place where inbuilt_inline_nn_modules would inline it. This is done with the aim of tagging recompiles that could be avoided by setting inbuilt_inline_nn_modules flag. It will help us log and track the flag's adoption and potentially quantify saving in the the number of recompiles. Differential Revision: D80075975 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160592 Approved by: https://github.com/anijain2305	2025-08-15 17:09:39 +00:00
Jovian Anthony Jaison	cd8d8c18f5	[pytorch][dynamo_compile] Log graph_node_shape to dynamo_compile (#160556 ) This PR adds the dynamo graph node shape logging to dynamo compile. Also added unit tests to check if correct graph node shape is being logged. Test Plan: $ python -m test_utils Ran 12 tests in 36.447s OK Note: Will merge after D80185628 lands. Pull Request resolved: https://github.com/pytorch/pytorch/pull/160556 Approved by: https://github.com/masnesral, https://github.com/jingsh	2025-08-14 16:42:35 +00:00
Jovian Anthony Jaison	9a0f7a3bb0	[retry-land][pytorch][dynamo_compile] Log stack_trace to dynamo_compile (#160348 ) refer: https://github.com/pytorch/pytorch/pull/159655 Earlier pr failed on dynamo/test_utils.py::TestDynamoTimed::test_dynamo_timed. Updated test_dynamo_timed + re-ran locally to test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/160348 Approved by: https://github.com/masnesral	2025-08-12 06:24:54 +00:00

1 2 3 4 5 ...

421 Commits