pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
can-gaa-hou	eb4361a801	[Fix] Adding missing `f` prefixes to formatted strings [1/N] (#164065 ) As stated in the title. * #164068 * #164067 * #164066 * __->__ #164065 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164065 Approved by: https://github.com/Skylion007	2025-09-29 04:53:00 +00:00
Arsh Zahed	4c45090cf7	[DTensor] Check if tracing for sharding propagation to handle unhashable keys (#160798 ) Fixes #159590 This is similar to the reverted commit #156868, except it resolves an issue with two caches becoming misaligned, leading to incorrect objects for stateful placements (i.e. `_MaskPartial`) as in issue #159601. This adds little to no overhead in eager ([see past benchmarks](https://github.com/pytorch/pytorch/pull/156868#issuecomment-3047831149)). This also handles cases such as #159590 where dynamo is disabled during tracing by entering the Python Dispatcher ahead of the sharding propogation during compile. Tests are added/modified to handle these, and the list/tuple inputs with the cat op. Pull Request resolved: https://github.com/pytorch/pytorch/pull/160798 Approved by: https://github.com/bdhirsh	2025-09-09 03:52:05 +00:00
Guilherme Leobas	789d494212	Defer loading hipify until it is needed (#160824 ) Saves a few milliseconds when running a test case: Before: ``` $ PYTORCH_TEST_WITH_DYNAMO=1 python test/dynamo/cpython/3_13/test_float.py GeneralFloatCases.test_float_pow frames [('total', 1), ('ok', 1)] inline_call [] . ---------------------------------------------------------------------- Ran 1 test in 1.497s ``` After: ``` $ PYTORCH_TEST_WITH_DYNAMO=1 python test/dynamo/cpython/3_13/test_float.py GeneralFloatCases.test_float_pow frames [('total', 1), ('ok', 1)] inline_call [] . ---------------------------------------------------------------------- Ran 1 test in 0.909s ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/160824 Approved by: https://github.com/zou3519	2025-09-02 15:27:37 +00:00
PyTorch MergeBot	13b65196db	Revert "Defer loading hipify until it is needed (#160824 )" This reverts commit 403a3a393cda7e60f503f3b04b8805a845dcf45d. Reverted https://github.com/pytorch/pytorch/pull/160824 on behalf of https://github.com/atalman due to Broke slow tests test_utils.py::TestHipifyTrie::test_special_char_export_trie_to_regex [GH job link](https://github.com/pytorch/pytorch/actions/runs/17387051351/job/49355619371) [HUD commit link](`403a3a393c`) ([comment](https://github.com/pytorch/pytorch/pull/160824#issuecomment-3243281628))	2025-09-01 21:34:13 +00:00
Guilherme Leobas	403a3a393c	Defer loading hipify until it is needed (#160824 ) Saves a few milliseconds when running a test case: Before: ``` $ PYTORCH_TEST_WITH_DYNAMO=1 python test/dynamo/cpython/3_13/test_float.py GeneralFloatCases.test_float_pow frames [('total', 1), ('ok', 1)] inline_call [] . ---------------------------------------------------------------------- Ran 1 test in 1.497s ``` After: ``` $ PYTORCH_TEST_WITH_DYNAMO=1 python test/dynamo/cpython/3_13/test_float.py GeneralFloatCases.test_float_pow frames [('total', 1), ('ok', 1)] inline_call [] . ---------------------------------------------------------------------- Ran 1 test in 0.909s ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/160824 Approved by: https://github.com/zou3519	2025-09-01 20:57:41 +00:00
Arsh Zahed	7ea789ccfb	Revert #156868 : Bring back symint check for sharding propagation cache (#159671 ) Fixes #159601 Unfortunately #156868 introduced a couple regressions (see #159590 and #159601). This reverts the commit while I am working on a permanent fix. This means the `in_compiled_autograd_initial_trace` global flag will be removed and the `_are_we_tracing()` will instead be replaced with the symint preprocessing step during sharding prop post init. Pull Request resolved: https://github.com/pytorch/pytorch/pull/159671 Approved by: https://github.com/xmfan	2025-08-04 19:58:48 +00:00
Lucas Kabela	c137f9da0b	[Dynamo][Better Engineering] Add type coverage to dynamo/compiled_autograd.py (#159518 ) As part of better engineering effort, we would like to improve out type support to improve dev experience in dynamo This PR adds strict typing support to `torch/_dynamo/compiled_autograd.py` Running ``` mypy torch/_dynamo/compiled_autograd.py --linecount-report /tmp/coverage_log ``` \| -------- \| Lines Annotated \| Lines Total \| % lines covered \| Funcs Annotated \| Funcs Total \| % funcs covered \| \| -------- \| ------- \| -------- \| ------- \| ------- \| ------- \| ------- \| \| Main \| 425 \| 1553 \| 27.37% \| 17 \| 62 \| 27.42% \| \| This PR \| 1623 \| 1623 \| 100.00% \| 62 \| 62 \| 100.00% \| \| Delta \| +1198\| +0 \| +72.63% \| +45 \| 0 \| +72.58% \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/159518 Approved by: https://github.com/xmfan	2025-08-01 20:24:58 +00:00
Arsh Zahed	f6d138807f	Always disable ShardingPropagation cache if compiling (#156868 ) Fixes #151106 Addresses issue (2) in #152963 for the DTensor sharding propagation cache being brittle under compile. The existing `_are_we_tracing` from `distributed._functional_collectives`, which mostly determines if currently tracing based on Fake Tensor dispatch mode, is reused here. Test Plan: There are already tests for DTensor + Compile with dynamic shape ([test_dtensor_dynamic](https://github.com/pytorch/pytorch/blob/main/test/distributed/tensor/test_dtensor_compile.py#L260), [test_dynamo_dtensor_from_local_dynamic_shapes](https://github.com/pytorch/pytorch/blob/main/test/distributed/tensor/test_dtensor_compile.py#L402)) that cover the change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/156868 Approved by: https://github.com/xmfan	2025-07-17 01:33:53 +00:00
Simon Fan	5f2f343e1e	[ca] suggest to disable compiled autograd for trace-time NotImplementedErrors (#156509 ) Example: ```python File "/home/xmfan/core/a/pytorch/torch/autograd/graph.py", line 829, in _engine_run_backward return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ NotImplementedError: TorchDispatchMode not yet implemented for compiled autograd. You can disable compiled autograd for this operation by: 1. Relocating the unsupported autograd call outside the compiled region. 2. Wrapping the unsupported autograd call within a scope that disables compiled autograd. 3. Configuring the specific compilation unit to disable compiled autograd. 4. Globally disabling compiled autograd at the application's initialization. ``` No duplicate error messages for python side trace-time errors ```python ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/xmfan/core/a/pytorch/torch/_dynamo/compiled_autograd.py", line 344, in begin_capture raise NotImplementedError( NotImplementedError: Found tensor of type <class 'torch.nn.utils._expanded_weights.expanded_weights_impl.ExpandedWeight'>, which is not supported by FakeTensorMode. You can turn off compiled autograd by either: 1. Moving the unsupported autograd call outside of the torch.compile'd region. 2. Wrapping the unsupported autograd call in the torch._dynamo.compiled_autograd._disable() context manager. 3. Setting torch._dynamo.config.compiled_autograd=False for the torch.compile call containing the unsupported autograd call. 4. Setting torch._dynamo.config.compiled_autograd=False at the start of the program. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/156509 Approved by: https://github.com/jansel ghstack dependencies: #156374	2025-06-21 18:33:46 +00:00
Simon Fan	3bec588bf5	[aot][ca] save bw_module in AOTAutogradCache (#151860 ) Compiled Autograd retraces AOT's bw_module at backward runtime into a larger graph, and today this runs into an issue on warm cache runs because the bw_module is not restored. This PR adds it to the cache, by first stripping it bare from unserializable metadata. I also intentionally differentiate the cached and non-cached versions to avoid accidental attempts of AOT compilation with a restored bw_module (would probably crash). The bw_module's generated code is then serialized, and at compiled autograd runtime, it is restored via symbolic_trace. This also means that presence of tensor constructors will be lifted as constants. Something we will address separately. Note that since the cache entry may be used by runs that use compiled autograd and runs that do not, we need to cache both the lowered backward and the bw_module. Pull Request resolved: https://github.com/pytorch/pytorch/pull/151860 Approved by: https://github.com/jamesjwu ghstack dependencies: #156120	2025-06-19 03:47:41 +00:00
Simon Fan	17b38b850e	[ca] Allow using compiled autograd context managers during backward runtime (#156120 ) Added an invariant that nested compiled autograd context managers must exit before their parent context manager. This allows us to defer the thread check. FIXES https://github.com/pytorch/pytorch/issues/152219 Pull Request resolved: https://github.com/pytorch/pytorch/pull/156120 Approved by: https://github.com/jansel ghstack dependencies: #155521, #155480	2025-06-18 03:01:15 +00:00
Simon Fan	9ff9c28fe8	[ca] Functionalize AccumulateGrad (#155521 ) This PR changes compiled autograd's handling of gradient accumulation, by proxying it as a `call_accumulate_grad`, which does the .grad mutation in python bytecode for dynamo to see. For eager, the only change is the leaf invariant check was moved up. Before: - Compiled Autograd Engine: proxies call to inductor accumulate_grad op - Dynamo: polyfills the inductor accumulate_grad op (not respecting all of the accumulateGrad implementation e.g. sparse, gradient layout contract) ```python new_grad_strided: "f32[s21]" = torch.empty_like(getitem_1); getitem_1 = None copy_: "f32[s21]" = new_grad_strided.copy_(aot3_tangents_1); copy_ = None ``` - AOTAutograd: functionalizes the copy_ After: - Compiled Autograd Engine: proxies call to `call_accumulate_grad`, which calls `torch._dynamo.compiled_autograd.ops.AccumulateGrad`/`AccumulateGrad_apply_functional_no_hooks_ivalue`, similar to other functional autograd implementations, but also sets .grad from python. Hooks are still handled separately from this call. - Dynamo: `torch._dynamo.compiled_autograd.ops.AccumulateGrad` was allow_in_graph'd - AOTAutograd: traces into the op, with FunctionalTensors. While functionalizing the tensors, we insert an autograd Error node to ensure that we don't use the autograd meta from tracing. This clashes with the "leaf variable has been moved into the graph interior" error check, I could not find a way to identify a FunctionalTensor subclass from C++, so I bypass that for Error nodes in the compiled case. In the CI PR, this fixes 19 tests relating to sparse tensors, and more are hidden by an earlier failure in dynamo Pull Request resolved: https://github.com/pytorch/pytorch/pull/155521 Approved by: https://github.com/jansel	2025-06-16 18:45:02 +00:00
Simon Fan	6dfada220e	[ca] better error message for subclasses not supported by FakeTensor (#155481 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/155481 Approved by: https://github.com/jansel ghstack dependencies: #155473, #155570	2025-06-11 19:09:29 +00:00
Simon Fan	87b002b6fb	[ca] make torch.compile API respect ambient disable contexts (#155473 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/155473 Approved by: https://github.com/jansel	2025-06-11 19:09:29 +00:00
Aaron Gokaslan	83d22256f8	[BE][Ez]: Improve typing in torch._logging (#155345 ) Add a few missing returns in torch._logging and use ruff to infer the obvious ones. LazyStr now properly checks the return type of the Callable and the args and kwargs passed to it Pull Request resolved: https://github.com/pytorch/pytorch/pull/155345 Approved by: https://github.com/ezyang	2025-06-07 00:04:39 +00:00
mieshkiwrk	ef4d57329b	[CAG] Support for call_module at copy paste aot bwd graph (#153827 ) Support for `call_module` in `copy_paste_aot_backward_graph` added recently with PT2.7 Problem is being observed with HPU backend in example repro due to creating fused modules. ``` import torch device = 'cpu' #'hpu' backend = 'inductor' #'hpu_backend' def fn(t1): t1 = t1 * 1 t1_grad = torch.ones_like(t1, device=device) t1.backward(t1_grad, retain_graph=True) return t1 t1 = torch.ones(1, requires_grad=True, device=device) #.squeeze() compiled_fn = torch.compile(fn, backend=backend) result = compiled_fn(t1) with torch._dynamo.compiled_autograd._enable(torch.compile(backend=backend)): result_grad = torch.ones_like(result, device=device) result.backward(result_grad) print(f'{result_grad=}') print(f'{t1.grad=}') ``` With this change I'm getting same results like on CPU, however I'm facing below problem when running with scalar (t1 tensor after squeeze): `torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_function <built-in function getitem>((FakeTensor(..., device='hpu:0', size=()), 0), *{}): got IndexError('invalid index of a 0-dim tensor. Use `tensor.item()` in Python or `tensor.item<T>()` in C++ to convert a 0-dim tensor to a number')` While on CPU there's following warning and None returned: `repro.py:23: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at pytorch/build/aten/src/ATen/core/TensorBody.h:489.) print(f'{t1.grad=}') t1.grad=None` Pull Request resolved: https://github.com/pytorch/pytorch/pull/153827 Approved by: https://github.com/xmfan	2025-05-28 22:52:40 +00:00
Simon Fan	a80eb84a5f	[ca] support higher order gradients (create_graph=True) (#153222 ) Adds create_graph support if you don't compile or compile only with torch.compile(backend="eager"). Using a backend that uses AOTDispatch produces a post-dispatch AOT backward, where its double backward will be silently incorrect if the forward trace involved any ops that are not composite implicit. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153222 Approved by: https://github.com/jansel ghstack dependencies: #153193	2025-05-13 16:42:09 +00:00
Simon Fan	6dea8ef555	[ca] hide unused scalar int sizes from dynamo (#151962 ) together with https://github.com/pytorch/pytorch/pull/151731, FIXES https://github.com/pytorch/pytorch/issues/113129 https://github.com/pytorch/pytorch/issues/146168 Pull Request resolved: https://github.com/pytorch/pytorch/pull/151962 Approved by: https://github.com/jansel ghstack dependencies: #151731	2025-05-08 15:12:16 +00:00
Simon Fan	8f380b239f	[ca] mark scalar int sizes as dynamic via tensor wrapping (#151731 ) This is the only way to support dynamic shapes on scalars right now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/151731 Approved by: https://github.com/jansel	2025-05-08 15:12:08 +00:00
PyTorch MergeBot	a28dcdba2c	Revert "[aot][ca] save bw_module in AOTAutogradCache (#151860 )" This reverts commit 613bd462721f3246888030de0a3f6932d52f515a. Reverted https://github.com/pytorch/pytorch/pull/151860 on behalf of https://github.com/huydhn due to Chatting with @xmfan and decide to revert and reland this instead ([comment](https://github.com/pytorch/pytorch/pull/151860#issuecomment-2856709646))	2025-05-07 00:56:54 +00:00
PyTorch MergeBot	f6db749e60	Revert "[ca] mark scalar int sizes as dynamic via tensor wrapping (#151731 )" This reverts commit 18229a5300a61b2d76ca95bee8ae8d4f4d5fa938. Reverted https://github.com/pytorch/pytorch/pull/151731 on behalf of https://github.com/huydhn due to Chatting with @xmfan and decide to revert and reland this instead ([comment](https://github.com/pytorch/pytorch/pull/151860#issuecomment-2856709646))	2025-05-07 00:56:54 +00:00
PyTorch MergeBot	8f208dc75a	Revert "[ca] hide unused scalar int sizes from dynamo (#151962 )" This reverts commit 4555ed8c83b47c450e31f1192e1f0fc4147d435f. Reverted https://github.com/pytorch/pytorch/pull/151962 on behalf of https://github.com/huydhn due to Chatting with @xmfan and decide to revert and reland this instead ([comment](https://github.com/pytorch/pytorch/pull/151860#issuecomment-2856709646))	2025-05-07 00:56:53 +00:00
Simon Fan	4555ed8c83	[ca] hide unused scalar int sizes from dynamo (#151962 ) together with https://github.com/pytorch/pytorch/pull/151731, FIXES https://github.com/pytorch/pytorch/issues/113129 https://github.com/pytorch/pytorch/issues/146168 Pull Request resolved: https://github.com/pytorch/pytorch/pull/151962 Approved by: https://github.com/jansel ghstack dependencies: #149707, #151860, #151731	2025-05-01 21:59:55 +00:00
Simon Fan	18229a5300	[ca] mark scalar int sizes as dynamic via tensor wrapping (#151731 ) This is the only way to support dynamic shapes on scalars right now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/151731 Approved by: https://github.com/jansel ghstack dependencies: #149707, #151860	2025-05-01 21:59:49 +00:00
Simon Fan	613bd46272	[aot][ca] save bw_module in AOTAutogradCache (#151860 ) Compiled Autograd retraces AOT's bw_module at backward runtime into a larger graph, and today this runs into an issue on warm cache runs because the bw_module is not restored. This PR adds it to the cache, by first stripping it bare from unserializable metadata. I also intentionally differentiate the cached and non-cached versions to avoid accidental attempts of AOT compilation with a restored bw_module (would probably crash). Note that since the cache entry may be used by runs that use compiled autograd and runs that do not, we need to cache both the lowered backward and the bw_module. Pull Request resolved: https://github.com/pytorch/pytorch/pull/151860 Approved by: https://github.com/jamesjwu ghstack dependencies: #149707	2025-05-01 21:59:43 +00:00
Simon Fan	748252378d	[ca] introduce RuntimeState to support c++ hooks via graph breaks (#149987 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149987 Approved by: https://github.com/jansel ghstack dependencies: #149647, #149709, #149651, #149897	2025-03-27 05:05:34 +00:00
Simon Fan	dcb378cff2	[ca] support anomly mode nan checks with different semantics than eager (#149897 ) see note in code Pull Request resolved: https://github.com/pytorch/pytorch/pull/149897 Approved by: https://github.com/jansel ghstack dependencies: #149647, #149709, #149651	2025-03-27 05:05:34 +00:00
Simon Fan	754875e237	[ca] API comments and support dynamic shapes via configs (#149709 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149709 Approved by: https://github.com/jansel ghstack dependencies: #149647	2025-03-24 19:06:45 +00:00
Simon Fan	f123f2c077	[ca] fix dce for side-effects (#149336 ) The AOT backward could have contained side effectful ops, so we can't DCE them. Have CA also call the default fx.Node.is_impure which will cover some of the existing cases Pull Request resolved: https://github.com/pytorch/pytorch/pull/149336 Approved by: https://github.com/jansel	2025-03-19 05:56:47 +00:00
Lirong	4482a65fef	Add side_effect to avoid dce custom op in CA graph (#149181 ) We found that in compiled_autograd, when defining custom op, the custom op will be dce in the backward graph. We added a side effect condition in the dce function to prevent eliminating custom op with side effect in CA graph. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149181 Approved by: https://github.com/xmfan	2025-03-15 04:15:49 +00:00
Simon Fan	578160c875	[ca] don't inline accumulate grad op (#149014 ) we use dummy tensors in our initial trace, so we should never inline. the subclass dispatch might not support the dummy tensor, e.g. DTensor accumulate grad will check that both param and grad are DTensors Pull Request resolved: https://github.com/pytorch/pytorch/pull/149014 Approved by: https://github.com/jansel ghstack dependencies: #149064	2025-03-15 01:10:54 +00:00
Simon Fan	f4368d8872	[ca] clean up aot node deduping (#149064 ) rename the AOT nodes as we copy paste them into the CA graph Pull Request resolved: https://github.com/pytorch/pytorch/pull/149064 Approved by: https://github.com/jansel	2025-03-15 01:10:54 +00:00
Simon Fan	7c87ec1b50	[ca] always do initial trace with dynamic shapes (#148801 ) HUD: https://fburl.com/wzvx6tax no regressions (ignore the pass rate improvements, those come from #149030) <img width="864" alt="image" src="https://github.com/user-attachments/assets/d7598f98-b378-4abb-a0c7-e4311162f681" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/148801 Approved by: https://github.com/jansel ghstack dependencies: #148799, #149030	2025-03-13 17:30:29 +00:00
Simon Fan	b263b272fa	[ca] fix lazily compiled aot bwd (#149030 ) FIXES https://github.com/pytorch/pytorch/issues/137372 sometimes, the aot bwd is lowered lazily. so the bw_module we saved in CompiledFunction._lazy_backward_info hasn't gone through post grad passes, specifically the view_to_reshape pass. Running that directly will then sometimes error, because the AOT forward has already changed its views to reshapes, and it is reflected in the gradients we see in CA. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149030 Approved by: https://github.com/bdhirsh ghstack dependencies: #148799	2025-03-13 17:30:29 +00:00
Simon Fan	e6f560a262	[ca] support for dynamic shapes CopySlices (#148799 ) i'm changing CA initial trace to always trace as dynamic, fixes these errors: ```python This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 FAILED [0.2139s] test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_autograd_python_custom_function_inplace - RuntimeError: !has_symbolic_sizes_strides_ INTERNAL ASSERT FAILED at "/home/xmfan/core/a/pytorch/aten/src/ATen/TensorGeometry.h":63, please report a bug to PyTorch. To execute this test, run the following from the base repo dir: python test/test_autograd.py TestAutogradWithCompiledAutograd.test_autograd_python_custom_function_inplace This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 FAILED [0.0057s] test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_copy_slices_graph_task_updates - RuntimeError: !has_symbolic_sizes_strides_ INTERNAL ASSERT FAILED at "/home/xmfan/core/a/pytorch/aten/src/ATen/TensorGeometry.h":63, please report a bug to PyTorch. To execute this test, run the following from the base repo dir: python test/test_autograd.py TestAutogradWithCompiledAutograd.test_copy_slices_graph_task_updates This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 FAILED [0.9662s] test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_inplace_on_view_weak_grad_fn - RuntimeError: !has_symbolic_sizes_strides_ INTERNAL ASSERT FAILED at "/home/xmfan/core/a/pytorch/aten/src/ATen/TensorGeometry.h":63, please report a bug to PyTorch. To execute this test, run the following from the base repo dir: python test/test_autograd.py TestAutogradWithCompiledAutograd.test_inplace_on_view_weak_grad_fn This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 FAILED [0.0077s] test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_leaf_assignment - RuntimeError: !has_symbolic_sizes_strides_ INTERNAL ASSERT FAILED at "/home/xmfan/core/a/pytorch/aten/src/ATen/TensorGeometry.h":63, please report a bug to PyTorch. To execute this test, run the following from the base repo dir: python test/test_autograd.py TestAutogradWithCompiledAutograd.test_leaf_assignment This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 FAILED [5.0485s] test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_setitem_mask - RuntimeError: !has_symbolic_sizes_strides_ INTERNAL ASSERT FAILED at "/home/xmfan/core/a/pytorch/aten/src/ATen/TensorGeometry.h":63, please report a bug to PyTorch. To execute this test, run the following from the base repo dir: python test/test_autograd.py TestAutogradWithCompiledAutograd.test_setitem_mask This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 FAILED [0.0102s] test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_tensor_hooks_inplace_over_view - RuntimeError: !has_symbolic_sizes_strides_ INTERNAL ASSERT FAILED at "/home/xmfan/core/a/pytorch/aten/src/ATen/TensorGeometry.h":63, please report a bug to PyTorch. To execute this test, run the following from the base repo dir: python test/test_autograd.py TestAutogradWithCompiledAutograd.test_tensor_hooks_inplace_over_view ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/148799 Approved by: https://github.com/jansel, https://github.com/zou3519	2025-03-13 17:30:20 +00:00
Xuehai Pan	3ce352e389	[BE][PYFMT] migrate PYFMT for `torch._dynamo` to `ruff format` (#144549 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144549 Approved by: https://github.com/jansel	2025-02-28 03:03:53 +00:00
Simon Fan	0a2da008f8	[ca] trace saved variable unpacking (#147242 ) ## Before Previously, CA will always unpack all saved variables stored in the autograd graph before executing it. This meant that we can't capture unpack hooks as part of the CA graph, and they would fire out of order wrt to other backward hooks. For memory saving APIs built on top of saved tensor hooks like non-reentrant checkpointing and offloading, we couldn't achieve any savings because all activations would be recomputed/loaded and active at the same time, resulting in no-op. ## After We add unpack hooks into the CA graph so that they can be executed progressively. The python hook and hook input themselves are wrapped by non-traceable code, so CA polyfills the wrapping as: ```python # pseudocode class SavedVariable: def unpack(self): if self.hook: return self.hook(self.packed_data) else: return self.packed_data # This approach won't directly work when we add support for Forward AD or double-backward. ``` Directly executing the CA graph (without torch.compiling it) under checkpointing/offloading, memory profile is expected to stay the same as when using the eager autograd engine. If AOT backward is in the autograd graph, memory profile is expected to be better than the eager autograd engine, since we can now delay saved activations unpacking into the AOT backward's execution. All tests pass when running the CA graph directly, the remaining issues are in Dynamo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147242 Approved by: https://github.com/jansel	2025-02-26 16:37:17 +00:00
PyTorch MergeBot	90e3a3d86d	Revert "[ca] trace saved variable unpacking (#147242 )" This reverts commit 68ddca94498fd7961cc5ebcb0dffafb8c2f4baca. Reverted https://github.com/pytorch/pytorch/pull/147242 on behalf of https://github.com/wdvr due to failing tests in the slow workflow, see below ([comment](https://github.com/pytorch/pytorch/pull/147242#issuecomment-2683604547))	2025-02-26 00:40:16 +00:00
Simon Fan	68ddca9449	[ca] trace saved variable unpacking (#147242 ) ## Before Previously, CA will always unpack all saved variables stored in the autograd graph before executing it. This meant that we can't capture unpack hooks as part of the CA graph, and they would fire out of order wrt to other backward hooks. For memory saving APIs built on top of saved tensor hooks like non-reentrant checkpointing and offloading, we couldn't achieve any savings because all activations would be recomputed/loaded and active at the same time, resulting in no-op. ## After We add unpack hooks into the CA graph so that they can be executed progressively. The python hook and hook input themselves are wrapped by non-traceable code, so CA polyfills the wrapping as: ```python # pseudocode class SavedVariable: def unpack(self): if self.hook: return self.hook(self.packed_data) else: return self.packed_data # This approach won't directly work when we add support for Forward AD or double-backward. ``` Directly executing the CA graph (without torch.compiling it) under checkpointing/offloading, memory profile is expected to stay the same as when using the eager autograd engine. If AOT backward is in the autograd graph, memory profile is expected to be better than the eager autograd engine, since we can now delay saved activations unpacking into the AOT backward's execution. All tests pass when running the CA graph directly, the remaining issues are in Dynamo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147242 Approved by: https://github.com/jansel	2025-02-25 20:38:51 +00:00
Simon Fan	057bcd3a45	[ca] eliminate duplicate getitem graph nodes for shape inputs (#146875 ) should reuse existing proxies instead of creating new ones before: https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpL7hmHe/0_-_-_0/compiled_autograd_graph_3.txt?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=100 ```python class CompiledAutograd0(torch.nn.Module): def forward(self, inputs, sizes, scalars, hooks): # No stacktrace found for following nodes getitem = inputs[0] getitem_1 = inputs[1] getitem_2 = inputs[2]; inputs = None getitem_3 = sizes[0]; getitem_3 = None getitem_4 = sizes[1]; getitem_4 = None getitem_5 = sizes[2]; getitem_5 = None getitem_6 = sizes[3]; getitem_6 = None getitem_7 = sizes[4]; getitem_7 = None getitem_8 = sizes[5]; getitem_8 = None getitem_9 = sizes[6]; getitem_9 = None getitem_10 = sizes[7]; getitem_10 = None getitem_11 = sizes[8]; getitem_11 = None getitem_12 = sizes[9]; getitem_12 = None getitem_13 = sizes[10]; getitem_13 = None getitem_14 = sizes[11]; getitem_14 = None getitem_15 = sizes[12]; getitem_15 = None getitem_16 = sizes[13]; getitem_16 = None getitem_17 = sizes[14]; getitem_17 = None getitem_18 = sizes[15]; getitem_18 = None getitem_19 = sizes[0] getitem_20 = sizes[1] getitem_21 = sizes[2] getitem_22 = sizes[3] getitem_23 = sizes[4] getitem_24 = sizes[5] getitem_25 = sizes[6] getitem_26 = sizes[7] getitem_27 = sizes[8] getitem_28 = sizes[9] getitem_29 = sizes[10] getitem_30 = sizes[11] getitem_31 = sizes[12] getitem_32 = sizes[13] getitem_33 = sizes[14] getitem_34 = sizes[15]; sizes = None ``` after: https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpCo5T6B/0_-_-_0/compiled_autograd_graph_1.txt?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=100 ```python class CompiledAutograd0(torch.nn.Module): def forward(self, inputs, sizes, scalars, hooks): # No stacktrace found for following nodes getitem = inputs[0] getitem_1 = inputs[1] getitem_2 = inputs[2]; inputs = None getitem_3 = sizes[0] getitem_4 = sizes[1] getitem_5 = sizes[2] getitem_6 = sizes[3] getitem_7 = sizes[4] getitem_8 = sizes[5] getitem_9 = sizes[6] getitem_10 = sizes[7] getitem_11 = sizes[8] getitem_12 = sizes[9] getitem_13 = sizes[10] getitem_14 = sizes[11] getitem_15 = sizes[12] getitem_16 = sizes[13] getitem_17 = sizes[14] getitem_18 = sizes[15]; sizes = None ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/146875 Approved by: https://github.com/jansel ghstack dependencies: #146720, #146735	2025-02-13 21:41:33 +00:00
Simon Fan	76dacd5fc7	[ca] log graph before reodering passes (#146735 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146735 Approved by: https://github.com/jansel ghstack dependencies: #146720	2025-02-13 21:41:33 +00:00
Raymond Li	21c2565f35	Document dynamo (#146736 ) Many files in dynamo are currently lacking file/module-level documentation, which makes it hard to know what they do at a glance and without digging into the code. This fixes that. Note: documentation was AI-generated and could be incorrect, please review carefully. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146736 Approved by: https://github.com/jansel, https://github.com/StrongerXi, https://github.com/anijain2305, https://github.com/zou3519	2025-02-13 00:02:21 +00:00
Simon Fan	72405b0c0f	[ca] refactor compile reasons and log to tlparse (#146386 ) This PR accumulates comple reasons inside each CacheNode, and logs them to tlparse on each CA compile. This defines a compile as an autograd structure change, and a recompile as a dynamic shape change. sample tlparse: https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpdbo7gt/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=100 for compiles: ```python [ "!0: Cache miss due to new autograd node: torch::autograd::GraphRoot (NodeCall 0) with key size 39, previous key sizes=[]" ] ``` for recompiles: ```python [ "!0: Cache miss due to new autograd node: torch::autograd::GraphRoot (NodeCall 0) with key size 39, previous key sizes=[]", "!1: Cache miss due to 7 changed tensor shapes (total of 7): sizes[0], sizes[1], sizes[2], sizes[3], sizes[4], sizes[5], sizes[6]" ] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/146386 Approved by: https://github.com/jansel ghstack dependencies: #146229	2025-02-05 23:33:21 +00:00
Simon Fan	e20b0c82d1	[ca] no longer require is_traceable annotations for c++ autograd functions (#146229 ) This PR removes the CA compile-time error for C++ autograd functions, and supports them by having dynamo graph break on them (instead of allow_in_graph). The CppNode's collects are kept as is for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146229 Approved by: https://github.com/jansel, https://github.com/zou3519	2025-02-05 08:49:17 +00:00
rzou	ea141d8134	functional compiled autograd (#144707 ) This PR squashes together the following commits: https://github.com/pytorch/pytorch/pull/144115 https://github.com/pytorch/pytorch/pull/143417 https://github.com/pytorch/pytorch/pull/143405 https://github.com/pytorch/pytorch/pull/143387 https://github.com/pytorch/pytorch/pull/143304 https://github.com/pytorch/pytorch/pull/143296 This is a refactor of compiled autograd to use "functional autograd". The end goal is that it gets compiled autograd's initial capture to stop specializing on Tensor metadata, therefore allowing compiled autograd to better handle Tensor subclasses. For more information, please read the commit messages for each PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144707 Approved by: https://github.com/bdhirsh, https://github.com/xmfan, https://github.com/jansel	2025-01-27 05:20:56 +00:00
PyTorch MergeBot	6dd8283381	Revert "[compiled autograd] Proxy opaque nodes for built-in autograd nodes (#143296 )" This reverts commit 5531fafffefc45cd894040b2b07b0d5227430082. Reverted https://github.com/pytorch/pytorch/pull/143296 on behalf of https://github.com/izaitsevfb due to breaking internal tests T213390054 ([comment](https://github.com/pytorch/pytorch/pull/143296#issuecomment-2611224926))	2025-01-23 23:34:13 +00:00
PyTorch MergeBot	9553301ade	Revert "[compiled autograd] Proxy nodes for user-defined C++ torch::autograd::Function (#143387 )" This reverts commit 784bb2127ca9729c646f1650ecc2cf946a583da8. Reverted https://github.com/pytorch/pytorch/pull/143387 on behalf of https://github.com/izaitsevfb due to breaking internal tests T213390054 ([comment](https://github.com/pytorch/pytorch/pull/143296#issuecomment-2611224926))	2025-01-23 23:34:13 +00:00
PyTorch MergeBot	16c4f8c395	Revert "[compiled autograd] Always proxy autograd.Function nodes; handle AOT backwards (#143405 )" This reverts commit ec820fe57c2d6a2847569a107856e7fcff87dc5c. Reverted https://github.com/pytorch/pytorch/pull/143405 on behalf of https://github.com/izaitsevfb due to breaking internal tests T213390054 ([comment](https://github.com/pytorch/pytorch/pull/143296#issuecomment-2611224926))	2025-01-23 23:34:13 +00:00
PyTorch MergeBot	3f6cfd0156	Revert "[compiled autograd] stop specializing on metadata during initial trace (#143417 )" This reverts commit 99dd1bf1b93bc26080e611af54497a73a618e02a. Reverted https://github.com/pytorch/pytorch/pull/143417 on behalf of https://github.com/izaitsevfb due to breaking internal tests T213390054 ([comment](https://github.com/pytorch/pytorch/pull/143296#issuecomment-2611224926))	2025-01-23 23:34:12 +00:00
PyTorch MergeBot	ab082863a1	Revert "[compiled autograd] support Tensor Subclasses in AOTBackward (#144115 )" This reverts commit 082c28c3c655984ce65c13336cff822db95ee470. Reverted https://github.com/pytorch/pytorch/pull/144115 on behalf of https://github.com/izaitsevfb due to breaking internal tests T213390054 ([comment](https://github.com/pytorch/pytorch/pull/143296#issuecomment-2611224926))	2025-01-23 23:34:12 +00:00

1 2 3

139 Commits