pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-27 00:54:52 +08:00

Author	SHA1	Message	Date
Yidi Wu	6713b457ae	[hop][dynamo] support torch.SymInt inputs (#141524 ) Fixes https://github.com/pytorch/pytorch/issues/141305. ```python class M(torch.nn.Module): def forward(self, x, y, z): a = y.shape[0] b = z.shape[0] def true_fn(x): return x + a def false_fn(x): return x + b * z # When exporting with non-strict: a and b are symints, # so torch.compile need to wrap and trace symint inputs. return torch.cond(x.shape[0] > 5, true_fn, false_fn, (x,)) ``` In non-strict export, when inputs are annotated with dynamic shape, the a, and b in above example are torch.SymInt type. true_fn and false_fn will have closure that're of torch.SymInt types. The error is triggered because we didn't handle SymInt inputs in dynamo and ends up using a UserDefinedObjectVariable for it, which doesn't have a proxy. We added support by following how we handle SymBool input previously. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141524 Approved by: https://github.com/zou3519 ghstack dependencies: #141610, #142185	2024-12-10 17:33:57 +00:00
Thomas Bohnstingl	871b93bc59	[associative_scan] Fixing shape checks (#141698 ) This PR fixes the shape checks that are done in the associative_scan operation. Before all shapes of the input leaves were required to be the same. With this PR only the shapes of the output of the combine_fn and the input leaves need to be the same, but not among the input leaves. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141698 Approved by: https://github.com/ydwu4	2024-12-03 03:49:11 +00:00
Yidi Wu	45bc9165fe	[hop] add discard_graph_changes to remove the empty calls before hop (#140334 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140334 Approved by: https://github.com/zou3519	2024-11-26 17:32:43 +00:00
Yidi Wu	4b3ce62946	[while_loop] support pytree inputs (#140059 ) Previously, we only support carries to be tuple of tensors. This pr enables us to support pytree of tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140059 Approved by: https://github.com/zou3519	2024-11-20 21:12:29 +00:00
William Wen	72943ba823	[3.13] deal with exec() semantic change in test_cond_no_dynamo_cache_limit (#140401 ) https://peps.python.org/pep-0667/ changed the semantics of `eval/exec` in 3.13 so that changes to locals no longer propagate (but globals do). This is to make the behavior predictable since in the past, the locals may or may not update based on various mysterious conditions. Other test sites may need updating too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140401 Approved by: https://github.com/ydwu4, https://github.com/zou3519	2024-11-18 22:06:47 +00:00
PyTorch MergeBot	3483f7809e	Revert "Fix typo in associative_scan tests (#139929 )" This reverts commit 7fa94f03635709a30ef85c6955dcdd5051e72e71. Reverted https://github.com/pytorch/pytorch/pull/139929 on behalf of https://github.com/ZainRizvi due to This test is breaking in trunk somehow, which is really weird. functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cuda [GH job link](https://github.com/pytorch/pytorch/actions/runs/11747748990/job/32732254909) [HUD commit link](`7fa94f0363`) ([comment](https://github.com/pytorch/pytorch/pull/139929#issuecomment-2465773366))	2024-11-08 21:26:41 +00:00
Thomas Bohnstingl	7fa94f0363	Fix typo in associative_scan tests (#139929 ) Fix typo with Associative_Scan tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/139929 Approved by: https://github.com/ydwu4	2024-11-08 18:42:26 +00:00
Yidi Wu	ab42967238	[hop free symbols] lift free symbols in example_value when create_graph_input (#138363 ) There are 4 parts (they are hard to further break into smaller ones cause they're highly coupled) in this PR: 1. Whenever we call create_graph_input, we try to bind the symbols in the graph input. We've enforced the invariant that all create_graph_inputs calls must provide an example value, we could intercept at the create_graph_input calls (This PR only handles free symbols in tensors). 2. We cache the bound_symbols to avoid lift the same symbol repeated. 3. For lifted symbols, we re-used lifted_freevars i.e. the mapping between symbol proxy in parent graph to the lifted phs in current subgraph, which we handle lifted tensors. In this way, all hops that supports lifted tensors should be able to handle lifted_symints automatically (at least in dynamo part). 4. For unbacked symbols created during tracing, we need to also bound these symbols to its proxy. This is to support the tests cases where we want to lift unbacked symbols as input. We need the proxy of the unbacked symbol in parent graph in order to properly create the args to the hop. 5. We change all the tests after free symbols are lifted in subgraphs. And also supports the lifted symbols in existing higher order ops. The interaction of nested tracers: The previous design for lifting tensor closures is that: suppose we're in nested tracers, whenever we see a new proxy that's not created by create tracer, we recursively look for the proxy in parent tracer until we find the tracer that creates this proxy (either a placeholder or some intermediate results). More detail is in Note [Nested SubgraphTracer and free_variable handling]. Given the above design, the plan for lifting the free symbols is: whenever we lift a free tensor to be the inputs of current subgraph, we'll look at the symbols in it and bind the symbols at the same time. For example, suppose we have the following function: ```python def f(x: [s1, s2]): def true_f(): def true_f_inner(): return x.sin() ``` what will happen in time order: 1. we create a subtracer 1 and start to speculate the outer cond's true_f 2. we create a another subtracer 2 and start to speculate the inner cond's true_f_inner. 3. dynamo realize the tensor input x by calling wrap_tensor in top-level to create graph input x (tracer 0), we bind the symbol s1, s2 after ph for x is created. So the graph now looks like: ```python def gm(s1, s2, x): ``` 4. when seeing TensorVariable.call_method of x, tracer2 wants to create a call_function(sin, proxy_of_x), but it finds that proxy_of_x is not created by current tracer. So it recursively look up its parent tracer1 and find parent tracer1 also doesn't track this proxy_of_x then it finds the root tracer0, who is the creator of it and tracks it as a ph. Then tracer 1 create_graph_input to lift the closure to its input ph1 and add (proxy_of_x: ph1) k-v in lifted_freevars of tracer 1. Now the graph looks like: ```python def gm(s1, s2, x): def true_gm(x): ``` 5. Since there are free symbols inside this new tensor input, tracer 1 also binds the symbols (maybe_bind_symbol), which calls create_graph_input for s1 and s2. Now the graph looks like ```python def gm(s1, s2, x): def true_gm(s1, s2, x): ``` 6. then it goes back to tracer 2, and call create_graph_input for x and get ph2, tracer 2's lifted_freevars records (ph1, ph2). and tracer 2 also binds the symbols in this new tensor input. Now the graph looks like: ```python def gm(s1, s2, x): def true_gm(s1, s2, x): def true_gm_inner(s1, s2, x): ``` 7. Finally the sin call_function node is created by tracer 2. This PR also handles the following cases: - What if we lift two tensors share the same symbol? e.g. x1 [s1, s2], x2 [s2, s3]? Each subtracer maintains bound_symbols as a cache that maps a symbol.expr to its proxy in current tracer. So when we see x1, we'll track s1 and s2 as inputs and bound s1 to ph1, s2 to ph2. So when we try to bind symbols of x2, s2 will already be tracked so no graph input is created. - what if a subgraph close over a symint? e.g. ```python def f(x): def true_f(): c = x.size(0) def true_fn_inner(): return c ``` When we speculate true_fn_inner, we find proxy_of_c is not tracked by tracer 2, so it recursively looks up its parent. At this point, x and its symbols have been lifted as input of true_f (as a result of lifting x during tracing true_f in tracer 1. Specifically the graph looks like: ```python def gm(s1, s2, x): def true_gm(s1, s2, x): def true_gm_inner(): ``` So tracer 2 is able to find that s1 have been tracked as ph in tracer 1 so it returns back to gm and call create_graph_input on s1. The graph now looks like: ```python def gm(s1, s2, x): def true_gm(s1, s2, x): def true_gm_inner(s1): return s1 ``` - What if subgraph close over an unbacked symint? e.g. ```python def f(x): def true_f(): c = x.item() def true_f_inner(): return c ``` When x.item() is called, proxy_of_c and its symnode variable is created for tracer 1, and we also call track_unbacked_symbols to record this relationship. So when tracer 2 finds proxy_of_c is not created by current tracer, it recursivelly looks up its parent tracer and finds that that expression u0 has been tracked as a result of track_unbacked_symbol in tracer 1. So it will stop the recursion and create_graph_input u0 in tracer 2. Graph looks like: ```python def f(x): def true_f(s1, s2, x): c = x.item() def true_gm_inner(u0): return u0 cond(pred, true_gm_inner, false_gm_inner, (c,)) ``` - what if subgraph close over a tensor with unbacked symint shape? ```python def f(x): def true_f(): c = x.item() r = torch.randn((c,)) def true_f_inner(): return r + 1 ``` This is the same as the case of closing over tensors with backed shapes. where we first lift r, then bind u0 in it, which recursively bind_symint of u0 in its parent and found u0 is tracked in parent tracer as a result of .item() call. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138363 Approved by: https://github.com/zou3519	2024-11-07 04:44:32 +00:00
Thomas Bohnstingl	d1c26b0781	Improvements for associative_scan - slicing of xs (#138858 ) In this PR, the combine_fn is consistently called with a slice along the scan dim. It implements part of https://github.com/pytorch/pytorch/pull/136966 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138858 Approved by: https://github.com/ydwu4	2024-11-05 23:38:21 +00:00
Yidi Wu	dc3a6a9d08	[hop free symbols][refactor] make create_graph_input always take example_value (#138428 ) Code refactoring only. We move the wrap_to_fake_tensor_logic out of wrap_fx_proxy for placeholders to provide the invariant that all graph inputs must set their example values when creating the inputs. This invariant helps us to identify all the free symbols in the graph in top-level and sub-graphs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138428 Approved by: https://github.com/ezyang, https://github.com/zou3519 ghstack dependencies: #138345	2024-11-04 22:47:49 +00:00
Yidi Wu	0ac9a663ec	[hop] always trace subgraph with fake to support .item in eager mode (#138771 ) Fixes https://github.com/pytorch/pytorch/issues/138664 When we eagerly run torch.cond with autograd keys set, we'll create_fw_bw_graph using real tensors. This PR forces fakification when cannot detect the fake mode so as to trace the .item calls. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138771 Approved by: https://github.com/zou3519, https://github.com/malfet	2024-10-26 02:17:17 +00:00
Yidi Wu	c6bb9b53f4	[scan] better error handling and remove redundant tests (#137967 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137967 Approved by: https://github.com/zou3519	2024-10-25 19:01:25 +00:00
Yidi Wu	3087b5e431	[cond] support lifted symint inputs in subgraph (#137519 ) As titled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/137519 Approved by: https://github.com/eellison	2024-10-17 16:09:06 +00:00
Brian Hirsh	a682194a11	inductor: use previous guards to know if a size is 1 for broadcasting (#136670 ) Fixes https://github.com/pytorch/pytorch/issues/136640 Today, inductor has some logic to figure out when it needs to do broadcasting during lowering, which just checks if any of the input shapes have sizes equal to 1. In particular: we should already have this information by the time we get to inductor, because our FakeTensor compute will have branched/guarded on whether any ops performed broadcasting, appropriately. In particular, if we have a tensor with a size value of `(64//((2048//(s3((s2//s3)))))))`, and it happens to be equal to one (and it is used in an op that requires this dim to be broadcasted), FakeTensorProp will have generated a guard: ``` Eq((64//((2048//(s3((s2//s3))))))), 1) ``` I chose the simplest possible way to beef up inductor's checks to know when a given size is equal to 1: loop over the existing shape env guards, and if our current size is a sympy expression on the LHS of one of our `Eq(LHS, 1)` guards, then return True. I'm hoping for feedback on whether or not this approach is reasonable. One better option I could imagine is that our symbolic reasoning should have automatically simplified the size of our tensor down to a constant as part of evaluating that guard. I was originally going to try to do this directly in the shape env, but I ran into a few issues: (1) I wanted to call some version of `set_replacement(expr, 1)`. But `set_replacement()` only accepts plain symbols on the LHS, not expressions (2) in theory I could get this to work if I could rework the above expression to move everything that is not a free variable to the RHS, e.g. `Eq(s2, 32)`. It looks like our existing `try_solve()` logic is... [not quite able](https://github.com/pytorch/pytorch/blob/main/torch/utils/_sympy/solve.py#L27) to do this generally though. Checking the guards feels pretty simple-and-easy. Are we worried that it is too slow to iterate over all the guards? I could also cache the lookup so we only need to iterate over guards that are of the form `Eq(LHS, 1)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/136670 Approved by: https://github.com/ezyang	2024-10-16 22:41:39 +00:00
Yidi Wu	0bfa1bf21d	[scan] support closure (#135602 ) This PR adds an additional_inputs argument to support closures similar to what we've done for while_loop. Pull Request resolved: https://github.com/pytorch/pytorch/pull/135602 Approved by: https://github.com/zou3519 ghstack dependencies: #135600, #135601	2024-10-16 22:28:03 +00:00
Yidi Wu	819d6b139c	[scan] flatten subgraph output and make subgraph inputs to be a slice (#135601 ) This pr introduces two changes: 1. Before this pr, the subgraphs output is ([], []), in this pr, we change it to a flattened list for easier codegen and consistency with other control flow operators. 2. Before the PR, the combine_fn of scan takes a sliced input but keep the sliced dimension. For exmaple, suppose xs = torch.randn(3, 4, 5) and we scan over dim 0, the combine_fn looks like: ``` # x.shape = (1, 4, 5) instead of (4, 5) def combine_fn(carry, x): ... ``` In this PR, we fixed this and also simplify some of the slicing logic. 3. this diff also make sure we always stack ys on fist dimension. Pull Request resolved: https://github.com/pytorch/pytorch/pull/135601 Approved by: https://github.com/zou3519 ghstack dependencies: #135600	2024-10-16 22:28:03 +00:00
Michael Lazos	27dee935af	[Dynamo] Ensure torch function modes are dispatched on builtin ops (#137117 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137117 Approved by: https://github.com/yanboliang, https://github.com/williamwen42 ghstack dependencies: #137114, #137115, #137116	2024-10-09 02:29:40 +00:00
PyTorch MergeBot	2d18c2d5e7	Revert "[Dynamo] Ensure torch function modes are dispatched on builtin ops (#137117 )" This reverts commit 941be418d8ec3290d0e3bae0e16a443be26b3075. Reverted https://github.com/pytorch/pytorch/pull/137117 on behalf of https://github.com/huydhn due to The top of the stack has been reverted but it leaves trunk in a broken state, so I try to revert the rest of the stack ([comment](https://github.com/pytorch/pytorch/pull/137114#issuecomment-2400765603))	2024-10-08 20:33:17 +00:00
Michael Lazos	941be418d8	[Dynamo] Ensure torch function modes are dispatched on builtin ops (#137117 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137117 Approved by: https://github.com/yanboliang, https://github.com/williamwen42 ghstack dependencies: #137114, #137115, #137116	2024-10-07 18:55:26 +00:00
Vincent Moens	bd9517c1ee	cond_batch_rule with boolean pred (#135009 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135009 Approved by: https://github.com/guilhermeleobas, https://github.com/jansel, https://github.com/zou3519	2024-10-03 07:43:30 +00:00
Michael Lazos	5c5c33ac32	[Dynamo] Trace torch function modes entered outside of torch.compile (#133137 ) This PR adds initial tracing for torch function modes. Details: In essence, this adds tracing into the torch function of modes entered outside of the torch.compile call. This does not yet support tracing enter/exit of a torch function mode/ tracing set_default_device properly using the new mode infra (this will be a very good stress test for modes). I am adding more PRs to this stack to support these. The overall plan is to support tracing enter/exit and handling graph breaks like we do other torch.* context managers. Previously landed: https://github.com/pytorch/pytorch/pull/133135 https://github.com/pytorch/pytorch/pull/133136 https://github.com/pytorch/pytorch/pull/133134 https://github.com/pytorch/pytorch/pull/133133 https://github.com/pytorch/pytorch/pull/133132 https://github.com/pytorch/pytorch/pull/133131 https://github.com/pytorch/pytorch/pull/133729 https://github.com/pytorch/pytorch/pull/133130 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133137 Approved by: https://github.com/jansel, https://github.com/zou3519 ghstack dependencies: #134732	2024-09-14 18:52:22 +00:00
PyTorch MergeBot	8c8a3086a7	Revert "[Dynamo] Trace torch function modes entered outside of torch.compile (#133137 )" This reverts commit 4528777e034b157a8329d1879daf52290eea199a. Reverted https://github.com/pytorch/pytorch/pull/133137 on behalf of https://github.com/mlazos due to broke python test/quantization/pt2e/test_numeric_debugger.py TestNumericDebugger.test_re_export_preserve_handle modified yesterday ([comment](https://github.com/pytorch/pytorch/pull/134732#issuecomment-2350937008))	2024-09-14 10:02:55 +00:00
Jack Taylor	b9b6094793	[ROCm] Skip pointwise associative scan tests due to regression (#135995 ) https://github.com/pytorch/pytorch/pull/133012 caused a regression on ROCm causing pointwise scan tests to fail ``` ERROR: test_pointwise_associative_scan_tuple_reverse_True_combine_mode_pointwise_cuda ERROR: test_pointwise_associative_scan_tuple_reverse_False_combine_mode_pointwise_cuda ERROR: test_pointwise_associative_scan_complex_pytree_reverse_True_combine_mode_pointwise_cuda ERROR: test_pointwise_associative_scan_complex_pytree_reverse_False_combine_mode_pointwise_cuda ERROR: test_pointwise_associative_scan_binary_operator_reverse_True_combine_mode_pointwise_cuda ERROR: test_pointwise_associative_scan_binary_operator_reverse_False_combine_mode_pointwise_cuda ``` Skipping temporarily while triage is underway. Full log: https://ossci-raw-job-status.s3.amazonaws.com/log/30067645445 ``` File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/graph.py", line 1020, in call_function out = lowerings[target](args, kwargs) # type: ignore[index] File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/lowering.py", line 363, in wrapped out = decomp_fn(args, **kwargs) File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/lowering.py", line 6245, in associative_scan raise RuntimeError("Unable to generate code for associative_scan op") torch._inductor.exc.LoweringException: RuntimeError: Unable to generate code for associative_scan op ``` NOTE: even "eager" backend fails ``` File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_higher_order_ops/associative_scan.py", line 338, in associative_scan_op_dense raise NotImplementedError("associative_scan is not implemented for eager") NotImplementedError: associative_scan is not implemented for eager ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/135995 Approved by: https://github.com/malfet	2024-09-14 05:40:10 +00:00
Michael Lazos	4528777e03	[Dynamo] Trace torch function modes entered outside of torch.compile (#133137 ) This PR adds initial tracing for torch function modes. Details: In essence, this adds tracing into the torch function of modes entered outside of the torch.compile call. This does not yet support tracing enter/exit of a torch function mode/ tracing set_default_device properly using the new mode infra (this will be a very good stress test for modes). I am adding more PRs to this stack to support these. The overall plan is to support tracing enter/exit and handling graph breaks like we do other torch.* context managers. Previously landed: https://github.com/pytorch/pytorch/pull/133135 https://github.com/pytorch/pytorch/pull/133136 https://github.com/pytorch/pytorch/pull/133134 https://github.com/pytorch/pytorch/pull/133133 https://github.com/pytorch/pytorch/pull/133132 https://github.com/pytorch/pytorch/pull/133131 https://github.com/pytorch/pytorch/pull/133729 https://github.com/pytorch/pytorch/pull/133130 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133137 Approved by: https://github.com/jansel, https://github.com/zou3519 ghstack dependencies: #134732	2024-09-14 02:40:43 +00:00
PyTorch MergeBot	eb7dd91dd1	Revert "[Dynamo] Trace torch function modes entered outside of torch.compile (#133137 )" This reverts commit fafdd588f27e1d56090c6d260d0382c255eaf9eb. Reverted https://github.com/pytorch/pytorch/pull/133137 on behalf of https://github.com/albanD due to Broke tests on main ([comment](https://github.com/pytorch/pytorch/pull/134732#issuecomment-2348886378))	2024-09-13 12:52:58 +00:00
Michael Lazos	fafdd588f2	[Dynamo] Trace torch function modes entered outside of torch.compile (#133137 ) This PR adds initial tracing for torch function modes. Details: In essence, this adds tracing into the torch function of modes entered outside of the torch.compile call. This does not yet support tracing enter/exit of a torch function mode/ tracing set_default_device properly using the new mode infra (this will be a very good stress test for modes). I am adding more PRs to this stack to support these. The overall plan is to support tracing enter/exit and handling graph breaks like we do other torch.* context managers. Previously landed: https://github.com/pytorch/pytorch/pull/133135 https://github.com/pytorch/pytorch/pull/133136 https://github.com/pytorch/pytorch/pull/133134 https://github.com/pytorch/pytorch/pull/133133 https://github.com/pytorch/pytorch/pull/133132 https://github.com/pytorch/pytorch/pull/133131 https://github.com/pytorch/pytorch/pull/133729 https://github.com/pytorch/pytorch/pull/133130 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133137 Approved by: https://github.com/jansel, https://github.com/zou3519 ghstack dependencies: #134732	2024-09-13 08:41:00 +00:00
Xinya Zhang	74fd1bf965	[ROCm] Update to AOTriton 0.7b (#134498 ) Notable changes: 1. Enable CudaGraph related tests 2. Fix UT problems 3. EXPERIMENTAL Navi31 support. User should enable Navi31 support with Env Var `TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1` Know Problem: 1. `test/test_transformers.py` will massive failures and/or NaN outputs with `--use-pytest` + Update: Confirmed skip `class TestSDPAPrivateUse1Only` can fix the problem with `--use-pytest` Note: AOTriton 0.7b adds support to nestedtenosrs+SDPA but need more work (and consequently a separate PR) to enable it. Fixes #133540 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134498 Approved by: https://github.com/pruthvistony, https://github.com/jeffdaily, https://github.com/malfet	2024-09-11 20:34:01 +00:00
PyTorch MergeBot	183c32fd3b	Revert "[Dynamo] Trace torch function modes entered outside of torch.compile (#133137 )" This reverts commit 0d15122092c27fec1143b800bab7c996d126b547. Reverted https://github.com/pytorch/pytorch/pull/133137 on behalf of https://github.com/clee2000 due to something in this stack broke functorch/test_control_flow.py::TestControlFlow::test_scan_simple_graph [GH job link](https://github.com/pytorch/pytorch/actions/runs/10804912306/job/29980571390) [HUD commit link](`444b52ff40`), newly added test yesterday ([comment](https://github.com/pytorch/pytorch/pull/133137#issuecomment-2344054339))	2024-09-11 15:57:00 +00:00
Michael Lazos	0d15122092	[Dynamo] Trace torch function modes entered outside of torch.compile (#133137 ) This PR adds initial tracing for torch function modes. Details: In essence, this adds tracing into the torch function of modes entered outside of the torch.compile call. This does not yet support tracing enter/exit of a torch function mode/ tracing set_default_device properly using the new mode infra (this will be a very good stress test for modes). I am adding more PRs to this stack to support these. The overall plan is to support tracing enter/exit and handling graph breaks like we do other torch.* context managers. Previously landed: https://github.com/pytorch/pytorch/pull/133135 https://github.com/pytorch/pytorch/pull/133136 https://github.com/pytorch/pytorch/pull/133134 https://github.com/pytorch/pytorch/pull/133133 https://github.com/pytorch/pytorch/pull/133132 https://github.com/pytorch/pytorch/pull/133131 https://github.com/pytorch/pytorch/pull/133729 https://github.com/pytorch/pytorch/pull/133130 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133137 Approved by: https://github.com/jansel, https://github.com/zou3519 ghstack dependencies: #134732	2024-09-11 04:18:22 +00:00
Thomas Bohnstingl	e889252493	Implementation of scan (#134102 ) This operation is supposed to be the pendant to the `associative_scan`, but can operate with non-associative functions. @ydwu4 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134102 Approved by: https://github.com/ydwu4	2024-09-10 04:51:16 +00:00
Thomas Bohnstingl	994438040c	Improvements for associative_scan - combine_mode (#133012 ) This is part of a series of PRs to improve the functionality of the `associatve_scan` functionality. This specific PR introduces a `combine_mode`, which can be either `pointwise` (default) or `generic`. In case of `generic`, the `associative_scan` is more flexible and allows also to perform non-pointwise functions. This PR has been derived from https://github.com/pytorch/pytorch/pull/129307. @ydwu4 @Chillee @zou3519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133012 Approved by: https://github.com/ydwu4	2024-08-30 16:09:53 +00:00
Yidi Wu	b07d0a22f5	[hop] require hops to override __call__. (#134352 ) Fixes https://github.com/pytorch/pytorch/issues/133719 by making `__call__` of hops an abstractmethod. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134352 Approved by: https://github.com/zou3519	2024-08-28 19:56:40 +00:00
Yidi Wu	6835f20d20	[HOP] support generating schema for hop (#133521 ) Add a way of generating a FunctionSchema from example values because hop's schema varies even for the same hop. We didn't use torch._C.FunctionSchema because we cannot construct the classes directly (e.g. "__init__" cannot be used for torch._C.FunctionSchema). Also extending the Basic types in c++ seems not that easy. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133521 Approved by: https://github.com/zou3519	2024-08-21 17:34:21 +00:00
Yidi Wu	2ec95ffe57	[cond] support unbacked symbool inputs (#133589 ) Fixes https://github.com/pytorch/pytorch/issues/133577. In dynamo, when received an unbacked symbool input, we create an unbacked symint to replace it. The alternative approach of `not realizing the pred LazyVariable in cond` doesn't work because we need to get the proxy of the symbool input. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133589 Approved by: https://github.com/ezyang	2024-08-19 23:36:48 +00:00
Thomas Bohnstingl	d04cd7f3ba	Improvements for associative_scan - Reverse feature (#133011 ) This is part of a series of PRs to improve the functionality of the `associatve_scan` functionality. This specific PR introduces a `reverse` flag to the `associative_scan` to establish a similar interface as for `jax.associative_scan`. This PR has been derived from https://github.com/pytorch/pytorch/pull/129307. @ydwu4 @Chillee @zou3519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133011 Approved by: https://github.com/ydwu4	2024-08-16 23:06:31 +00:00
Yidi Wu	a4ed8eeb33	[hop] makes compiled hops not share code objects (#132427 ) Fixes code object sharing issue in https://github.com/pytorch/pytorch/issues/132417. Before this Pr, compiled hops such as cond and flex_attenion are wrapped by _dynamo/external_utils.py:wrap_inline. This causes them to share the same code object. There is a condition surrounding the warp_inline call and currently is passing. We make hops fail the check so that they don't share code objects by adding them to LEGACY_MOD_INLINELIST. Adding them to MOD_INLINELIST doesn't work because trace_rules.check(fn) doesn't check for MOD_INLINLIST by default. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132427 Approved by: https://github.com/jansel, https://github.com/anijain2305	2024-08-05 22:59:05 +00:00
Oguz Ulgen	221350e3a4	Add None return type to init -- tests (#132352 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132352 Approved by: https://github.com/ezyang ghstack dependencies: #132335, #132351	2024-08-01 15:44:51 +00:00
YangQun1	589aef4bb0	Fix py codegen to delete values that don't have any users (#131028 ) Fixes #131025 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131028 Approved by: https://github.com/ezyang	2024-08-01 03:18:37 +00:00
ekamiti	9e473fd868	Make adding Buffers more like adding Parameters (#125971 ) Add similar semantics for creating a buffer object similar to creating a parameter. This is done by introducing a new Buffer class that can be used for type disambiguation. The underlying functionality of registering a buffer remains the same as the register_buffer method has not been changed. The persistent parameter in the Buffer type is to indicate whether a buffer object should be persistent or not. Other non-test changes have to do with getting the new Buffer type recognized by inductor and dynamo. Remaining changes are test changes to make sure that the Buffer type can be used as a drop in replacement for register_buffer as it just leads to register_buffer being called. The addition of this new functionality still allows for normal tensors to be used as buffers so these changes are intended to be backwards compatible. Fixes #35735 Co-authored-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/125971 Approved by: https://github.com/albanD, https://github.com/anijain2305, https://github.com/mlazos	2024-07-31 10:32:40 +00:00
PyTorch MergeBot	c3679bed35	Revert "Fix py codegen to delete values that don't have any users (#131028 )" This reverts commit 91aba7baac3d2a079c0b13db25588842260c98cc. Reverted https://github.com/pytorch/pytorch/pull/131028 on behalf of https://github.com/clee2000 due to broke inductor/test_triton_kernels inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_functionalize [GH job link](https://github.com/pytorch/pytorch/actions/runs/10094659640/job/27915271250) [HUD commit link](`91aba7baac`) ([comment](https://github.com/pytorch/pytorch/pull/131028#issuecomment-2251058374))	2024-07-25 17:42:18 +00:00
YangQun1	91aba7baac	Fix py codegen to delete values that don't have any users (#131028 ) Fixes #131025 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131028 Approved by: https://github.com/ezyang	2024-07-25 13:04:23 +00:00
PyTorch MergeBot	8ffd109a00	Revert "Fix py codegen to delete values that don't have any users (#131028 )" This reverts commit 466c167b71e6021f8eadcfbae1d9156a375663ce. Reverted https://github.com/pytorch/pytorch/pull/131028 on behalf of https://github.com/atalman due to breaks CI ([comment](https://github.com/pytorch/pytorch/pull/131028#issuecomment-2247771530))	2024-07-24 12:21:43 +00:00
YangQun1	466c167b71	Fix py codegen to delete values that don't have any users (#131028 ) Fixes #131025 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131028 Approved by: https://github.com/ezyang	2024-07-24 01:03:56 +00:00
Thomas Ortner	8ae1963a61	[Autograd] Cond Higher-Order Operation (#126911 ) This is an updated PR to equip cond with the autograd feature and replaces the old [PR](https://github.com/pytorch/pytorch/pull/126007) @ydwu4 I tried to incorporate your requests already. Currently there are two problems that I struggle with solving: 1. There seems to be an import issue when trying to import cond in `torch/__init__.py`, see [here](`8a704035c9/torch/__init__.py (L1914-L1916)`). Therefore, I had to comment those lines, which resolved the import issues, but I believe cond is not proberly exposed as torch.cond. 2. I am not entirely sure how to deal with the opinfo test in `hop_db.py` Co-authored-by: Yidi Wu <yidi@meta.com> Co-authored-by: Xuehai Pan <XuehaiPan@outlook.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/126911 Approved by: https://github.com/ydwu4	2024-07-22 23:18:19 +00:00
PyTorch MergeBot	fb3674b1f4	Revert "[Autograd] Cond Higher-Order Operation (#126911 )" This reverts commit f7058b735e52a1d876912f8c96a594673a495007. Reverted https://github.com/pytorch/pytorch/pull/126911 on behalf of https://github.com/clee2000 due to broke lint and functorch/test_aotdispatch `f7058b735e` Probably a landrace since both the test and lint passed on PR ([comment](https://github.com/pytorch/pytorch/pull/126911#issuecomment-2237703182))	2024-07-18 22:06:40 +00:00
Thomas Bohnstingl	f7058b735e	[Autograd] Cond Higher-Order Operation (#126911 ) This is an updated PR to equip cond with the autograd feature and replaces the old [PR](https://github.com/pytorch/pytorch/pull/126007) @ydwu4 I tried to incorporate your requests already. Currently there are two problems that I struggle with solving: 1. There seems to be an import issue when trying to import cond in `torch/__init__.py`, see [here](`8a704035c9/torch/__init__.py (L1914-L1916)`). Therefore, I had to comment those lines, which resolved the import issues, but I believe cond is not proberly exposed as torch.cond. 2. I am not entirely sure how to deal with the opinfo test in `hop_db.py` Co-authored-by: Yidi Wu <yidi@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/126911 Approved by: https://github.com/ydwu4	2024-07-18 21:09:09 +00:00
Xuehai Pan	76169cf691	[BE][Easy][9/19] enforce style for empty lines in import segments in `test/[e-h]*/` (#129760 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129760 Approved by: https://github.com/ezyang	2024-07-17 14:25:29 +00:00
PyTorch MergeBot	dff9d68f18	Revert "Fix names conflict when lifting (#129817 )" This reverts commit 53cf46b8c602f8512d49a5c30bca7fcf5411e25c. Reverted https://github.com/pytorch/pytorch/pull/129817 on behalf of https://github.com/clee2000 due to Failing inductor/test_flex_attention.py https://github.com/pytorch/pytorch/actions/runs/9940532858/job/27478084137 `74da2a467f` Sorry for the churn, possibly a landrace? ([comment](https://github.com/pytorch/pytorch/pull/129817#issuecomment-2229519886))	2024-07-15 22:08:45 +00:00
Zhanghan Wang	53cf46b8c6	Fix names conflict when lifting (#129817 ) ## Bug description When pending args that are potentially to be lift [here](`58f346c874/torch/_dynamo/output_graph.py (L1866)`) having same base name, like `contiguous` and `contiguous_1`, the call into [create_graph_input](`58f346c874/torch/_dynamo/output_graph.py (L2081)`) can finally create a name ([here](`58f346c874/torch/fx/graph.py (L1008)`)) that overwrite args to lift. And thus causing a wrong output of graph. ## Reproducing Below is an reproduceable example, ```python import logging from typing import List import torch from functorch.compile import aot_module_simplified, make_boxed_func @torch.library.custom_op("mylib::somefunc_forward", mutates_args=()) def somefunc_forward( input_: torch.Tensor, weight: torch.Tensor, shape: List[int], ) -> torch.Tensor: return torch.ones_like(input_) @somefunc_forward.register_fake def _(input_, shape, weight): return torch.empty_like(input_) @torch.library.custom_op("mylib::somefunc_backward", mutates_args=()) def somefunc_backward( grad_output: torch.Tensor, input_: torch.Tensor, weight: torch.Tensor, shape: List[int], ) -> torch.Tensor: print(f"backward.{grad_output.shape=}") print(f"backward.{input_.shape=}") print(f"backward.{weight.shape=}") print(f"backward.{shape=}") assert list(weight.shape) == shape return torch.ones_like(weight) @somefunc_backward.register_fake def _(grad_output, input_, weight, shape): return torch.empty_like(weight) def a_func(grad_output, input_, weight_, shape): return torch.ones_like(input_.sum() * weight_) class SomeFunc(torch.autograd.Function): @staticmethod def forward(ctx, input, weight, normalized_shape): ctx.normalized_shape = normalized_shape input_ = input.contiguous() weight_ = weight.contiguous() output = somefunc_forward(input_, weight_, ctx.normalized_shape) ctx.save_for_backward(input_, weight_) return output @staticmethod def backward(ctx, grad_output): input_, weight_ = ctx.saved_tensors # grad_weight = a_func(grad_output, input_, weight_, ctx.normalized_shape) grad_weight = somefunc_backward( grad_output.contiguous(), input_, weight_, ctx.normalized_shape, ) return None, grad_weight, None class MyModel(torch.nn.Module): def __init__(self): super().__init__() self.weight = torch.nn.Parameter(torch.ones(7)) def forward(self, x): return SomeFunc.apply(x, self.weight, [7]) model = MyModel() torch._logging.set_logs(dynamo=logging.DEBUG, aot=logging.DEBUG, graph_code=True) def aot_print_backend(gm, sample_inputs): # Forward compiler capture def fw(gm, sample_inputs): print(f"----- fw") gm.print_readable() return make_boxed_func(gm.forward) # Backward compiler capture def bw(gm, sample_inputs): print(f"----- bw") gm.print_readable() return make_boxed_func(gm.forward) # Call AOTAutograd gm_forward = aot_module_simplified( gm, sample_inputs, fw_compiler=fw, bw_compiler=bw ) return gm_forward model = torch.compile( model, backend=aot_print_backend, dynamic=False, ) out = model(torch.rand((128, 4, 7))) out.mean().backward() ``` I can see log that showing calling into create_graph_input like ```log V0629 02:08:46.839914 8200981504 torch/_dynamo/output_graph.py:2042] [0/0] create_graph_input contiguous (none) V0629 02:08:46.839998 8200981504 torch/_dynamo/output_graph.py:2042] [0/0] create_graph_input contiguous_1 (none) ``` And the backward graph generate will be like ```log class GraphModule(torch.nn.Module): def forward(self, function_ctx, somefunc_forward_default: "f32[128, 4, 7]", contiguous: "f32[128, 4, 7]", contiguous_1: "f32[7]"): contiguous_1 = contiguous contiguous_2 = contiguous_1 # No stacktrace found for following nodes _set_grad_enabled = torch._C._set_grad_enabled(False) # File: /Users/bytedance/testtorch/test_custom_op_bug.py:61 in backward, code: grad_output.contiguous(), contiguous: "f32[128, 4, 7]" = somefunc_forward_default.contiguous(); somefunc_forward_default = None # File: /opt/tiger/pytorch/torch/_library/custom_ops.py:506 in __call__, code: return self._opoverload(args, *kwargs) somefunc_backward_default: "f32[7]" = torch.ops.mylib.somefunc_backward.default(contiguous, contiguous_1, contiguous_2, [7]); contiguous = contiguous_1 = contiguous_2 = None # No stacktrace found for following nodes _set_grad_enabled_1 = torch._C._set_grad_enabled(True) return (None, somefunc_backward_default) ``` The original code of `somefunc_backward` takes a input list of `grad_output`, `input_`, `weight` and `shape`, where `weight` should be shape of `torch.Size([7])`. However, in the graph, `contiguous1` and `contiguous_2` are assigned with `contiguous`, this leads to assertion failure I added in `somefunc_backward`. ## Environment ```log Collecting environment information... PyTorch version: 2.5.0a0+git0b7e8df Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A OS: macOS 14.5 (arm64) GCC version: Could not collect Clang version: 15.0.0 (clang-1500.3.9.4) CMake version: version 3.26.4 Libc version: N/A Python version: 3.9.19 (main, May 6 2024, 14:39:30) [Clang 14.0.6 ] (64-bit runtime) Python platform: macOS-14.5-arm64-arm-64bit Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True CPU: Apple M3 Pro Versions of relevant libraries: [pip3] numpy==2.0.0 [pip3] optree==0.11.0 [pip3] torch==2.5.0a0+git0b7e8df [pip3] torchgraph==0.0.1 [conda] numpy 2.0.0 pypi_0 pypi [conda] optree 0.11.0 pypi_0 pypi [conda] torch 2.5.0a0+git0b7e8df dev_0 <develop> [conda] torchgraph 0.0.1 dev_0 <develop> ``` ## How to fix? I put a naive fix that add the potential args to lift into the used_names. This visits private variables, will fix that if this issue makes sense to you. @zou3519 @oulgen Co-authored-by: rzou <zou3519@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/129817 Approved by: https://github.com/zou3519	2024-07-15 18:49:12 +00:00
PyTorch MergeBot	1e897a0ca4	Revert "Fix names conflict when lifting (#129817 )" This reverts commit 74da2a467f166e00316aee82ba24835ca563ed87. Reverted https://github.com/pytorch/pytorch/pull/129817 on behalf of https://github.com/clee2000 due to broke dynamo/test_inline_inbuilt_nn_modules.py https://github.com/pytorch/pytorch/actions/runs/9940532858/job/27461141919 `74da2a467f`. Test passed on PR, possibly a landrace? ([comment](https://github.com/pytorch/pytorch/pull/129817#issuecomment-2228993570))	2024-07-15 17:09:52 +00:00

1 2 3 4 5

243 Commits