pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-21 05:34:18 +08:00

Author	SHA1	Message	Date
Edward Z. Yang	0c6f1ca064	Introduce torch._dynamo.config.enable_compiler_collectives for syncing compilation across ranks (#130935 ) This PR implements an opt-in configuration option for synchronizing compilation across all ranks at the end of Dynamo tracing (and potentially, other places in the future). There are two pieces to this PR: 1. Implementing infrastructure for compiler collectives (DistributedState/LocalState, the actual collective) 2. Using this infrastructure to synchronize automatic dynamic choices across all ranks The infrastructure in part one can be used for other purposes, just add more (serializable) fields to LocalState. Here is how automatic dynamic synchronization works: 1. Preflight in "torch/_dynamo/variables/builder.py": On the first Dynamo trace run, we trace without automatic dynamic at all; we assume all Tensor inputs that are not otherwise marked are static. This run is purely to collect all Tensor input sizes in the program. 2. torch/_dynamo/output_graph.py: At the end of the first Dynamo trace run, we perform a compiler collective to distribute all Tensor input sizes to all ranks. Then, we restart Dynamo 3. Apply the updates in "torch/_dynamo/variables/builder.py": Now that we have all sizes for every rank, we now update frame state with the observed sizes for all ranks, in rank order. Under the assumption that frame state is consistent on all ranks, this series of updates will preserve consistency. For future work, it would be safer if we force a consistent hint on all ranks; this is more involved as we have to interpose in fakification. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/130935 Approved by: https://github.com/jansel	2024-07-24 11:24:11 +00:00
Aaron Orenstein	0e780a7d69	[BE] Remove some mypy allow-untyped-decorators that are no longer needed (#131564 ) See #131429 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131564 Approved by: https://github.com/oulgen	2024-07-24 02:00:08 +00:00
Aaron Orenstein	5a0068cc69	[BE] mypy: disallow untyped decorators (#131428 ) Untyped decorators strip the types from their decorated function so even if the underlying function is fully typed then callers to it don't get any benefit from type annotations. Step 1 - Enable the error and override in all the offending files. #131429 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131428 Approved by: https://github.com/justinchuby, https://github.com/oulgen	2024-07-23 21:50:55 +00:00
Adria Orenstein	f75d724482	Updating Types in torch/_dynamo/utils.py (#131001 ) Adds some type annotations to the torch/_dynamo/utils.py file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131001 Approved by: https://github.com/aorenste	2024-07-23 18:25:52 +00:00
Michael Lazos	d31f2ae904	Ensure invariant that all inputs have tensor dict (#131249 ) There was a path with freezing enabled that violated the invariant that all inputs have the "tensor_dict" meta. This ensures that `register_attr_or_module` also sets tensor_dict meta. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131249 Approved by: https://github.com/anijain2305	2024-07-20 04:40:58 +00:00
PyTorch MergeBot	dff9d68f18	Revert "Fix names conflict when lifting (#129817 )" This reverts commit 53cf46b8c602f8512d49a5c30bca7fcf5411e25c. Reverted https://github.com/pytorch/pytorch/pull/129817 on behalf of https://github.com/clee2000 due to Failing inductor/test_flex_attention.py https://github.com/pytorch/pytorch/actions/runs/9940532858/job/27478084137 `74da2a467f` Sorry for the churn, possibly a landrace? ([comment](https://github.com/pytorch/pytorch/pull/129817#issuecomment-2229519886))	2024-07-15 22:08:45 +00:00
Zhanghan Wang	53cf46b8c6	Fix names conflict when lifting (#129817 ) ## Bug description When pending args that are potentially to be lift [here](`58f346c874/torch/_dynamo/output_graph.py (L1866)`) having same base name, like `contiguous` and `contiguous_1`, the call into [create_graph_input](`58f346c874/torch/_dynamo/output_graph.py (L2081)`) can finally create a name ([here](`58f346c874/torch/fx/graph.py (L1008)`)) that overwrite args to lift. And thus causing a wrong output of graph. ## Reproducing Below is an reproduceable example, ```python import logging from typing import List import torch from functorch.compile import aot_module_simplified, make_boxed_func @torch.library.custom_op("mylib::somefunc_forward", mutates_args=()) def somefunc_forward( input_: torch.Tensor, weight: torch.Tensor, shape: List[int], ) -> torch.Tensor: return torch.ones_like(input_) @somefunc_forward.register_fake def _(input_, shape, weight): return torch.empty_like(input_) @torch.library.custom_op("mylib::somefunc_backward", mutates_args=()) def somefunc_backward( grad_output: torch.Tensor, input_: torch.Tensor, weight: torch.Tensor, shape: List[int], ) -> torch.Tensor: print(f"backward.{grad_output.shape=}") print(f"backward.{input_.shape=}") print(f"backward.{weight.shape=}") print(f"backward.{shape=}") assert list(weight.shape) == shape return torch.ones_like(weight) @somefunc_backward.register_fake def _(grad_output, input_, weight, shape): return torch.empty_like(weight) def a_func(grad_output, input_, weight_, shape): return torch.ones_like(input_.sum() * weight_) class SomeFunc(torch.autograd.Function): @staticmethod def forward(ctx, input, weight, normalized_shape): ctx.normalized_shape = normalized_shape input_ = input.contiguous() weight_ = weight.contiguous() output = somefunc_forward(input_, weight_, ctx.normalized_shape) ctx.save_for_backward(input_, weight_) return output @staticmethod def backward(ctx, grad_output): input_, weight_ = ctx.saved_tensors # grad_weight = a_func(grad_output, input_, weight_, ctx.normalized_shape) grad_weight = somefunc_backward( grad_output.contiguous(), input_, weight_, ctx.normalized_shape, ) return None, grad_weight, None class MyModel(torch.nn.Module): def __init__(self): super().__init__() self.weight = torch.nn.Parameter(torch.ones(7)) def forward(self, x): return SomeFunc.apply(x, self.weight, [7]) model = MyModel() torch._logging.set_logs(dynamo=logging.DEBUG, aot=logging.DEBUG, graph_code=True) def aot_print_backend(gm, sample_inputs): # Forward compiler capture def fw(gm, sample_inputs): print(f"----- fw") gm.print_readable() return make_boxed_func(gm.forward) # Backward compiler capture def bw(gm, sample_inputs): print(f"----- bw") gm.print_readable() return make_boxed_func(gm.forward) # Call AOTAutograd gm_forward = aot_module_simplified( gm, sample_inputs, fw_compiler=fw, bw_compiler=bw ) return gm_forward model = torch.compile( model, backend=aot_print_backend, dynamic=False, ) out = model(torch.rand((128, 4, 7))) out.mean().backward() ``` I can see log that showing calling into create_graph_input like ```log V0629 02:08:46.839914 8200981504 torch/_dynamo/output_graph.py:2042] [0/0] create_graph_input contiguous (none) V0629 02:08:46.839998 8200981504 torch/_dynamo/output_graph.py:2042] [0/0] create_graph_input contiguous_1 (none) ``` And the backward graph generate will be like ```log class GraphModule(torch.nn.Module): def forward(self, function_ctx, somefunc_forward_default: "f32[128, 4, 7]", contiguous: "f32[128, 4, 7]", contiguous_1: "f32[7]"): contiguous_1 = contiguous contiguous_2 = contiguous_1 # No stacktrace found for following nodes _set_grad_enabled = torch._C._set_grad_enabled(False) # File: /Users/bytedance/testtorch/test_custom_op_bug.py:61 in backward, code: grad_output.contiguous(), contiguous: "f32[128, 4, 7]" = somefunc_forward_default.contiguous(); somefunc_forward_default = None # File: /opt/tiger/pytorch/torch/_library/custom_ops.py:506 in __call__, code: return self._opoverload(args, *kwargs) somefunc_backward_default: "f32[7]" = torch.ops.mylib.somefunc_backward.default(contiguous, contiguous_1, contiguous_2, [7]); contiguous = contiguous_1 = contiguous_2 = None # No stacktrace found for following nodes _set_grad_enabled_1 = torch._C._set_grad_enabled(True) return (None, somefunc_backward_default) ``` The original code of `somefunc_backward` takes a input list of `grad_output`, `input_`, `weight` and `shape`, where `weight` should be shape of `torch.Size([7])`. However, in the graph, `contiguous1` and `contiguous_2` are assigned with `contiguous`, this leads to assertion failure I added in `somefunc_backward`. ## Environment ```log Collecting environment information... PyTorch version: 2.5.0a0+git0b7e8df Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A OS: macOS 14.5 (arm64) GCC version: Could not collect Clang version: 15.0.0 (clang-1500.3.9.4) CMake version: version 3.26.4 Libc version: N/A Python version: 3.9.19 (main, May 6 2024, 14:39:30) [Clang 14.0.6 ] (64-bit runtime) Python platform: macOS-14.5-arm64-arm-64bit Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True CPU: Apple M3 Pro Versions of relevant libraries: [pip3] numpy==2.0.0 [pip3] optree==0.11.0 [pip3] torch==2.5.0a0+git0b7e8df [pip3] torchgraph==0.0.1 [conda] numpy 2.0.0 pypi_0 pypi [conda] optree 0.11.0 pypi_0 pypi [conda] torch 2.5.0a0+git0b7e8df dev_0 <develop> [conda] torchgraph 0.0.1 dev_0 <develop> ``` ## How to fix? I put a naive fix that add the potential args to lift into the used_names. This visits private variables, will fix that if this issue makes sense to you. @zou3519 @oulgen Co-authored-by: rzou <zou3519@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/129817 Approved by: https://github.com/zou3519	2024-07-15 18:49:12 +00:00
PyTorch MergeBot	1e897a0ca4	Revert "Fix names conflict when lifting (#129817 )" This reverts commit 74da2a467f166e00316aee82ba24835ca563ed87. Reverted https://github.com/pytorch/pytorch/pull/129817 on behalf of https://github.com/clee2000 due to broke dynamo/test_inline_inbuilt_nn_modules.py https://github.com/pytorch/pytorch/actions/runs/9940532858/job/27461141919 `74da2a467f`. Test passed on PR, possibly a landrace? ([comment](https://github.com/pytorch/pytorch/pull/129817#issuecomment-2228993570))	2024-07-15 17:09:52 +00:00
Zhanghan Wang	74da2a467f	Fix names conflict when lifting (#129817 ) ## Bug description When pending args that are potentially to be lift [here](`58f346c874/torch/_dynamo/output_graph.py (L1866)`) having same base name, like `contiguous` and `contiguous_1`, the call into [create_graph_input](`58f346c874/torch/_dynamo/output_graph.py (L2081)`) can finally create a name ([here](`58f346c874/torch/fx/graph.py (L1008)`)) that overwrite args to lift. And thus causing a wrong output of graph. ## Reproducing Below is an reproduceable example, ```python import logging from typing import List import torch from functorch.compile import aot_module_simplified, make_boxed_func @torch.library.custom_op("mylib::somefunc_forward", mutates_args=()) def somefunc_forward( input_: torch.Tensor, weight: torch.Tensor, shape: List[int], ) -> torch.Tensor: return torch.ones_like(input_) @somefunc_forward.register_fake def _(input_, shape, weight): return torch.empty_like(input_) @torch.library.custom_op("mylib::somefunc_backward", mutates_args=()) def somefunc_backward( grad_output: torch.Tensor, input_: torch.Tensor, weight: torch.Tensor, shape: List[int], ) -> torch.Tensor: print(f"backward.{grad_output.shape=}") print(f"backward.{input_.shape=}") print(f"backward.{weight.shape=}") print(f"backward.{shape=}") assert list(weight.shape) == shape return torch.ones_like(weight) @somefunc_backward.register_fake def _(grad_output, input_, weight, shape): return torch.empty_like(weight) def a_func(grad_output, input_, weight_, shape): return torch.ones_like(input_.sum() * weight_) class SomeFunc(torch.autograd.Function): @staticmethod def forward(ctx, input, weight, normalized_shape): ctx.normalized_shape = normalized_shape input_ = input.contiguous() weight_ = weight.contiguous() output = somefunc_forward(input_, weight_, ctx.normalized_shape) ctx.save_for_backward(input_, weight_) return output @staticmethod def backward(ctx, grad_output): input_, weight_ = ctx.saved_tensors # grad_weight = a_func(grad_output, input_, weight_, ctx.normalized_shape) grad_weight = somefunc_backward( grad_output.contiguous(), input_, weight_, ctx.normalized_shape, ) return None, grad_weight, None class MyModel(torch.nn.Module): def __init__(self): super().__init__() self.weight = torch.nn.Parameter(torch.ones(7)) def forward(self, x): return SomeFunc.apply(x, self.weight, [7]) model = MyModel() torch._logging.set_logs(dynamo=logging.DEBUG, aot=logging.DEBUG, graph_code=True) def aot_print_backend(gm, sample_inputs): # Forward compiler capture def fw(gm, sample_inputs): print(f"----- fw") gm.print_readable() return make_boxed_func(gm.forward) # Backward compiler capture def bw(gm, sample_inputs): print(f"----- bw") gm.print_readable() return make_boxed_func(gm.forward) # Call AOTAutograd gm_forward = aot_module_simplified( gm, sample_inputs, fw_compiler=fw, bw_compiler=bw ) return gm_forward model = torch.compile( model, backend=aot_print_backend, dynamic=False, ) out = model(torch.rand((128, 4, 7))) out.mean().backward() ``` I can see log that showing calling into create_graph_input like ```log V0629 02:08:46.839914 8200981504 torch/_dynamo/output_graph.py:2042] [0/0] create_graph_input contiguous (none) V0629 02:08:46.839998 8200981504 torch/_dynamo/output_graph.py:2042] [0/0] create_graph_input contiguous_1 (none) ``` And the backward graph generate will be like ```log class GraphModule(torch.nn.Module): def forward(self, function_ctx, somefunc_forward_default: "f32[128, 4, 7]", contiguous: "f32[128, 4, 7]", contiguous_1: "f32[7]"): contiguous_1 = contiguous contiguous_2 = contiguous_1 # No stacktrace found for following nodes _set_grad_enabled = torch._C._set_grad_enabled(False) # File: /Users/bytedance/testtorch/test_custom_op_bug.py:61 in backward, code: grad_output.contiguous(), contiguous: "f32[128, 4, 7]" = somefunc_forward_default.contiguous(); somefunc_forward_default = None # File: /opt/tiger/pytorch/torch/_library/custom_ops.py:506 in __call__, code: return self._opoverload(args, *kwargs) somefunc_backward_default: "f32[7]" = torch.ops.mylib.somefunc_backward.default(contiguous, contiguous_1, contiguous_2, [7]); contiguous = contiguous_1 = contiguous_2 = None # No stacktrace found for following nodes _set_grad_enabled_1 = torch._C._set_grad_enabled(True) return (None, somefunc_backward_default) ``` The original code of `somefunc_backward` takes a input list of `grad_output`, `input_`, `weight` and `shape`, where `weight` should be shape of `torch.Size([7])`. However, in the graph, `contiguous1` and `contiguous_2` are assigned with `contiguous`, this leads to assertion failure I added in `somefunc_backward`. ## Environment ```log Collecting environment information... PyTorch version: 2.5.0a0+git0b7e8df Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A OS: macOS 14.5 (arm64) GCC version: Could not collect Clang version: 15.0.0 (clang-1500.3.9.4) CMake version: version 3.26.4 Libc version: N/A Python version: 3.9.19 (main, May 6 2024, 14:39:30) [Clang 14.0.6 ] (64-bit runtime) Python platform: macOS-14.5-arm64-arm-64bit Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True CPU: Apple M3 Pro Versions of relevant libraries: [pip3] numpy==2.0.0 [pip3] optree==0.11.0 [pip3] torch==2.5.0a0+git0b7e8df [pip3] torchgraph==0.0.1 [conda] numpy 2.0.0 pypi_0 pypi [conda] optree 0.11.0 pypi_0 pypi [conda] torch 2.5.0a0+git0b7e8df dev_0 <develop> [conda] torchgraph 0.0.1 dev_0 <develop> ``` ## How to fix? I put a naive fix that add the potential args to lift into the used_names. This visits private variables, will fix that if this issue makes sense to you. @zou3519 @oulgen Co-authored-by: rzou <zou3519@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/129817 Approved by: https://github.com/zou3519	2024-07-15 13:41:46 +00:00
Yidi Wu	0bf9a091ec	[torchbind] add tracing_mode support (#129586 ) Sometimes, it could be difficult to write a fake class e.g. when the original implementation is using some third-party libraries or users are certain that the class is safe to trace with the real object. This PR allows user to specify their intention by implementing a "safe_to_trace_with_real_obj" method on their script class. Test Plan: `pytest test/export/test_torchbind.py -k safe` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129586 Approved by: https://github.com/zou3519	2024-07-12 18:01:47 +00:00
Pian Pawakapan	988ed4d5db	[export] clean up allow_complex_guards_as_runtime_asserts flag (#130596 ) Summary: removes underscore, cleans up dead code in DimConstraints Test Plan: existing export tests Reviewed By: angelayi Differential Revision: D59612746 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130596 Approved by: https://github.com/angelayi	2024-07-12 17:17:11 +00:00
Xuehai Pan	973037be6a	[BE][Easy] apply autofix for ruff rules unnecessary-collection-call (C408): `list()` / `tuple()` / `dict()` (#130199 ) This PR changes the empty collection factory call to Python literals: - `list()` -> `[]` - `tuple()` -> `()` - `dict()` -> `{}` The Python literals are more performant and safer. For example, the bytecode for building an empty dictionary: ```bash $ python3 -m dis - <<EOS import collections d1 = {} d2 = dict() dict = collections.OrderedDict d3 = dict() EOS ``` ```text 0 0 RESUME 0 1 2 LOAD_CONST 0 (0) 4 LOAD_CONST 1 (None) 6 IMPORT_NAME 0 (collections) 8 STORE_NAME 0 (collections) 3 10 BUILD_MAP 0 12 STORE_NAME 1 (d1) 4 14 PUSH_NULL 16 LOAD_NAME 2 (dict) 18 CALL 0 26 STORE_NAME 3 (d2) 6 28 LOAD_NAME 0 (collections) 30 LOAD_ATTR 8 (OrderedDict) 50 STORE_NAME 2 (dict) 7 52 PUSH_NULL 54 LOAD_NAME 2 (dict) 56 CALL 0 64 STORE_NAME 5 (d3) 66 RETURN_CONST 1 (None) ``` The dict literal `{}` only has one bytecode `BUILD_MAP`, while the factory call `dict()` has three `PUSH_NULL + LOAD_NAME + CALL`. Also, the factory call is not safe if users override the `dict` name in `locals` or `globals` (see the example of replacing with `OrderedDict` above). Pull Request resolved: https://github.com/pytorch/pytorch/pull/130199 Approved by: https://github.com/malfet	2024-07-11 17:30:28 +00:00
William Wen	79aabaf626	[3.13, dynamo] codegen PUSH_NULL when callable is codegen'd (#129172 ) Significant bytecode generation API change! The new suggested convention to generating bytecode to call a function is now to wrap instructions that push a callable to the stack with `add_push_null`, then that callable is called with `create_call_function` with `push_null=False` (see diff for examples). In Python 3.13, NULL is now expected to be pushed after the callable. In <=3.12, the NULL was pushed before the callable. This change abstracts away the exact placement of the NULL, but the developer must be aware that a NULL may be needed when codegen'ing a callable. This abstraction also reduces the need for the `push_null=True` option in `create_call_function`, which removes the need to rotate a NULL to the right place on the stack with a sequence of `SWAP` instructions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129172 Approved by: https://github.com/jansel	2024-06-22 17:25:23 +00:00
rzou	08b616281f	[custom ops] Switch out references from old landing page to new landing page (#129178 ) Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/129178 Approved by: https://github.com/albanD ghstack dependencies: #129177	2024-06-21 13:31:40 +00:00
Animesh Jain	1aafb9eb90	[dynamo][yolov3] Track UnspecializedNNModuleVariable for mutation (#128269 ) Fixes https://github.com/pytorch/pytorch/issues/101168 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128269 Approved by: https://github.com/jansel ghstack dependencies: #128715	2024-06-14 20:17:03 +00:00
chilli	c486e2ab64	Add coloring to fx graph print out (#128476 ) Note: Won't land immediately, at least I'll need to add a color option to the field. But curious if any tests fail. Old: <img width="1294" alt="image" src="https://github.com/pytorch/pytorch/assets/6355099/c3a750ed-5e54-4621-b2e4-be5481be15b6"> New: <img width="1303" alt="image" src="https://github.com/pytorch/pytorch/assets/6355099/3a1f1adc-6f3a-413e-8b87-ee53da9bf4ed"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/128476 Approved by: https://github.com/ezyang	2024-06-13 23:39:04 +00:00
PyTorch MergeBot	d630e1e838	Revert "[dynamo][yolov3] Track UnspecializedNNModuleVariable for mutation (#128269 )" This reverts commit f2d7f235a684c593f5a1ff2ca0b47b47274bfe85. Reverted https://github.com/pytorch/pytorch/pull/128269 on behalf of https://github.com/anijain2305 due to incorrect ([comment](https://github.com/pytorch/pytorch/pull/128269#issuecomment-2164267320))	2024-06-13 03:04:26 +00:00
Animesh Jain	f2d7f235a6	[dynamo][yolov3] Track UnspecializedNNModuleVariable for mutation (#128269 ) Fixes https://github.com/pytorch/pytorch/issues/101168 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128269 Approved by: https://github.com/jansel ghstack dependencies: #128295, #126578, #128268, #128254	2024-06-11 07:09:04 +00:00
Aaron Orenstein	dcfa7702c3	Flip default value for mypy disallow_untyped_defs [1/11] (#127838 ) See #127836 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127838 Approved by: https://github.com/oulgen	2024-06-08 18:16:33 +00:00
Edward Z. Yang	73d6ec2db6	Increase verbosity of FX graph dumps (#128042 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/128042 Approved by: https://github.com/aorenste	2024-06-08 07:24:58 +00:00
rzou	c9beea13ac	Rewrite existing links to custom ops gdocs with the landing page (#127423 ) NB: these links will be live after the docs build happens, which is once a day. Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/127423 Approved by: https://github.com/jansel, https://github.com/williamwen42 ghstack dependencies: #127291, #127292, #127400	2024-05-30 14:54:29 +00:00
Pian Pawakapan	8a31c2aa84	[export] allow complex guards as runtime asserts (#127129 ) With the current state of export's dynamic shapes, we struggle with guards and constraints that are beyond the current dynamic shapes language, expressed with dims and derived dims. While we can compile and guarantee correctness for guards within the current language (e.g. min/max ranges, linear relationships, integer divisibility) we struggle to dynamically compile guards which extend beyond that. For these "complex" guards, we typically do either of the following: 1) raise a constraint violation error, along the lines of "not all values of <symbol> in the specified range satisfy <guard>", with or without suggested fixes, 2) specialize to the provided static values and suggest removing dynamism, or 3) fail compilation due to some arbitrary unsupported case. Previous [work](https://github.com/pytorch/pytorch/pull/124949) went towards resolving this by disabling forced specializations, instead allowing the user to fail at runtime with incorrect inputs. In this PR, relying on [hybrid backed-unbacked symints](https://github.com/pytorch/pytorch/issues/121749), [deferred runtime asserts](https://github.com/pytorch/pytorch/blob/main/torch/fx/passes/runtime_assert.py), and the function [_is_supported_equivalence()](`d7de4c9d80/torch/fx/experimental/symbolic_shapes.py (L1824)`), we add a flag `_allow_complex_guards_as_runtime_asserts` which allows the user to compile exported programs containing these guards and maintain dynamism, while adding correctness checks as runtime assertions in the graph. Hybrid backed-unbacked symints allow us to easily bypass "implicit" guards emitted from computation - guards that we ~expect to be true. Popular examples revolve around reshapes: ``` # reshape def forward(self, x, y): # x: [s0, s1], y: [s2] return x.reshape([-1]) + y # guard s0 * s1 = s2 This leads to the following exported program class GraphModule(torch.nn.Module): def forward(self, x: "f32[s0, s1]", y: "f32[s2]"): sym_size_int: "Sym(s2)" = torch.ops.aten.sym_size.int(y, 0) mul: "Sym(-s2)" = -1 * sym_size_int; sym_size_int = None sym_size_int_1: "Sym(s0)" = torch.ops.aten.sym_size.int(x, 0) sym_size_int_2: "Sym(s1)" = torch.ops.aten.sym_size.int(x, 1) mul_1: "Sym(s0s1)" = sym_size_int_1 sym_size_int_2; sym_size_int_1 = sym_size_int_2 = None add: "Sym(s0s1 - s2)" = mul + mul_1; mul = mul_1 = None eq: "Sym(Eq(s0s1 - s2, 0))" = add == 0; add = None _assert_scalar = torch.ops.aten._assert_scalar.default(eq, "Runtime assertion failed for expression Eq(s0s1 - s2, 0) on node 'eq'"); eq = None view: "f32[s0s1]" = torch.ops.aten.view.default(x, [-1]); x = None add_1: "f32[s0s1]" = torch.ops.aten.add.Tensor(view, y); view = y = None return (add_1,) ``` Another case is symbol divisibility: ``` def forward(self, x): # x: [s0, s1] return x.reshape([-1, x.shape[0] - 1]) # Eq(Mod(s0 s1, s0 - 1), 0) ``` Applying deferred runtime asserts also helps dynamic compilation for "explicit" complex guards that typically cause problems for export. For example we can generate runtime asserts for not-equal guards, and complex conditions like the following: ``` class Foo(torch.nn.Module): def forward(self, x, y): # check that negation of first guard also shows up as runtime assertion if x.shape[0] == y.shape[0]: # False return x + y elif x.shape[0] == y.shape[0] 3: # False return x + 2, y + 3 elif x.shape[0] 2 == y.shape[0] * 3: # True return x * 2.0, y * 3.0 ``` For the above graph we will generate 3 runtime assertions: the negation of the first 2, and the 3rd condition as a guard. One additional benefit here over the current state of exported programs is that this adds further correctness guarantees - previously with explicit complex guards, if compilation succeeded, the guards would be ignored at runtime, treated as given. As shown above, the runtime asserts appear as math ops in the graph, generated by the sympy interpreter, resulting in an _assert_scalar call. There is an option to avoid adding these asserts into the graph, by setting `TORCH_DYNAMO_DO_NOT_EMIT_RUNTIME_ASSERTS=1`. This results in the "original" computation graph, with dynamism, and any incorrect inputs will fail on ops during runtime. Further work could go into prettifying the printer, so the majority of the graph isn't guard-related. Ideally this PR would subsume and remove the recently added [_disable_forced_specializations](https://github.com/pytorch/pytorch/pull/124949) flag, but that flag still handles one additional case of specialization: single-variable equalities where the symbol is solvable for a concrete value: see this [PR](https://github.com/pytorch/pytorch/pull/126925) This PR doesn't change any behavior around data-dependent errors/unbacked symints yet, that could be further work. NOTE: will take naming change suggestions for the flag :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127129 Approved by: https://github.com/avikchaudhuri	2024-05-29 17:15:25 +00:00
Aart Bik	ff82e2e7cf	[traced-graph][sparse] propagate sparsity metadata into traced graph (#117907 ) Propagate sparsity metadata from sparse tensors of torch.sparse into the traced graph representation (with would be useful for a JIT backend that supports a "sparse compiler"). This is a first careful attempt, since the actual "meta" feature seem still incomplete for coo and completely lacking for csr/csc/bsr/bsc. For background see forum postings (with examples): https://discuss.pytorch.org/t/connecting-pytorch-sparse-tensors-with-mlir/195145 https://dev-discuss.pytorch.org/t/connecting-pytorch-sparse-tensors-with-mlir/1803 And feature request: https://github.com/pytorch/pytorch/issues/117188 Pull Request resolved: https://github.com/pytorch/pytorch/pull/117907 Approved by: https://github.com/pearu, https://github.com/ezyang	2024-05-23 22:46:46 +00:00
Edward Z. Yang	0d17aae242	Teach FakeTensor to fill in item_memo when converting scalar CPU tensor (#126245 ) This PR requires a little justification, but let's start with what it does first: 1. When you have a 0d CPU scalar int64/float64 tensor input to a graph, we will preallocate a backed SymInt/SymFloat corresponding to what you would get if you call item() on this tensor. This means you can freely change your input to be a Python int/float or a Tensor with an item() call and end up with exactly the same level of expressivity (specifically, you can guard on the internal SymInt/SymFloat no matter what). By default, the source of the backed SymInt/SymFloat is `L['tensor'].item()`, but if you have promoted a float input into a Tensor, we will cancel out `torch.as_tensor(L['float']).item()` into just `L['float']`. 2. We switch wrap_symfloat to use this, instead of hand crafting the new SymNodeVariable. Everything works out, except that we carefully pass the item() result to tracked fakes (and not the fake Tensor argument) OK, so why do this at all? There is some marginal benefit where now some item() calls on scalar inputs can be guarded on, but IMO this is a pretty marginal benefit, and if it was the only reason, I wouldn't do this. The real reason for this is that I need to be able to propagate fake tensors through the graphs that are produced by Dynamo, and if I am doing the old custom wrap_symfloat logic, there's no way I can do this, because ordinarily an item() call will cause an unbacked SymInt when I reallocate. The other obvious way to solve the problem above is to make a HOP alternative that item() that "bakes in" the backed SymInt its supposed to return. But this strategy seems more parsimonious, and it does have the marginal benefit I mentioned above. The main downside is that what I have to do next, is make it so that when I run tensor computation, I also apply the equivalent operations to the SymInt/SymFloat as well. That's next PR. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/126245 Approved by: https://github.com/eellison ghstack dependencies: #126637	2024-05-22 15:25:38 +00:00
Edward Z. Yang	db3b38202b	Improve dead code elimination of unnecessary int arguments (#126074 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/126074 Approved by: https://github.com/lezcano ghstack dependencies: #125325, #125915	2024-05-14 17:22:30 +00:00
Edward Z. Yang	2ba102f689	Implement native support for float inputs in Dynamo and ShapeEnv (#125325 ) The big idea is that floats are treated as Tensors on input/output to the FX graph, but on the inside, we immediately call item() on the synthetic Tensor and record regular float operations on it. Canonicalization to Tensor operations will happen in a standalone FX pass. This behavior is controlled by `specialize_float` config variable when set to False. The generated graph looks like this for the test `test_unspec_float_output`: ``` def forward(self, L_x_: "f32[3]", L_y_: "f32[]"): l_x_ = L_x_ l_y_ = L_y_ # File: /data/users/ezyang/a/pytorch/test/dynamo/test_unspec.py:511 in f, code: return x + 1, y * 2 add: "f32[3]" = l_x_ + 1; l_x_ = None item: "Sym(zf0)" = l_y_.item(); l_y_ = None mul: "Sym(2zf0)" = item 2; item = None scalar_tensor: "f32[]" = torch.scalar_tensor(mul); mul = None return (add, scalar_tensor) ``` The ingredients: * torch/_dynamo/variables/builder.py When `specialize_float` is False, we wrap float literals with `wrap_symfloat`. This is an unholy mashup of `wrap_symint` and `wrap_unspecialized_primitive`. The overall strategy is that we first generate a tensor argument (because that's what we want to show up into the FX graph), but then immediately call item() on the tensor argument to get a SymNodeVariable, which we will do the rest of the tracing with. Importantly, this SymNodeVariable is backed with the source of the original float: this means we can guard on the resulting value (something we could NOT do with UnspecializedPythonVariable). This has to be done manually, because if you literally call item() on the tensor, you will end up with an unbacked float. There is a bit of copy paste from wrap_symint and wrap_unspecialized_primitive which we can try to factor out, but this really is its own thing and you should review every line of code in the function. * torch/fx/experimental/symbolic_shapes.py We now can generate guards on float inputs, and these guards are handled inside of ShapeEnv. So we need to be able to allocate (backed!) float symbols, and produce guards for them. Fairly straightforward generalization. * torch/_dynamo/codegen.py I also need to maintain the invariant that there are no float outputs to the FX graph. I chose to do this at codegen time. When we detect a SymNodeVariable on the return stack for a float, we on the fly convert it (via `as_tensor`) to a TensorVariable, which is the true output. We then special case the output bytecode to call item() on it again. The tensor conversion is memoized on SymNodeVariable since we typically run the code generation process twice. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/125325 Approved by: https://github.com/lezcano, https://github.com/jansel	2024-05-14 04:10:01 +00:00
Simon Fan	7e0edafe86	[compiled autograd][dynamo] improve lifted autograd.Function.backward handling and fallback to pseudo-eager (#125661 ) - `FakeContext` hides all fields other than ctx.saved_tensors, this dynamo errors when the autograd.Function.backward uses other attrs on ctx and it also doesn't allow fallback to eager. - If we remove it, we still can't fallback to eager: node variables are already freed (ctx.saved_tensors throws) - However, we can fallback to "pseudo-eager" by using a duck-typed ctx and routing the ctx.saved_tensors to lifted tensors - Dynamo tries to inline external_utils.call_backward, treats BackwardCFunction as a AutogradFunctionContextVariable (only used up until we create the fake context: FakeBackwardCFunction) - we call_function backward from the forward class AutogradFunctionVariable, and we still pass in the fake context as a UserDefinedObjectVariable (can later use AutogradFunctionContextVariable + HOO graph speculate) Fixes #125489 #124827 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125661 Approved by: https://github.com/jansel	2024-05-08 21:00:37 +00:00
Yu, Guangye	d17be10df1	make torch.amp.autocast more generic (#125103 ) # Motivation As discussed in [#124479](https://github.com/pytorch/pytorch/pull/124479), `torch.amp.autocast` can NOT be completely equivalent to `torch.cuda.amp.autocast` and `torch.cpu.amp.autocast` since `torch.amp.autocast` has NOT the default `dtype` for CPU (`torch.bfloat16` by default) and CUDA (`torch.float16` by default) respectively. We would like `torch.amp.autocast` to be more generic to help the developer/customer write the device-agnostic code. Because there are not enough reasons to add device-specific autocast `torch.xxx.amp.autocast` for each device backend. # Solution When `None` is passed to `dtype`, we should use `torch.get_autocast_dtype` to get the related dtype for each backend. Meanwhile, `torch.get_autocast_dtype` is necessary to be supported in JIT path for BC. # Additional Context With this PR, `torch.amp.autocast(device_type='cuda')` is equivalent to `torch.cuda.amp.autocast`. Add two new UTs to cover this change in eager and jit path respectively. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125103 Approved by: https://github.com/albanD, https://github.com/jgong5, https://github.com/gujinghui	2024-05-08 12:13:26 +00:00
ydwu4	461ffaaaf3	[dynamo] support torchbind object input (#124978 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124978 Approved by: https://github.com/jansel	2024-05-07 03:02:00 +00:00
Edward Z. Yang	b6bcd09173	Get rid of tabular and sizes, beef up verbosity of output graph (#125507 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/125507 Approved by: https://github.com/Chillee, https://github.com/jansel ghstack dependencies: #125505	2024-05-06 13:41:58 +00:00
Aaron Gokaslan	1dd42e42c4	[BE]: Try TCH autofixes on torch/ (#125536 ) Tries TCH autofixes and see what breaks Pull Request resolved: https://github.com/pytorch/pytorch/pull/125536 Approved by: https://github.com/ezyang	2024-05-05 23:13:59 +00:00
Edward Z. Yang	650a248d3e	Rename is_unspecialized to pass_arg_as_tensor, add comment (#125496 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/125496 Approved by: https://github.com/lezcano ghstack dependencies: #125395, #125419, #125483, #125494	2024-05-05 16:57:50 +00:00
Animesh Jain	071ee40793	[dynamo][nn module] Check for duplicate tensors in register_attr_or_module (#125421 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/125421 Approved by: https://github.com/mlazos ghstack dependencies: #125439	2024-05-03 05:08:09 +00:00
Animesh Jain	e68d65dae2	[dynamo][cpp-guards] Differentiate dict guards wrt to guarding on key order (#124779 ) We guard on key order 1) When a key is a non-constant object 2) When we actually need key order - like .values, .items etc For dicts/OrderedDicts that do not require key order guarding, we just rely on usual `GuardManger + DictGetItemGuardAccessor`. This is faster than going through the `list(d.keys())` based design for OrderedDicts. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124779 Approved by: https://github.com/jansel	2024-04-25 08:20:35 +00:00
Animesh Jain	59a1f1f308	[dynamo][inline inbuilt nn modules] Do not inline for export (#124814 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124814 Approved by: https://github.com/jansel	2024-04-25 06:35:31 +00:00
Yu, Guangye	25f321b84f	Refactor autocast C++ APIs to be device-agnostic (#124359 ) # Motivation This PR aims to refactor autocast C++ APIs to be device-agnostic and deprecate the device-specific autocast C++ APIs. In C++ side, - `is_enabled()` -> `is_enabled(device_type)`. - `set_enabled(new_enabled)` -> `set_enabled(device_type, new_enabled)`. - `get_autocast_dtype()` -> `get_autocast_dtype(device_type)` - `set_autocast_dtype(dtype)` -> `set_autocast_dtype(device_type, dtype)` These following C++ APIs are deprecated and should be removed in PyTorch 2.5 - `is_cpu_enabled` - `set_cpu_enabled` - `get_autocast_cpu_dtype` - `set_autocast_cpu_dtype` - `is_xpu_enabled` - `set_xpu_enabled` - `get_autocast_xpu_dtype` - `set_autocast_xpu_dtype` - `is_ipu_enabled` - `set_ipu_enabled` - `get_autocast_ipu_dtype` - `set_autocast_ipu_dtype` - `is_hpu_enabled` - `set_hpu_enabled` - `get_autocast_hpu_dtype` - `set_autocast_hpu_dtype` - `is_xla_enabled` - `set_xla_enabled` - `get_autocast_xla_dtype` - `set_autocast_xla_dtype` - `is_privateuseone_enabled` - `set_privateuseone_enabled` - `get_autocast_privateuseone_dtype` - `set_autocast_privateuseone_dtype` In Python side, provide 4 generic autocast APIs: - `torch.is_autocast_enabled(device_type)` - `torch.set_autocast_enabled(device_type, new_enabled)` - `torch.get_autocast_dtype(device_type)` - `torch.set_autocast_dtype(device_type, dtype)` # Additional Context We will submit another PR to refactor autocast Python APIs based on this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124359 Approved by: https://github.com/jgong5, https://github.com/albanD	2024-04-23 10:38:50 +00:00
Boyuan Feng	aa2da0cdd2	[Export] Add runtime assert to non-strict export (#123681 ) This PR moves insert_deferred_runtime_asserts from dynamo to torch.fx.passes and uses it to add runtime assertion for non-strict export. Differential Revision: D55944267 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123681 Approved by: https://github.com/tugsbayasgalan, https://github.com/angelayi	2024-04-18 16:13:27 +00:00
Edward Z. Yang	bebdbb63ce	Introduce set_example_value and use it throughout Dynamo (#124176 ) I'm going to setup some extra behavior when we set example value, so I need a convenient place to interpose. I cannot easily do it on meta itself because its a generic dict with no interposition point. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/124176 Approved by: https://github.com/oulgen ghstack dependencies: #124105, #124059	2024-04-17 22:57:11 +00:00
Simon Fan	67bd43b510	[compiled autograd][dynamo] use aliases for stack restore when partial graphs steal inputs (#124127 ) same idea as https://github.com/pytorch/pytorch/pull/123359, but for when we restore stack variables after calling a partial graph: Illustrated by the test case: before: ```python def function(inputs): graph_out_0 = __compiled_fn_2(inputs) getitem_1 = graph_out_0[0] add = inputs[1] <---- error inputs is already cleared del graph_out_0 add_1 = add + getitem_1 add = None getitem_1 = None cpu = add_1.cpu() add_1 = None return (cpu,) ``` after: ```python def function(inputs): inputs_ref_0 = inputs[1] graph_out_1 = __compiled_fn_2(inputs) getitem_1 = graph_out_1[0] add = inputs_ref_0 del graph_out_1 add_1 = add + getitem_1 add = None getitem_1 = None cpu = add_1.cpu() add_1 = None return (cpu,) ``` Co-authored-by: Jason Ansel <jansel@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/124127 Approved by: https://github.com/jansel	2024-04-16 17:01:34 +00:00
William Wen	9309580d69	[dynamo, 3.12] handle possibility of NULL local variables during graph breaks (#124095 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124095 Approved by: https://github.com/jansel	2024-04-16 08:44:43 +00:00
Animesh Jain	bb0c768c5b	[dynamo][refactor] Move LazyGraphModule handling (#124113 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124113 Approved by: https://github.com/jansel ghstack dependencies: #124078	2024-04-16 06:39:45 +00:00
Simon Fan	540b451e91	[compiled autograd][dynamo] Codegen aliases to keep grad mutated tensors alive (#123359 ) The current codegen is problematic if __compiled_fn_0 clears the inputs list, since we need it for assignment afterwards ```python def forward(inputs): __compiled_fn_0 = ... # The actual function needs to be provided graph_out_0 = __compiled_fn_0(inputs) # clears inputs temp_list = [] temp_list.append(graph_out_0[0]) inputs[4].grad = graph_out_0[1] # inputs is empty, index error inputs[7].grad = graph_out_0[2] inputs[8].grad = graph_out_0[3] inputs[9].grad = graph_out_0[3] del graph_out_0 return temp_list ``` With this fix, we use aliases to keep the tensors alive ```python def forward(inputs): __compiled_fn_0 = ... # The actual function needs to be provided inputs_ref_1 = inputs[9] inputs_ref_2 = inputs[4] inputs_ref_3 = inputs[8] inputs_ref_4 = inputs[7] graph_out_0 = __compiled_fn_0(inputs) temp_list = [] temp_list.append(graph_out_0[0]) inputs_ref_2.grad = graph_out_0[1] inputs_ref_4.grad = graph_out_0[2] inputs_ref_3.grad = graph_out_0[3] inputs_ref_1.grad = graph_out_0[3] del graph_out_0 return temp_list ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/123359 Approved by: https://github.com/jansel ghstack dependencies: #123630, #123674, #122353	2024-04-12 10:29:09 +00:00
Animesh Jain	7283c37c98	[dynamo] Keep guards on global function (#123423 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123423 Approved by: https://github.com/jansel	2024-04-09 04:23:11 +00:00
Oguz Ulgen	287680176b	Use graph.find_nodes in dynamo (#122257 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122257 Approved by: https://github.com/jansel ghstack dependencies: #121565, #122255, #122256	2024-04-07 18:51:18 +00:00
Animesh Jain	8c84fe3c86	[dynamo][guards] Forward fix for #123302 (#123485 ) For some reason, adding a `TYPE_CHECK` in DATA_PTR_MATCH guard in https://github.com/pytorch/pytorch/issues/123302 increases optimizer guard overhead for `MT5ForConditionalGeneration` by 10x. There is nothing special about MT5. As we are going to move towards the CPP guards soon, there is no reason to investigate this deeper. We can use `ID_MATCH` instead of `DATA_PTR` match. Today both cant be serialized, so there is no one preference over the other. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123485 Approved by: https://github.com/mlazos	2024-04-06 02:34:06 +00:00
Guilherme Leobas	32f9453c2a	[dynamo] Emit FUNCTORCH_STACK_MATCH guard in vmap(compile(f)) case (#122786 ) Fixes: #122201 Pull Request resolved: https://github.com/pytorch/pytorch/pull/122786 Approved by: https://github.com/zou3519	2024-04-05 15:04:16 +00:00
Michael Lazos	512759a3d7	Fix for tensor attribute missing (#123313 ) Tensors would sometimes be realized after we already registered attrs on the root nn module. This ensures all stack values are realized before registering attrs on the root nn module. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123313 Approved by: https://github.com/anijain2305	2024-04-04 21:11:04 +00:00
rzou	fd60752786	Turn _allow_unsafe_data_ptr_access into a config option (#123291 ) We're not planning on having this flag around for very long (see deprecation in next PR), so it's better as a config option. Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/123291 Approved by: https://github.com/eellison ghstack dependencies: #123261, #123282	2024-04-04 20:35:24 +00:00
Animesh Jain	5b45ec8892	[dynamo][guards] Use DATA_PTR instead of ID_MATCH for tensors (#123302 ) We should sparingly use ID_MATCH guards. When it comes to performance, ID_MATCH is much faster DATA_PTR for Python guards. However, the difference is very small in C++. So, its worth just using DATA_PTR_MATCH. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123302 Approved by: https://github.com/mlazos ghstack dependencies: #123285	2024-04-04 03:52:50 +00:00
Michael Lazos	3e2b7e6052	[dynamo][guard overhead] Data ptr guard optimizer state tensors (#122858 ) Stricter (but faster) guarding on optimizer state tensors Pull Request resolved: https://github.com/pytorch/pytorch/pull/122858 Approved by: https://github.com/anijain2305	2024-04-03 21:42:06 +00:00

... 3 4 5 6 7 ...

526 Commits