pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-21 05:34:18 +08:00

Author	SHA1	Message	Date
Michael Lazos	d9ae92cd6e	[Dynamo] Support for proxying frozen dataclasses (#134846 ) Fixes https://github.com/pytorch/pytorch/issues/133858 Details: Previously Dynamo would treat dataclasses as UserDefinedVariables. This was non-desirable if we would like to proxy the value into the graph, which is needed for TensorSubclassMetadata. To rectify this, frozen dataclasses are now able to be proxied similarly to NamedTuples. We require the object to be frozen, because if arbitrary mutation were allowed, we would need to replay those mutations in the graph after construction of the object. For tracing construction of the variable, the generated `__init__` for the dataclass uses `object.__setattr__` because frozen dataclasses throw errors on the usual `__setattr__` invocation. With this treatment, no special handling is needed in dynamo for frozen dataclass construction. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134846 Approved by: https://github.com/bdhirsh, https://github.com/anijain2305	2024-09-04 22:17:00 +00:00
Xu Han	2ec149cd3e	[inductor] fix test_functional_call_sequential_params_and_buffers expectation on Windows (#134394 ) This UT actual code only one empty line wrap difference(`linear` and `add`) between Windows/Linux, and the context is right. Reproduce UTs: ```cmd pytest test\dynamo\test_higher_order_ops.py -v -k test_functional_call_sequential_params_and_buffers ``` We can add `empty_line_normalizer` to fix it. ```cmd ______________________________________________________________________________________________ FuncTorchHigherOrderOpTests.test_functional_call_sequential_params_and_buffers _______________________________________________________________________________________________ Traceback (most recent call last): File "D:\xu_git\dnnl_cb\pytorch\test\dynamo\test_higher_order_ops.py", line 3676, in test_functional_call_sequential_params_and_buffers self.assertExpectedInline( File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\site-packages\torch\testing\_internal\common_utils.py", line 2871, in assertExpectedInline return super().assertExpectedInline(actual if isinstance(actual, str) else str(actual), expect, skip + 1) File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\site-packages\expecttest\__init__.py", line 271, in assertExpectedInline self.assertMultiLineEqualMaybeCppStack(expect, actual, msg=help_text) File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\site-packages\expecttest\__init__.py", line 292, in assertMultiLineEqualMaybeCppStack self.assertMultiLineEqual(expect, actual, args, *kwargs) File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\unittest\case.py", line 1226, in assertMultiLineEqual self.fail(self._formatMessage(msg, standardMsg)) File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\unittest\case.py", line 675, in fail raise self.failureException(msg) AssertionError: 'clas[509 chars]one\n add: "f32[1, 1]" = linear + l_buf[69 chars],)\n' != 'clas[509 chars]one\n\n add: "f32[1, 1]" = linear + l_b[71 chars],)\n' class GraphModule(torch.nn.Module): def forward(self, L_params_l1_weight_: "f32[1, 1]", L_params_l1_bias_: "f32[1]", L_buffers_buffer_: "f32[1]", L_inputs_: "f32[1, 1]"): l_params_l1_weight_ = L_params_l1_weight_ l_params_l1_bias_ = L_params_l1_bias_ l_buffers_buffer_ = L_buffers_buffer_ l_inputs_ = L_inputs_ linear: "f32[1, 1]" = torch._C._nn.linear(l_inputs_, l_params_l1_weight_, l_params_l1_bias_); l_inputs_ = l_params_l1_weight_ = l_params_l1_bias_ = None + <<<< (difference is here ) add: "f32[1, 1]" = linear + l_buffers_buffer_; linear = l_buffers_buffer_ = None return (add,) : To accept the new output, re-run test with envvar EXPECTTEST_ACCEPT=1 (we recommend staging/committing your changes before doing this) To execute this test, run the following from the base repo dir: python test\dynamo\test_higher_order_ops.py FuncTorchHigherOrderOpTests.test_functional_call_sequential_params_and_buffers This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 ========================================================================================================================== short test summary info ========================================================================================================================== FAILED [0.4275s] test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_functional_call_sequential_params_and_buffers - AssertionError: 'clas[509 chars]one\n add: "f32[1, 1]" = linear + l_buf[69 chars],)\n' != 'clas[509 chars]one\n\n add: "f32[1, 1]" = linear + l_b[71 chars],)\n' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/134394 Approved by: https://github.com/jansel Co-authored-by: Jason Ansel <jansel@jansel.net>	2024-08-26 01:41:20 +00:00
Xuehai Pan	e74ba1b34a	[BE][Easy][15/19] enforce style for empty lines in import segments in `torch/_d*/` (#129767 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129767 Approved by: https://github.com/anijain2305	2024-07-31 21:18:11 +00:00
Xuehai Pan	973037be6a	[BE][Easy] apply autofix for ruff rules unnecessary-collection-call (C408): `list()` / `tuple()` / `dict()` (#130199 ) This PR changes the empty collection factory call to Python literals: - `list()` -> `[]` - `tuple()` -> `()` - `dict()` -> `{}` The Python literals are more performant and safer. For example, the bytecode for building an empty dictionary: ```bash $ python3 -m dis - <<EOS import collections d1 = {} d2 = dict() dict = collections.OrderedDict d3 = dict() EOS ``` ```text 0 0 RESUME 0 1 2 LOAD_CONST 0 (0) 4 LOAD_CONST 1 (None) 6 IMPORT_NAME 0 (collections) 8 STORE_NAME 0 (collections) 3 10 BUILD_MAP 0 12 STORE_NAME 1 (d1) 4 14 PUSH_NULL 16 LOAD_NAME 2 (dict) 18 CALL 0 26 STORE_NAME 3 (d2) 6 28 LOAD_NAME 0 (collections) 30 LOAD_ATTR 8 (OrderedDict) 50 STORE_NAME 2 (dict) 7 52 PUSH_NULL 54 LOAD_NAME 2 (dict) 56 CALL 0 64 STORE_NAME 5 (d3) 66 RETURN_CONST 1 (None) ``` The dict literal `{}` only has one bytecode `BUILD_MAP`, while the factory call `dict()` has three `PUSH_NULL + LOAD_NAME + CALL`. Also, the factory call is not safe if users override the `dict` name in `locals` or `globals` (see the example of replacing with `OrderedDict` above). Pull Request resolved: https://github.com/pytorch/pytorch/pull/130199 Approved by: https://github.com/malfet	2024-07-11 17:30:28 +00:00
Edward Z. Yang	e836ee1955	Enhancements to recompiles logs (#130043 ) ---- - We now record on CacheEntry what the compile id that populated it was, so now we can say why a specific frame was rejected - Add structured log for recompiles under name artifact "recompile_reasons". As it stands, it's not terribly structured, but this was the easiest thing I could do to start - Slightly reformat multi-reason printing; since we only report one guard failure seems better to have it as a single line Example output: ``` V0703 10:34:13.273000 140345997743104 torch/_dynamo/guards.py:2590] [0/1] [__recompiles] Recompiling function f in /data/users/ezyang/a/pytorch/b.py:3 V0703 10:34:13.273000 140345997743104 torch/_dynamo/guards.py:2590] [0/1] [__recompiles] triggered by the following guard failure(s): V0703 10:34:13.273000 140345997743104 torch/_dynamo/guards.py:2590] [0/1] [__recompiles] - 0/0: tensor 'L['x']' size mismatch at index 0. expected 4, actual 5 ``` Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/130043 Approved by: https://github.com/anijain2305	2024-07-09 03:40:56 +00:00
PyTorch MergeBot	4bbadeee8a	Revert "Set simdlen based on ATEN_CPU_CAPABILITY (#123514 )" This reverts commit b66e3f0957b96b058c9b632ca60833d9717a9d8a. Reverted https://github.com/pytorch/pytorch/pull/123514 on behalf of https://github.com/clee2000 due to broke test/inductor/test_torchinductor.py::CpuTests::test_new_cpp_build_logical_cpu on periodic test on the no gpu tests `b66e3f0957` https://github.com/pytorch/pytorch/actions/runs/9453518547/job/26040077301 ([comment](https://github.com/pytorch/pytorch/pull/123514#issuecomment-2159433432))	2024-06-10 22:46:01 +00:00
CaoE	b66e3f0957	Set simdlen based on ATEN_CPU_CAPABILITY (#123514 ) It is part of https://github.com/pytorch/pytorch/issues/123224. Set simdlen based on the environment ATEN_CPU_CAPABILITY to control CPU vec ISA like eager. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123514 Approved by: https://github.com/jgong5, https://github.com/peterbell10	2024-06-10 09:02:14 +00:00
Aaron Orenstein	dcfa7702c3	Flip default value for mypy disallow_untyped_defs [1/11] (#127838 ) See #127836 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127838 Approved by: https://github.com/oulgen	2024-06-08 18:16:33 +00:00
William Wen	d44ab8ba6d	[dynamo] utility to generate bytecode from template function (#127359 ) This will be helpful in reducing some of the hardcoded and python-version-dependent bytecode generation in various places in dynamo - e.g. resume function generation and object reconstruction. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127359 Approved by: https://github.com/jansel ghstack dependencies: #127329	2024-05-30 06:37:32 +00:00
William Wen	f17572fcf6	add 3.12 inductor CI tests (#126218 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/126218 Approved by: https://github.com/huydhn, https://github.com/desertfire	2024-05-16 22:29:24 +00:00
Edward Z. Yang	e93b57a570	Add propagate_real_tensors mode for unbacked (#125115 ) A common complaint when working with data-dependent code in PyTorch is that it's hard to tell how far you are from the finish line: every time a GuardOnDataDependentSymNode error is hit, you have to somehow fix or workaround it to see the next one. This PR adds a new mode `torch._functorch.config.fake_tensor_propagate_real_tensors` which modifies fake tensors to also propagate real tensors. This means that when we try to guard on a data-dependent SymNode, we can actually produce a real result. We also produce a warning which you should consult to figure out what the crux points are. I ran this on vision_maskrcnn. In the baseline (without this mode), the model has 27 graph breaks, resulting in 40 graphs. With this mode on, the model has only 11 graph breaks, resulting in 15 graphs (the remaining graph breaks are due to missing functionality for item() on float tensor and some other Dynamo missing features.) You get a list of things that would have errored like this: ``` WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u1) < 2) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u1), 1)) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u1), 1)) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Ne(Max(1, u1), 1)) -> False WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u0) < 2) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u0), 1)) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u0), 1)) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Ne(Max(1, u0), 1)) -> False WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u1) < 2) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u1), 1)) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u1), 1)) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Ne(Max(1, u1), 1)) -> False WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u0) < 2) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u0), 1)) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u0), 1)) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Ne(Max(1, u0), 1)) -> False WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u1) < 2) -> False WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u1), 1)) -> False WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Ne(Max(1, u1), 1)) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u0) < 2) -> False WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u0), 1)) -> False ``` Potential later follow ups: * Improve the warning messages (in particular, should provide user frames) * GC real tensors when they are no longer needed by tracing. Right now, this will use A LOT of memory, equal to as if your GC was broken and every intermediate tensor was kept live Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/125115 Approved by: https://github.com/IvanKobzarev	2024-05-02 15:28:26 +00:00
William Wen	d6c713884a	[dynamo, 3.12] xfail refleaking tests due to buggy getattr_static (#125062 ) For tracking https://github.com/pytorch/pytorch/issues/124302 so that we can re-enable the test once 3.12 updates with the bug fix for https://github.com/python/cpython/issues/118013. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125062 Approved by: https://github.com/anijain2305, https://github.com/jansel	2024-04-30 22:40:47 +00:00
Simon Fan	c12c85e919	Revert "[benchmark][cudagraph] Explicitly call aten.div with CUDA denominator for cudagraphs (#119729 )" (#125246 ) This reverts commit 62b5738a8bf325d79468b839b8412b87cb9951c1. https://github.com/pytorch/pytorch/pull/119729/ regresses cudagraph dashboard. Moving the one-time per iteration loss from CPU to CUDA is somehow causing a lot of copies: current (top) vs with revert (bottom) ![image](https://github.com/pytorch/pytorch/assets/9547562/62dfbf66-7edc-4a3c-ba7f-1ec057fba950) Pull Request resolved: https://github.com/pytorch/pytorch/pull/125246 Approved by: https://github.com/eellison	2024-04-30 22:39:53 +00:00
Simon Fan	62b5738a8b	[benchmark][cudagraph] Explicitly call aten.div with CUDA denominator for cudagraphs (#119729 ) aten.div's output device will be its numerator's device. so it is acceptable to do cuda / cpu type divisions. post grad passes operate only on graphs and can't handle runtime graph inputs. so we change user code to move inputs to cuda for cudagraph. this affects any graph that has cpu tensors as graph inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119729 Approved by: https://github.com/eellison	2024-04-26 03:22:26 +00:00
PyTorch MergeBot	48a016157d	Revert "[benchmark][cudagraph] Explicitly call aten.div with CUDA denominator for cudagraphs (#119729 )" This reverts commit c021c9b8e48b8e787b75fd69a3076beffffb8208. Reverted https://github.com/pytorch/pytorch/pull/119729 on behalf of https://github.com/jeanschmidt due to one PR in this stack seems to have broken linux pull cuda12 tests ([comment](https://github.com/pytorch/pytorch/pull/119729#issuecomment-2076750595))	2024-04-25 09:26:25 +00:00
Simon Fan	c021c9b8e4	[benchmark][cudagraph] Explicitly call aten.div with CUDA denominator for cudagraphs (#119729 ) aten.div's output device will be its numerator's device. so it is acceptable to do cuda / cpu type divisions. post grad passes operate only on graphs and can't handle runtime graph inputs. so we change user code to move inputs to cuda for cudagraph. this affects any graph that has cpu tensors as graph inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119729 Approved by: https://github.com/eellison	2024-04-25 03:38:09 +00:00
Animesh Jain	a32eac345f	[dynamo] Return gm.forward for eager backend (#124109 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124109 Approved by: https://github.com/yanboliang, https://github.com/jansel ghstack dependencies: #124445	2024-04-20 14:11:05 +00:00
William Wen	812bae09be	[dynamo] fix 3.11+ refleak (#124238 ) Fixes https://github.com/pytorch/pytorch/issues/119607 for 3.11+. In 3.11+, `_PyFrame_FastToLocalsWithError` could implicity run `COPY_FREE_VARS` on the original frame, leading to double incref's since the dynamo shadow frame can rerun `COPY_FREE_VARS`. So the solution is to skip the first `COPY_FREE_VARS` instruction in the shadow frame if it was already executed in the original frame. Also move the location for clearing the original frame in 3.12 to handle error cases more thoroughly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124238 Approved by: https://github.com/jansel	2024-04-18 03:02:29 +00:00
Shunting Zhang	df5829d0ba	[inductor] let rand_strided support fp8 (#124120 ) I'm working on https://fb.workplace.com/groups/1075192433118967/posts/1411161629522044/ (this is a meta internal link about a inefficient inner/persistent reduction kernel generated by inductor). I found the generated benchmark code for a kernel ( https://gist.github.com/shunting314/13a0105f72a1c54d9c220370c7fd3845 ) can not be run since rand_strided failed to generate tensors for fp8. Errors are like ``` RuntimeError: "normal_kernel_cpu" not implemented for 'Float8_e4m3fn' ``` for CPU or ``` RuntimeError: "normal_kernel_cuda" not implemented for 'Float8_e4m3fn' ``` for GPU This PR work around that problem. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124120 Approved by: https://github.com/Chillee, https://github.com/jansel	2024-04-16 04:15:56 +00:00
Aaron Gokaslan	1d6c5972c1	[BE]: Optimize min/max/sum comprehensions C419 (#123960 ) Automatic fixes that replaces certain list comprehensions with generator ones where appropriate so that they are immediately consumed. This is preview functionality in ruff for rule C419 and it was automatically applied. Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/123960 Approved by: https://github.com/malfet	2024-04-12 23:54:15 +00:00
Peter Bell	6939279a17	[dynamo] Forward OptimizedModule.__setattr__ to the wrapped module (#122098 ) Fixes #114844 In the linked issue we have ``` compiled_module = torch.compile(module) compiled_module.x = ... compiled_module(...) # Mutates self.x ``` Where since the module mutates `self.x` you would expect `compiled_module.x` to be updated but actually `compiled_module.x = ...` sets an attribute "x" on the `OptimizedModule` object while the forward method of the module mutates `module.x`. This gives the expected behavior by forwarding `compiled_module.__setattr__` down to `module.__setattr__`. There is already a corresponding `__getattr__` so now `compiled_module.x` becomes an alias for `module.x`. Co-authored-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/122098 Approved by: https://github.com/ezyang, https://github.com/lezcano	2024-04-01 14:30:44 +00:00
William Wen	a9b27bbbe9	[dynamo, 3.12] update jump instructions (#122530 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122530 Approved by: https://github.com/jansel ghstack dependencies: #122146, #122335, #122354, #122355, #122356, #122449, #122455, #122456	2024-03-27 20:39:39 +00:00
William Wen	2564f6cf0e	[dynamo, 3.12] Allocate Dynamo shadow frames by mimicking CPython (#122146 ) Python 3.12 changed a few things with how `_PyInterpreterFrame`s are allocated and freed: - Frames are now required to be placed on the Python frame stack. In 3.11, we could allocate frames anywhere in memory. In 3.12, we now need to use `THP_PyThreadState_BumpFramePointerSlow`/`push_chunk`/`allocate_chunk`. This method of allocating/freeing frames is also compatible with 3.11. - The eval frame function is now responsible for clearing the frame (see https://docs.python.org/3/whatsnew/changelog.html#id128, the point about "...which now clear the frame.") Pull Request resolved: https://github.com/pytorch/pytorch/pull/122146 Approved by: https://github.com/jansel	2024-03-27 20:39:39 +00:00
PyTorch MergeBot	f631586084	Revert "[dynamo] Forward OptimizedModule.__setattr__ to the wrapped module (#122098 )" This reverts commit b6982bf2b25d2d3ba5d82488a39721d6013a838f. Reverted https://github.com/pytorch/pytorch/pull/122098 on behalf of https://github.com/atalman due to Failing internally ([comment](https://github.com/pytorch/pytorch/pull/122098#issuecomment-2021233604))	2024-03-26 18:54:17 +00:00
Peter Bell	b6982bf2b2	[dynamo] Forward OptimizedModule.__setattr__ to the wrapped module (#122098 ) Fixes #114844 In the linked issue we have ``` compiled_module = torch.compile(module) compiled_module.x = ... compiled_module(...) # Mutates self.x ``` Where since the module mutates `self.x` you would expect `compiled_module.x` to be updated but actually `compiled_module.x = ...` sets an attribute "x" on the `OptimizedModule` object while the forward method of the module mutates `module.x`. This gives the expected behavior by forwarding `compiled_module.__setattr__` down to `module.__setattr__`. There is already a corresponding `__getattr__` so now `compiled_module.x` becomes an alias for `module.x`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122098 Approved by: https://github.com/ezyang, https://github.com/lezcano	2024-03-26 00:52:12 +00:00
PyTorch MergeBot	e5e0685f61	Revert "[dynamo] Forward OptimizedModule.__setattr__ to the wrapped module (#122098 )" This reverts commit 88ebdbc97c103271766203df6662240e95a09b42. Reverted https://github.com/pytorch/pytorch/pull/122098 on behalf of https://github.com/huydhn due to Sorry for reverting your change but the distributed failure looks legit as it is also failing in trunk `88ebdbc97c` ([comment](https://github.com/pytorch/pytorch/pull/122098#issuecomment-2008483316))	2024-03-20 01:12:24 +00:00
Peter Bell	88ebdbc97c	[dynamo] Forward OptimizedModule.__setattr__ to the wrapped module (#122098 ) Fixes #114844 In the linked issue we have ``` compiled_module = torch.compile(module) compiled_module.x = ... compiled_module(...) # Mutates self.x ``` Where since the module mutates `self.x` you would expect `compiled_module.x` to be updated but actually `compiled_module.x = ...` sets an attribute "x" on the `OptimizedModule` object while the forward method of the module mutates `module.x`. This gives the expected behavior by forwarding `compiled_module.__setattr__` down to `module.__setattr__`. There is already a corresponding `__getattr__` so now `compiled_module.x` becomes an alias for `module.x`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122098 Approved by: https://github.com/ezyang, https://github.com/lezcano	2024-03-19 16:51:43 +00:00
William Wen	d14d62b7aa	[dynamo] add more refleak tests (#120657 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120657 Approved by: https://github.com/jansel	2024-03-07 22:25:43 +00:00
Alexander Grund	99cb807e25	Skip test_wrap_bad if run under pytest (#115070 ) Pytest replaces sys.stdout/stderr by `TextIOWrapper` instances which do not support `fileno()` Hence skip that test in this case Fixes #115069 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115070 Approved by: https://github.com/clee2000	2024-02-15 00:10:05 +00:00
Edward Z. Yang	790858afa9	Make start compiling stack trace omit framework frames (#119251 ) Fixes https://github.com/pytorch/pytorch/issues/119238 Here's what it looks like now: ``` $ TORCH_LOGS=+torch._dynamo.convert_frame python a.py [2024-02-05 18:52:07,248] [0/0] torch._dynamo.convert_frame: [DEBUG] torchdynamo start compiling f /data/users/ezyang/b/pytorch/a.py:3, stack (elided 5 frames): [2024-02-05 18:52:07,248] [0/0] torch._dynamo.convert_frame: [DEBUG] File "/data/users/ezyang/b/pytorch/a.py", line 7, in <module> [2024-02-05 18:52:07,248] [0/0] torch._dynamo.convert_frame: [DEBUG] f(torch.randn(2)) [2024-02-05 18:52:07,248] [0/0] torch._dynamo.convert_frame: [DEBUG] File "/data/users/ezyang/b/pytorch/torch/_dynamo/eval_frame.py", line 453, in _fn [2024-02-05 18:52:07,248] [0/0] torch._dynamo.convert_frame: [DEBUG] return fn(args, kwargs) [2024-02-05 18:52:07,248] [0/0] torch._dynamo.convert_frame: [DEBUG] $ cat a.py import torch @torch.compile def f(x): return x 2 f(torch.randn(2)) ``` The eval_frame frame is intentionally present, since what happens is you run the torch.compile wrapper, and then you actually hit the user frame to be compiled. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/119251 Approved by: https://github.com/yanboliang, https://github.com/mlazos	2024-02-06 17:40:07 +00:00
Xuehai Pan	039fbeb016	[dynamo] fix `functools.reduce()` function with `None` as `initial` (#116398 ) The `initial` argument in `functools.reduce` can be `None`. ```python initial_missing = object() def reduce(function, iterable, initial=initial_missing, /): it = iter(iterable) if initial is initial_missing: value = next(it) else: value = initial for element in it: value = function(value, element) return value ``` Reference: - python/cpython#102759 Pull Request resolved: https://github.com/pytorch/pytorch/pull/116398 Approved by: https://github.com/Skylion007	2023-12-25 21:23:28 +00:00
PyTorch MergeBot	92e3f45f0e	Revert "[dynamo] Refactor test cross importing (#113242 )" This reverts commit 4309d38f5d33530cbd875bded551e3fc08286c5d. Reverted https://github.com/pytorch/pytorch/pull/113242 on behalf of https://github.com/huydhn due to Sorry for reverting your stack, but it is failing to list test internally with buck2 ([comment](https://github.com/pytorch/pytorch/pull/113242#issuecomment-1811674395))	2023-11-15 01:53:07 +00:00
Jason Ansel	4309d38f5d	[dynamo] Refactor test cross importing (#113242 ) Having tests import tests is a bit annoying because fbcode/oss have different paths. This moves that stuff into a helper function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113242 Approved by: https://github.com/yanboliang	2023-11-11 03:17:35 +00:00
PyTorch MergeBot	59592389fc	Revert "[dynamo] Refactor test cross importing (#113242 )" This reverts commit 8858edad656f505728c9810093f796f96e1285cb. Reverted https://github.com/pytorch/pytorch/pull/113242 on behalf of https://github.com/PaliC due to this diff appears to be causing inductor failures internally ([comment](https://github.com/pytorch/pytorch/pull/113242#issuecomment-1805132719))	2023-11-10 05:43:08 +00:00
Jason Ansel	8858edad65	[dynamo] Refactor test cross importing (#113242 ) Having tests import tests is a bit annoying because fbcode/oss have different paths. This moves that stuff into a helper function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113242 Approved by: https://github.com/yanboliang	2023-11-09 01:36:27 +00:00
William Wen	aa649f713f	[dynamo, test] remove #ops comparison to fx.symbolic_trace from dynamo standard_test (#112420 ) Fix https://github.com/pytorch/pytorch/issues/112230 by removing the comparison of number of ops in dynamo vs. fx.symbolic_trace. A number of tests fail in `test_functions.py` fail because the number of ops is no longer the same, but this seems to be acceptable behavior by dynamo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112420 Approved by: https://github.com/jansel, https://github.com/int3	2023-10-31 19:55:47 +00:00
Jez Ng	a26cb0a3f2	[dynamo] Enable typechecking for testing.py (#112129 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112129 Approved by: https://github.com/Skylion007 ghstack dependencies: #111894, #111992, #112031, #112127, #112128	2023-10-27 18:00:56 +00:00
Shunting Zhang	cc9b7bb85c	[reland] [inductor] fix a max-autotune rng state related bug (#111381 ) reland https://github.com/pytorch/pytorch/pull/109828 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111381 Approved by: https://github.com/lezcano	2023-10-17 19:16:36 +00:00
PyTorch MergeBot	d9627c4264	Revert "[inductor] fix a max-autotune rng state related bug (#109828 )" This reverts commit 3663436db31bd3cebcb76efe05d8355553a05c57. Reverted https://github.com/pytorch/pytorch/pull/109828 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but the rocm failure looks legit. There is also another numpy import error when running dynamo test on CPU ([comment](https://github.com/pytorch/pytorch/pull/109828#issuecomment-1732423883))	2023-09-23 22:35:37 +00:00
Shunting Zhang	3663436db3	[inductor] fix a max-autotune rng state related bug (#109828 ) Fix https://github.com/pytorch/pytorch/issues/109736 . HF pin move causes regression on accuracy check for HF models on the dashboard. Manually reverting the HF PR ( https://github.com/huggingface/transformers/pull/24696/files ) could recover, but this may hide some real issue. I happen to found that using a warm matmul max-autotune cache can work around the issue. Or putting it in another way: - make all calls to check_cache cache miss repro the issue - make all cals to check_cache cache hit works around the issue I did some sort of 'bisect' to force halving the amount of cache miss each time while still make sure we can repro. Luckily reducing to a single cache miss still repro the issue. With more debugging, it turns out that it's the call to `torch.randn` on cuda device causing the problem. The fix is to make sure we restore the rng state when we generate random inputs for max-autotune benchmarking. TBH, I can not fully explain the root cause although I know it's caused by rng state change. AOTAutograd already has some logic to preserve rng state. And I can not repro the issue in unit tests. I have a few guess why the RNG state is not restored in the first place after we generate random inputs for max-autotune: - maybe AOTAutograd misses some corner case to preserve the rng state - maybe for the failed models, there are some eager fallback that's not handled by inductor. And if those fallback calles random number related APIs, we will see the issue. But again I don't find a good way to simulate this. Repro: ``` TORCHINDUCTOR_BENCHMARK_KERNEL=1 TORCHINDUCTOR_MAX_AUTOTUNE_GEMM=1 CUDA_VISIBLE_DEVICES=3 time python benchmarks/dynamo/huggingface.py --backend inductor --amp --accuracy --only PLBartForCausalLM --training --cold-start-latency ``` We always repro the issue without the PR but pass the accuracy check with the PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109828 Approved by: https://github.com/eellison	2023-09-23 00:58:10 +00:00
Catherine Lee	54e73271c7	When patching dynamic shapes test class, don't run the original tests (#108681 ) redo of https://github.com/pytorch/pytorch/pull/103523 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108681 Approved by: https://github.com/ezyang	2023-09-07 02:13:59 +00:00
lezcano	a9dca53438	NumPy support in torch.compile (#106211 ) RFC: https://github.com/pytorch/rfcs/pull/54 First commit is the contents of https://github.com/Quansight-Labs/numpy_pytorch_interop/ We have already been using this in core for the last few months as a external dependency. This PR pulls all these into core. In the next commits, I do a number of things in this order - Fix a few small issues - Make the tests that this PR adds pass - Bend backwards until lintrunner passes - Remove the optional dependency on `torch_np` and simply rely on the upstreamed code - Fix a number dynamo tests that were passing before (they were not tasting anything I think) and are not passing now. Missing from this PR (but not blocking): - Have a flag that deactivates tracing NumPy functions and simply breaks. There used to be one but after the merge stopped working and I removed it. @lezcano to investigate. - https://github.com/pytorch/pytorch/pull/106431#issuecomment-1667079543. @voznesenskym to submit a fix after we merge. All the tests in `tests/torch_np` take about 75s to run. This was a work by @ev-br, @rgommers @honno and I. I did not create this PR via ghstack (which would have been convenient) as this is a collaboration, and ghstack doesn't allow for shared contributions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106211 Approved by: https://github.com/ezyang	2023-08-11 00:39:32 +00:00
Jouni K. Seppänen	186352a625	[inductor] Make autotune_process.py pass mypy (#105791 ) `TensorMeta.from_irnodes` handles either a single `IRNode` or a tuple or list of them. I tried to express this with overloading, but because this file is in MYPYNOFOLLOW, the `IRNode` subclasses become `Any`, which causes the overloads to be overlapping. This changes the type of the argument to `benchmark_in_sub_process` to the more specific `TritonTemplateCaller`, since that one has the `bmreq` member and existing docstrings indicate that only the triton template benchmark is handled. The `rand_strided` call caused a mypy error because the default value for device was a string. This is fixed by adding type hints to `rand_strided` in `torch/_dynamo/testing.py`. Likewise, the return value of `PyCodeCache.load_by_key_path` can be inferred from the type hint on `PyCodeCache.cache`. Fixes one part of #105230 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105791 Approved by: https://github.com/jansel, https://github.com/Skylion007	2023-07-31 23:58:38 +00:00
kshitij12345	920b446da9	dynamo: support disable_saved_tensors_hooks (#104869 ) Functorch transforms use this context manager which will lead to graph-breaks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104869 Approved by: https://github.com/zou3519	2023-07-26 07:27:37 +00:00
Edward Z. Yang	2385dad4b3	Enable automatic_dynamic_shapes by default (#103623 ) Some notes: * I now manually turn off `_generate` jobs from running with cudagraphs, as it is unrealistic to expect to cudagraph autoregressive generation up to max sequence length, this would imply compiling the entire unrolled sequence generation. Concretely, cm3leon_generate was timing out post this change, likely due to the compile time slowdown of dynamic shapes ON TOP OF accidentally unrolling all the loops * A few torch._dynamo.reset tactically inserted to force recompiles on tests that expected it * expectedFailureAutomaticDynamic flip into patching automatic_dynamic_shapes=False Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/103623 Approved by: https://github.com/voznesenskym	2023-07-05 00:25:02 +00:00
Animesh Jain	75dab587ef	[dynamo] FSDP + AC + torch.compile (#103953 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103953 Approved by: https://github.com/wanchaol	2023-06-24 01:40:56 +00:00
Edward Z. Yang	7ce932a92c	Add signpost_event to dynamic_shapes (#103882 ) Added two signpost_event calls to torch.fx.experimental.symbolic_shapes, one for produce_guards (where we can give stats like how many free symbols and how many guards produced) and the other is for evaluate_expr after freeze (so we can look for cases where we're improperly discarding guards in backwards.) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/103882 Approved by: https://github.com/Skylion007	2023-06-21 13:26:21 +00:00
Edward Z. Yang	ed3a61afcc	Add automatic_dynamic_shapes test configuration (#103598 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/103598 Approved by: https://github.com/Skylion007	2023-06-15 19:55:57 +00:00
Edward Z. Yang	bc6ec97e02	Switch dynamic_shapes to True by default (#103597 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/103597 Approved by: https://github.com/voznesenskym	2023-06-15 15:16:20 +00:00
Michael Voznesensky	aece6705d1	Move locals/globals to output graph, make it easier to access them anywhere (#103456 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103456 Approved by: https://github.com/jansel	2023-06-14 20:04:33 +00:00

1 2

86 Commits