pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-21 05:34:18 +08:00

Author	SHA1	Message	Date
Jason Ansel	01ec8df6d8	[Compiled Autograd] Introduce BackwardState capture (#120382 ) This adds support for backwards hooks that are both: 1) Interior to the graph; and 2) Dynamically generated (e.g. lambdas) We do this by creating a BackwardState object that is used to register the hooks in the forward, then populated by dynamo after the forwards runs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120382 Approved by: https://github.com/xmfan	2024-02-28 20:36:47 +00:00
soulitzer	55483fc2c9	Min-cut partitioner always saves tensors that are returned as-is in backward (#114970 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114970 Approved by: https://github.com/Chillee	2024-02-13 00:04:41 +00:00
Yuzhen Huang	de6a906093	Expose aggressive_recomputation as an inductor config (#118943 ) Summary: As title. We found aggressive_recomputation shows memory savings (7% on APS COFFEE model) with 2% QPS loss. It also gives very promising signal on our auto ac experiments: https://docs.google.com/document/d/1S2qgMg1CwAQ4U1Ffuk2epbEOx06ogZhioX2jKCwL7ZQ/edit {F1426175073} Test Plan: APS COFFEE from silverlakeli - Zoom of baseline job: https://www.internalfb.com/intern/zoomer/?profiling_run_fbid=927380488801910&tab=overview - Zoom of job with aggressive_recomputation: https://www.internalfb.com/intern/zoomer/?profiling_run_fbid=1126815608217470&tab=overview APS 1100x shrunk version: - baseline: https://www.internalfb.com/mast/job/aps-yuzhenhuang-afe049505a - test: https://www.internalfb.com/mast/job/aps-yuzhenhuang-709e41bf0d Memory from 42.98% -> 41.04%. Reviewed By: yf225, yuxihu, silverlakeli, richqyz Differential Revision: D53248057 Pull Request resolved: https://github.com/pytorch/pytorch/pull/118943 Approved by: https://github.com/anijain2305, https://github.com/yanboliang	2024-02-03 00:17:03 +00:00
Aaron Gokaslan	1562dae62c	[BE]: Apply RUF025 dict.fromkeys preview rule (#118637 ) Simplifies and optimizes dict construction using the `fromkeys` classmethod ctor. This also makes it really obvious when all the keys will have the same static value, which could be a bug if unintentional. It is also significantly faster than using a dict comprehension. The rule is in preview, but I am adding a forward fix for when it becomes stable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/118637 Approved by: https://github.com/albanD	2024-01-30 20:46:54 +00:00
Shunting Zhang	fe10b1800f	LazyGraphModule (#117911 ) I feel it's easier to open a new PR rather than iterating on the previous PR (https://github.com/pytorch/pytorch/pull/105257 ) since this is more like a rewrite. In this PR, instead of changing GraphModule directly which can easily causes BC issue, I create a LazyGraphModule class as Zachary & Jason suggested in comments from the previous PR. The difference between LazyGraphModule and GraphModule is mainly about how re-compile for the graph module happens. In GraphModule the recompilation happens 'eagerly': constructing a GraphModule will cause the recompilation. While in LazyGraphModule, we just mark the module as needing recompilation. The real recompilation only happens when absolutely required (e.g. call forward method, access the code property etc.). In a lot of cases in torch.compile, the real recompilation eventually is not triggered at all. This can save a few seconds of compilation time. By default, GraphModule rather than LazyGraphModule is used. `use_lazy_graph_module(True)` context manager can be used to pick LazyGraphModule instead. This has been applied to the torch.compile stack. Pull Request resolved: https://github.com/pytorch/pytorch/pull/117911 Approved by: https://github.com/jansel	2024-01-27 04:10:18 +00:00
Edward Z. Yang	9bce208dfb	Replace follow_imports = silent with normal (#118414 ) This is a lot of files changed! Don't panic! Here's how it works: * Previously, we set `follow_imports = silent` for our mypy.ini configuration. Per https://mypy.readthedocs.io/en/stable/running_mypy.html#follow-imports, what this does is whenever we have an import to a module which is not listed as a file to be typechecked in mypy, we typecheck it as normal but suppress all errors that occurred in that file. * When mypy is run inside lintrunner, the list of files is precisely the files covered by the glob in lintrunner.toml, but with files in excludes excluded. * The top-level directive `# mypy: ignore-errors` instructs mypy to typecheck the file as normal, but ignore all errors. * Therefore, it should be equivalent to set `follow_imports = normal`, if we put `# mypy: ignore-errors` on all files that were previously excluded from the file list. * Having done this, we can remove the exclude list from .lintrunner.toml, since excluding a file from typechecking is baked into the files themselves. * torch/_dynamo and torch/_inductor were previously in the exclude list, because they were covered by MYPYINDUCTOR. It is not OK to mark these as `# mypy: ignore-errors` as this will impede typechecking on the alternate configuration. So they are temporarily being checked twice, but I am suppressing the errors in these files as the configurations are not quite the same. I plan to unify the configurations so this is only a temporary state. * There were some straggler type errors after these changes somehow, so I fixed them as needed. There weren't that many. In the future, to start type checking a file, just remove the ignore-errors directive from the top of the file. The codemod was done with this script authored by GPT-4: ``` import glob exclude_patterns = [ ... ] for pattern in exclude_patterns: for filepath in glob.glob(pattern, recursive=True): if filepath.endswith('.py'): with open(filepath, 'r+') as f: content = f.read() f.seek(0, 0) f.write('# mypy: ignore-errors\n\n' + content) ``` Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/118414 Approved by: https://github.com/thiagocrepaldi, https://github.com/albanD	2024-01-27 02:44:11 +00:00
Will Feng	670e7992fd	[Easy] Document AGGRESSIVE_RECOMPUTATION flag in min-cut partitioner (#114007 ) As titled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114007 Approved by: https://github.com/wanchaol	2024-01-04 05:05:08 +00:00
Aaron Gokaslan	bd10fea79a	[BE]: Enable F821 and fix bugs (#116579 ) Fixes #112371 I tried to fix as many of the bugs as I could, a few I could not figure out what the proper fix for them was though and so I left them with noqas. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116579 Approved by: https://github.com/ezyang	2024-01-01 08:40:46 +00:00
Aaron Gokaslan	ee5d981249	[BE]: Enable RUFF PERF402 and apply fixes (#115505 ) * Enable PERF402. Makes code more efficient and succinct by removing useless list copies that could be accomplished either via a list constructor or extend call. All test cases have noqa added since performance is not as sensitive in that folder. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115505 Approved by: https://github.com/malfet	2023-12-20 18:01:24 +00:00
Yang Chen	5c3f03e2dd	[inductor] add a config to specify the shape attribute for the generated svg graphs (#114811 ) We draw our fx graphs with the "record" shape attribute by default. Sometimes, when the graph is very complex, we may hit dot errors like below: "flat edge between adjacent nodes one of which has a record shape - replace records with HTML-like labels" and thus fail to generate a graph. So, let's give the user an option to specify the shape attribute for the dot graph. For example, passing INDUCTOR_DOT_GRAPH_SHAPE_SVG = "none" would let us generate HTML-like lables to workaround the above failure. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114811 Approved by: https://github.com/weifengpy	2023-11-30 06:10:37 +00:00
Aaron Gokaslan	b7b2178204	[BE]: Remove useless lambdas (#113602 ) Applies PLW0108 which removes useless lambda calls in Python, the rule is in preview so it is not ready to be enabled by default just yet. These are the autofixes from the rule. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113602 Approved by: https://github.com/albanD	2023-11-14 20:06:48 +00:00
Jez Ng	e6f0960762	[inductor] Make debug.py pass follow-imports typechecking (#113307 ) pydot accepts both a str and a list of str for its `prog` parameter. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113307 Approved by: https://github.com/Skylion007 ghstack dependencies: #113304, #113305, #113306	2023-11-09 22:08:17 +00:00
Edward Z. Yang	1f3fa13f0a	Handle unbacked SymInt sized outputs in AOTAutograd (#113159 ) Thanks aakhundov for constructing the test case. This PR was constructed by running the failing test case, and then fixing problems until we got all the way to the end. There are a few distinct fixes: * AOTAutograd performs equality tests on tensor metadata to determine if a metadata mutation had occurred. If we test i0 vs i1, we should report these are NOT equal, since obviously we have somehow resized the tensor from i0 to i1 (even if, on a particular run, it is possible i0 == i1). * There's a sketchy fix for `test_aot_autograd_exhaustive_matmul_cpu_float32` where we check if the output shape equals the tangent shape. Unfortunately, the same `definitely_true` treatment does not work here, it still fails on the example. I piled an extra sketchy fix on top of it, where I just try my best to avoid doing the view. Maybe we should have some sort of logging here. * Partitioner needs to get out a size for unbacked SymInt when partitioning. I just feed it a random heuristic value in this case, similar to how we've been dealing with this in Inductor. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113159 Approved by: https://github.com/aakhundov, https://github.com/bdhirsh	2023-11-08 04:28:38 +00:00
Aaron Gokaslan	8219bf051b	[BE]: Apply RUF015 to torch folder (#113025 ) Removes unnecessary allocations of iterators. There is a small chance this may have side effects as the entire iterator is no longer consumed, but this is a way more efficient method for retrieving the first element. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113025 Approved by: https://github.com/ezyang, https://github.com/malfet	2023-11-07 00:48:15 +00:00
Peter Bell	66c32d099a	Use `pytree.arg_tree_leaves` everywhere (#112394 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112394 Approved by: https://github.com/lezcano ghstack dependencies: #112391, #112392, #112393	2023-10-31 15:57:06 +00:00
Peter Bell	bbd5b935e4	Use `pytree.tree_leaves` everywhere (#112324 ) This changes all the instances I could find of `tree_flatten(...)[0]` or `x, _ = tree_flatten` to use `tree_leaves`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112324 Approved by: https://github.com/lezcano ghstack dependencies: #112327, #112323	2023-10-30 03:39:04 +00:00
lezcano	47ccf04885	Split SymNode into its own file (#112037 ) This PR: - Moves TrueDiv, LShift, RShift, IsNonOverlappingAndDenseIndicator to `_sympy.functions.py` - Moves SymNode to `fx.experimental.sym_node`. - This file does not have any SymPy dependencies at import time - It installs the magic methods in Sym{Bool,Int,Float}. - N.b. With this split, we may be able to move Sym{Bool,Int,Float} to this file, and remove quite a few of the hacks around these classes - Imports `sym_node` in `torch/__init__.py` rather than the whole `symbolic_shapes.py`. This breaks the import-time dependency between torch and SymPy Pull Request resolved: https://github.com/pytorch/pytorch/pull/112037 Approved by: https://github.com/peterbell10 ghstack dependencies: #112035, #112036	2023-10-26 23:32:27 +00:00
Kazuaki Ishizaki	6d7744ca46	Fix typo under torch/_functorch directory (#111067 ) This PR fixes typo the the of comments and exception messages in files under `torch/_functorch` directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111067 Approved by: https://github.com/Skylion007	2023-10-11 23:09:36 +00:00
chilli	f767a6c57a	Made pattern-matcher diagnostics lazily reported + added TORCH_COMPILE_CPROFILE (#110504 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110504 Approved by: https://github.com/mlazos, https://github.com/eellison ghstack dependencies: #110501	2023-10-05 15:47:30 +00:00
PyTorch MergeBot	1e4c0641ce	Revert "Made pattern-matcher diagnostics lazily reported + added TORCH_COMPILE_CPROFILE (#110504 )" This reverts commit 9648df1a6af8509ba2f5455a8465e0c67d0dd0c2. Reverted https://github.com/pytorch/pytorch/pull/110504 on behalf of https://github.com/PaliC due to temporarily will revert as it's causing problems with difftrain import ([comment](https://github.com/pytorch/pytorch/pull/110504#issuecomment-1749132253))	2023-10-05 15:28:23 +00:00
chilli	9648df1a6a	Made pattern-matcher diagnostics lazily reported + added TORCH_COMPILE_CPROFILE (#110504 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110504 Approved by: https://github.com/mlazos, https://github.com/eellison ghstack dependencies: #110501	2023-10-05 01:34:57 +00:00
chilli	e686341f64	Consider that ops can be fused into cat in the min-cut partitioner (#110501 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110501 Approved by: https://github.com/eellison	2023-10-05 01:34:57 +00:00
willfengg	772e104dfd	[inductor] visualize fused ops in svg graph (#107752 ) example usage * `TORCH_COMPILE_DEBUG=1 INDUCTOR_ORIG_FX_SVG=1 INDUCTOR_POST_FUSION_SVG=1 python trig.py`: show original fx node name, file, and code. see snapshot 2 where we have origin_0, 1, 2 * trig.py can be found in P816304818 Implementation * keep original fx graph in GraphLowering, ```self.orig_gm: torch.fx.GraphModule = gm.__copy__()``` * draw original fx graph with origins ir_post_fusion ```V.debug.draw_orig_fx_graph(self.orig_gm, self.scheduler.nodes)```. node.meta["buff_meta"] tracks buf_name <img width="350" alt="Screenshot 2023-08-29 at 12 40 24 PM" src="https://github.com/pytorch/pytorch/assets/134637289/c4e197cb-ab3b-4a09-a584-c1356376accb"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/107752 Approved by: https://github.com/mlazos	2023-09-21 08:03:05 +00:00
Animesh Jain	8b7b824dca	[inductor][ac] preserve recompute tags through pattern matching (#107742 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107742 Approved by: https://github.com/eellison	2023-08-25 03:48:26 +00:00
Elias Ellison	918df10198	[Easy] use dtype.itemsize in partitions (#107749 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107749 Approved by: https://github.com/davidberard98	2023-08-24 16:07:05 +00:00
vasiliy	61fe49b8ed	pt2: make aot_eager backend handle basic float8 operations (#107783 ) Summary: Reland of https://github.com/pytorch/pytorch/pull/107642 with a fix for tests on Windows. Makes aot_eager backend of torch.compile handle basic float8 operations. This is useful for float8 training UX. Test Plan: ``` python test/test_quantization.py -k test_pt2_traceable_aot_eager ``` Reviewers: Subscribers: Tasks: Tags: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/107783 Approved by: https://github.com/albanD	2023-08-23 18:10:53 +00:00
PyTorch MergeBot	5025fb9213	Revert "pt2: make aot_eager backend handle basic float8 operations (#107642 )" This reverts commit 24147a8e1c6855489c1669c612ff5cb1b09a09dd. Reverted https://github.com/pytorch/pytorch/pull/107642 on behalf of https://github.com/huydhn due to Sorry for reverting this, but it is failing Windows CPU test in trunk. The Windows failures on your PR looks related I think ([comment](https://github.com/pytorch/pytorch/pull/107642#issuecomment-1688999380))	2023-08-22 22:17:36 +00:00
vasiliy	24147a8e1c	pt2: make aot_eager backend handle basic float8 operations (#107642 ) Summary: Makes aot_eager backend of torch.compile handle basic float8 operations. This is useful for float8 training UX. Test Plan: ``` python test/test_quantization.py -k test_pt2_traceable_aot_eager ``` Reviewers: Subscribers: Tasks: Tags: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/107642 Approved by: https://github.com/albanD	2023-08-22 18:57:14 +00:00
Animesh Jain	0b11da0ccb	[partitioners][ac][dynamic] Fix output signature of fwd with symints (#105771 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105771 Approved by: https://github.com/Chillee	2023-07-22 03:04:11 +00:00
PyTorch MergeBot	5ab2b27353	Revert "Re-enable low memory dropout (#103330 )" This reverts commit f32593630bceed0eb51656304841d9f5de09ec7c. Reverted https://github.com/pytorch/pytorch/pull/103330 on behalf of https://github.com/davidberard98 due to large compilation time regression ([comment](https://github.com/pytorch/pytorch/pull/103330#issuecomment-1622304072))	2023-07-05 19:00:40 +00:00
Elias Ellison	f32593630b	Re-enable low memory dropout (#103330 ) On attention_is_all_you_need_pytorch: Perf: 1.526x -> 1.544x Memory: 1.00 -> 1.05x Fix for https://github.com/pytorch/pytorch/issues/102319, although I'm not sure all the perf is recovered. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103330 Approved by: https://github.com/jansel	2023-06-29 16:27:02 +00:00
PyTorch MergeBot	f7fdaf8191	Revert "Re-enable low memory dropout (#103330 )" This reverts commit 2d14395f176b38b8416c2713d285e5ae55695a5f. Reverted https://github.com/pytorch/pytorch/pull/103330 on behalf of https://github.com/malfet due to Lots of tests failed with 'prims' object has no attribute 'inductor_random' ([comment](https://github.com/pytorch/pytorch/pull/103330#issuecomment-1610691147))	2023-06-28 04:27:37 +00:00
Elias Ellison	2d14395f17	Re-enable low memory dropout (#103330 ) On attention_is_all_you_need_pytorch: Perf: 1.526x -> 1.544x Memory: 1.00 -> 1.05x Fix for https://github.com/pytorch/pytorch/issues/102319, although I'm not sure all the perf is recovered. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103330 Approved by: https://github.com/jansel	2023-06-28 03:13:41 +00:00
Animesh Jain	75dab587ef	[dynamo] FSDP + AC + torch.compile (#103953 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103953 Approved by: https://github.com/wanchaol	2023-06-24 01:40:56 +00:00
Animesh Jain	f9c64a1156	[debugging] aot_eager backend to use the min-cut partitioner (#103555 ) default_partitioner is kind of broken when it comes to memory footprint. Moving aot_eager to use min-cut partitioner is better debugging experience. One bad thing though would be that we will much lower speedup numbers, because min cut partitioner will try to recompute ops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103555 Approved by: https://github.com/eellison, https://github.com/jansel	2023-06-22 09:31:08 +00:00
Animesh Jain	b7ae40f4c8	[min-cut partitioner] Disable a heuristic if graph has recomputable ops (#103635 ) Removing this heuristic leads to major memory compression and speedup bump for activation-checkpointed models. Here is the data ![image](https://github.com/pytorch/pytorch/assets/13822661/64a491ab-173d-435a-b858-61b847fbb08b) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103635 Approved by: https://github.com/Chillee	2023-06-21 22:27:17 +00:00
Animesh Jain	df0505743f	[activation checkpointing] Tagging based min cut partitioner (#103357 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103357 Approved by: https://github.com/jansel	2023-06-14 20:15:43 +00:00
Animesh Jain	9c4fd72b53	[aot_autograd][functional_rng] Change calling convention (#102344 ) Key change - seed, offset are the last 2 args in both the fwd and bwd graphs Reason - The cudagraphs implementation in inductor currently relies on very simple ordering guarantees i.e. first n inputs are static for both fwd and bwd graphs. In the current implementation of functionalization of rng ops, this assumption is broken because the first 2 inputs are seed, offset. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102344 Approved by: https://github.com/eellison	2023-05-26 21:27:20 +00:00
Animesh Jain	c2093de5d9	[partitioner] fix for rng ops (#102123 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/102123 Approved by: https://github.com/Chillee	2023-05-25 00:35:07 +00:00
James Reed	d5169e7141	Use a stable ordering for saved values in functorch.default_partition (#100111 ) Previously, due to the use of the Python set data structure, the ordering of saved values (and how they would appear in the graph) was unstable and changed across runs, making it hard to debug downstream applications. Here we use a dict (with insertion-ordering semantics) to deduplicate values in a way that preserves ordering Pull Request resolved: https://github.com/pytorch/pytorch/pull/100111 Approved by: https://github.com/Skylion007	2023-05-02 05:14:31 +00:00
Animesh Jain	fdbc8625a1	Functionalization of torch.rand/rand_like ops (#97377 ) This PR introduces the functionalization of RNG ops. Key points are * Introduces a new `philox_rand` prim operator that accepts seed, offset. * Adds decompositions for random operators that use these philox_rand prims * Adds a PhiloxStateTracker to track the offset for each occurence of rand ops * Changes calling convention of AOT Autograd and adds <fwd_seed, fwd_base_offset> and <bwd_seed, bwd_base_offset> * Monkeypatches set_rng_state and get_rng_state while AOT Autograd tracing to record the rng state behavior * Raises assertion for CPU because CPU does not Philox RNG. Not dealt in this PR * dropout op - offset calculation is different * other distributions like normal, poisson etc * Inductor support * Cudagraph support * Dynamic shape support An example ~~~ class Custom(torch.autograd.Function): @staticmethod def forward(ctx, x): ctx.save_for_backward(x) a = torch.rand_like(x) * x a = torch.rand_like(x) * a return a @staticmethod def backward(ctx, grad_out): x, = ctx.saved_tensors return grad_out * torch.rand_like(grad_out) * torch.cos(x) ====== Forward graph 0 ====== def forward(self, fwd_seed_1: i64[], fwd_base_offset_1: i64[], primals_1: f32[16, 16]): # No stacktrace found for following nodes add: i64[] = torch.ops.aten.add.Tensor(fwd_base_offset_1, 0) philox_rand: f32[16, 16] = torch.ops.prims.philox_rand.default([16, 16], fwd_seed_1, add, [16, 1], device(type='cuda', index=0), torch.float32); add = None mul: f32[16, 16] = torch.ops.aten.mul.Tensor(philox_rand, primals_1); philox_rand = None add_1: i64[] = torch.ops.aten.add.Tensor(fwd_base_offset_1, 4); fwd_base_offset_1 = None philox_rand_1: f32[16, 16] = torch.ops.prims.philox_rand.default([16, 16], fwd_seed_1, add_1, [16, 1], device(type='cuda', index=0), torch.float32); fwd_seed_1 = add_1 = None mul_1: f32[16, 16] = torch.ops.aten.mul.Tensor(philox_rand_1, mul); philox_rand_1 = mul = None return [mul_1, primals_1] ====== Backward graph 0 ====== def forward(self, bwd_seed_1: i64[], bwd_base_offset_1: i64[], primals_1: f32[16, 16], tangents_1: f32[16, 16]): # No stacktrace found for following nodes add_2: i64[] = torch.ops.aten.add.Tensor(bwd_base_offset_1, 0); bwd_base_offset_1 = None philox_rand_2: f32[16, 16] = torch.ops.prims.philox_rand.default([16, 16], bwd_seed_1, add_2, [16, 1], device(type='cuda', index=0), torch.float32); bwd_seed_1 = add_2 = None mul_2: f32[16, 16] = torch.ops.aten.mul.Tensor(tangents_1, philox_rand_2); tangents_1 = philox_rand_2 = None cos: f32[16, 16] = torch.ops.aten.cos.default(primals_1); primals_1 = None mul_3: f32[16, 16] = torch.ops.aten.mul.Tensor(mul_2, cos); mul_2 = cos = None return [mul_3] ~~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/97377 Approved by: https://github.com/ezyang	2023-04-16 09:55:56 +00:00
Edward Z. Yang	039faf0dbf	Add invariant that all symbolic shapes must be bound in graph (#99089 ) Previously, we had a problem when partitioning forward-backward dynamic graphs, which is that we could end up with a backward graph that mentions a symbol in an input tensor (e.g., `f32[s0 + s1]`), but without this symbol being otherwise bound elsewhere. When this happens, we have no way of actually deriving the values of `s0` and `s1`. Our fix for this in https://github.com/pytorch/pytorch/pull/93059 was to just retrace the graph, so that s0 + s1 got allocated a new symbol s2 and everything was happy. However, this strategy had other problems, namely (1) we lost all information from the previous ShapeEnv, including guards and (2) we end up allocating a LOT of fresh new symbols in backwards. With this change, we preserve the same ShapeEnv between forward and backwards. How do we do this? We simply require that every symbol which may be present inside tensors, ALSO be a plain SymInt input to the graph. This invariant is enforced by Dynamo. Once we have done this, we can straightforwardly modify the partitioner to preserve these SymInt as saved for backwards, if they are needed in the backwards graph to preserve the invariant as well. This apparently breaks yolov3, but since everything else is OK I'm merging this as obviously good and investigating later. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99089 Approved by: https://github.com/voznesenskym	2023-04-16 01:48:19 +00:00
Aaron Gokaslan	597b558c51	[BE]: Update flake8 and plugins and fix bugs (#97795 ) Update flake8 and flake8-plugins in lintrunner to a modern version. Enables more checks and makes flake8 checks significantly faster. Added a few additional rule ignores that will need to be fixed in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97795 Approved by: https://github.com/alexsio27444, https://github.com/janeyx99, https://github.com/ezyang	2023-03-28 23:51:55 +00:00
Edward Z. Yang	02f6d14b97	Only allow SymInt across partitioner boundaries, and fixes (#96653 ) This PR does a few things all at once, as I needed to fix several bugs on the way here. The main goal of the PR is to fix the `'float' object has no attribute '_has_symbolic_sizes_strides'` error. The general idea is to heavily penalize non-SymInt but still SymNode cuts in the graph. This doesn't work for default partitioner, so essentially, dynamic shapes with default partitioner is not supported. While doing this, I had a fix a few other bugs in the partitioner: * SymNode operations weren't considered recomputable. But they are very cheap, go wild. * zeros_like wasn't considered recomputable, and this prevented some gradient formulas (e.g., for angle with real inputs) from successfully finding a cut at all * AOTAutograd tests use the default partitioner. I switch them to use min-cut partitioner... * ...but this reveals a bug where if we have nodes in backward outputs that don't depend on tangents, they never get assigned to the backward graph. I fix this by making the backward outputs mandatory to be in backwards. I have to be careful to filter out None backward outputs; those never participate in flow analysis! This causes some wobbling for the min-cut tests, but these seem legitimate: since we're now willing to recompute, the partitioner can reduce the number of SymInts it transmits by just doing some recompute in the backend. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96653 Approved by: https://github.com/ngimel	2023-03-14 18:30:56 +00:00
Edward Z. Yang	e9e6b3b6c5	[EASY] Add complex dtypes to partitioner (#96297 ) Also, delete some redundant dtype setting. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96297 Approved by: https://github.com/Chillee	2023-03-08 21:08:26 +00:00
Aaron Gokaslan	67d9790985	[BE] Apply almost all remaining flake8-comprehension checks (#94676 ) Applies the remaining flake8-comprehension fixes and checks. This changes replace all remaining unnecessary generator expressions with list/dict/set comprehensions which are more succinct, performant, and better supported by our torch.jit compiler. It also removes useless generators such as 'set(a for a in b)`, resolving it into just the set call. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94676 Approved by: https://github.com/ezyang	2023-02-12 01:01:25 +00:00
Aaron Gokaslan	3d82d8d0ed	[BE] Enable more flake8-comprehensions checks (#94601 ) I applied some flake8 fixes and enabled checking for them in the linter. I also enabled some checks for my previous comprehensions PR. This is a follow up to #94323 where I enable the flake8 checkers for the fixes I made and fix a few more of them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94601 Approved by: https://github.com/ezyang	2023-02-10 23:40:29 +00:00
Edward Z. Yang	dc70b00d0b	Track and record hint on SymNode and use when possible (#94201 ) Historically, we work out `size_hint` by working it out on the fly by doing a substitution on the sympy expression with the `var_to_val` mapping. With this change, we also maintain the hint directly on SymNode (in `expr._hint`) and use it in lieu of Sympy substitution when it is available (mostly guards on SymInt, etc; in particular, in idiomatic Inductor code, we typically manipulate Sympy expressions directly and so do not have a way to conveniently maintain hints.) While it's possible this will give us modest performance improvements, this is not the point of this PR; the goal is to make it easier to carefully handle unbacked SymInts, where hints are expected not to be available. You can now easily test if a SymInt is backed or not by checking `symint.node.hint is None`. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/94201 Approved by: https://github.com/voznesenskym	2023-02-09 00:00:44 +00:00
Aaron Gokaslan	8fce9a09cd	[BE]: pyupgrade Python to 3.8 - imports and object inheritance only (#94308 ) Apply parts of pyupgrade to torch (starting with the safest changes). This PR only does two things: removes the need to inherit from object and removes unused future imports. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-02-07 21:10:56 +00:00
Horace He	d6c3468f70	Don't allow recomputing a node that must be materialized in the backwards pass (#90896 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90896 Approved by: https://github.com/ezyang	2023-01-20 22:34:41 +00:00

1 2 3 4 5

208 Commits