pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-21 13:44:15 +08:00

Author	SHA1	Message	Date
PyTorch MergeBot	945bf78894	Revert "[BE] typing for decorators - fx/_compatibility (#131568 )" This reverts commit 193f62fde91ee20deb5ddcd9ff4593cd78d74c64. Reverted https://github.com/pytorch/pytorch/pull/131568 on behalf of https://github.com/clee2000 due to same as https://github.com/pytorch/pytorch/pull/131572#issuecomment-2254328359 but I clicked the wrong link by accident. This is where it actually starts ([comment](https://github.com/pytorch/pytorch/pull/131568#issuecomment-2254330781))	2024-07-28 03:43:39 +00:00
Aaron Orenstein	193f62fde9	[BE] typing for decorators - fx/_compatibility (#131568 ) See #131429 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131568 Approved by: https://github.com/justinchuby, https://github.com/oulgen, https://github.com/zou3519	2024-07-25 22:24:19 +00:00
Aaron Orenstein	634b62f111	typing proxy_tensor.py (#129182 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/129182 Approved by: https://github.com/Chillee	2024-07-12 23:17:09 +00:00
Simon Fan	d97dfe9313	[compiled autograd] move inputs to cuda with non_blocking=True (#129181 ) non_blocking=True requires first pinning, which shouldn't be a problem given that they are cpu scalars Pull Request resolved: https://github.com/pytorch/pytorch/pull/129181 Approved by: https://github.com/eellison, https://github.com/jansel ghstack dependencies: #127960, #128905, #128982, #128987	2024-06-21 08:16:33 +00:00
Simon Fan	fafa1867d1	[compiled autograd] use in_compiled_autograd_region instead of compiled_autograd_enabled_count (#128982 ) current implementation of compiled_autograd_enabled_count affects the entire region under the context manager. so if the context manager wraps torch.compile calls unrelated to the backward, they are affected too: - no lazy compile for compiled fw - no aot autograd cache for inference graphs we instead maintain a flag when we execute the compiled backward callable, to isolate the special handling to the compiled backward graph Pull Request resolved: https://github.com/pytorch/pytorch/pull/128982 Approved by: https://github.com/jansel ghstack dependencies: #127960, #128905	2024-06-21 08:16:33 +00:00
Will Feng	e3a39d49a0	[Traceable FSDP][Compiled Autograd] Add queue_callback() support (#126366 ) Adds support for `Variable._execution_engine.queue_callback()`, which is used in FSDP2. Important tests: - `pytest -rA test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_callback_graph_break_throws_error` - `pytest -rA test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_callback_adds_callback` - `PYTORCH_TEST_WITH_DYNAMO=1 python test/test_autograd.py -k TestAutograd.test_callback_adds_callback` Pull Request resolved: https://github.com/pytorch/pytorch/pull/126366 Approved by: https://github.com/xmfan	2024-06-18 06:22:14 +00:00
chilli	c486e2ab64	Add coloring to fx graph print out (#128476 ) Note: Won't land immediately, at least I'll need to add a color option to the field. But curious if any tests fail. Old: <img width="1294" alt="image" src="https://github.com/pytorch/pytorch/assets/6355099/c3a750ed-5e54-4621-b2e4-be5481be15b6"> New: <img width="1303" alt="image" src="https://github.com/pytorch/pytorch/assets/6355099/3a1f1adc-6f3a-413e-8b87-ee53da9bf4ed"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/128476 Approved by: https://github.com/ezyang	2024-06-13 23:39:04 +00:00
Aaron Orenstein	dcfa7702c3	Flip default value for mypy disallow_untyped_defs [1/11] (#127838 ) See #127836 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127838 Approved by: https://github.com/oulgen	2024-06-08 18:16:33 +00:00
Simon Fan	00c6ca4459	[compiled autograd][cudagraphs] Inputs runtime wrapper to move cpu scalars to cuda (#125382 ) Most commonly CPU scalars used for philox random seed. Right now, any cpu input will skip cudagraphing the entire graph. We need both the traced graph and the runtime inputs to be cudaified. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125382 Approved by: https://github.com/jansel	2024-06-07 07:12:46 +00:00
Simon Fan	be67985bd7	[compiled autograd] log in cpp using python logger (#126483 ) Internal infra may not preserve python and c++ log ordering e.g. MAST logs: https://fburl.com/mlhub/38576cxn, all the `[python_compiled_autograd.cpp] Creating cache entry [...]` logs of the entire run are at the beginning of the file Pull Request resolved: https://github.com/pytorch/pytorch/pull/126483 Approved by: https://github.com/jansel ghstack dependencies: #126144, #126146, #126148	2024-05-19 23:49:52 +00:00
Simon Fan	93524cf5ff	[compiled autograd] clear compiled_autograd_verbose once test is done (#126148 ) verbose flag leaks into tests ran after Pull Request resolved: https://github.com/pytorch/pytorch/pull/126148 Approved by: https://github.com/jansel ghstack dependencies: #126144, #126146	2024-05-16 22:23:02 +00:00
Simon Fan	4cd4463c1c	[compiled autograd] Fix LoggingTensor flaky test (#126144 ) LoggingTensor fails consistently when root logger level is INFO or lower By default, root logger should be WARNING But, triton driver initialization will overwrite root logger to INFO, which causes flakiness: https://github.com/pytorch/pytorch/issues/126143 Pull Request resolved: https://github.com/pytorch/pytorch/pull/126144 Approved by: https://github.com/jansel	2024-05-16 22:23:02 +00:00
Simon Fan	7e0edafe86	[compiled autograd][dynamo] improve lifted autograd.Function.backward handling and fallback to pseudo-eager (#125661 ) - `FakeContext` hides all fields other than ctx.saved_tensors, this dynamo errors when the autograd.Function.backward uses other attrs on ctx and it also doesn't allow fallback to eager. - If we remove it, we still can't fallback to eager: node variables are already freed (ctx.saved_tensors throws) - However, we can fallback to "pseudo-eager" by using a duck-typed ctx and routing the ctx.saved_tensors to lifted tensors - Dynamo tries to inline external_utils.call_backward, treats BackwardCFunction as a AutogradFunctionContextVariable (only used up until we create the fake context: FakeBackwardCFunction) - we call_function backward from the forward class AutogradFunctionVariable, and we still pass in the fake context as a UserDefinedObjectVariable (can later use AutogradFunctionContextVariable + HOO graph speculate) Fixes #125489 #124827 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125661 Approved by: https://github.com/jansel	2024-05-08 21:00:37 +00:00
PyTorch MergeBot	7ffa5558ee	Revert "[FX] Update type hints in `torch.fx._compatibility.py` (#125469 )" This reverts commit 235b4d6ec22ddac35b2e47b7e871ef10538d4aee. Reverted https://github.com/pytorch/pytorch/pull/125469 on behalf of https://github.com/izaitsevfb due to breaks pyre in dependent projects (internal: see D56986361) ([comment](https://github.com/pytorch/pytorch/pull/125469#issuecomment-2096665396))	2024-05-06 18:36:43 +00:00
Aaron Gokaslan	1dd42e42c4	[BE]: Try TCH autofixes on torch/ (#125536 ) Tries TCH autofixes and see what breaks Pull Request resolved: https://github.com/pytorch/pytorch/pull/125536 Approved by: https://github.com/ezyang	2024-05-05 23:13:59 +00:00
Xuehai Pan	235b4d6ec2	[FX] Update type hints in `torch.fx._compatibility.py` (#125469 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/125469 Approved by: https://github.com/Skylion007 ghstack dependencies: #125468	2024-05-05 19:30:22 +00:00
Simon Fan	43a7ab2a21	[compiled autograd] introduce verbose logs, add autograd node info to graph (#124954 ) - sets it as a fake stack trace as we don't have a generic comment feature - when verbose is disabled, still adds a contextmanager and flag checks. the alternative is to use MACROS, but that wouldn't be usable with TORCH_LOGS Pull Request resolved: https://github.com/pytorch/pytorch/pull/124954 Approved by: https://github.com/jansel	2024-04-27 01:10:37 +00:00
Simon Fan	d274d57037	[compiled autograd][dynamo] Make compiled graph take in boxed inputs (#122353 ) ### Context In today's Dynamo, we lift all tensors encountered during tracing to be individual graph inputs, even when they were in a container. And [Dynamo generates](`fdc281f258/torch/_dynamo/codegen.py (L371)`) the runtime function's signature using the graph's graphargs. This means that the generated function will have each grapharg as an argument, which is problematic if we want to free the inputs in inductor codegen. See [python function arguments are kept alive for the duration of the function call](https://github.com/pytorch/pytorch/pull/83137#issuecomment-1211320670). ```python # original code def forward(inputs): a, b, c, d, e = inputs inputs.clear() out = a out += b del b # frees memory out += c del c # frees memory out += d del d # frees memory out += e del e # frees memory return out # compiled code: def forward(a, b, c, d, e): # b, c, d, e can't be freed before end of function ``` This isn't a concern when compiling forward because a, b, c, d, e are all from user code, and should be kept alive. But when compiling backwards, a, b, c, d, e may be intermediate results i.e. activations, that we DO want to clear ASAP to remain on par with eager peak memory. ### Solution We have encountered similar memory problems in AOTAutograd before, where we adopted the boxed calling convention (wrapping to-be-freed objects in a list), adding list clearing to inductor codegen, and being careful about holding references to elements in the input list. We need to do something similar, but for inputs from the user program (compiled autograd fx graph in this case). This PR support lists as graphargs/placeholder nodes. When tracing a list of tensors, we create a node for it, and pre-emptively initialize variable trackers for its elements before they are used in the user program. Subsequent uses of those variables will find hits in the lookup table `input_source_to_var`. With the inputs as a list in the graph args, our compiled code can free inputs just like in the eager case. ```python def forward(inputs): # a, b, c, d, e can be freed within the function now ``` Currently, AOT/Inductor flattens list input via [flatten_graph_inputs wrapper](`597f479643/torch/_inductor/compile_fx.py (L1454-L1478)`), which is why this PR's CI can be green. Additional changes are needed to its runtime wrapper, done in the next PR. The next step is to ensure that we are careful in forwarding the list to inductor codegen without holding additional references. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122353 Approved by: https://github.com/jansel ghstack dependencies: #123630, #123674	2024-04-12 10:29:09 +00:00
Oguz Ulgen	287680176b	Use graph.find_nodes in dynamo (#122257 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122257 Approved by: https://github.com/jansel ghstack dependencies: #121565, #122255, #122256	2024-04-07 18:51:18 +00:00
Jason Ansel	18d94d7165	Make FX nodes sortable (#122071 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122071 Approved by: https://github.com/oulgen	2024-03-19 01:40:56 +00:00
Jason Ansel	040b925753	[Compiled Autograd] Reorder accumulate grad nodes (#121735 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121735 Approved by: https://github.com/xmfan	2024-03-16 04:29:56 +00:00
Tugsbayasgalan Manlaibaatar	c646030cd2	Support higher order op functionalization in predispatch IR (#115314 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115314 Approved by: https://github.com/bdhirsh	2024-03-01 09:13:47 +00:00
Jason Ansel	01ec8df6d8	[Compiled Autograd] Introduce BackwardState capture (#120382 ) This adds support for backwards hooks that are both: 1) Interior to the graph; and 2) Dynamically generated (e.g. lambdas) We do this by creating a BackwardState object that is used to register the hooks in the forward, then populated by dynamo after the forwards runs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120382 Approved by: https://github.com/xmfan	2024-02-28 20:36:47 +00:00
Edward Z. Yang	1a1fc1047d	Add structured trace logs (#120289 ) Overall design: https://docs.google.com/document/d/1CX_hJ0PNy9f3R1y8TJrfkSeLkvGjjjLU84BSXgS2AZ8/edit How to read the diff: * Most files are me augmenting pre-existing logging with structured variants. For the most part it's simple (esp FX graphs, which have a canonical string representation); it gets more complicated when I decided to JSON-ify some data structure instead of keeping the ad hoc printing (notably, guards and dynamo output graph sizes) * torch/_functorch/_aot_autograd/collect_metadata_analysis.py is some unrelated fixes I noticed while auditing artifact logs * torch/_logging/_internal.py has the actual trace log implementation. The trace logger is implement as a logger named torch.__trace which is disconnected from the logging hierarchy. It gets its own handler and formatter (TorchLogsFormatter with _is_trace True). `trace_structured` is the main way to emit a trace log. Unusually, there's a separate "metadata" and "payload" field. The metadata field should not be too long (as it is serialized as a single line) and is always JSON (we put contextual things like compile id in it); the payload field can be long and is emitted after the metadata log line and can span multiple lines. * torch/_logging/structured.py contains some helpers for converting Python data structures into JSON form. Notably, we have a string interning implementation here, which helps reduce the cost of serializing filenames into the log. * test/dynamo/test_structured_trace.py the tests are cribbed from test_logging.py, but all rewritten to use expect tests on munged versions of what we'd actually output. Payloads are never tested, since they tend not be very stable. https://github.com/ezyang/tlparse is a POC Rust program that can interpret these logs. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/120289 Approved by: https://github.com/Skylion007 ghstack dependencies: #120712	2024-02-28 01:01:41 +00:00
PyTorch MergeBot	f3dd2a544c	Revert "Add structured trace logs (#120289 )" This reverts commit 9dfaef962cda5f65eec53e5fd6f07b5226ea65cb. Reverted https://github.com/pytorch/pytorch/pull/120289 on behalf of https://github.com/kit1980 due to breaking internal builds, see D54230697 ([comment](https://github.com/pytorch/pytorch/pull/120289#issuecomment-1967477120))	2024-02-27 19:49:05 +00:00
Edward Z. Yang	9dfaef962c	Add structured trace logs (#120289 ) Overall design: https://docs.google.com/document/d/1CX_hJ0PNy9f3R1y8TJrfkSeLkvGjjjLU84BSXgS2AZ8/edit How to read the diff: * Most files are me augmenting pre-existing logging with structured variants. For the most part it's simple (esp FX graphs, which have a canonical string representation); it gets more complicated when I decided to JSON-ify some data structure instead of keeping the ad hoc printing (notably, guards and dynamo output graph sizes) * torch/_functorch/_aot_autograd/collect_metadata_analysis.py is some unrelated fixes I noticed while auditing artifact logs * torch/_logging/_internal.py has the actual trace log implementation. The trace logger is implement as a logger named torch.__trace which is disconnected from the logging hierarchy. It gets its own handler and formatter (TorchLogsFormatter with _is_trace True). There's a teensy bit of FB specific code to automatically enable trace logging if a /logs directory exists. `trace_structured` is the main way to emit a trace log. Unusually, there's a separate "metadata" and "payload" field. The metadata field should not be too long (as it is serialized as a single line) and is always JSON (we put contextual things like compile id in it); the payload field can be long and is emitted after the metadata log line and can span multiple lines. * torch/_logging/structured.py contains some helpers for converting Python data structures into JSON form. Notably, we have a string interning implementation here, which helps reduce the cost of serializing filenames into the log. * test/dynamo/test_structured_trace.py the tests are cribbed from test_logging.py, but all rewritten to use expect tests on munged versions of what we'd actually output. Payloads are never tested, since they tend not be very stable. https://github.com/ezyang/tlparse is a POC Rust program that can interpret these logs. Testing that the fbcode detection works at https://www.internalfb.com/mlhub/pipelines/runs/fblearner/534553450 (Meta-only) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/120289 Approved by: https://github.com/Skylion007	2024-02-27 00:04:23 +00:00
Shunting Zhang	a72190fd51	make nanogpt work with both compiled autograd and _LazyGraphModule (#118981 ) @xmfan and @fegin reported that _LazyGraphModule ( https://github.com/pytorch/pytorch/pull/117911 ) makes nanogpt training fail with compiled autograd. We have a repro: ``` python benchmarks/dynamo/torchbench.py --training --backend=inductor --disable-cudagraphs --accuracy --only nanogpt --repeat 1 --compiled-autograd ``` but it's still mysterious how to trigger the issue with a toy model. The error message for the failure is https://gist.github.com/shunting314/6402a6388b3539956090b6bc098952fb . In compile_fx we will call `detect_fake_mode`. This function will look for an active FakeTensorMode from both TracingContext and example inputs. The error is triggered because we find different FakeTensorMode from these 2 sources. Although I don't know what really causes the discrepancy of FakeTensorMode above, the fix here is to force _LazyGraphModule recompilation if we have compiled autograd enabled. This does not hurt compilation time most of the time because we anyway will call the graph module here in the backward pass when compiled autograd is enabled: `855d5f144e/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py (L705)` Let me know if we can have a better fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/118981 Approved by: https://github.com/jansel	2024-02-05 10:40:06 +00:00
Edward Z. Yang	d03173e88c	Unify MYPYINDUCTOR and MYPY (#118432 ) The original motivation for MYPYINDUCTOR was a faster type checking configuration that only checked a subset of files. With the removal of `follow_imports = ignore`, we are now able to use dmypy to do fast incremental typechecking, eliminating the need for this. Perhaps erroneously, when I tee'ed up this PR I elected to delete the `follow_imports = skip` designations in the mypy-inductor.ini. This lead to a number of extra type error suppressions that I manually edited. You will need to review. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/118432 Approved by: https://github.com/Skylion007 ghstack dependencies: #118414, #118418	2024-01-27 17:23:20 +00:00
suo	4057d005ff	Initial torchbind support in PT2 (#117697 ) This PR adds the bare minimum functionality to get torchbind working in an e2e testable way on PT2. It implements: * ProxyTensor support * Simple torch.export support (proxytensor-only path, e.g. non-strict). * add some tests exercising the path. Because all this is not fully baked, I hide the functionality behind a feature flag (`enable_torchbind_tracing()`) so it does not affect regular users for now. Still on the agenda: * Dynamo support * Actual FakeMode support * Mutability support Hoping to get this first bit in as a standalone, as it will unblock some more extensive experimentation/testing going on internally. Differential Revision: [D51825372](https://our.internmc.facebook.com/intern/diff/D51825372/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/117697 Approved by: https://github.com/SherlockNoMad	2024-01-19 06:28:20 +00:00
Simon Fan	9eb842cbd6	Compiled autograd: Lift autograd functions' backward and provide default key for custom autograd functions (#115573 ) This PR adds support for torch.autograd.Function subclasses in compiled autograd. We do this by: - Creating a uid for all torch.autograd.Function via its metaclass. This uid is used in the compiled autograd key, which is a subset of the cache key to the compiled graph - "Lifting" the backward/saved_tensors, having them as input arguments in the compiled graph - Creating proxies to track the backward's inputs and outputs. Since the backward's outputs (grads) have to match the forward's inputs, we pass the node's `input_info` (forward's input sizes) to build the proxies tracking the backward's outputs. - Use a `FakeContext` class as a replacement for the autograd node's context object (`BackwardCFunction`) during tracing, only support passing saved_tensors from the forward to the backward - Index each backward, to support multiple torch.autograd.Functions in the same graph - Special case for `CompiledFunctionBackward`, lifting CompiledFunction will fail 4 tests and requires some skipfiles changes that I'd rather do that in a separate PR Example graph: test_custom_fn_saved_multiple_tensors (eager fw + compiled autograd) ```python class MyFn(torch.autograd.Function): @staticmethod def forward(ctx, x, y): ctx.save_for_backward(x, y) return torch.sin(x), torch.sin(y) @staticmethod def backward(ctx, gO_x, gO_y): (x, y) = ctx.saved_tensors return gO_x * torch.cos(x), gO_y * torch.cos(y) ``` The backwards is lifted via `getitem_5` and `call_backward` ```python # Compiled autograd graph ===== Compiled autograd graph ===== <eval_with_key>.0 class CompiledAutograd(torch.nn.Module): def forward(self, inputs, sizes, hooks): # No stacktrace found for following nodes getitem: "f32[]" = inputs[0] getitem_1: "f32[10]" = inputs[1] getitem_2: "f32[10]" = inputs[2] getitem_3: "f32[10]" = inputs[3] getitem_4: "f32[10]" = inputs[4]; inputs = None expand: "f32[10]" = torch.ops.aten.expand.default(getitem, [10]); getitem = None mul: "f32[10]" = torch.ops.aten.mul.Tensor(expand, getitem_2); getitem_2 = None mul_1: "f32[10]" = torch.ops.aten.mul.Tensor(expand, getitem_1); expand = getitem_1 = None getitem_5 = hooks[0]; hooks = None call_backward = torch__dynamo_external_utils_call_backward(getitem_5, (getitem_3, getitem_4), mul_1, mul); getitem_5 = mul_1 = mul = None getitem_6: "f32[10]" = call_backward[0] getitem_7: "f32[10]" = call_backward[1]; call_backward = None accumulate_grad_ = torch.ops.inductor.accumulate_grad_.default(getitem_4, getitem_7); getitem_4 = getitem_7 = None accumulate_grad__1 = torch.ops.inductor.accumulate_grad_.default(getitem_3, getitem_6); getitem_3 = getitem_6 = None return [] ``` then is later inlined by dynamo ```python # Dynamo graph ===== __compiled_fn_0 ===== <eval_with_key>.1 class GraphModule(torch.nn.Module): def forward(self, L_inputs_0_ : torch.Tensor, L_inputs_1_ : torch.Tensor, L_inputs_2_ : torch.Tensor, L_inputs_3_ : torch.Tensor, L_inputs_4_ : torch.Tensor): getitem = L_inputs_0_ getitem_1 = L_inputs_1_ getitem_2 = L_inputs_2_ x = L_inputs_3_ y = L_inputs_4_ # File: <eval_with_key>.0:10, code: expand = torch.ops.aten.expand.default(getitem, [10]); getitem = None expand = torch.ops.aten.expand.default(getitem, [10]); getitem = None # File: <eval_with_key>.0:11, code: mul = torch.ops.aten.mul.Tensor(expand, getitem_2); getitem_2 = None mul = torch.ops.aten.mul.Tensor(expand, getitem_2); getitem_2 = None # File: <eval_with_key>.0:12, code: mul_1 = torch.ops.aten.mul.Tensor(expand, getitem_1); expand = getitem_1 = None mul_1 = torch.ops.aten.mul.Tensor(expand, getitem_1); expand = getitem_1 = None # File: /data/users/xmfan/core/pytorch/test/inductor/test_compiled_autograd.py:412, code: return gO_x * torch.cos(x), gO_y * torch.cos(y) cos = torch.cos(x) getitem_6 = mul_1 * cos; mul_1 = cos = None cos_1 = torch.cos(y) getitem_7 = mul * cos_1; mul = cos_1 = None # File: <eval_with_key>.0:17, code: accumulate_grad_ = torch.ops.inductor.accumulate_grad_.default(getitem_4, getitem_7); getitem_4 = getitem_7 = None accumulate_grad__default = torch.ops.inductor.accumulate_grad_.default(y, getitem_7); y = getitem_7 = None # File: <eval_with_key>.0:18, code: accumulate_grad__1 = torch.ops.inductor.accumulate_grad_.default(getitem_3, getitem_6); getitem_3 = getitem_6 = None accumulate_grad__default_1 = torch.ops.inductor.accumulate_grad_.default(x, getitem_6); x = getitem_6 = None return () ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/115573 Approved by: https://github.com/jansel	2024-01-10 18:01:28 +00:00
voznesenskym	3e4d14702a	On grad access, check if grad has changed and update stored example grad as needed (#112811 ) Fixes https://github.com/pytorch/pytorch/issues/112446 This is a doozy of a PR, there's a few important things to keep in mind here: 1) We MUST lift all tensors accessed via attrs to inputs, getattr is a no go in the graph, it violates the aot_autograd contract. Furthermore, aot_autograd does not know how to apply in-place ops to intermediary tensors that are attributes (aka from getattr) anyway. Views from ops are fine. 2) `.grad` access handling in dynamo peeks at the underlying value, the real tensor, because re-piping FakeTensors already made with this fake_mode through builder anew is a no go. 3) We have no proper mechanism for updating the hint / grapharg.example (the real value in (2) above) midway through trace Therefore, what we need to do is reconcile the difference in grad stashed on grapharg.example. The easiest way to do this is lazily, upon .grad access, by reading the new value off the right fake tensors. We can then make a tensor using that data as a hint to VariableBuilder to make the right VariableTracker. Note that the example value used here (torch.zeros) in the PR, is a dummy value only used as a tracing hint, it does not leak out into real runtime code. Alternatively, we could implement accumulate_grad_ in python... Pull Request resolved: https://github.com/pytorch/pytorch/pull/112811 Approved by: https://github.com/jansel	2023-11-08 05:45:00 +00:00
Jason Ansel	2964682490	[dynamo] Add LazyVariableTracker (#111306 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111306 Approved by: https://github.com/voznesenskym	2023-11-07 19:55:19 +00:00
voznesenskym	0f4d2904be	[dynamo] compiled_autograd support for post_acc_grad hooks (#112326 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112326 Approved by: https://github.com/jansel ghstack dependencies: #112325	2023-10-31 22:53:01 +00:00
Jez Ng	20fc2b4186	[dynamo] Enable typechecking for compiled_autograd.py (#112128 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112128 Approved by: https://github.com/Skylion007 ghstack dependencies: #111894, #111992, #112031, #112127	2023-10-27 06:18:58 +00:00
Michael Voznesensky	02f6a8126e	Support a simple subset of functions as backward hooks on intermediate tensors (#109537 ) The main thrust of the initial effort here was to capture `register_hook` calls on tensors in compile regions. The first part of this was done in https://github.com/pytorch/pytorch/pull/108903 wherein we added support for register_hook input tensors. The distinction between input and intermediary is due to implementation differences. There are 2 kinds of hooks: 1) Hooks on objects with sources (inputs, params) 2) Hooks on objects w/o sources (intermediaries, and outputs). Note: As outputs can be made simple by how dynamo handles residuals, they could actually be handled as if they were inputs, but, for the sake of this PR, we will refer to hooks as either hooks on inputs (sourced), or hooks on intermediaries (not sourced). The plan: For tensors w/ a source: (The PR above) We record registered hooks, store them as a global, and associate them with the tensor in residuals. This means that when dynamo goes to create the frame, where we produce bytecode to stitch together our PT2 modified bytecode with the original eager code, we call register_hook. This registration of hooks in residuals is sound because (a) it happens right after a Pt2 frame region ends and (b) we know that the tensor is alive in f_locals, f_globals, or a module in the users invoking frame. This means we can soundly know it will be around to invoke register_hook on. As long as we guard on the identity of the lifted function, this is sound to do. For tensors w/o a source: (This PR) Ostensibly, the most correct and complete solution would be to smuggle hooks into a runtime wrapper in aot_autograd, where all the items the hooks close over are lifted to inputs as necessary and passed alongside the user provided function. This is necessary so that we can properly trace out and capture all the mutations within the user defined hook at backwards time. This is too complicated - so, we limited the scope of this initial PR to a simple subset of hooks: - Hooks must have a source (be known to us already, not a lambda or intermediary defined function) - We must be tracing under compiled autograd The flow: We use the HOP added in https://github.com/pytorch/pytorch/pull/109690/files, referred to as the HOP below. 1) We intercept register_hook calls and wrap the user defined fn in the HOP 2) We write a `_register_hook_trampoline` to the graph that is a local no-arg function that is invoked as a call_function in the dynamo graph 3) aot_autograd inlines through it during its trace, and sees the HOP 4) the HOP preserves itself in the graph - it does not get traced into 5) During backwards, compiled_autograd installs the HOP under a hook call 6) When compiled_autograd enters compilation over its generated graph, dynamo traces the contents of the hook Pull Request resolved: https://github.com/pytorch/pytorch/pull/109537 Approved by: https://github.com/ezyang	2023-10-11 01:35:37 +00:00
Michael Voznesensky	55a204ebc8	[Easy] log graphs in compiled_autograd if TORCH_LOGS=compiled_autograd (#108991 ) [Easy] log graphs in compiled_autograd if TORCH_LOGS=compiled_autograd Pull Request resolved: https://github.com/pytorch/pytorch/pull/108991 Approved by: https://github.com/ezyang ghstack dependencies: #108846	2023-09-12 00:15:02 +00:00
Jason Ansel	a01e795a6d	[Compiled Autograd] Fix bug with multithreading check (#106621 ) Fixes #106555 There was bug where the multithreading check would fire because of the `compiled_autograd.disable()` calls in AotAutograd, even though compiled autograd was already disabled, so that call was doing nothing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106621 Approved by: https://github.com/yanboliang	2023-08-04 20:49:21 +00:00
Jason Ansel	2e02dfae9a	[Compiled Autograd] Fix handling of undefined gradients in hooks (#105813 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105813 Approved by: https://github.com/albanD	2023-07-28 15:59:35 +00:00
Jason Ansel	c902b84e0b	Compiled autograd (#103822 ) This branch: 1) converts the autograd tape into an FX graph 2) caches that conversion using a "shadow" graph 3) compiles and runs the generated FX graph instead of the normal autograd What works currently: 1) Caching, capture, and initial integration 2) Backwards hooks 3) Inlining AotAutograd generated subgraphs 4) torch.compiling the generated FX graph 5) Auto-detecting dynamic shapes based on changes Future work 1) Larger scale testing 1) Boxed calling convention, so memory can be freed incrementally 1) Support hooks on SavedTensor 1) Additional testing by running eager autograd tests under compiled_autograd.enable() Pull Request resolved: https://github.com/pytorch/pytorch/pull/103822 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-07-24 21:12:05 +00:00

1 2 3

139 Commits