pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Simon Fan	992857e286	Fix pre-dispatch AC HOP calling convention (#165145 ) For AC HOP, dynamo traces it without kwargs. (kwargs are only inputs to the HOP, not to the body) `55f01a48af/torch/_dynamo/variables/higher_order_ops.py (L2594-L2609)` When we add non-strict support, we should match this calling convention too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165145 Approved by: https://github.com/tugsbayasgalan ghstack dependencies: #164296, #164321, #164419, #164420, #164340, #163602, #164431, #164433, #164437	2025-10-12 02:28:21 +00:00
Tugsbayasgalan Manlaibaatar	91c211fb8c	AC should work with pre-dispatch IR (#164505 ) Previously we had to rely on turning off export verifier because the AC body was torch IR instead of aten IR. This PR makes it so that we create an IR that is export compatible. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164505 Approved by: https://github.com/ydwu4, https://github.com/xmfan	2025-10-06 11:05:22 +00:00
angelayi	0b59492853	[export] Fix wrap_with_set_grad_enabled retracing (#163295 ) Fixes https://github.com/pytorch/pytorch/issues/163294 The code `with torch.set_grad_enabled(enable_grad)` calls `torch._C._set_grad_enabled` three times -- (1) when [initializing set_grad_enabled](`bb7c9a2d41/torch/autograd/grad_mode.py (L187C9-L187C35)`), (2) when [entering the context](`bb7c9a2d41/torch/autograd/grad_mode.py (L194)`), and (3) when [exiting the context](`bb7c9a2d41/torch/autograd/grad_mode.py (L197)`). This results in the the retraced export module to have a duplicate `torch._C._set_grad_enabled` like: ``` def forward(self, arg0_1): add = torch.ops.aten.add.Tensor(arg0_1, 1); arg0_1 = None _set_grad_enabled = torch._C._set_grad_enabled(False); _set_grad_enabled = None _set_grad_enabled = torch._C._set_grad_enabled(False); _set_grad_enabled = None add_1 = torch.ops.aten.add.Tensor(add, 2); add = None _set_grad_enabled_1 = torch._C._set_grad_enabled(True); _set_grad_enabled_1 = None add_2 = torch.ops.aten.add.Tensor(add_1, 3); add_1 = None return (add_2,) ``` When export runs the `replace_set_grad_with_hop_pass`, it will look through the graph for `torch._C._set_grad_enabled` and create subgraphs. The duplicate `torch._C._set_grad_enabled` results in an empty submod in the graph, which resulted in an error in [this post](https://fb.workplace.com/groups/1028545332188949/posts/1844720036398281/?comment_id=1862175381319413). Pull Request resolved: https://github.com/pytorch/pytorch/pull/163295 Approved by: https://github.com/yushangdi	2025-09-21 22:54:40 +00:00
Simon Fan	be1612201d	[export] Support AC HOP in pre-dispatch (#161479 ) Adds the pre-dispatch handling for the AC hop. This lets the HOP pre-dispatch export without actually pre-dispatch tracing into it,. However, this is not sufficient to support AC in export: - because the HOP body will still be in torch IR, so it will fail export verifiers - the exported module also can't be ran in eager because the AC HOP relies on partitioner to embed RNG state saving/restoring So it must be lowered by AOT Autograd into post-dispatch first before being executed, It suffices for my purposes though. If users had checkpoint API use in their exported model, the behavior goes from silently incorrect to now be validation error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/161479 Approved by: https://github.com/ydwu4 ghstack dependencies: #161353	2025-08-28 01:46:25 +00:00
soulitzer	f2af30fee5	Add a HOP to bypass tracing of a wrapper function while tracing the wrapped function (#153487 ) Usage: ```python from torch._higher_order_ops.wrap import dynamo_bypassing_wrapper # Your ordinary function wrapper def my_hop_fn_impl(fn, args, k=1, kwargs): def wrapper(args, *kwargs): out = fn(args, *kwargs) if isinstance(out, tuple): return (out[0] + k,) return out + k return wrapper # Calling `my_hop_fn` instead of the impl directly captures a HOP into the dynamo graph def my_hop_fn(fn, args, k=1, *kwargs): return dynamo_bypassing_wrapper( functools.partial(my_hop_fn_impl, k=k), fn, args, **kwargs ) ``` Notes: - The dynamo captured graph now stashes arbitrary callable objects (the wrapper_fn) - this is equivalent to what SAC does today with policy_fn. - The `wrapper_fn` passed to `dynamo_bypassing_wrapper ` should have signature `Callable -> Callable` Pull Request resolved: https://github.com/pytorch/pytorch/pull/153487 Approved by: https://github.com/ydwu4	2025-05-22 04:24:38 +00:00
rzou	1e57154af3	Require that all HOPs be imported at `import torch` time (#145939 ) E.g. torch.ops.higher_order.cond does not exist until it is imported, which is bad if it shows up in an FX graph or is used in some code somewhere. This PR also makes some more HOPs get imported at `import torch` time. Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/145939 Approved by: https://github.com/ydwu4 ghstack dependencies: #145938	2025-01-29 22:27:52 +00:00
Oguz Ulgen	9abdc62065	Allow fx graph caching higher order operators (opt-in) (#135877 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135877 Approved by: https://github.com/zou3519	2024-09-24 17:23:09 +00:00
PyTorch MergeBot	e9bfbf78d5	Revert "Allow fx graph caching higher order operators (opt-in) (#135877 )" This reverts commit 66d5eb64e0be91680a8531ccb24f098554610d46. Reverted https://github.com/pytorch/pytorch/pull/135877 on behalf of https://github.com/jeanschmidt due to seems to have introduced regressions on rocm signals ([comment](https://github.com/pytorch/pytorch/pull/135877#issuecomment-2367616653))	2024-09-23 09:04:24 +00:00
Oguz Ulgen	66d5eb64e0	Allow fx graph caching higher order operators (opt-in) (#135877 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135877 Approved by: https://github.com/zou3519	2024-09-23 04:33:27 +00:00
Shangdi Yu	4a2cf50edf	[export][reland] Convert autocast to HOO (#132677 ) Summary: Reland of D60206382. Suggested in https://github.com/pytorch/pytorch/issues/128394. If there's an autocast context manager, the predispatch (strict) graph can look something like: ``` class <lambda>(torch.nn.Module): def forward(self, x: "f32[1]"): ... _enter_autocast = torch.amp.autocast_mode._enter_autocast('cuda', torch.bfloat16, True, None) mm: "f32[8, 8]" = torch.ops.aten.mm.default(rand, rand_1); rand = rand_1 = None _exit_autocast = torch.amp.autocast_mode._exit_autocast(_enter_autocast); _enter_autocast = None return (mm_1,) ``` But the operator `torch.amp.autocast_mode._enter_autocast` is not a valid ATen op. We remove these nodes by turning autocast into a higher order operator and make a submodule for the blocks between `_enter_autocast` and `_exit_autocast`. Some potential followup improvement: 1) Merge some of the duplicated logic with `replace_set_grad_with_hop_pass.py` 2) Check the current autocast status (any enabled? dtype?) and not create a submodule if the autocast args matches current autocast status. Test Plan: CI ``` buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:test_export -- -r "test_predispatch_autocast" buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:test_export -- -r "test_predispatch_set_grad" ``` Verified that now we can export the llama model in gh issue 128394 and the gemma model in gh issue 131829 without error. Differential Revision: D60770038 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132677 Approved by: https://github.com/angelayi	2024-08-05 22:34:52 +00:00
PyTorch MergeBot	a3ea96b762	Revert "[export] Convert autocast to HOO (#131914 )" This reverts commit aec948adfc224e49213c4bc49586d4e4ba65fbbb. Reverted https://github.com/pytorch/pytorch/pull/131914 on behalf of https://github.com/davidberard98 due to PR shouldn't have been relanded by the bot, phabricator diff did not have any recent changes and is still internally reverted ([comment](https://github.com/pytorch/pytorch/pull/131914#issuecomment-2269797388))	2024-08-05 19:52:09 +00:00
Shangdi Yu	aec948adfc	[export] Convert autocast to HOO (#131914 ) Summary: Suggested in https://github.com/pytorch/pytorch/issues/128394. If there's an autocast context manager, the predispatch (strict) graph can look something like: ``` class <lambda>(torch.nn.Module): def forward(self, x: "f32[1]"): ... _enter_autocast = torch.amp.autocast_mode._enter_autocast('cuda', torch.bfloat16, True, None) mm: "f32[8, 8]" = torch.ops.aten.mm.default(rand, rand_1); rand = rand_1 = None _exit_autocast = torch.amp.autocast_mode._exit_autocast(_enter_autocast); _enter_autocast = None return (mm_1,) ``` But the operator `torch.amp.autocast_mode._enter_autocast` is not a valid ATen op. We remove these nodes by turning autocast into a higher order operator and make a submodule for the blocks between `_enter_autocast` and `_exit_autocast`. Some potential followup improvement: 1) Merge some of the duplicated logic with `replace_set_grad_with_hop_pass.py` 2) Check the current autocast status (any enabled? dtype?) and not create a submodule if the autocast args matches current autocast status. Test Plan: CI ``` parsh --build-flags fbcode//mode/dev-nosan fbcode//caffe2/test:test_export run_tests("test_predispatch_autocast") ``` Reviewed By: angelayi Differential Revision: D60206382 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131914 Approved by: https://github.com/angelayi	2024-08-05 18:52:12 +00:00
PyTorch MergeBot	d984105748	Revert "[export] Convert autocast to HOO (#131914 )" This reverts commit b28c01d90d6575522d2240ce485d7dd87a7242aa. Reverted https://github.com/pytorch/pytorch/pull/131914 on behalf of https://github.com/ezyang due to Failing lint, but was covered up by master failure on lint ([comment](https://github.com/pytorch/pytorch/pull/131914#issuecomment-2267248773))	2024-08-04 02:10:35 +00:00
Shangdi Yu	b28c01d90d	[export] Convert autocast to HOO (#131914 ) Summary: Suggested in https://github.com/pytorch/pytorch/issues/128394. If there's an autocast context manager, the predispatch (strict) graph can look something like: ``` class <lambda>(torch.nn.Module): def forward(self, x: "f32[1]"): ... _enter_autocast = torch.amp.autocast_mode._enter_autocast('cuda', torch.bfloat16, True, None) mm: "f32[8, 8]" = torch.ops.aten.mm.default(rand, rand_1); rand = rand_1 = None _exit_autocast = torch.amp.autocast_mode._exit_autocast(_enter_autocast); _enter_autocast = None return (mm_1,) ``` But the operator `torch.amp.autocast_mode._enter_autocast` is not a valid ATen op. We remove these nodes by turning autocast into a higher order operator and make a submodule for the blocks between `_enter_autocast` and `_exit_autocast`. Some potential followup improvement: 1) Merge some of the duplicated logic with `replace_set_grad_with_hop_pass.py` 2) Check the current autocast status (any enabled? dtype?) and not create a submodule if the autocast args matches current autocast status. Test Plan: CI ``` parsh --build-flags fbcode//mode/dev-nosan fbcode//caffe2/test:test_export run_tests("test_predispatch_autocast") ``` Reviewed By: angelayi Differential Revision: D60206382 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131914 Approved by: https://github.com/angelayi	2024-08-03 05:48:57 +00:00
Oguz Ulgen	72d2dba992	Add None return type to init (#132335 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132335 Approved by: https://github.com/albanD	2024-08-01 15:26:45 +00:00
Xuehai Pan	e7eeee473c	[BE][Easy][14/19] enforce style for empty lines in import segments in `torch/_[a-c]/` and `torch/_[e-h]/` and `torch/_[j-z]*/` (#129765 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129765 Approved by: https://github.com/ezyang	2024-07-31 10:42:50 +00:00
Will Feng	ead97ee486	[Compile+SAC] Only warn for in-place ops once (#129397 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/129397 Approved by: https://github.com/tianyu-l	2024-06-26 07:25:02 +00:00
Will Feng	575bc1e3af	[Reopen #114036 ] Allow "must recompute" in torch.compile + selective checkpointing (SAC) (#129295 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/129295 Approved by: https://github.com/Chillee	2024-06-25 23:47:08 +00:00
Xuehai Pan	93a33bf3ac	[BE] update type annotations for basic utilities in `torch/__init__.py` (#129001 ) Changes: 1. Make some arguments positional-only as we only support Python 3.8+ 2. Clean up `torch.typename(obj)` implementation. 3. Update type annotations., especially `is_tensor()` and `is_masked_tensor()` using `TypeGuard`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129001 Approved by: https://github.com/malfet	2024-06-24 18:04:38 +00:00
PyTorch MergeBot	cb4919344a	Revert "[BE] update type annotations for basic utilities in `torch/__init__.py` (#129001 )" This reverts commit e53d9590287cbf97521f96d055910394f6e9a849. Reverted https://github.com/pytorch/pytorch/pull/129001 on behalf of https://github.com/XuehaiPan due to lint failure ([comment](https://github.com/pytorch/pytorch/pull/129001#issuecomment-2186944549))	2024-06-24 16:18:43 +00:00
Xuehai Pan	e53d959028	[BE] update type annotations for basic utilities in `torch/__init__.py` (#129001 ) Changes: 1. Make some arguments positional-only as we only support Python 3.8+ 2. Clean up `torch.typename(obj)` implementation. 3. Update type annotations., especially `is_tensor()` and `is_masked_tensor()` using `TypeGuard`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129001 Approved by: https://github.com/malfet	2024-06-24 14:35:41 +00:00
soulitzer	1877b7896c	[checkpoint] Clean up selective activation checkpoint and make public (#125795 ) ### bc-breaking for existing users of the private API: - Existing policy functions must now change their return value to be [CheckpointPolicy](`c0b40ab42e/torch/utils/checkpoint.py (L1204-L1230)`) Enum instead of bool. - To restore previous behavior, return `PREFER_RECOMPUTE` instead of `False` and `{PREFER,MUST}_SAVE` instead of `True` depending whether you prefer the compiler to override your policy. - Policy function now accepts a `ctx` object instead of `mode` for its first argument. - To restore previous behavior, `mode = "recompute" if ctx.is_recompute else "forward"`. - Existing calls to `_pt2_selective_checkpoint_context_fn_gen` must be renamed to `create_selective_checkpoint_contexts `. The way you use the API remains the same. It would've been nice to do something different (not make the user have to use functools.partial?), but this was the easiest to compile (idk if this should actually be a constraint). Related doc: https://docs.google.com/document/d/1BKyizkZPdri9mHqdDOLAUpkI7SbbKfLHRFVVpK9ZWqo/edit Memory considerations: - As with the existing SAC, cached values are cleared upon first use. - We error if the user wishes to backward a second time on a region forwarded with SAC enabled. In-place: - We use version counting to enforce that if any cached tensor has been mutated. In-place operations not mutating cached tensors are allowed. - `allow_cache_entry_mutation=True` can be passed to disable this check (useful in the case of auto AC where the user is cleverly also saves the output of the in-place) Randomness, views - Currently in this PR, we don't do anything special for randomness or views, the author of the policy function is expected to handle them properly. (Would it would be beneficial to error? - we either want to save all or recompute all random tensors) Tensor object preservation - ~We guarantee that if a tensor does not requires grad, and it is saved, then what you get out is the same tensor object.~ UPDATE: We guarantee that if a tensor is of non-differentiable dtype AND it is not a view, and it is saved, then what you get out is the same tensor object. This is a nice guarantee for nested tensors which care about the object identity of of the offsets tensor. Policy function - Enum values are `{MUST,PREFER}_{SAVE,RECOMPUTE}` (bikeshed welcome). Alternatively there was `{SAVE,RECOMPUTE}_{NON_,}OVERRIDABLE`. The former was preferred bc it seemed clearer that two `MUST` clashing should error, versus it is ambiguous whether two `NON_OVERRIDABLE` being stacked should silently ignore or error. - The usage of Enum today. There actually is NO API to stack SAC policies today. The only thing the Enum should matter for in the near term is the compiler. The stacking SAC policy would be useful if someone wants to implement something like simple FSDP, but it is not perfect because with a policy of `PREFER_SAVE` you are actually saving more than autograd would save normally (would be fixed with AC v3). - The number of times we call the policy_fn is something that should be documented as part of public API. We call the policy function for all ops except ~~detach~~ UPDATE : metadata ops listed in `torch.utils.checkpoint.SAC_IGNORED_OPS`) because these ops may be called a different number of times by AC itself between forward and recompute. - The policy function can be a stateful object (we do NOT make separate copies of this object for forward/recompute, the user is expected to handle that via is_recompute see below). Tensors guaranteed to be the same tensor as-is - Policy function signature takes ctx object as its first argument. The ctx function is an object encapsulating info that may be useful to the user, it currently only holds "is_recompute". Adding this indirection gives us flexibility to add more attrs later if necessary. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125795 Approved by: https://github.com/Chillee, https://github.com/fmassa	2024-06-18 18:18:50 +00:00
PyTorch MergeBot	6895a5804c	Revert "[checkpoint] Clean up selective activation checkpoint and make public (#125795 )" This reverts commit c472cec5656b9ffb668af97a02d711bdbdf5ebec. Reverted https://github.com/pytorch/pytorch/pull/125795 on behalf of https://github.com/soulitzer due to breaking torchtitan CI ([comment](https://github.com/pytorch/pytorch/pull/125795#issuecomment-2167036157))	2024-06-14 01:14:59 +00:00
soulitzer	c472cec565	[checkpoint] Clean up selective activation checkpoint and make public (#125795 ) Related doc: https://docs.google.com/document/d/1BKyizkZPdri9mHqdDOLAUpkI7SbbKfLHRFVVpK9ZWqo/edit Memory considerations: - As with the existing SAC, cached values are cleared upon first use. - We error if the user wishes to backward a second time on a region forwarded with SAC enabled. In-place: - We use version counting to enforce that if any cached tensor has been mutated. In-place operations not mutating cached tensors are allowed. - `allow_cache_entry_mutation=True` can be passed to disable this check (useful in the case of auto AC where the user is cleverly also saves the output of the in-place) Randomness, views - Currently in this PR, we don't do anything special for randomness or views, the author of the policy function is expected to handle them properly. (Would it would be beneficial to error? - we either want to save all or recompute all random tensors) Tensor object preservation - We guarantee that if a tensor does not requires grad, and it is saved, then what you get out is the same tensor object. If the tensor does require grad, we must detach to avoid creating a reference cycle. This is a nice guarantee for nested tensors which care about the object identity of of the offsets tensor. Policy function - Enum values are `{MUST,PREFER}_{SAVE,RECOMPUTE}` (bikeshed welcome). Alternatively there was `{SAVE,RECOMPUTE}_{NON_,}OVERRIDABLE`. The former was preferred bc it seemed clearer that two `MUST` clashing should error, versus it is ambiguous whether two `NON_OVERRIDABLE` being stacked should silently ignore or error. - The usage of Enum today. There actually is NO API to stack SAC policies today. The only thing the Enum should matter for in the near term is the compiler. The stacking SAC policy would be useful if someone wants to implement something like simple FSDP, but it is not perfect because with a policy of `PREFER_SAVE` you are actually saving more than autograd would save normally (would be fixed with AC v3). - The number of times we call the policy_fn is something documented part of public API. We call the policy function for all ops except detach because detach is itself called a different number of times by AC between forward and recompute. - The policy function can be a stateful object (we do NOT make separate copies of this object for forward/recompute, the user is expected to handle that via is_recompute see below). Tensors guaranteed to be the same tensor as-is - Policy function signature takes ctx object as its first argument. The ctx function is an object encapsulating info that may be useful to the user, it currently only holds "is_recompute". Adding this indirection gives us flexibility to add more attrs later if necessary. "bc-breaking" for existing users of the private API: - Existing policy functions must now change their return value to use the Enum. - Existing calls to `_pt2_selective_checkpoint_context_fn_gen` must be renamed to `gen_selective_checkpoint_context_fn`. The way you use the API remains the same. It would've been nice to do something different (not make the user have to use functools.partial?), but this was the easiest to compile (idk if this should actually be a constraint). Pull Request resolved: https://github.com/pytorch/pytorch/pull/125795 Approved by: https://github.com/Chillee, https://github.com/fmassa	2024-06-12 23:57:33 +00:00
Aaron Orenstein	ea614fb2b1	Flip default value for mypy disallow_untyped_defs [2/11] (#127839 ) See #127836 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127839 Approved by: https://github.com/oulgen	2024-06-08 18:23:08 +00:00
ydwu4	812f05d731	[export] add replace_set_grad_with_hop_pass (#119810 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/119810 Approved by: https://github.com/tugsbayasgalan ghstack dependencies: #119732, #119736	2024-02-17 02:18:19 +00:00
Will Feng	495054545c	Allow `preserve_rng_state=True` when torch.compile + selective checkpointing + CUDA (#113718 ) Fixes https://github.com/pytorch/pytorch/issues/113717. When `preserve_rng_state=True`, we let AOTAutograd trace through `torch.random.fork_rng` op, and the tracing doesn't work under CUDA, hence the original error reported in the issue. But since we are already doing RNG functionalization at Inductor level, we don't actually need to trace this `fork_rng` op. So we should just rewrite `preserve_rng_state` to False when we are using torch.compile (and let Inductor do its RNG functionalization which it's already been doing). Pull Request resolved: https://github.com/pytorch/pytorch/pull/113718 Approved by: https://github.com/wanchaol	2023-12-09 01:47:25 +00:00
Will Feng	b612e27221	[Easy] Fix typo in TagActivationCheckpoint comment (#113818 ) As titled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113818 Approved by: https://github.com/Chillee, https://github.com/bdhirsh	2023-11-16 06:06:09 +00:00
Will Feng	d52b9ba6a8	[torch.compile + selective checkpoint] Attach `context_fn` to the checkpointed graph module, fixing flaky tests (#112672 ) torch.compile + SAC unit test is causing adjacent unit tests to be flaky due to its modification of shared singleton object. This PR attaches the checkpoint context fn to the checkpointed GraphModule, and look it up during execution, avoiding the need to make the higher-order op stateful. Specifically, we attach the `context_fn` to the checkpointed GraphModule. These two will be gc'ed at the same time, so it satisfies the lifetime requirement. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112672 Approved by: https://github.com/wanchaol	2023-11-16 01:34:52 +00:00
Will Feng	3f3e353885	torch.compile + selective activation checkpointing (#105489 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105489 NOTE: this PR is tagged "not user facing", because it's not ready to be announced externally yet. This PR implements torch.compile + selective activation checkpoint (SAC) integration, by using `TagActivationCheckpoint` (same backend as torch.compile + full activation checkpoint integration). TorchDispatchMode based implementation cannot support including inplace ops in the checkpointed region at the moment (the reason for this needs investigation), and there is also no way to ban them (because TorchDispatchMode now only sees "after-functionalization" ops, so can't detect if an op is in-place). Hence we hide torch.compile + SAC behind a flag (`torch._dynamo.config._experimental_support_context_fn_in_torch_utils_checkpoint`) and will only use it internally for cases that are known to not have in-place ops. This state won't last too long, because in-place op will at least be able to be detected after Brian's mode reordering and related functionalization changes. So next steps after this PR: 1. Wait for Brian's mode reordering and related functionalization changes to land, and then try to enable the "inplace ops" unit test for torch.compile + selective activation checkpoint (if it doesn't work, investigate why). 2. Unify selective- and full-checkpoint under TorchDispatchMode based implementation. Differential Revision: D47497145 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105489 Approved by: https://github.com/anijain2305	2023-09-21 16:24:11 +00:00
Animesh Jain	735e6ae801	[dynamo] Maintainable code - Move decorators in a separate file (#105070 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105070 Approved by: https://github.com/ezyang	2023-07-13 07:41:19 +00:00
kshitij12345	90eaa98d13	dynamo : kwarg support for wrap (higher order op) (#104180 ) Ref: https://github.com/pytorch/pytorch/issues/100278 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104180 Approved by: https://github.com/zou3519	2023-07-11 06:08:18 +00:00
Richard Zou	280df5dc2e	[HigherOrderOp] Remove `_deprecated_global_ns` from some ops (#104105 ) The remaining ops after this PR are: - cond - map - anything that is out of tree. These are a bit more difficult to remove. Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/104105 Approved by: https://github.com/ydwu4	2023-06-28 00:03:29 +00:00
Richard Zou	618cc82e77	Stop Dynamo from peeking into wrap's body (#104076 ) When Dynamo sees `wrap(f, x)`, and it decides that `f` is unsafe, Dynamo should fall back to eager mode and stop introspection all the way throughout the call of `f`. The motivation is: - it's easier to test `wrap` this way (it is clearer how many graph breaks should occur) - Other HigherOrderOperator do this because their execution of the body involves code that is not necessarily Dynamo-able. e.g. functorch transforms. Since `wrap` is a test for the HigherOrderOp mechanism, it should reflect what other HigherOrderOps do. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104076 Approved by: https://github.com/ydwu4	2023-06-26 17:16:51 +00:00
Animesh Jain	75dab587ef	[dynamo] FSDP + AC + torch.compile (#103953 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103953 Approved by: https://github.com/wanchaol	2023-06-24 01:40:56 +00:00
rzou	036cda415f	Change HigherOrderOperator default namespace from global to 'higher_order' (#103870 ) This PR changes the default namespace for higher order operators from the global namespace (e.g. torch.ops.cond) to `higher_order` (e.g. torch.ops.higher_order.cond). We don't actually change the namespace for existing HigherOrderOperators. The motivation is to stem the bleeding; exposing operators into the global namespace is a bad idea due to name collision with other user-defined namespaces. We will go in and fix the `_deprecated_global_ns` as necessary after this diff. Differential Revision: [D46809738](https://our.internmc.facebook.com/intern/diff/D46809738/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103870 Approved by: https://github.com/ydwu4	2023-06-20 19:10:55 +00:00
Animesh Jain	bd0ed940b7	[activation checkpoint][dynamo] Wrap AC into Tag based higher order op (#102935 ) These are the numbers with this PR ![image](https://github.com/pytorch/pytorch/assets/13822661/63e991d5-80e2-4e94-8e4b-243621c3990e) There are 3 main followups * A naive partitioner gives better memory footprint than min-cut partitioner here. Currently, we are using min-cut partitioner. Waiting for @Chillee to discuss this further to either modify min-cut or add a naive partitioner. * aot_eager is < 1x memory footprint. This is true even for non AC models. This could hide some inefficiency somewhere. * inductor is giving very different memory numbers between AOT-traced-AC (duplicate early) vs this implementation. This leads to some inefficiency in inductor that we need to resolve. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102935 Approved by: https://github.com/jansel	2023-06-14 20:15:43 +00:00
Animesh Jain	2fa1b563da	[dynamo] Activation checkpoint higher order ops - Reland 101028 (#101790 ) https://github.com/pytorch/pytorch/pull/101028 was reverted due to internal breakage. Relanding. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101790 Approved by: https://github.com/zou3519	2023-05-18 19:09:14 +00:00
PyTorch MergeBot	d0db7d624d	Revert "[dynamo] Activation checkpointing as higher order op (#101028 )" This reverts commit de15e740a1f1cf0f267bb77ef851522ce2ab4674. Reverted https://github.com/pytorch/pytorch/pull/101028 on behalf of https://github.com/jeanschmidt due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/101028#issuecomment-1548280970))	2023-05-15 17:47:08 +00:00
Animesh Jain	de15e740a1	[dynamo] Activation checkpointing as higher order op (#101028 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101028 Approved by: https://github.com/voznesenskym, https://github.com/zou3519	2023-05-12 03:17:41 +00:00
Richard Zou	3d10e748e7	[Reland] Initial version of Dynamo capture for HigherOrderOperator (#100544 ) Original PR #99988 The problem was that we added `wrap` to torch._ops which actually puts it on `torch.ops.wrap` which is a namespace that can be open-registered to. The fix is that we now shove `wrap` into a new file Pull Request resolved: https://github.com/pytorch/pytorch/pull/100544 Approved by: https://github.com/voznesenskym	2023-05-03 20:49:05 +00:00

41 Commits