pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-11-13 16:45:14 +08:00

Author	SHA1	Message	Date
soulitzer	4957ae5838	Add API to annotate disjoint backward and handle in AC (#166536 ) This adds zero-bubble / DualPipeV support for (S)AC Before: - AC will always retrigger recompute upon every distinct backward. After: - Any checkpointed regions encountered by backward under the same instance of this context manager will only trigger recompute at most once, even if there are multiple calls to backward. - Backward calls under the same instance of this context manager must execute over non-overlapping regions of the backward graph even if retain_graph=True. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166536 Approved by: https://github.com/albanD	2025-11-08 00:21:25 +00:00
cyy	2f082e1e56	[13/N] Fix extra warnings brought by clang-tidy-17 (#140897 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/140897 Approved by: https://github.com/ezyang	2024-11-27 00:35:19 +00:00
Sergei Vorobev	116e04be29	Initialize view_replay_enabled_ in the AutogradState ctor (#100822 ) Cruise uses [clang static analyzer](https://clang-analyzer.llvm.org/) internally. In the v2.0.0 release of PyTorch it found this problem ``` In file included from external/pytorch/aten/src/ATen/ATen.h:7: In file included from external/pytorch/aten/src/ATen/Context.h:3: In file included from external/pytorch/aten/src/ATen/CPUGeneratorImpl.h:3: In file included from external/pytorch/aten/src/ATen/core/Generator.h:22: In file included from external/pytorch/c10/core/GeneratorImpl.h:8: In file included from external/pytorch/c10/core/TensorImpl.h:6: external/pytorch/c10/core/InferenceMode.h:58:5: warning: Passed-by-value struct argument contains uninitialized data (e.g., field: 'view_replay_enabled_') AutogradState::set_tls_state(AutogradState( ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1 warning generated. ``` In other words, the value of `view_replay_enabled_` could be initialized which may lead to subtle bugs later on. This PR addresses the warning by explicitly initializing it to `false`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100822 Approved by: https://github.com/Skylion007	2023-05-08 18:57:14 +00:00
Pierre Moulon	01885cea43	[Typo] mulithreading_enabled => multithreading_enabled (#97054 ) Summary: Fix typo Test Plan: Continuous integration - Expected NoOp since it is just a variable renaming Differential Revision: D44118850 Pull Request resolved: https://github.com/pytorch/pytorch/pull/97054 Approved by: https://github.com/Skylion007	2023-03-21 20:11:59 +00:00
Brian Hirsh	83275d8cdf	add torch.autograd._set_view_replay_enabled, use in aot autograd (#92588 ) tldr; this should fix some minor perf regressions that were caused by adding more as_strided() calls in aot autograd. This PR adds a new context manager, `torch.autograd._set_view_replay_enabled()`. Context: AOT Autograd has special handling for "outputs that alias graph intermediates". E.g. given this function: ``` def f(x): y = torch.mul(x, 2) out = y.view(-1) return out ``` AOT Autograd will do the following: ``` def fn_to_compile(x): y = torch.mul(x, 2) out = y.view(-1) # return the graph intermediate return y, out compiled_fn = compile(fn_to_compile) def wrapper(x): y, out = compiled_fn(x) # regenerate the alias of the graph intermediate return out._view_func(y) ``` What's annoying is that `out._view_func()` will result in a `.as_strided` call, because `out` is an ordinary runtime tensor. This (likely?) caused a perf regression, because when running the backward, out `as_strided_backward()` is slower than our `view_backward()`. In this PR, I added some TLS for instructing autograd to do view replay instead of as_strided, even when given a normal tensor. I'm definitely interested in thoughts from autograd folks (cc @albanD @soulitzer). A few points that I want to bring up: (1) One reason that this API seems generally useful to me is because of the case where you `torch.compile()` a function, and you pass in two inputs that alias each other, and mutate one of the inputs. Autograd is forced to add a bunch of as_strided() calls into the graph when this happens, but this would give users an escape hatch for better compiled perf in this situation (2) To be fair, AOT Autograd probably won't need this TLS in the long term. There's a better (more complicated) solution, where AOT Autograd manually precomputes the view chain off of graph intermediates during tracing, and re-applies them at runtime. This is kind of complicated though and feels lower priority to implement immediately. (3) Given all of that I made the API private, but lmk what you all think. This is a followup of https://github.com/pytorch/pytorch/pull/92255. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92588 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-02-08 01:48:32 +00:00
cyy	fa65ae8f56	cleanup unused include (#93359 ) Using `include-what-you-use` tool to find out and remove some unused includes Pull Request resolved: https://github.com/pytorch/pytorch/pull/93359 Approved by: https://github.com/malfet	2023-02-04 02:15:50 +00:00
Elias Ellison	d04889323e	Add Context Manager for Disabling Multithreading in Backwards, use in aot autograd (#86245 ) We were running into a few issues with running multithreaded backwards in aot_autograd: such as https://github.com/pytorch/pytorch/issues/86136, and `FakeTensorMode` getting into a weird state as a result of not executing functions completely sequentially. The multithreaded backwards is lost in translation when we trace out the backwards anyway, and adds a lot of additional complexity. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86245 Approved by: https://github.com/albanD, https://github.com/yf225	2022-10-06 03:27:42 +00:00
albanD	04108592a3	New TLS to disable forward mode AD (#63117 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63117 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30388097 Pulled By: albanD fbshipit-source-id: f1bc777064645db1ff848bdd64af95bffb530984	2021-08-27 11:59:24 -07:00
Alban Desmaison	41ffec07ce	Add a common autograd TLS state (#63860 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63860 Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D30513253 Pulled By: albanD fbshipit-source-id: 97d76ed54dfbdf4ba3fc7051ce3b9bb636cefb4b	2021-08-24 15:34:06 -07:00
Alban Desmaison	688f06cac3	Revert D30388099: Add a common autograd TLS state Test Plan: revert-hammer Differential Revision: D30388099 (`83d9bad44a`) Original commit changeset: 8e03f940150f fbshipit-source-id: f6d60fec66e8292f5268335bb8a3e7e1a662f23b	2021-08-24 07:22:39 -07:00
albanD	83d9bad44a	Add a common autograd TLS state (#63114 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63114 This PR collapses the GradMode and InferenceMode thread local booleans into a single thread local uint8. This helps reducing the number of thread local variable accesses done when we propagate ThreadLocalStates. Note that this is even more beneficial as we will add a forward mode AD TLS (similar to GradMode) higher in this stack and this new structure should reduce the perf impact of adding this new TLS. Here is the full benchmark result between master and the top of this stack: https://gist.github.com/albanD/e421101e9ed344e94999bef3a54bf0f3 tl;dr: give a benefit in most cases. It is never detrimental. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D30388099 Pulled By: albanD fbshipit-source-id: 8e03f940150ff063c2edd792733663413ae2f486	2021-08-24 06:54:02 -07:00

11 Commits