Commit Graph

11 Commits

Author SHA1 Message Date
4957ae5838 Add API to annotate disjoint backward and handle in AC (#166536)
This adds zero-bubble / DualPipeV support for (S)AC

Before:
- AC will always retrigger recompute upon every distinct backward.

After:
- Any checkpointed regions encountered by backward under the same instance of this context manager will only trigger recompute at most once, even if there are multiple calls to backward.
- Backward calls under the same instance of this context manager must execute over non-overlapping regions of the backward graph even if retain_graph=True.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/166536
Approved by: https://github.com/albanD
2025-11-08 00:21:25 +00:00
cyy
2f082e1e56 [13/N] Fix extra warnings brought by clang-tidy-17 (#140897)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140897
Approved by: https://github.com/ezyang
2024-11-27 00:35:19 +00:00
116e04be29 Initialize view_replay_enabled_ in the AutogradState ctor (#100822)
Cruise uses [clang static analyzer](https://clang-analyzer.llvm.org/) internally.
In the v2.0.0 release of PyTorch it found this problem

```
In file included from external/pytorch/aten/src/ATen/ATen.h:7:
In file included from external/pytorch/aten/src/ATen/Context.h:3:
In file included from external/pytorch/aten/src/ATen/CPUGeneratorImpl.h:3:
In file included from external/pytorch/aten/src/ATen/core/Generator.h:22:
In file included from external/pytorch/c10/core/GeneratorImpl.h:8:
In file included from external/pytorch/c10/core/TensorImpl.h:6:
external/pytorch/c10/core/InferenceMode.h:58:5: warning: Passed-by-value struct argument contains uninitialized data (e.g., field: 'view_replay_enabled_')
    AutogradState::set_tls_state(AutogradState(
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.
```

In other words, the value of `view_replay_enabled_` could be initialized which may lead to subtle bugs later on.

This PR addresses the warning by explicitly initializing it to `false`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100822
Approved by: https://github.com/Skylion007
2023-05-08 18:57:14 +00:00
01885cea43 [Typo] mulithreading_enabled => multithreading_enabled (#97054)
Summary: Fix typo

Test Plan: Continuous integration - Expected NoOp since it is just a variable renaming

Differential Revision: D44118850

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97054
Approved by: https://github.com/Skylion007
2023-03-21 20:11:59 +00:00
83275d8cdf add torch.autograd._set_view_replay_enabled, use in aot autograd (#92588)
tldr; this should fix some minor perf regressions that were caused by adding more as_strided() calls in aot autograd.

This PR adds a new context manager, `torch.autograd._set_view_replay_enabled()`.

Context: AOT Autograd has special handling for "outputs that alias graph intermediates". E.g. given this function:

```
def f(x):
    y = torch.mul(x, 2)
    out = y.view(-1)
    return out
```

AOT Autograd will do the following:

```
def fn_to_compile(x):
    y = torch.mul(x, 2)
    out = y.view(-1)
    # return the graph intermediate
    return y, out

compiled_fn = compile(fn_to_compile)

def wrapper(x):
    y, out = compiled_fn(x)
    # regenerate the alias of the graph intermediate
    return out._view_func(y)
```

What's annoying is that `out._view_func()` will result in a `.as_strided` call, because `out` is an ordinary runtime tensor. This (likely?) caused a perf regression, because when running the backward, out `as_strided_backward()` is slower than our `view_backward()`.

In this PR, I added some TLS for instructing autograd to do view replay instead of as_strided, even when given a normal tensor. I'm definitely interested in thoughts from autograd folks (cc @albanD @soulitzer). A few points that I want to bring up:

(1) One reason that this API seems generally useful to me is because of the case where you `torch.compile()` a function, and you pass in two inputs that alias each other, and mutate one of the inputs. Autograd is forced to add a bunch of as_strided() calls into the graph when this happens, but this would give users an escape hatch for better compiled perf in this situation

(2) To be fair, AOT Autograd probably won't need this TLS in the long term. There's a better (more complicated) solution, where AOT Autograd manually precomputes the view chain off of graph intermediates during tracing, and re-applies them at runtime. This is kind of complicated though and feels lower priority to implement immediately.

(3) Given all of that I made the API private, but lmk what you all think.

This is a followup of https://github.com/pytorch/pytorch/pull/92255.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92588
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-02-08 01:48:32 +00:00
cyy
fa65ae8f56 cleanup unused include (#93359)
Using `include-what-you-use` tool to find out and remove some unused includes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93359
Approved by: https://github.com/malfet
2023-02-04 02:15:50 +00:00
d04889323e Add Context Manager for Disabling Multithreading in Backwards, use in aot autograd (#86245)
We were running into a few issues with running multithreaded backwards in aot_autograd: such as https://github.com/pytorch/pytorch/issues/86136, and `FakeTensorMode` getting into a weird state as a result of not executing functions completely sequentially. The multithreaded backwards is lost in translation when we trace out the backwards anyway, and adds a lot of additional complexity.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86245
Approved by: https://github.com/albanD, https://github.com/yf225
2022-10-06 03:27:42 +00:00
04108592a3 New TLS to disable forward mode AD (#63117)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63117

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D30388097

Pulled By: albanD

fbshipit-source-id: f1bc777064645db1ff848bdd64af95bffb530984
2021-08-27 11:59:24 -07:00
41ffec07ce Add a common autograd TLS state (#63860)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63860

Test Plan: Imported from OSS

Reviewed By: heitorschueroff

Differential Revision: D30513253

Pulled By: albanD

fbshipit-source-id: 97d76ed54dfbdf4ba3fc7051ce3b9bb636cefb4b
2021-08-24 15:34:06 -07:00
688f06cac3 Revert D30388099: Add a common autograd TLS state
Test Plan: revert-hammer

Differential Revision:
D30388099 (83d9bad44a)

Original commit changeset: 8e03f940150f

fbshipit-source-id: f6d60fec66e8292f5268335bb8a3e7e1a662f23b
2021-08-24 07:22:39 -07:00
83d9bad44a Add a common autograd TLS state (#63114)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63114

This PR collapses the GradMode and InferenceMode thread local booleans into a single thread local uint8.
This helps reducing the number of thread local variable accesses done when we propagate ThreadLocalStates.

Note that this is even more beneficial as we will add a forward mode AD TLS (similar to GradMode) higher in this stack and this new structure should reduce the perf impact of adding this new TLS.

Here is the full benchmark result between master and the top of this stack: https://gist.github.com/albanD/e421101e9ed344e94999bef3a54bf0f3
tl;dr: give a benefit in most cases. It is never detrimental.

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D30388099

Pulled By: albanD

fbshipit-source-id: 8e03f940150ff063c2edd792733663413ae2f486
2021-08-24 06:54:02 -07:00