Currently OutputGraphGuardsState is separated out as a serializable interface for OutputGraph, but some of the typing around it is incorrect in dynamo's guards.py and output_graph.py: more fields are used by code than claimed by OutputGraphGuardsState, and it works because either the full OutputGraph is passed in or the parts that use those fields are dead when OutputGraphGuardsState is passed in.
In this PR we try to further separate the necessary fields of OutputGraph that should be retained by a full graph capture mechanism, not just limited to dynamo (as it is currently) but also something like make_fx (in the future). Since these fields do not need to be serialized, the result is an intermediate "common" data structure that is between OutputGraphGuardsState and OutputGraph in the inheritance hierarchy.
Differential Revision: D81718791
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162211
Approved by: https://github.com/zhxchen17
Initial prototype for dynamic int inputs, allows users to run with `torch.compile(f)(DynamicInt(4))`, compiling dynamically and using the underlying hint at runtime.
Current behavior:
- Also works in eager (mostly by subclassing int), as scalar input to torch functions, or numpy/math/etc. For example, `x = DynamicInt(3); torch.randn(x); torch.add(y, z, alpha=x); np.arange(x)` all act as if x = 3.
- Behavior for arithmetic ops is to return new DynamicInts rather than static ints; `DynamicInt(3) * 2 = DynamicInt(6)`. This is via SymNode magic methods, but coverage might not be 100% - for example, I had to explicitly override floordiv to avoid int casting. This is not necessarily the case for non-magic method ops (e.g. `math.cos(x)`). The alternative here is to int cast on all operations, but I opted for this for dynamism propagation in non-compiled regions.
- Doesn't ban fullgraph=False; DynamicInt objects might be leaked back to the user, but I guess this is fine, because they can be casted to ints when needed?
- Dynamo only allocates one symbol per DynamicInt; specifying the same DynamicInt for multiple inputs leads to input deduplication, and a guard installed.
- We don't raise on int specialization (in allowlist/maybe_mark_dynamic style) - but an easy change if needed.
- DynamicInts as nn.Module attributes are handled.
- We don't guard on the DynamicInt id, e.g. users can do the following without recompiling (maybe we should guard?)
```python
x = DynamicInt(4)
f(x)
f(1)
f(DynamicInt(3)) # same as f(3)
```
Follow-up work:
- Specifying shape constraints, either at the int-level, e.g.
```python
DynamicInt(64, name="s0", constraints=["s0 % 32 == 0", "s0 <= 1024"]
```
or at the compilation level, e.g. something like
```python
s0 = DynamicInt(64, name="s0")
s1 = DynamicInt(128, name="s1")
with some_compiler_config.dynamic_int_constraints(["s1 == 2*s0", "s0 % 32 == 0"]):
f(s0, s1)
```
This should subsume the need for specifying derived SymInts?
- SymFloat support - currently it seems backed floats are specialized by the tensorify float pass, and there's no handling in inductor.
- Propagating dynamism in tensor constructors, e.g. `x = DynamicInt(4); torch.randn(x)` could annotate `_dynamo_dynamic_indices`.
Differential Revision: D81698719
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162194
Approved by: https://github.com/bobrenjc93
Summary:
Add some metadata to CompileArtifacts, so that it contains the source code information about the original code while they are being traced.
For now, we will not provide a verification method to end user and instead we just provide which files are inlined. It's up to user to verify the content from these files are not changed (because it's optional for many users to validate source code changes anyway in aot precompile)
Test Plan:
buck run @mode/opt test/dynamo:test_dynamo -- -k test_file_change
buck run @mode/opt test/dynamo:test_dynamo -- -k test_aot_compile_source_info
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162983
Approved by: https://github.com/yushangdi
This PR is quite large in that it covers most of rough edges in the new strict export flow:
1. Handle nn_module_stack correctly now that we are tracing wrapper module
2. module_call_spec needs to get queried from source directly because we are not running the bytecode anymore.
3. Correct input and output handling.
@diff-train-skip-merge
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162183
Approved by: https://github.com/zhxchen17
This PR is quite large in that it covers most of rough edges in the new strict export flow:
1. Handle nn_module_stack correctly now that we are tracing wrapper module
2. module_call_spec needs to get queried from source directly because we are not running the bytecode anymore.
3. Correct input and output handling.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162183
Approved by: https://github.com/zhxchen17
ghstack dependencies: #162167
Summary:
[reland]
Since `allow_complex_guards_as_runtime_asserts` is now sync'd with `prefer_deferred_runtime_asserts_over_guards`, we can kill the former (especially since it was a export-only concept).
Test Plan:
updated tests
Rollback Plan:
Differential Revision: D81334984
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161794
Approved by: https://github.com/zhxchen17
Fixes#160437
Summary:
This PR avoids compiling empty FX graphs generated during graph breaks. If there are no calls in the graph, we can just return the empty list of instructions.
More precisely,
In compile_and_call_fx_graph, if the FX graph contains no calls (count_calls(self.graph) == 0) and the return value list is empty, we now return an empty instruction list immediately
Impact:
module: dynamo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160536
Approved by: https://github.com/Lucaskabela
Summary: Since `allow_complex_guards_as_runtime_asserts` is now sync'd with `prefer_deferred_runtime_asserts_over_guards`, we can kill the former (especially since it was a export-only concept).
Test Plan:
updated tests
Rollback Plan:
Differential Revision: D79903317
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160198
Approved by: https://github.com/ezyang
Summary: Since `allow_complex_guards_as_runtime_asserts` is now sync'd with `prefer_deferred_runtime_asserts_over_guards`, we can kill the former (especially since it was a export-only concept).
Test Plan:
updated tests
Rollback Plan:
Differential Revision: D79903317
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160198
Approved by: https://github.com/ezyang
Fix comments to reflect that we no longer codegen cells to be sent to resume function as inputs - they are instead codegen'd after the unsupported instruction in order to build resume functions that are closures.
Also simplify some codegen.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160138
Approved by: https://github.com/anijain2305
ghstack dependencies: #159329, #159678, #159817
Summary:
convert_frame.compile_frame used to take a callback transform function which will capture the frame object it has, but the frame information is not passed directly into compile_frame function.
This PR changes the signature of compile_frame so that frame information is directly passed in the function without taking a callback. This makes it easier to build fullgraph capture API on top of compile_frame.
Test Plan:
CI
Rollback Plan:
Differential Revision: D81041296
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161514
Approved by: https://github.com/tugsbayasgalan
```python
import torch
torch._dynamo.config.capture_scalar_outputs = True
class M(torch.nn.Module):
def forward(self, idx, x):
u0 = idx.item()
x0 = x.select(0, u0)
def fn():
return x0.sin()
return torch.cond(x0.sum() > 0, fn, fn)
m = M()
out = torch.compile(m, fullgraph=True)(torch.tensor(0, dtype=torch.int64, device="cuda"), torch.randn(3, 3, device="cuda"))
print(out)
```
Before the PR, we didn't track the storage_offset symbol of a tensor. After https://github.com/pytorch/pytorch/pull/157605, we create an unbacked_symint for stroage_offset for the result of select. So when we try to lift the free basic symbols of x0 during speculating fn, we found a free symbol that's not bound to a proxy.
This PR tracks the symbols of storage_offset and associated it with a proxy using torch.ops.aten.storage_offest.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161199
Approved by: https://github.com/zou3519
ghstack dependencies: #161198
Before the change in this PR, we have an error for the following code
```python
import torch
torch._dynamo.config.capture_scalar_outputs = True
class M(torch.nn.Module):
def forward(self, idx, x):
u0 = idx.item()
x0 = x.select(0, u0)
def fn():
return x0.sin()
return torch.cond(x0.sum() > 0, fn, fn)
m = M()
out = torch.compile(m, fullgraph=True)(torch.tensor(0, dtype=torch.int64), torch.randn(3, 3))
```
The error is caused when speculate fn, and tries to lift symbol of x0.storage_offset() but found the symbols doesn't have a source associated with it.
What really happens is that, when input tensor is a scalar tensor of int type and resides on CPU, we have a short cut that creates a norm symint when .item() is called see https://github.com/pytorch/pytorch/pull/126245.
However, previously, we only track the unbacked symint output of an operation because we believe all the backed symint must have a source associated with it and has already bee lifted as input at the top-level. Now this invariant no longer holds, so we end up an error saying the symbol doesn't have source (because only input and symbols derided from inputs have source and result of .item() doesn't have a source).
In this PR, we start to also track the normal symint with the proxy that created it (i.e. in this case the proxy .item()).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161198
Approved by: https://github.com/zou3519
This adds a new function `bypass_package` and `CompilePackage.bypass_current_entry()`. This allows us to safely bypass if there are models with unserializable or incompatible parts. When we encounter something incompatible, we'll raise a bypass and ignore that particular code in DynamoCodeEntry.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160902
Approved by: https://github.com/zhxchen17
We are refactoring dynamo code for convert frame so that we can have modularized pieces sharable between different compiler frontends (e.g. torch.compile, precompile and torch.export).
This PR follows the last one which separate out the part to run instruction translator on a given frame and return a DynamoTracerOutput.
The end result is a free function that runs instruction translator indepedently. A follow up diff will wrap the low level function.
Differential Revision: [D80388694](https://our.internmc.facebook.com/intern/diff/D80388694/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160815
Approved by: https://github.com/anijain2305
ghstack dependencies: #160814
This PR replaces "guard_serialization_mode" into `save_guards`. All cases where we care about whether or not we're *loading* guards can be inferred automatically from the existing inputs.
The only case that's special here is whether or not to check guards. We don't want to check guards on guard load in CheckFnManager, because these guards have already been checked on save. Therefore, we put the setting in OutputGraphGuardsState, so that when we save, we bypass the guards check.
Because of this change, it is *technically* possible to do a load and a save in the *same* CheckFunctionManager.__init__() by passing all the necessary parts, and also passing `save_guards=True`. This should just work out of the box, but so far no callsites need it, so not super important.
Next up, we'll work on removing save_guards from GuardBuilder, and putting it into its own phase.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160531
Approved by: https://github.com/zhxchen17
Purely a refactor, improve typing and get rid of some type errors. Make certain fields as nonnull, since in general it's not empty.
The goal of this stack of PRs is to move the save/load logic of guard serialization into separate, flat phases, instead of being embedded in guard creation. This way, we can put a try/catch around it and fail safely if certain guards are not serializable.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160530
Approved by: https://github.com/Lucaskabela, https://github.com/Skylion007
As part of better engineering week, we would like to improve out type support to improve dev experience in dynamo
This PR adds strict typing support to a critical set of files for dynamo, `source.py` and the base `_guards.py`
Running
```
mypy torch/_dynamo/source.py torch/_guards.py --linecount-report /tmp/coverage_log
```
| -------- | Lines Unannotated | Lines Total | % lines covered | Funcs Unannotated | Funcs Total | % funcs covered |
| -------- | ------- | -------- | ------- | ------- | ------- | ------- |
| Main | 1227 | 2208 | 55.57% | 207 | 362 | 57.18% |
| This PR | 2217 | 2217 | 100.00% | 362 | 362 | 100.00% |
| Delta | +990 | +9 | +44.43% | +155 | 0 | +42.82% |
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158397
Approved by: https://github.com/anijain2305