This PR is quite large in that it covers most of rough edges in the new strict export flow:
1. Handle nn_module_stack correctly now that we are tracing wrapper module
2. module_call_spec needs to get queried from source directly because we are not running the bytecode anymore.
3. Correct input and output handling.
@diff-train-skip-merge
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162183
Approved by: https://github.com/zhxchen17
This PR is quite large in that it covers most of rough edges in the new strict export flow:
1. Handle nn_module_stack correctly now that we are tracing wrapper module
2. module_call_spec needs to get queried from source directly because we are not running the bytecode anymore.
3. Correct input and output handling.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162183
Approved by: https://github.com/zhxchen17
ghstack dependencies: #162167
Summary: This PR introduces shape guards to export. Previously only value ranges, equalities, and specializations would be tracked for symbolic expressions, and we had a forward hook to check them. Instead now we create a function to check shape guards and call it in the exported program.
Test Plan:
updated several tests
Rollback Plan:
Differential Revision: D80713603
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161178
Approved by: https://github.com/tugsbayasgalan
Summary:
[reland]
Since `allow_complex_guards_as_runtime_asserts` is now sync'd with `prefer_deferred_runtime_asserts_over_guards`, we can kill the former (especially since it was a export-only concept).
Test Plan:
updated tests
Rollback Plan:
Differential Revision: D81334984
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161794
Approved by: https://github.com/zhxchen17
Summary: Since `allow_complex_guards_as_runtime_asserts` is now sync'd with `prefer_deferred_runtime_asserts_over_guards`, we can kill the former (especially since it was a export-only concept).
Test Plan:
updated tests
Rollback Plan:
Differential Revision: D79903317
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160198
Approved by: https://github.com/ezyang
Summary: Since `allow_complex_guards_as_runtime_asserts` is now sync'd with `prefer_deferred_runtime_asserts_over_guards`, we can kill the former (especially since it was a export-only concept).
Test Plan:
updated tests
Rollback Plan:
Differential Revision: D79903317
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160198
Approved by: https://github.com/ezyang
Summary:
In preparation for checking shape guards in export, this PR effectively switches `prefer_deferred_runtime_asserts_over_guards` to `False`, matching Dynamo.
Actually that's a lie: we switch it to `allow_complex_guards_as_runtime_asserts`, which is `False` by default but can be controlled via an internally API to be `True`. This makes the two flags synchronized, so we should be able to kill `allow_complex_guards_as_runtime_asserts` at this point.
Test Plan:
updated tests
Rollback Plan:
Differential Revision: D79734206
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160111
Approved by: https://github.com/tugsbayasgalan
Originally, if the PT2 errored when loading, we would try to load using the old loader to fit BC issues. However this hides the error messages for if an up-to-date PT2 is erroring when loading due to some other reason.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159881
Approved by: https://github.com/yushangdi
Collects some scattershot improvements made while attempting to enable training for AOTInductor. Non-typing changes are:
1. Swapping a few custom searches for the output node in an FX graph for calling `graph.output_node()`.
2. Removing two unused parameters from `torch.export._unlift._unlift`.
3. Switching handles to constants in `cpp_wrapper_cpu` to use C++ references for memory efficiency.
4. Cleaning out unused, unexported imports from `torch/export/__init__.py`, and adding one missing export to `__all__`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158075
Approved by: https://github.com/Skylion007
Summary: export_for_training exist because we couldn't migrate internal usages of export to the final IR. Now that we have completed the migration, we should deprecate and delete this API.
Test Plan:
CI
Rollback Plan:
Differential Revision: D78240836
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158203
Approved by: https://github.com/JacobSzwejbka
Refactor the pt2 archive saving to consolidate the format of torch.export.save and torch._inductor.package.package_aoti.
This PR adds the following functions, which torch.export.save and AOTI packaging calls into:
```python
package_pt2(
f: FileLike,
*,
exported_programs: Optional[Union[ExportedProgram, dict[str, ExportedProgram]]] = None,
aoti_files: Optional[Union[list[str], dict[str, list[str]]]] = None,
extra_files: Optional[dict[str, Any]] = None,
) -> FileLike
@dataclass
class PT2ArchiveContents:
exported_programs: dict[str, ExportedProgram]
aoti_runners: dict[str, AOTICompiledModel]
extra_files: dict[str, Any]
load_pt2(f: FileLike) -> PT2ArchiveContents
```
Power users directly call into these APIs if they want to bundle multiple exported programs, aoti files, or extra metadata.
This is how the pt2 archive looks like ([spec](https://docs.google.com/document/d/1RQ4cmywilnFUT1VE-4oTGxwXdc8vowCSZsrRgo3wFA8/edit?tab=t.0)):
```
├── archive_format
├── version
├── .data
├── data
│ ├── aotinductor
│ │ └── model1
│ │ ├── model1.cpp
│ │ ├── model1.so # currently AOTI automatically moves weights in here, TODO to move it out
│ │ ├── cg7domx3woam3nnliwud7yvtcencqctxkvvcafuriladwxw4nfiv.cubin
│ │ └── cubaaxppb6xmuqdm4bej55h2pftbce3bjyyvljxbtdfuolmv45ex.cubin
│ ├── weights
│ │ ├── model1.pt # TODO to dedup weights between model1/model2
│ │ └── model2.pt
│ └── constants
│ │ ├── model1.pt # TODO to dedup weights between model1/model2
│ │ └── model2.pt
│ └── sample_inputs
│ ├── model1.pt # TODO to dedup weights between model1/model2
│ └── model2.pt
├── extra
│ └── user_metadata.txt
└── models
├── model1.json
└── model2.json
```
Future todos:
- unbundle the weights -- instead of .pt, we can use bin files, which will also allow us to dedup weights if we store multiple models
- update aoti_compile_and_package to also save the exported program
- integrate TNR with this packaging flow
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152495
Approved by: https://github.com/yushangdi
Given an exception in torch.export, I want to try/catch it to add the message "hey try out draft-export!". Currently I only add this message for errors that draft-export is known to fix, like DataDependentErrors, ConstraintViolationErrors, and no fake impl.
Originally the error message looks like:
```
File "/data/users/angelayi/pytorch/torch/_library/custom_ops.py", line 626, in fake_impl
raise RuntimeError(
RuntimeError: There was no fake impl registered for <CustomOpDef(mylib::foo2)>. This is necessary for torch.compile/export/fx tracing to work. Please use `foo2_impl.register_fake` to add an fake impl.
```
Now, the error msg now looks something like:
```
File "/data/users/angelayi/pytorch/torch/_library/custom_ops.py", line 626, in fake_impl
raise RuntimeError(
RuntimeError: There was no fake impl registered for <CustomOpDef(mylib::foo2)>. This is necessary for torch.compile/export/fx tracing to work. Please use `foo2_impl.register_fake` to add an fake impl.
The error above occurred when calling torch.export.export. If you would like to view some more information about this error, and get a list of all other errors that may occur in your export call, you can rerun your program with the `DRAFT_EXPORT=1` envvar, or replace your `export()` call with `draft_export()`.
```
In python versions >= 3.11, we can use `exception.add_note` to add to the error message. However with previous versions I did a hack to modify `e.args`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151065
Approved by: https://github.com/pianpwk
ghstack dependencies: #151051
Summary:
- Flip default value of `strict` argument from True to False on torch.export.export_for_training API
- All callsites have been updated to provide this argument explicitly to avoid behavior change.
- If you see any breakages, that means you may have a new callsite that is missed, please set `strict=True` explicitly to the callsite to mitigage.
Test Plan: CI
Differential Revision: D72724975
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150941
Approved by: https://github.com/ydwu4
Summary:
Instead of explicitly specifying dynamic shapes, it is possible to infer them from additional example inputs. Together with the example inputs provided to export, we can basically make any varying dim dynamic and keep any fixed dim static. This should be useful for prod scenarios that have access to tests and/or profiling data, yet are somewhat removed from the model authoring process.
However this alone is not satisfactory: the exported program by design has only one graph, representing one path through the model, and we cannot necessarily guarantee that this graph works for the additional example inputs because different guards might have been created if we had exported with them instead (corresponding to different traced paths). However, checking that the additional example inputs satisfy the guards created by the original export should be sufficient for generalization.
Now, while we don't preserve all guards in the exported program, we do check a subset of them as part of input matching. So we add a verification step at the end of export when such additional example inputs are provided. This should be enough for now.
Test Plan: added test (positive and negative cases)
Differential Revision: D72001771
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150144
Approved by: https://github.com/bobrenjc93
Summary: Remove torch.export.export_for_inference, it is redundant and can always be replaced with torch.export.export_for_training() + run_decompositions()
Test Plan: unit tests
Differential Revision: D71069057
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149078
Approved by: https://github.com/tugsbayasgalan
Summary:
- Flip the default value of strict argument in torch.export.export from True to False
- Update test infra to cope with the change, some of them made the assumption of strict mode as default
- Disabled some tests that fail in non-strict mode
Test Plan: Sandcastle
Differential Revision: D70228628
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148790
Approved by: https://github.com/angelayi
Fixes#144976
Using appoach ① `IO[bytes]`, but could also try with a protocol.
## Notes:
- moved `torch.serialization.FILE_LIKE` to `torch.types.FileLike`
- Use `FileLike` annotation where it makes sense
- made sure those functions also support `os.PathLike`
- Replaced `isinstance(x, io.BytesIO)` with `isinstance(x, (io.IOBase, IO))` where appropriate.
- Replaced `BinaryIO` with `IO[bytes]` (the two ABCs are almost identical, the only difference is that `BinaryIO` allows `bytearray` input to `write`, whereas `IO[bytes]` only `bytes`)
- needed to make `torch.serialization._opener` generic to avoid LSP violations.
- skipped `torch/onnx/verification` for now (functions use `BytesIO.getvalue` which is not part of the `IO[bytes]` ABC, but it kind of seems that this is redundant, as e.g. `onnx.load` supports `str | PathLike[str] | IO[bytes]` directly...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144994
Approved by: https://github.com/ezyang, https://github.com/Skylion007
In this PR, we implement lazy dictionary for export decomp behaviour for following reasons:
1. Custom op loading can happen after import time, as a result, the decomp table might not be able to pick up the decomp. Therefore we try to delay materialization as late as possible.
I intentionally seperated out the core_aten_decomp to not have any custom CIA ops in this PR to mitigate the risk of getting reverted but in the future, core_aten_decomp under torch/_decomp will exist as an alias to official export table (torch.export.default_decompositions)
Differential Revision: [D64140807](https://our.internmc.facebook.com/intern/diff/D64140807)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137650
Approved by: https://github.com/justinchuby, https://github.com/bdhirsh
Recently we decided to split export IR into two different IRs (training vs inference). In the inference IR, one major change we decided to introduce was we wanted to keep the composite ops that user specified in the IR. This PR does that by overriding the CompositeImplicitAutograd decomp in export inference path.
Differential Revision: [D58701607](https://our.internmc.facebook.com/intern/diff/D58701607)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128077
Approved by: https://github.com/bdhirsh
This PR switches export IR from aot-dispatch to pre-dispatch IR.
**What is pre-dispatch IR and why should you care?**
Currently the default IR returned by torch.export can contain only functional ATen operators after ALL pytorch dispatcher decompositions (for example, CompositeImplicitAutograd) run.
In contrast, pre-dispatch IR refers to an IR that can contain all functional ATen operators (i.e., not just from the core subset), before any decomposition happens, as well as operators that manipulate autograd state. Pre-dispatch IR closely resembles eager PyTorch computation, but is still functional and serializable by torch.export. As a result:
You can train the pre-dispatch IR in eager mode as the IR contains necessary information for the autograd engine to automatically generate a backward graph.
You can write sound graph transformations more easily as the IR is functional.
Since it is an ATen IR, it is still normalized. For example, torch.add has multiple overloads, but aten.add.Tensor is unique in this IR.
If you want to get the core aten IR out of torch.export, you will need to:
```
ep = torch.export.export(M(), inputs)
ep_for_core_aten = ep.run_decompositions()
```
Differential Revision: [D57172986](https://our.internmc.facebook.com/intern/diff/D57172986)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125860
Approved by: https://github.com/zhxchen17
This PR introduces a new way of building `dynamic_shapes` for export. The idea is to build up a mapping from input tensors to the dynamic shapes that should be assigned to their corresponding fake tensors.
This mapping is automatically converted to the current form of `dynamic_shapes`, which must exactly match the structure of inputs. We do this by using pytree utils.
With the current `dynamic_shapes`, we had to be careful about user-defined classes that are registered with pytree, since such classes are not necessarily polymorphic containers; they may be fine containing tensors, but not dynamic shapes. Thus we had decided to allow input instances of such classes to be associated with dynamic shapes in flattened form. This decision needs to be mirrored in this PR as well. To make it easier to keep these code paths in sync, we refactor the current recursive procedure for associating inputs with dynamic shapes to use the same pytree utils. This needs minor fixes to a few tests where `dynamic_shapes` were not exactly matching the structure of inputs.
Differential Revision: D56551992
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124898
Approved by: https://github.com/zhxchen17
This PR switches export IR from aot-dispatch to pre-dispatch IR.
**What is pre-dispatch IR and why should you care?**
Currently the default IR returned by torch.export can contain only functional ATen operators after ALL pytorch dispatcher decompositions (for example, CompositeImplicitAutograd) run.
In contrast, pre-dispatch IR refers to an IR that can contain all functional ATen operators (i.e., not just from the core subset), before any decomposition happens, as well as operators that manipulate autograd state. Pre-dispatch IR closely resembles eager PyTorch computation, but is still functional and serializable by torch.export. As a result:
- You can train the pre-dispatch IR in eager mode as the IR contains necessary information for the autograd engine to automatically generate a backward graph.
- You can write sound graph transformations more easily as the IR is functional.
- Since it is an ATen IR, it is still normalized. For example, torch.add has multiple overloads, but aten.add.Tensor is unique in this IR.
If you want to get the core aten IR out of `torch.export`, you will need to:
```
ep = torch.export.export(M(), inputs)
ep_for_core_aten = ep.run_decompositions()
```
Differential Revision: [D56273267](https://our.internmc.facebook.com/intern/diff/D56273267)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123573
Approved by: https://github.com/gmagogsfm