Commit Graph

748 Commits

Author SHA1 Message Date
5d4da26ed0 Revert "[export] preserve_node_meta by default (#165524)"
This reverts commit fdd560afd1d413a9f814cbf7cc2a72e0d39b0117.

Reverted https://github.com/pytorch/pytorch/pull/165524 on behalf of https://github.com/lw due to test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_symint_closure [GH job link](https://github.com/pytorch/pytorch/actions/runs/18586312291/job/52991654051) [HUD commit link](fdd560afd1) ([comment](https://github.com/pytorch/pytorch/pull/165524#issuecomment-3415352522))
2025-10-17 12:27:17 +00:00
fdd560afd1 [export] preserve_node_meta by default (#165524)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165524
Approved by: https://github.com/malaybag
2025-10-17 07:55:28 +00:00
d795fb225a [RFC] Add pyrefly to lintrunner (#165179)
This will add pyrefly to lint runner as a warning only - and allow us to collect feedback about the tool before switching to pyrefly as the main type checker.

References the steps outlined here: : https://github.com/pytorch/pytorch/issues/163283:

test plan:
`lintrunner init`
`lintrunner`
confirm when pyrefly errors are present results look like: https://gist.github.com/maggiemoss/e6cb2d015dd1ded560ae1329098cf33f

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165179
Approved by: https://github.com/ezyang
2025-10-16 20:07:09 +00:00
74acf92648 Forward fix inductor failure (#165363) (#165443)
Summary:

Title

Test Plan: CI

Differential Revision: D84615478

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165443
Approved by: https://github.com/angelayi
2025-10-14 19:31:58 +00:00
fbe0d20a17 [2/N] More ruff SIM fixes (#165031)
This is follow-up of #164695 to apply ruff SIM rules to more files. Most changes are about simplifying dict.get because None is already the default value.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165031
Approved by: https://github.com/mlazos
2025-10-14 14:22:54 +00:00
de8d81275a Do not decompose in functionalization/proxy tensor if autograd wouldn't have decomposed (#164939)
This fixes AOTAutograd rms_norm not being bitwise equivalent to
eager, because it avoids a decomposition.  You can force the
decomposition by having the decomposition in the dispatch table,
but if eager mode wouldn't have decomposed (because it went to the fused
one), we now default to preserving the fused call by default.

This largely reverts https://github.com/pytorch/pytorch/pull/103275/ for view ops. This means that in inference mode we could hit the wrong C++ kernel; if this occurs we should just SymInt'ify the C++ kernel.

Another neat side effect of this change is that Inductor's generated kernels for rms_norm now have rms_norm in their name.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164939
Approved by: https://github.com/bdhirsh
2025-10-11 01:03:55 +00:00
5c3fe9fb30 Revert "Do not decompose in functionalization/proxy tensor if autograd wouldn't have decomposed (#164939)"
This reverts commit a6fa4f9c283971c0fb6f60a89674a1f35370ac79.

Reverted https://github.com/pytorch/pytorch/pull/164939 on behalf of https://github.com/izaitsevfb due to introduces numeric issues internally, see [D84326613](https://www.internalfb.com/diff/D84326613) ([comment](https://github.com/pytorch/pytorch/pull/164939#issuecomment-3392203314))
2025-10-10 20:21:12 +00:00
4cd06dc82c [PT2 Archive] Use tensor dtype while deduping/grouping weights (state_dict/constants) (#165090)
Summary: While saving state_dict tensors, deduping is done to reduce number of tensor data. For this storage point is used. But when the tensor is empty, storage pointer is 0. But dtype of the tensors could be different. Existing logic will consider all such tensor as same. This will fail the model later when different dtype is expected. This change will include dtype also while deduping. For non empty tensor, this should not affect as the storage point will be unique.

Test Plan: TBD

Differential Revision: D84243094

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165090
Approved by: https://github.com/yiming0416
2025-10-10 17:51:43 +00:00
8b2137e74a Don't use C++ CIA decomps if there's a Python one (#164970)
Some more context at https://github.com/pytorch/pytorch/pull/164939

The basic point here is that Python decomps are guaranteed to be functional, whereas C++ ones are not. If we have a Python decomp, we should prefer it over the C++ one. This currently doesn't matter too much as CIA decomps will get functionalized, but it matters after the quoted PR because we now run these decompositions very late (to make it easy for things like aot_eager to get the fused versions of operators in proxy tensor).

Signed-off-by: Edward Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164970
Approved by: https://github.com/bdhirsh
2025-10-10 16:46:09 +00:00
fb64da0791 [2/N] Use "is" in python type comparison (#165142)
This is follow-up of #165037. It generally recommended to use `is/is not` to compare types. Therefore this series of changes apply this suggestion in the code base, and it aims to finally enabling related linter checks.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165142
Approved by: https://github.com/albanD
2025-10-10 15:36:44 +00:00
b8be796a57 Revert "[2/N] More ruff SIM fixes (#165031)"
This reverts commit 38095fbd1323ee4a9541fbcbb9b28bd20f2cd956.

Reverted https://github.com/pytorch/pytorch/pull/165031 on behalf of https://github.com/albanD due to One of the changed line started to fail on trunk ([comment](https://github.com/pytorch/pytorch/pull/165031#issuecomment-3390190870))
2025-10-10 13:42:14 +00:00
70925bdf82 [1/N] Use "is" in python type comparison (#165037)
It generally recommended to use `is/is not` to compare types. Therefore this series of changes apply this suggestion in the code base, and it aims to finally enabling related linter checks.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165037
Approved by: https://github.com/mlazos
2025-10-10 12:36:50 +00:00
38095fbd13 [2/N] More ruff SIM fixes (#165031)
This is follow-up of #164695 to apply ruff SIM rules to more files. Most changes are about simplifying dict.get because None is already the default value.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165031
Approved by: https://github.com/mlazos
2025-10-10 05:37:46 +00:00
a6fa4f9c28 Do not decompose in functionalization/proxy tensor if autograd wouldn't have decomposed (#164939)
This fixes AOTAutograd rms_norm not being bitwise equivalent to
eager, because it avoids a decomposition.  You can force the
decomposition by having the decomposition in the dispatch table,
but if eager mode wouldn't have decomposed (because it went to the fused
one), we now default to preserving the fused call by default.

This largely reverts https://github.com/pytorch/pytorch/pull/103275/ for view ops. This means that in inference mode we could hit the wrong C++ kernel; if this occurs we should just SymInt'ify the C++ kernel.

Another neat side effect of this change is that Inductor's generated kernels for rms_norm now have rms_norm in their name.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164939
Approved by: https://github.com/bdhirsh
2025-10-10 00:15:00 +00:00
a57a14868d Better handling of restore_state_dict (#164401)
After lean export, we might want to be able to restore the original fqn. This PR refactors one util function in export that sort of does this. Note that strict_export has some complicated logic of updating the graph signature as well which we don't want. I think we can gradually make this util more refined by handling constants, non persistent buffers etc and change how strict_export does it today.

Differential Revision: [D83687844](https://www.internalfb.com/diff/D83687844)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164401
Approved by: https://github.com/avikchaudhuri
2025-10-09 22:39:11 +00:00
06d86e58d0 Revert "Do not decompose in functionalization/proxy tensor if autograd wouldn't have decomposed (#164939)"
This reverts commit d40a9bfb8da0dc1ac1e6e56b33a25979112874de.

Reverted https://github.com/pytorch/pytorch/pull/164939 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](https://github.com/pytorch/pytorch/pull/164939#issuecomment-3385056722))
2025-10-09 09:50:59 +00:00
d40a9bfb8d Do not decompose in functionalization/proxy tensor if autograd wouldn't have decomposed (#164939)
This fixes AOTAutograd rms_norm not being bitwise equivalent to
eager, because it avoids a decomposition.  You can force the
decomposition by having the decomposition in the dispatch table,
but if eager mode wouldn't have decomposed (because it went to the fused
one), we now default to preserving the fused call by default.

This largely reverts https://github.com/pytorch/pytorch/pull/103275/ for view ops. This means that in inference mode we could hit the wrong C++ kernel; if this occurs we should just SymInt'ify the C++ kernel.

Another neat side effect of this change is that Inductor's generated kernels for rms_norm now have rms_norm in their name.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164939
Approved by: https://github.com/bdhirsh
ghstack dependencies: #164573
2025-10-09 04:49:44 +00:00
a029675f6f More ruff SIM fixes (#164695)
This PR applies ruff `SIM` rules to more files. Most changes are about simplifying `dict.get` because `None` is already the default value.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164695
Approved by: https://github.com/ezyang
2025-10-09 03:24:50 +00:00
086dec3235 Pyrefly suppressions 6/n (#164877)
Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283

Almost there!

Test plan:
dmypy restart && python3 scripts/lintrunner.py -a
pyrefly check

step 1: delete lines in the pyrefly.toml file from the project-excludes field
step 2: run pyrefly check
step 3: add suppressions, clean up unused suppressions
before: https://gist.github.com/maggiemoss/4b3bf2037014e116bc00706a16aef199

after:

INFO 0 errors (5,064 ignored)

Only four directories left to enable

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164877
Approved by: https://github.com/oulgen
2025-10-08 02:30:57 +00:00
7b15534434 [export] Fix weight sharing when there is no complete tensor (#164857)
Summary: As titled.

Test Plan: CI

Differential Revision: D84079625

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164857
Approved by: https://github.com/yushangdi
2025-10-07 23:40:13 +00:00
b13cd141b3 Add pyrefly suppressions (#164748)
Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283

Test plan:
dmypy restart && python3 scripts/lintrunner.py -a
pyrefly check

step 1: delete lines in the pyrefly.toml file from the `project-excludes` field
step 2: run pyrefly check
step 3: add suppressions, clean up unused suppressions
before: https://gist.github.com/maggiemoss/4b3bf2037014e116bc00706a16aef199

after:

0 errors (4,263 ignored)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164748
Approved by: https://github.com/oulgen
2025-10-07 17:31:18 +00:00
4725871a81 Return fake mode from export graph capture API (#164730)
This PR is to temporarily unblock various experiments to re-use dynamo create fake mode. Note that this is still not what we want as the end state. The end state should look sth like:
```
out = fulllgraph_capture(mod, inputs)
fake_mode = out.backend_inputs.fake_mode
gm  = out.module()
```
This doesn't work today because export requires we need to wrap the original module to setup a flat module to trace for easier handling of pytree. As a result, we would need to carry export specific flag in fullgraph_capture which seems not ideal.
Regardless, the end state is that we need to give downstream user a graph module and a fake mode in some form, so I think _dynamo_graph_capture_for_export returning a fake mode within graph module itself via gm.meta

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164730
Approved by: https://github.com/avikchaudhuri
2025-10-07 03:42:46 +00:00
2164b66121 [export] Better state_dict and constant dedup in torch.export.save (#164196)
Summary:

Previously, weight deduplication was done by simply grouping tensors with their untyped storage and saving the first tensor in the group.

A more rigorous approach would be to find a complete tensor that covers the storage and store that tensor. This is particularly important for GPU weights because when saving to raw bytes, we move the weight to CPU first, and if the weight being saved is not a complete one, it will lose the storage information during the copy to CPU.

In this diff, we reuse code in `_package_weights.py` for better weights and constants deduplication in `torch.export.save`.

Test Plan: buck2 run mode/dev-nosan caffe2/test:test_export -- -r test_weight_sharing_gpu

Differential Revision: D83523690

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164196
Approved by: https://github.com/angelayi
2025-10-06 17:03:15 +00:00
5d7360bb03 Revert "Enable all SIM rules except disabled ones (#164645)"
This reverts commit 321e6026925f6b6e8a36e3a8b7c0295cd7541911.

Reverted https://github.com/pytorch/pytorch/pull/164645 on behalf of https://github.com/izaitsevfb due to causes lint failures ([comment](https://github.com/pytorch/pytorch/pull/164645#issuecomment-3369274351))
2025-10-05 19:32:21 +00:00
321e602692 Enable all SIM rules except disabled ones (#164645)
`SIM` rules are useful for simplifying boolean expressions and enhances code readability.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164645
Approved by: https://github.com/ezyang
2025-10-05 07:38:25 +00:00
35c4130fd1 [2/N] Fix ruff warnings (#164460)
Apply ruff `SIM` rules.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164460
Approved by: https://github.com/ezyang
2025-10-04 03:40:32 +00:00
b6b7a44dec Fix common typos and misspellings (#164413)
Summary:
This commit fixes numerous typos and misspellings found throughout the codebase. The fixes improve code readability and documentation consistency across C++, Python, CUDA, and documentation files.

## Typos Fixed

| Before | After | Occurrences |
|--------|-------|-------------|
| occured | occurred | 14 |
| accross | across | 9 |
| lenght/lenghts | length/lengths | 8 |
| unneccessary | unnecessary | 5 |
| Peform | Perform | 4 |
| furture | future | 3 |
| paritioned | partitioned | 2 |
| desireable | desirable | 2 |
| registerations | registrations | 2 |
| seperated | separated | 2 |
| intialized | initialized | 2 |
| capatibility | compatibility | 2 |
| peformed | performed | 2 |
| Exmple | Example | 2 |
| comma_seperated | comma_separated | 2 |
| cumsuming | consuming | 2 |
| neccessary | necessary | 1 |
| ParamterMetadataTable | ParameterMetadataTable | 1 |
| matached | matched | 1 |
| conaitner | container | 1 |
| reivew | review | 1 |
| prioriry | priority | 1 |
| Alocated | Allocated | 1 |
| opportunixtically | opportunistically | 1 |
| peformance | performance | 1 |
| equavalent | equivalent | 1 |
| asssumed | assumed | 1 |
| valdiation | validation | 1 |
| apprear | appear | 1 |
| consectuve | consecutive | 1 |
| dependending | depending | 1 |
| copnversion | conversion | 1 |
| weigted | weighted | 1 |
| repreesenting | representing | 1 |
| finialize | finalize | 1 |
| unintialized | uninitialized | 1 |
| conbined | combined | 1 |
| tesnor | tensor | 1 |
| desugared | discarded | 1 |
| behaviour | behavior | 1 |
| paramerizaitons | parametrizations | 1 |
| compute_output_lenghths_kernel | compute_output_lengths_kernel | 1 |

Test Plan: N/A - mostly comments - waiting on CI

Differential Revision: D83695665

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164413
Approved by: https://github.com/eqy, https://github.com/larryliu0820
2025-10-03 23:19:41 +00:00
f414aa8e0d Add pyrefly suppressions (3/n) (#164588)
Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283

Test plan:
dmypy restart && python3 scripts/lintrunner.py -a
pyrefly check

step 1: uncomment lines in the pyrefly.toml file
step 2: run pyrefly check
step 3: add suppressions, clean up unused suppressions
before: https://gist.github.com/maggiemoss/bb31574ac8a59893c9cf52189e67bb2d

after:

 0 errors (1,970 ignored)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164588
Approved by: https://github.com/oulgen
2025-10-03 22:03:03 +00:00
4661200125 [RELAND v2] Close some sources of fake tensors (#164372)
Changelog:

1. When we run into an operation we didn't proxy, we end up emitting fake constants. We error under a config and we disable the config for some internal users. The reason we want to error is this signals a coverage problem we need to address but at the same time, we don't wnat to be disruptive to already working flows.

2. Previous attribute mutation detection logic in non-strict didn't account for nested module structure. This fixes silent incorrectness issue of exporting esm and qwen in non-strict and some torchbench models like levit_128 and demucs.

3. Previous logic didn't work on the cases where we mutate a container attribute as the previous approach used to pytree over old and new attributes resulting in length mismatch. We gracefully handle this now.

Differential Revision: [D83673054](https://our.internmc.facebook.com/intern/diff/D83673054)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164372
Approved by: https://github.com/avikchaudhuri
2025-10-02 18:58:52 +00:00
a43c4c3972 [5/N] Apply ruff UP035 rule (#164423)
Continued code migration to enable ruff `UP035`. Most changes are about moving `Callable` from `typing` to `from collections.abc`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164423
Approved by: https://github.com/ezyang
2025-10-02 07:31:11 +00:00
3787a5a60e [export] Explicitly passing requires_grad to nn.Parameter() in deserialization (#164290)
Summary: `nn.Parameter()` by default has `requires_grad=True` and would cause issues when there are non-float parameters.

Test Plan: buck2 run mode/dev-nosan caffe2/test:test_export -- -r test_non_float_weight

Differential Revision: D83598796

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164290
Approved by: https://github.com/angelayi
2025-10-01 02:55:20 +00:00
28c1d2f81b [aoti] AOTI mingw cross compilation (#163188)
To run this, you need to install `mingw64-gcc-c++` and download windows cuda library toolkit.

See design doc and demo instructions in https://docs.google.com/document/d/1iDaChqA5nNKkBFTzsdkmoomvQlXHbnlb1Z4yEp7xaJA/edit?tab=t.0

If cross_platform_target is windows, we do the following:

- do not link to `sleef`. This can be improved in the future if we need it. Currently I avoid it because that requires extra setup on the linux side
- Use `mingw64-gcc-c++` to compile
- Use `WINDOWS_CUDA_HOME` instead of `CUDA_HOME` when linking to cuda

```
 python test/inductor/test_aot_inductor_windows.py -k so
 ```

 Other changes:
 - de-couples compile_standalone config and dynamic link flag
 - create a new aot_inductor_mode config module, which is used to control configs in aot_inductor.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163188
Approved by: https://github.com/desertfire
2025-10-01 02:22:06 +00:00
adc11a7634 [export] avoid checks during tracing of export verification (#164219)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164219
Approved by: https://github.com/Lucaskabela
2025-09-30 21:46:59 +00:00
9e792f583a Revert "[export] Skip the check instead of disable (#164084)"
This reverts commit c2768d0f5af840a94c342ed9eac3e26c819aa3f0.

Reverted https://github.com/pytorch/pytorch/pull/164084 on behalf of https://github.com/yangw-dev due to broke internal tests ([comment](https://github.com/pytorch/pytorch/pull/164084#issuecomment-3348862668))
2025-09-29 20:09:13 +00:00
c2768d0f5a [export] Skip the check instead of disable (#164084)
Its unclear why we had disable in the first place. With
install_free_tensors, we are tracing into this hook. A better way would
be to place the tracer without any hook. For now, disable the checking
while dynamo is tracing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164084
Approved by: https://github.com/tugsbayasgalan
2025-09-29 01:44:44 +00:00
008051b13c [Dynamic Shape][BE] trim _DimHint serialization (#163891)
Summary:
current serialization is a bit hard to read
```
Exporting with the dynamic shape spec: {getitem_123: (_DimHint(type=<_DimHintType.DYNAMIC: 3>, min=1, max=64, _factory=False)), getitem_118: (_DimHint(type=<_DimHintType.DYNAMIC: 3>,
min=489, max=31232, _factory=False)), getitem_117: (_DimHint(type=<_DimHintType.DYNAMIC: 3>, min=489, max=31232, _factory=False)), getitem_116: (_DimHint(type=<_DimHintType.DYNAMIC: 3>, min=489, max=31232, _factory=False)), getitem_115: (
_DimHint(type=<_DimHintType.STATIC: 2>, min=None, max=None, _factory=True), _DimHint(type=<_DimHintType.DYNAMIC: 3>, min=1, max=64, _factory=False)), getitem_46: (_DimHint(type=<_DimHintType.DYNAMIC: 3>, min=29, max=1792, _factory=False),
 _DimHint(type=<_DimHintType.STATIC: 2>, min=None, max=None, _factory=True)), _predict_module__base_model_model_ro_sparse_arch_ebc__output_dists_0__dist: (_DimHint(type=<_DimHintType.DYNAMIC: 3>, min=1, max=64, _factory=False), _DimHint(t
ype=<_DimHintType.STATIC: 2>, min=None, max=None, _factory=True)), _predict_module__base_model_model_nro_sparse_arch_ebc__output_dists_0__dist: (_DimHint(type=<_DimHintType.DYNAMIC: 3>, min=29, max=1792, _factory=False)...
```

Test Plan: UT

Differential Revision: D83175131

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163891
Approved by: https://github.com/pianpwk
2025-09-27 00:08:01 +00:00
2f85de0b42 Fix preserve annotation with decomp (#163896)
If we use `fx_traceback.preserve_node_meta()`, we will have a few extra node.meta fields on nodes, such as "seq_nr", added from `fx/proxy.py`. As a result, there might be non-empty node.meta on graph nodes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163896
Approved by: https://github.com/SherlockNoMad, https://github.com/ydwu4
2025-09-26 20:28:47 +00:00
11b4c0eb9e [aoti] Save compute information (#163792)
Metadata looks like:
```
{
  'AOTI_DEVICE_KEY': 'cpu',
  'AOTI_PLATFORM': 'linux',
  'AOTI_MACHINE': 'x86_64',
  'AOTI_CPU_ISA': 'AVX512',
  'AOTI_COMPUTE_CAPABILITY': '90'
}
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163792
Approved by: https://github.com/yushangdi, https://github.com/desertfire
ghstack dependencies: #163779
2025-09-26 05:40:44 +00:00
649ceda8a5 [export] handling NamedTuple inputs (#162959)
Fixes #160547
### Summary:
bug
```
    def test_namedtuple(self):
        from collections import namedtuple
        Point = namedtuple('Point', 'x y')

        class M(torch.nn.Module):
            def forward(self, x, y):
                return x + y

        inp = Point(torch.ones(3), torch.ones(3))
        print(M()(*inp))

        # errors
        ep = torch.export.export(M(), inp, strict=False)
        print(ep)

        # succeeds
        ep = torch.export.export(M(), inp, strict=True)
        print(ep)

        # workaround could be to convert namedtuple to a kwarg
        inp_kwargs =  {field: getattr(inp, field) for field in inp._fields}
        ep = torch.export.export(M(), (), inp_kwargs)
        print(ep)
```
FIx :
namedtuple is subclass of tuple
but namedtuple is not expected
So, this change handles named tuple case

I have added 🧪 test case for this as well
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162959
Approved by: https://github.com/angelayi

Co-authored-by: Angela Yi <angelayi@meta.com>
2025-09-23 17:43:50 +00:00
720a7b2887 [export] Remove .contiguous() when saving weights to raw bytes (#163587)
Summary: `.contiguous()` will discard the original storage size of the tensor, and could lead to issues during loading.

Test Plan:
buck2 run mode/dev-nosan caffe2/test:test_export -- -r test_1D_tensor_slicing
buck2 run mode/dev-nosan caffe2/test:test_export -- -r test_2D_tensor_slicing

Differential Revision: D83016250

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163587
Approved by: https://github.com/angelayi
2025-09-23 15:44:56 +00:00
b756b580fb Improve fake tensor leakage detection in export by not relying on gc too much (#163516)
Previously we relied on gc to get the snapshot of fake tensors before and after export to get list of fake tensors that are created during export. This caused some flakiness in our test suite (https://github.com/pytorch/pytorch/issues/162232). it seems super hard to make gc deterministic, so we just instrument fake tensor creation which seems lot better. In addition, it is also quite faster than previous approach becuase we are no longer manually triggering garbage collector.

Differential Revision: [D82966648](https://our.internmc.facebook.com/intern/diff/D82966648)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163516
Approved by: https://github.com/ezyang
2025-09-22 22:04:24 +00:00
0e9f9c3a61 Fix inconsistent test and add new tracer as config (#162558)
It is better to have the new tracer as global config that can be manipulated easily. Also I believe dynamo-like config infra is useful instead of relying on custom way of patching stuff.

Differential Revision: [D82478649](https://our.internmc.facebook.com/intern/diff/D82478649)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162558
Approved by: https://github.com/zhxchen17
ghstack dependencies: #162557
2025-09-17 17:01:48 +00:00
489860f3c2 Prefer_deferred_runtime_asserts should be propagated to new tracer (#162556)
Differential Revision: [D82478650](https://our.internmc.facebook.com/intern/diff/D82478650)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162556
Approved by: https://github.com/avikchaudhuri
ghstack dependencies: #162487
2025-09-16 21:25:00 +00:00
a4e74f416b Fix error message (#162487)
More proper fix here should be that we directly replace shape_env with correct sources but it is bit involved as we have to manually construct dynamo sources by hand (need to handle list/dict etc) but it is quite easy if we are operating on a string so i do this as post-processing step for now.

Differential Revision: [D82478647](https://our.internmc.facebook.com/intern/diff/D82478647)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162487
Approved by: https://github.com/zhxchen17
2025-09-16 19:06:30 +00:00
543d50db2b Fix torch export with dict input nested in args (#162618)
Investigated together with @pyemma and @taotaohuang001

## Problem
when calling exported module with dict nested in the args tuple, it will make following complaits
```
Traceback (most recent call last):
  File "/home/chzhu/infinitrain/test_torch_export.py", line 32, in <module>
    print(exported_model({"a2": torch.randn(10), "a1": torch.randn(10)}))
  File "/home/chzhu/infinitrain/build/infinitrain/environments/development-venv/lib/python3.10/site-packages/torch/fx/graph_module.py", line 848, in call_wrapped
    return self._wrapped_call(self, *args, **kwargs)
  File "/home/chzhu/infinitrain/build/infinitrain/environments/development-venv/lib/python3.10/site-packages/torch/fx/graph_module.py", line 424, in __call__
    raise e
  File "/home/chzhu/infinitrain/build/infinitrain/environments/development-venv/lib/python3.10/site-packages/torch/fx/graph_module.py", line 411, in __call__
    return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
  File "/home/chzhu/infinitrain/build/infinitrain/environments/development-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/chzhu/infinitrain/build/infinitrain/environments/development-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1879, in _call_impl
    return inner()
  File "/home/chzhu/infinitrain/build/infinitrain/environments/development-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1806, in inner
    args_kwargs_result = hook(self, args, kwargs)  # type: ignore[misc]
  File "/home/chzhu/infinitrain/build/infinitrain/environments/development-venv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 929, in _fn
    return fn(*args, **kwargs)
  File "/home/chzhu/infinitrain/build/infinitrain/environments/development-venv/lib/python3.10/site-packages/torch/export/_unlift.py", line 81, in _check_input_constraints_pre_hook
    flat_args_with_path = _check_inputs_match(args, kwargs, self._in_spec)
  File "/home/chzhu/infinitrain/build/infinitrain/environments/development-venv/lib/python3.10/site-packages/torch/export/_unlift.py", line 64, in _check_inputs_match
    raise ValueError(  # noqa: B904
ValueError: Trying to flatten user inputs with exported input tree spec:
TreeSpec(tuple, None, [TreeSpec(tuple, None, [TreeSpec(dict, ['a1', 'a2'], [*,
      *])]),
  TreeSpec(dict, [], [])])
but actually got inputs with tree spec of:
TreeSpec(tuple, None, [TreeSpec(tuple, None, [TreeSpec(dict, ['a2', 'a1'], [*,
      *])]),
  TreeSpec(dict, [], [])]).
Please check that the inputs have the same number and type of args and kwargs as the ones you used when tracing.

```

## How to reproduce the issue
```python
import torch

# create a nn.Module with data_batch as input and output as output
class MyModel(torch.nn.Module):
   def __init__(self):
       super(MyModel, self).__init__()
       self.linear = torch.nn.Linear(10, 1)

   def forward(self, data_batch):
       h1 = self.linear(data_batch["a1"])
       h2 = self.linear(data_batch["a2"])
       return h1 + h2

# torch export this module
model = MyModel()
example_args_forward = (
   {
       "a1": torch.randn(10),
       "a2": torch.randn(10),
   },
)
exported_model = torch.export.export(model, example_args_forward, strict=True)

# save the exported model
torch.export.save(exported_model, "exported_model.pt2")

# load the exported model
exported_model = torch.export.load("exported_model.pt2").module()

# run the exported model
print(exported_model({"a2": torch.randn(10), "a1": torch.randn(10)}))

```

## Root Cause
Input spec is encoded as [TreeSpec](582d278983/torch/utils/_pytree.py (L1059)) in torch export. With (args, kwargs) at the top level. When we call the exported model, it has a pre-execution [hook](582d278983/torch/export/_unlift.py (L66)) to check the input TreeSpec matches the received TreeSpec, where in Treespec, the dict key order is preserved. Something like

TreeSpec(dict, ['a2', 'a1'], [*,*])

To workaround this, the input check reorders [kwargs](582d278983/torch/export/_unlift.py (L67)), that is why kwargs can be out of order. But the dict nested in the args is not re-ordered, so any re-ordering of the keys will throw errors.

## Solution
Update eq_spec to handle the dict case, where we only guarantee that key set is the same without ordering constraints.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162618
Approved by: https://github.com/angelayi
2025-09-13 03:24:30 +00:00
501e19137a fix var args for shape guards (#162633)
Summary: Fixes #162599

Test Plan:
added test based on repro

Rollback Plan:

Differential Revision: D82144520

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162633
Approved by: https://github.com/tugsbayasgalan
2025-09-12 00:33:35 +00:00
463fbc8ca0 Support vmap + custom autograd function/improve DTensor constructor inefficiency (#162240)
This makes gemma3 exportable on transformers=4.55.4

In HF, there is a torch funciton mode called TransformGetItemToIndex which internally calls custom autograd function. When this custom autograd function is called under vmap, It triggers CustomFunctionHigherOrderOP which error-ed because there was no pre-dispatch proxy mode implementation.

Since there are number of requests lately to add various operators in pre-dispatch IR, I introduce a decorator in export that works similar to `allow_in_graph`. Basically:
1) We intercept custom_autograd_function.apply at pre-dispatch mode when this decorator is applied
2) We apply `flat_apply` HOP to hide the pytree spec for this autograd function. Note that this adds restriction that this custom autograd function needs to take in fx-able types.
3) subclass constructor decorator is implemented similarly, so we just refactor it to use similar implementation as this new decorator. eventually we should delete the subclass constructor decorator.
4) Move some code in subclass constructor decorator to exit early in non-export environment which should shave off some inefficiency (around 1% according to @swolchok 's benchmark)

Fixes: https://github.com/pytorch/pytorch/issues/161563#issuecomment-3246309758

Differential Revision: [D82141316](https://our.internmc.facebook.com/intern/diff/D82141316)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162240
Approved by: https://github.com/ydwu4
2025-09-11 17:42:41 +00:00
3f6d88f04c paths to exclude shape guards (#162684)
Summary: Easier to land than https://www.internalfb.com/diff/D82030581

Test Plan:
everything blamed by https://www.internalfb.com/diff/D80713603 (except some old exir tests)

Rollback Plan:

Differential Revision: D82180349

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162684
Approved by: https://github.com/tugsbayasgalan
2025-09-11 15:34:06 +00:00
22df9332da [serialization] Add pte file to archive (#162520)
Summary:
Add _package_executorch_files to archive apis. Allow us to package a PTE file into the archive.

I don't think there's a use-case to have more than one PTE file at the moment, but left it as `EXECUTORCH_FILES` just in case.

Test Plan:
Tested in D81992612

Rollback Plan:

Differential Revision: D81977483

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162520
Approved by: https://github.com/angelayi
2025-09-11 07:59:11 +00:00
6d65737aee testing infra and some fixes (#162183)
This PR is quite large in that it covers most of rough edges in the new strict export flow:

1. Handle nn_module_stack correctly now that we are tracing wrapper module
2. module_call_spec needs to get queried from source directly because we are not running the bytecode anymore.
3. Correct input and output handling.

@diff-train-skip-merge

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162183
Approved by: https://github.com/zhxchen17
2025-09-10 20:48:12 +00:00