Commit Graph

615 Commits

Author SHA1 Message Date
b2fc9cfea1 [precompile] Add CompilePackage to serialize dynamo states. (#155118)
Adding a per torch.compile() object CompilePackage which tracks dynamo artifact. CompilePackage is considered a low level component and should not be directly exposed to end users. It has the following interface:

1. `CompilePackage.__init__()` which optionally takes previously serialized dynamo states.
     a. when `dynamo` argument is None, it will contruct a brand new CompilePackage object.
     b. when `dynamo` argument is not None, it will load a pre-compiled dynamo state.
2. `package.save()` which dumps the dynamo states into _DynamoCacheEntry.
3. `package.install(backends)` which will handle all the side-effectful global scope updates with compiled functions and resume functions.

This diff focus on making the low level mechanism for precompile. It will be left to upper level interface to use these API to build more user-facing frontend.

Differential Revision: [D75956538](https://our.internmc.facebook.com/intern/diff/D75956538/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155118
Approved by: https://github.com/jamesjwu

Co-authored-by: James Wu <jjwu@meta.com>
2025-06-13 13:54:10 +00:00
d1947a8707 Migrate from lru_cache to cache (#155613)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155613
Approved by: https://github.com/ezyang
ghstack dependencies: #155612
2025-06-11 19:44:18 +00:00
c4b93e6579 Replace frame_traced_fn hook with get_traced_code() util (#155249)
#153622 introduced a hook for getting the relevant code objects after frame tracing. The idea is to have vLLM use this instead of monkey-patching `inline_call_()` to determine the source code files to hash. Unfortunately, the hook runs too late; the vLLM backend needs access to the set of source code filenames while it's running.

This PR replaces the newly-added hook with a utility function that a backend can call to get this information. I've made the change in vLLM and can verify that this allows the information to be queried at the right time.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155249
Approved by: https://github.com/zou3519
2025-06-10 22:40:58 +00:00
067fd0b3ab [dynamo][cleanup] Simplify disabling of the helper functions on tensor properties (#155259)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155259
Approved by: https://github.com/zhxchen17
2025-06-06 19:44:40 +00:00
271ca679a8 [reland][dynamo] Record the pre-graph bytecode using fast record function event (#154974)
reland of https://github.com/pytorch/pytorch/pull/154769

@diff-train-skip-merge
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154974
Approved by: https://github.com/Lucaskabela, https://github.com/jansel
2025-06-06 13:11:03 +00:00
e01fde8213 Revert "[reland][dynamo] Record the pre-graph bytecode using fast record function event (#154974)"
This reverts commit bee9c70c5d4b681ec1f2adf92eca1205b372634a.

Reverted https://github.com/pytorch/pytorch/pull/154974 on behalf of https://github.com/malfet due to Broke inductor tests, see 3c72b9fd8f/1 ([comment](https://github.com/pytorch/pytorch/pull/154974#issuecomment-2944370617))
2025-06-05 13:36:21 +00:00
bee9c70c5d [reland][dynamo] Record the pre-graph bytecode using fast record function event (#154974)
reland of https://github.com/pytorch/pytorch/pull/154769

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154974
Approved by: https://github.com/Lucaskabela, https://github.com/jansel
2025-06-05 07:25:04 +00:00
c881f2ddf3 [reland][dynamo] Mark a vt unspecialized nn module variable source earlier (#155099)
Reland of https://github.com/pytorch/pytorch/pull/154780

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155099
Approved by: https://github.com/williamwen42
2025-06-04 23:05:36 +00:00
a99a01a677 Revert "[dynamo] Mark a vt unspecialized nn module variable source earlier (#154780)"
This reverts commit cc96febb979da16b0a0b758020b330a49c72b7e7.

Reverted https://github.com/pytorch/pytorch/pull/154780 on behalf of https://github.com/seemethere due to This fails internal testing see, https://fburl.com/diff/b0yuxk4w ([comment](https://github.com/pytorch/pytorch/pull/154780#issuecomment-2940381691))
2025-06-04 15:03:34 +00:00
cc96febb97 [dynamo] Mark a vt unspecialized nn module variable source earlier (#154780)
I am working on providing some skip guard helper functions to allow users to reduce guard overhead. This is a refactor to allow that.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154780
Approved by: https://github.com/StrongerXi, https://github.com/jansel
2025-06-03 19:19:47 +00:00
a7e496a896 Revert "[dynamo] Record the pre-graph bytecode using fast record function event (#154769)"
This reverts commit 409c396a48584de1ab14e1be6957663d548ad89e.

Reverted https://github.com/pytorch/pytorch/pull/154769 on behalf of https://github.com/seemethere due to This fails internal tests see [fburl.com/diff/67gyp7gp](https://fburl.com/diff/67gyp7gp) ([comment](https://github.com/pytorch/pytorch/pull/154769#issuecomment-2933629894))
2025-06-03 06:13:49 +00:00
409c396a48 [dynamo] Record the pre-graph bytecode using fast record function event (#154769)
![image](https://github.com/user-attachments/assets/1d06618b-1c14-4ed5-ab7b-dcfecbb4d632)

Adds another event in the profiler traces. This can help us find models where pre-graph bytecode is very expensive.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154769
Approved by: https://github.com/zou3519, https://github.com/williamwen42, https://github.com/StrongerXi, https://github.com/jansel
2025-06-02 22:33:27 +00:00
b6b9311f4f [BE][Ez]: Fix typo in dynamo utils #154639 (#154748)
Fixes a typo in #154639

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154748
Approved by: https://github.com/ngimel
2025-05-30 18:39:01 +00:00
2120eeb8de [BE][Ez]: Improve dynamo utils typing with TypeIs and TypeGuard (#154639)
Adds some additional TypeIs and TypeGuard to some _dynamo utils for additional type narrowing

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154639
Approved by: https://github.com/jansel
2025-05-30 18:09:50 +00:00
8002d22ce3 [dynamo] Trace into descriptor with __set__ (#154176)
As title, this patch basically implements
https://github.com/python/cpython/blob/3.11/Objects/object.c#L1371-L1452,
and make the `__get__` handling more robust.

I ran into this while fixing #133762.

Differential Revision: [D75488090](https://our.internmc.facebook.com/intern/diff/D75488090)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154176
Approved by: https://github.com/jansel
2025-05-30 16:14:37 +00:00
bb17f9c98b [AOTAutogradCache] Fix CHROMIUM_EVENT_LOG being none (#154258)
It turns out if you import something that's None at import time in python, and later update the value, the one you imported stays none:

```
import torch
from torch._dynamo.utils import CHROMIUM_EVENT_LOG
class Foo:
  pass
torch._dynamo.utils.CHROMIUM_EVENT_LOG =  Foo()

print(CHROMIUM_EVENT_LOG) # None
```

This fixes teh bug so we get AOTAUtogradCache instant events again

Differential Revision: [D75305770](https://our.internmc.facebook.com/intern/diff/D75305770/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154258
Approved by: https://github.com/oulgen
2025-05-23 21:53:31 +00:00
3443627e07 Revert "[BE]: Enable RUFF TRY400 rule - log.exception (#153473)"
This reverts commit 4f4ecc583e0f48ad2d062a53bf91c61ab40b4948.

Reverted https://github.com/pytorch/pytorch/pull/153473 on behalf of https://github.com/jeanschmidt due to seems to have broken internal signals, @albanD may I count on you to help the author merge his PR? D74837988 ([comment](https://github.com/pytorch/pytorch/pull/153473#issuecomment-2886017075))
2025-05-16 08:29:26 +00:00
4f4ecc583e [BE]: Enable RUFF TRY400 rule - log.exception (#153473)
Change logging.error to logging.exception to log additional information when relevant.  A few places have slipped in logging.errors in try except since I last did a clean up here and the rule is stabilized so I am enabling it codebase wide. I have NOQA'd much of our custom exception stack trace handling for RPC calls and distributed and tried to a fix a few errors based on whether we immediately reraised it or if we didn't print any exception handling where it could be useful.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153473
Approved by: https://github.com/albanD, https://github.com/cyyever
2025-05-15 13:36:59 +00:00
dde705864a Fix test broken by D73809989 (#153413)
Summary: I forgot to remove this unused field in D73809989.

Test Plan: `buck test 'fbcode//mode/opt' fbcode//caffe2/test:fbonly -- --exact 'caffe2/test:fbonly - test_compilation_metrics_logger_in_sync (caffe2.test.fb.test_fb.TestFBOnly)'`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153413
Approved by: https://github.com/c00w
2025-05-13 16:44:30 +00:00
ff039d39ec [Dynamo] Optimize dedupe region ancestor tracking (#152589)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152589
Approved by: https://github.com/anijain2305
ghstack dependencies: #152389, #152505, #152410, #152506, #152570, #152572
2025-05-13 12:17:59 +00:00
3592cb52d9 [Hierarchical Compilation] Use universal flatten APIs (#152505)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152505
Approved by: https://github.com/anijain2305
ghstack dependencies: #152389
2025-05-13 12:17:59 +00:00
023a3dc69f [Hierarchical Compilation] Track node mutations (#152389)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152389
Approved by: https://github.com/anijain2305
2025-05-13 12:17:59 +00:00
3555ebb63d [BE]: Update ruff to 0.11.8 (#153249)
Fixes a ton of false negatives throughout the codebase. RUFF also properly validates NOQA comments now and most of the changes are fixing typos there or removing filewide flake8 suppressions that were also silencing ruff issues.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153249
Approved by: https://github.com/cyyever, https://github.com/albanD, https://github.com/seemethere
2025-05-12 18:30:52 +00:00
5c3fddb9cc Revert "[Hierarchical Compilation] Track node mutations (#152389)"
This reverts commit c2936ebfd58be7a6519f51d165dfac8407020140.

Reverted https://github.com/pytorch/pytorch/pull/152389 on behalf of https://github.com/jeanschmidt due to Humm, interesting, there seems to be a bug in stack PRs, as it should be part of the stack and be reverted with the other ones ([comment](https://github.com/pytorch/pytorch/pull/152389#issuecomment-2873540451))
2025-05-12 18:18:44 +00:00
78d752e96a Revert "[Hierarchical Compilation] Use universal flatten APIs (#152505)"
This reverts commit f9e3a9058e80fde310e5f0919d3a21e28cd024a8.

Reverted https://github.com/pytorch/pytorch/pull/152505 on behalf of https://github.com/jeanschmidt due to [TENTATIVE] reverting to check if reverting this stack partially caused the introduction of https://github.com/pytorch/pytorch/actions/runs/14966121510/job/42049638969#step:22:875 ([comment](https://github.com/pytorch/pytorch/pull/152505#issuecomment-2872869990))
2025-05-12 14:48:08 +00:00
aa7fe6af41 Revert "[Dynamo] Optimize dedupe region ancestor tracking (#152589)"
This reverts commit b5f1345f72ec6d1b004b05284e9553e65ee03abc.

Reverted https://github.com/pytorch/pytorch/pull/152589 on behalf of https://github.com/jeanschmidt due to Breaking internal signal citadel-fbcode-test-mode-opt-for-pt2_stack_for_internal-linux-0 please see diff [D74531503](https://www.internalfb.com/diff/D74531503) for more details ([comment](https://github.com/pytorch/pytorch/pull/152410#issuecomment-2871168679))
2025-05-12 07:15:09 +00:00
01bb249978 Revert "has_triton: Use the device interface for detecting Triton availability (#139171)"
This reverts commit 48bfe9afc70a98addd5aa738bf501c029e4a9285.

Reverted https://github.com/pytorch/pytorch/pull/139171 on behalf of https://github.com/masnesral due to Performance regression for huggingface ([comment](https://github.com/pytorch/pytorch/pull/139171#issuecomment-2868939790))
2025-05-10 14:46:23 +00:00
b5f1345f72 [Dynamo] Optimize dedupe region ancestor tracking (#152589)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152589
Approved by: https://github.com/anijain2305
ghstack dependencies: #152389, #152505, #152410, #152506, #152570, #152572
2025-05-10 08:27:56 +00:00
f9e3a9058e [Hierarchical Compilation] Use universal flatten APIs (#152505)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152505
Approved by: https://github.com/anijain2305
ghstack dependencies: #152389
2025-05-10 08:27:07 +00:00
c2936ebfd5 [Hierarchical Compilation] Track node mutations (#152389)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152389
Approved by: https://github.com/anijain2305
2025-05-10 08:27:01 +00:00
2d25e4d478 [1/n][Optimus][Auto-AC] Support activation quantization without scaling (#148380)
Summary: We enable the activation quantization in the forward pass, and users can customize the dtype they want to quantize.

Test Plan:
# unit test

```
buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/test/inductor:quantization -- test_activation_quantization_aten
```

Buck UI: https://www.internalfb.com/buck2/776d3911-bb86-4ac8-a527-540cf1510b9d
Test UI: https://www.internalfb.com/intern/testinfra/testrun/4785074873051017
Network: Up: 4.3MiB  Down: 42MiB  (reSessionID-fef7e727-68b1-4645-a519-5652854df38d)
Executing actions. Remaining     0/4                                                                                 6.7s exec time total
Command: test.     Finished 2 local
Time elapsed: 3:11.5s
Tests finished: Pass 2. Fail 0. Fatal 0. Skip 0. Build failure 0

# E2E

### how to enable (you can overrite the dtype, if nothing given, the default is fp8)

```
post_grad_fusion_options={
            "activation_quantization_aten_pass": {"quant_type": "torch.float8_e5m2"}
        },
```

Differential Revision: D70522237

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148380
Approved by: https://github.com/Mingming-Ding, https://github.com/Hahu803
2025-05-08 04:44:15 +00:00
48bfe9afc7 has_triton: Use the device interface for detecting Triton availability (#139171)
This PR replaces the `has_triton()` global method which was previously used for this task.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139171
Approved by: https://github.com/jansel, https://github.com/shink
2025-05-07 12:23:10 +00:00
5b9df57b50 [dynamo] context manager/decorator for dynamo config patching during tracing (#150586)
Implement traceable config patching for Dynamo: enables restricted patching of Dynamo config where user can use a context manager/decorator to change tracing behavior for parts of the code.

The new `dont_skip_tracing` decorator/context manager for ignoring most trace rules is easily implemented with this more generic traceable config patching feature.

Implementation:
- Create a new specialized context manager class representing a wrapper around torch._dynamo.config.patch
- Dynamo doesn't trace into the context manager but updates config at compile time
- Correctness is based on our correctness for handling supported context managers
- Implementation is inspired by how `GradModeVariable` is implemented.

Previous attempts: https://github.com/pytorch/pytorch/pull/148736 (decorator-only global approach) and https://github.com/pytorch/pytorch/pull/149439 (decorator-only traceback approach)

See https://docs.google.com/document/d/1vWNwKL_jpg-PLopifcaSa338wks3GqSVF4GHRguybGg/edit?tab=t.0 for more details on implementation - including previous approaches.

NOTE: this PR fixes a bug where skipped code objects were not tracked by convert_frame.py, leading to cases where code objects would be automatically skipped even after `torch._dynamo.reset()`. This exposed some latent dynamo-wrapped test failures in CI that previously passed in CI but not locally.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150586
Approved by: https://github.com/jansel, https://github.com/zou3519, https://github.com/anijain2305
2025-04-23 09:12:13 +00:00
529f698ad4 [logging] Put "everything" WaitCounters in dynamo_timed (#151757)
Summary: The main motivation is to capture the cudagraphs overhead in a WaitCounter. We'll combine that with Triton autotuning, and therefore rename to "compile_runtime_overheads". Since we have a couple WaitCounters where we want to capture all runtime and compile overheads, let's put the accounting in dynamo_timed so we'll automatically capture any toplevel timed regions that get added in the future. Also, dynamo_timed already has to figure out if we're timing a runtime vs. compile-time event, so we can reuse some of that logic.

Test Plan:
Ran an internal model with `TORCHINDUCTOR_BENCHMARK_FUSION=1` (to get benchmarking at compile time in addition to runtime).

Overall compile time from various sources matches up:
* tlparse: https://fburl.com/9fgsstkr. Eyeballing, total time should be 32 ranks x 2175 = ~69.6k s
* ods: https://fburl.com/canvas/r4clhnb7. Right on.
* dynamo_compile: https://fburl.com/scuba/dynamo_compile/ax71aqox. Right on.
* pt2_compile_events: https://fburl.com/scuba/pt2_compile_events/shcjd9ql. Right on.

And the runtime overhead:
* ods: https://fburl.com/canvas/nvgjb282
* dynamo_compile: https://fburl.com/scuba/dynamo_compile/f2dtv0qh

If we compare that to a run of the same model without the changes in this stack, results can mismatch by a lot:
* tlparse: https://fburl.com/cchxwd1s. Eyeballing, total time should be 32 ranks x 2300s = ~73.5k s
* ods: https://fburl.com/canvas/x1i3wvf4. It's kinda close
* dynamo_compile: https://fburl.com/scuba/dynamo_compile/l7sgxdxd. Waaay too high.
* pt2_compile_events: https://fburl.com/scuba/pt2_compile_events/jb4s9z1u. This is the only one that's actually correct.

The discrepancy is even worse if we focus on the runtime events:
* ods: https://fburl.com/canvas/a4o9f7ou
* dynamo_compile: https://fburl.com/scuba/dynamo_compile/95izaes1

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151757
Approved by: https://github.com/ppanchalia
ghstack dependencies: #151749
2025-04-22 03:29:13 +00:00
edba20b853 [logging] Fix duration logging for dynamo_compile (#151749)
Summary: There are a few issues I'm solving:.
1. It's too hard to measure total pt2 overhead using the dynamo_compile table because users need to know the columns representing all the top-level events (dynamo_cumulative_compile_time_us, etc.). Instead, let's populate the existing duration_us field for all top-level events. The complication is that runtime events in particular (Triton autotuning, cudagraphify) can be collapsed into a single row, with gaps in between, so we can't simply use `end_time - start_time` in all cases. Instead, we'll sum durations for all outer events when updating the compile-time or runtime metrics context. Introduce a 'depth' counter in TLS to track the nesting of CompilationMetrics events.
2. The existing implementation relies on callers of dynamo_timed to specify whether the event is a runtime or compile-time event. That doesn't work because some methods can be called in both situations, e.g., `CachingAutotuner.benchmark_all_configs`. For example `TORCHINDUCTOR_BENCHMARK_FUSION=1` enables benchmarking during compile-time. Instead, we can figure out automatically whether we're measuring a compile-time or runtime event and log accordingling.
3. If `log_compilation_events` were to throw an exception, we'd fail to clear the aggregated counters for runtime logs and they could be attributed to the wrong compile ID. I didn't actually find evidence of this in practice, but I added exception handling for extra safety.

Test Plan:
Ran internal models and compared dynamo_compile to pt2_compile_events:
`TORCHINDUCTOR_BENCHMARK_FUSION=0`
* tlparse: https://fburl.com/itciwnxc
* dynamo_compile: https://fburl.com/scuba/dynamo_compile/yvkif5vb
* pt2_compile_events: https://fburl.com/scuba/pt2_compile_events/segijet7

`TORCHINDUCTOR_BENCHMARK_FUSION=1`
* tlparse: https://fburl.com/jgurcvkw
* dynamo_compile: https://fburl.com/scuba/dynamo_compile/uum91ceb
* pt2_compile_events: https://fburl.com/scuba/pt2_compile_events/x4xnisez

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151749
Approved by: https://github.com/Skylion007
2025-04-22 03:29:13 +00:00
585d03fa39 Record how many parameters we're parsing within dynamo (#148508)
This allows us to track how many paramaters we have in compilations.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148508
Approved by: https://github.com/jansel, https://github.com/anijain2305

Co-authored-by: Sam Larsen <slarsen@meta.com>
2025-04-16 06:15:11 +00:00
6a1499d209 [dynamo] handle tensor subclass with non-classmethod __torch_function__ (#151061)
As title, this patch fixes bugs in
1. emulating `has_torch_function`
2. emulating calling `__torch_function__`
3. building a callable VT for non-classmethod `__torch_function__`

Fixes #120799, #150265, #150848.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151061
Approved by: https://github.com/anijain2305, https://github.com/mlazos
ghstack dependencies: #151060
2025-04-15 03:55:34 +00:00
2a1e2b88ed [logging] Add pgo remote get/put timings to dynamo_compile (#150322)
Test Plan: https://fburl.com/scuba/dynamo_compile/sandbox/xf950tw8

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150322
Approved by: https://github.com/ppanchalia
2025-04-07 18:08:26 +00:00
98d06b401b [Dynamo] Fix dict.items() return type (#150112)
Fixes #150110

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150112
Approved by: https://github.com/jansel, https://github.com/zou3519
2025-04-04 04:32:13 +00:00
1979a409e9 Make CompileEventLogger more defensive w.r.t to AOTAutogradCache and FXGraphCache (#150423)
This PR makes it so that we don't crash due to logging if we invoke AOTAutogradCache/FXGraphCache without using dynamo. This is preparation for supporting certain VLLM use cases where they store graph modules and have special handling in conjunection with the caches.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150423
Approved by: https://github.com/oulgen
2025-04-04 01:55:13 +00:00
bb98749230 [dynamo] Always trace into tensor subclass __torch_function__ (#149792)
This patch effectively ignores traceable_tensor_subclasses, allowing
Dynamo to always try tracing into the `__torch_function__` of tensor
subclass. This helps us with 2 things:
1. allowing users to directly benefit from better compilation of tensor
   subclass, by just upgrading pytorch, without having to change legacy
   library code (see earlier patches in the stack for examples).
2. potentially exposing more issues in compiling tensor subclass, so we
   can get signals and improve them.

As a consequence, it exposed and fixes 2 subtle bugs:
1. In `build_torch_function_fn`, we could get
   `torch._C._disabled_torch_function_impl` because we have a
   `Parameter` subclass without `__torch_function__` override or if we
   have a tensor subclass with `__torch_dispatch__` override. We graph
   break on this for now, and plan to add support -- the logic for
   simulating `torch._C._disabled_torch_function_impl` is already in
   `SuperVariable`, we just need to reuse it.
2. Sometimes we create `SyntheticLocalSource` and need to remove all the
   guards installed on it, but we only removed the ones whose source
   _is_ the created synthetic source `s`, but forgot about chained
   source like `s.foo`, this showed up as
   `SYNTHETIC_LOCAL['tmp_0'].__torch_function__.__func__`.

Differential Revision: [D71906141](https://our.internmc.facebook.com/intern/diff/D71906141)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149792
Approved by: https://github.com/jansel, https://github.com/mlazos
ghstack dependencies: #149482, #149483, #149484
2025-04-02 20:57:00 +00:00
e545567340 Revert "[dynamo] Always trace into tensor subclass __torch_function__ (#149792)"
This reverts commit 238109ad3245c5485f9e83b4b02d258b09329042.

Reverted https://github.com/pytorch/pytorch/pull/149792 on behalf of https://github.com/malfet due to Broke trunk, see b03c42109c/1 ([comment](https://github.com/pytorch/pytorch/pull/149482#issuecomment-2773650522))
2025-04-02 20:30:32 +00:00
238109ad32 [dynamo] Always trace into tensor subclass __torch_function__ (#149792)
This patch effectively ignores traceable_tensor_subclasses, allowing
Dynamo to always try tracing into the `__torch_function__` of tensor
subclass. This helps us with 2 things:
1. allowing users to directly benefit from better compilation of tensor
   subclass, by just upgrading pytorch, without having to change legacy
   library code (see earlier patches in the stack for examples).
2. potentially exposing more issues in compiling tensor subclass, so we
   can get signals and improve them.

As a consequence, it exposed and fixes 2 subtle bugs:
1. In `build_torch_function_fn`, we could get
   `torch._C._disabled_torch_function_impl` because we have a
   `Parameter` subclass without `__torch_function__` override or if we
   have a tensor subclass with `__torch_dispatch__` override. We graph
   break on this for now, and plan to add support -- the logic for
   simulating `torch._C._disabled_torch_function_impl` is already in
   `SuperVariable`, we just need to reuse it.
2. Sometimes we create `SyntheticLocalSource` and need to remove all the
   guards installed on it, but we only removed the ones whose source
   _is_ the created synthetic source `s`, but forgot about chained
   source like `s.foo`, this showed up as
   `SYNTHETIC_LOCAL['tmp_0'].__torch_function__.__func__`.

Differential Revision: [D71906141](https://our.internmc.facebook.com/intern/diff/D71906141)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149792
Approved by: https://github.com/jansel, https://github.com/mlazos
ghstack dependencies: #149482, #149483, #149484
2025-04-02 17:05:25 +00:00
3ac5a499dd [dynamo] add dynamo disable reasons to codebase (#150440)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150440
Approved by: https://github.com/jansel, https://github.com/zou3519
ghstack dependencies: #150341
2025-04-02 04:26:48 +00:00
005c9b2f4f Fix _Waitcounter decorator and dd backward pass wait counter (#150235)
Summary:
This will log a wait counter with for backward compile and fixes weirdness with nested context managers.

Since the old wait counters added through dynamo_timed were never created with the nesting issue. I am also changing the key nomenclature from `pytorch.dynamo_timed` to `pytorch.wait_counter`. We want to use the same nomenclature, to make it easy to find keys.

Reviewed By: jamesjwu

Differential Revision: D72032055

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150235
Approved by: https://github.com/jamesjwu, https://github.com/masnesral
2025-03-30 05:20:12 +00:00
d4da0e955e [Dynamo] Fix is_compile_supported() when device_type contains device index (#147837)
Fixes #147826

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147837
Approved by: https://github.com/anijain2305
2025-03-28 07:16:29 +00:00
1c98dc3664 [dynamo] Fix handling of setattr with some tensor attributes (#149791)
We weren't handling `setattr(tensor_obj, "real", 42)` correctly, because
the attribute is a `GetSetDescriptorType` that has special setter logic.
See added test and comments for more explanations.

This patch makes it so that we graph break in those cases, rather than
resulting in silent incorrectness.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149791
Approved by: https://github.com/mlazos
ghstack dependencies: #149481
2025-03-25 18:57:56 +00:00
1e30192b19 [logging] Add python version to dynamo_compile table (#149419)
Summary: This adds a version field like the following: `3.10.9+fb (3.10:1dd9be6, May  4 2022, 01:23:45) [Clang 15.0.7 (mononoke://mononoke.internal.tfbnw.net/fbsource 5d1601b0eed7426ac`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149419
Approved by: https://github.com/c00w
2025-03-20 01:48:34 +00:00
a3c286677b [compile] Switch off inference mode during compilation (#149321)
PR does following
* Turns `inference_mode` to False and `no_grad` for `convert_frame`, if the inference_mode is on globally.
* Turns off inference_mode for fake tensor prop. This ensures that converting from real inference tensor to a fake tensor removes the inference-ness.
* Graph breaks on is_inference and is_inference_mode_enabled.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149321
Approved by: https://github.com/jansel, https://github.com/zou3519
2025-03-19 02:45:27 +00:00
6c7d8419e3 fix two accuracy regression (#149172)
There are 2 accuracy regression in 3/12 nightly perf run. I can not repro them locally thus there is no effective way to bisect. Raise the tolerance to make them pass the accuracy check.

- error log for HF MegatronBertForQuestionAnswering https://gist.github.com/shunting314/25322b66e15e98feed32e0d9a1e43316
- error log for TIMM gluon_inception_v3 https://gist.github.com/shunting314/df64ce22327df27a7057bbbd19ef5164

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149172
Approved by: https://github.com/jansel, https://github.com/eellison
2025-03-17 19:34:00 +00:00