Commit Graph

512 Commits

Author SHA1 Message Date
dc55704b48 Rename cache limit to recompile limit in configs (#143709)
This PR renames every cache_limit to recompile_limit via sed.

Old config options are maintained via Config(alias='xyz')

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143709
Approved by: https://github.com/jansel
2024-12-22 10:03:57 +00:00
4627cfd1f9 [dynamo] Support user defined dicts (#143548)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143548
Approved by: https://github.com/yanboliang, https://github.com/jansel, https://github.com/williamwen42
2024-12-21 01:46:14 +00:00
d88ebbf822 cleanup chromium event log on dynamo exit rather than on entry (#143175)
clearing at dynamo start is an issue because it throws away events from compiled autograd

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143175
Approved by: https://github.com/Skylion007, https://github.com/jamesjwu
ghstack dependencies: #141907
2024-12-21 00:41:24 +00:00
ad7ab5ef84 Revert "[logging] A few fixes/updates to record_compilation_metrics (#143332)"
This reverts commit a9c753bbc88bfdc0e77f66956b3a11e405235d0f.

Reverted https://github.com/pytorch/pytorch/pull/143332 on behalf of https://github.com/malfet due to Surprisingly failure is caused by this PR ([comment](https://github.com/pytorch/pytorch/pull/143332#issuecomment-2557899120))
2024-12-21 00:06:44 +00:00
a9c753bbc8 [logging] A few fixes/updates to record_compilation_metrics (#143332)
Summary: Mostly cosmetic, but one bug fix:
* Bug fix: Make sure compile_id is converted to a string in the compilation metrics so it's printed as, e.g., "0/1" instead of "[0, 1]"
* Sort collections in `collection_to_str`
* Print non-string elements as `"<unknown>"` instead of None (since we don't expect non-strings)
* Move the population of the legacy metrics and any pre-processing to a new factory method in CompilationMetrics

Test Plan:
```
python test/dynamo/test_structured_trace.py
python test/dynamo/test_utils.py
```
Internal testing: https://fburl.com/scuba/dynamo_compile/sandbox/l0me8auf

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143332
Approved by: https://github.com/ppanchalia
2024-12-20 21:42:32 +00:00
8850a7b62c add some logging for tensorify (#143391)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143391
Approved by: https://github.com/jamesjwu
2024-12-19 20:06:26 +00:00
90cc43f270 Support garbage collection after pt2 compilation (#143364)
Summary:
Support garbage collection after pt2 compilation.
Add jk to control the global rollout / rollback of this functionality
Add env var to control individual job's rollout

Test Plan:
Test the model training job with / without this changes

Reviewers:
@yuxihu @ezyang , @Yuzhen11 ,

Subscribers:

Tasks:

Tags:

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143364
Approved by: https://github.com/ezyang
2024-12-18 07:25:11 +00:00
60c54467db [logging] Log runtime autotuning timing to scuba (#141919)
See test plan in internal diff [D66679369](https://our.internmc.facebook.com/intern/diff/D66679369)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141919
Approved by: https://github.com/jamesjwu, https://github.com/ezyang
2024-12-13 21:22:13 +00:00
dc23f1944a Remove unused Python variables in torch/[_-a]* (#133492)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133492
Approved by: https://github.com/albanD
2024-12-12 17:39:14 +00:00
30b61e521c [logging] Populate compile_time_autotune_time_us (#143104)
See testing in attached diff

Differential Revision: [D67128210](https://our.internmc.facebook.com/intern/diff/D67128210)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143104
Approved by: https://github.com/ezyang
2024-12-12 17:08:43 +00:00
5c97ac9721 Revert "Remove unused Python variables in torch/[_-a]* (#133492)"
This reverts commit fda975a7b3071a20dab8fc2c4e453479e1bb7cf2.

Reverted https://github.com/pytorch/pytorch/pull/133492 on behalf of https://github.com/clee2000 due to Sorry, I need to revert this in order to revert something else.  The only thing you need to do is rebase and remerge ([comment](https://github.com/pytorch/pytorch/pull/133492#issuecomment-2536635516))
2024-12-11 17:29:12 +00:00
fda975a7b3 Remove unused Python variables in torch/[_-a]* (#133492)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133492
Approved by: https://github.com/albanD
2024-12-10 21:48:44 +00:00
a751558467 [logging] Fix bug involving missing compilation_metrics fields in tlparse logs (#142423)
Summary: The line of code that's compiling the set of compilation_metrics to include in the corresponding tlparse log is missing the "legacy" and "common" fields populated above. Fix is to make sure we consider all fields in the compilation_metrics object.

Test Plan:
Before: https://fburl.com/d6em8csg (e.g, https://fburl.com/c19s7ny0)
After: https://fburl.com/5zr6kbvf (e.g, https://fburl.com/3hp14ht2)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142423
Approved by: https://github.com/ezyang
2024-12-10 15:58:43 +00:00
692b5e75ed [logging] Add triton_compile_time_us column to dynamo_compile (#142068)
Test Plan: See internal diff [D66799565](https://www.internalfb.com/diff/D66799565)

Differential Revision: [D66799565](https://our.internmc.facebook.com/intern/diff/D66799565)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/142068
Approved by: https://github.com/c00w
2024-12-06 16:11:57 +00:00
288b73cb14 [Redo] Set remote cache version and backend type once in compilation metrics (#141967)
(Got reverted due to a silly bug, fixed now.)

This is causing FbFxGraphRemoteCache.init to no longer be idempotent, i.e. only safe to call once per compile. AOTAutogradCache initializes a new remote cache for the forward and the backward.
Technically, we could make AOTAutogradCache smart and globally thread through a single FbFxGraphRemoteCache everywhere. But there's no reason to do so, as this class is just the handle to access the cache. Plus, it's very brittle for FbFxGraphRemoteCache to not be safe to call multiple times

Differential Revision: [D66701970](https://our.internmc.facebook.com/intern/diff/D66701970/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141967
Approved by: https://github.com/laithsakka
2024-12-04 03:07:53 +00:00
6b620423a3 dynamo_timed: Add a log_waitcounter option. (#141402)
This logs a waitcounter of the name pytorch.dynamo_timed.{key}.

Primarily sending this now to make sure everyone likes the API, then
I'll add tests, and migrate one dynamo_timed to use it. (likely starting
with
https://github.com/pytorch/pytorch/pull/141379).

Testing is a bit harder, since we don't normally have any way to read
_WaitCounter state AFAICT. I want to poke around and see if I can figure
out a way to read the state, otherwise I'll just mock it to at least
make sure it's mostly working.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141402
Approved by: https://github.com/jamesjwu, https://github.com/masnesral
2024-12-03 19:24:29 +00:00
7c3c8a662e [dynamo] Add RANGE_ITERATOR_MATCH to properly guard on range iterators (#141902)
A subsequeunt patch attempts to fix a side-effect issue for range
iterators, which in turn exposed an exising issue on guards for range
iterators -- the following test started failing:
```
PYTORCH_TEST_WITH_DYNAMO=1 python test/test_tensor_creation_ops.py TestTensorCreationCPU.test_hstack_column_stack_cpu_int16
```

This patch adds a `RANGE_ITERATOR_MATCH` guard to make sure that we
properly guard on range iterators, and adds a regression test.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141902
Approved by: https://github.com/jansel
ghstack dependencies: #141713, #141714, #141715
2024-12-03 09:18:06 +00:00
ce86119503 Revert "Set remote cache version and backend type once in compilation metrics (#141707)"
This reverts commit d633cf1f55f87e5536f63981357d543ac46e48f7.

Reverted https://github.com/pytorch/pytorch/pull/141707 on behalf of https://github.com/malfet due to It breaks tests by referencing FbRemoteFxGraphCache, but CI was green ([comment](https://github.com/pytorch/pytorch/pull/141707#issuecomment-2513555185))
2024-12-03 05:01:02 +00:00
08db735629 [BE]: Update mypy to 1.13.0 (#140808)
Update mypy to 1.13.0 . Should hopefully reduce linting time. Has support for orjson cache serialization which should improve mypy cache perf if orjson is installed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140808
Approved by: https://github.com/ezyang, https://github.com/malfet
2024-12-03 02:50:10 +00:00
d633cf1f55 Set remote cache version and backend type once in compilation metrics (#141707)
This is causing FbFxGraphRemoteCache.init to no longer be idempotent, i.e. only safe to call once per compile. AOTAutogradCache initializes a new remote cache for the forward and the backward.
Technically, we could make AOTAutogradCache smart and globally thread through a single FbFxGraphRemoteCache everywhere. But there's no reason to do so, as this class is just the handle to access the cache. Plus, it's very brittle for FbFxGraphRemoteCache to not be safe to call multiple times.

(Same problem, different fix of D66502138)

Differential Revision: [D66508492](https://our.internmc.facebook.com/intern/diff/D66508492/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141707
Approved by: https://github.com/ezyang
2024-12-03 01:49:11 +00:00
daa77f3d9f Revert "[BE]: Update mypy to 1.13.0 (#140808)"
This reverts commit 00134d68af2ce50560fa5a74473665ea229e6c9d.

Reverted https://github.com/pytorch/pytorch/pull/140808 on behalf of https://github.com/huydhn due to This is failing a distributed test in trunk, target determination missed this test and did not run it on PR ([comment](https://github.com/pytorch/pytorch/pull/140808#issuecomment-2512788426))
2024-12-02 20:47:43 +00:00
00134d68af [BE]: Update mypy to 1.13.0 (#140808)
Update mypy to 1.13.0 . Should hopefully reduce linting time. Has support for orjson cache serialization which should improve mypy cache perf if orjson is installed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140808
Approved by: https://github.com/ezyang, https://github.com/malfet
2024-12-02 18:47:54 +00:00
3141e038f0 [dynamo] Fix VariableBuilder._wrap on frozenset and enforce invariants on ConstantVariable (#141504)
Prior to this patch, we are using `ConstantVariable.create` to create VT
for frozenset objects, and intended yet failed to predicate that on all
itmes being literals (see https://github.com/pytorch/pytorch/pull/140984#discussion_r1847393736).

The code was from https://github.com/pytorch/torchdynamo/commit/7c03434 and
the original goal was to help DBR quantization, but as the new test in
this patch shows, it could lead to silent incorrectness.

Upon a closer look, this exposes some subtleties in how Dynamo handles
`ConstantVariable` and `LOAD_CONST`, so this patch both fixes the
aforementioned issue and documents, enforces, and makes explicit the
invariants around `ConstantVariable` and `LOAD_CONST` -- only immutable
objects are supported.

Specifically, this patch:
1. refine the checks for wrapping a `frozenset` object, document why we
   can't just wrap its items directly due to lack of `Sourcec` for set
   items, and use a safe workaround (`SourcelessBuilder`) to ensure
   soundness while keeping the DBR quantization support.
2. Adds more types to `common_constant_types`, thereby making
   `ConstantVariable.is_base_literal` more lenient, and strictly checks
   this property in the constructor of `ConstantVariable`.
3. Change relevant uses of `create_instruction("LOAD_CONST", ...)` to
   `create_load_const` which checks `is_safe_constant`, and makes
   developer overrides explicit by using `create_load_const_unchecked`
   when needed.
4. In a few places, use more specific `VariableTracker`, e.g.,
   `TypingVariable` rather than `ConstantVariable`, and
   `FrozensetVariable` rather than `SetVariable`.

(2) and (3) are mainly to future-proof Dynamo against bugs like (1).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141504
Approved by: https://github.com/jansel
2024-11-27 21:58:35 +00:00
a7ca6a9113 Enable autograd cache on inductor tests (#140890)
This turns on AOTAutogradCache for all inductor tests. It clears AOTAutogradCache on each test as well, by virtue of the local cache using the same directory to store cache entries.

I've also tested with INDUCTOR_TEST_DISABLE_FRESH_CACHE=1, running all the tests. AOTAutogradCache successfully caches 99% of these. There are a few tests that use view_replay and therefore save functional tensors, which cause AOTAutogradCache to fail to pickle its result. Will look into next steps there, but for now, it seems okay if the cache just misses on those cases where it can't serialize the result. It would be better to check before pickling, though.

I've made the following small bugfixes to get this working:
- Inductor is sometimes used in a standalone mode without dynamo, which leads to attribute errors in check_can_cache. In general, we should *never* crash in cache checking, only bypass. So I change a try catch to check Exception instead of just a specific exception.
- Add extra structured logging for metadata on cache hits

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140890
Approved by: https://github.com/bdhirsh
2024-11-27 20:41:43 +00:00
cdde73033e [dynamo] fix generic namedtuple support when the class is created via class MyTuple(NamedTuple, Generic[T]): ... (#141360)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141360
Approved by: https://github.com/jansel
2024-11-27 00:21:58 +00:00
07906f2f2b [logging] Move population of common MetricsContext fields to record_compilation_metrics (#141291)
Summary: Fix outstanding TODOs related to logging of CompilationMetrics by moving the population of common fields to record_compilation_metrics() instead of populating those independently wherever we use a the metrics_context contextmanager:
* Keep track of start and end time in MetricsContext and pass those to record_compilation_metrics() and populate those fields in that function.
* Pass exception info to record_compilation_metrics() and populate those field in that function.
* Add a new contextmanager, chromium_event_timed, to create the start/end "dynamo" event. This is important because I want this contextmanager to complete _after_ building the CompilationMetrics.
* Populate the compile_id field centrally in record_compilation_metrics().
* Populate the structured_logging_overhead centrally in record_compilation_metrics().
* Add the CompilationMetrics to the current chromium event in record_compilation_metrics(), after all common fields have been added. In a future diff, I can also add _all_ compilation metrics to the chromium event.

Test plan: Unit tests. Also see internal testing:
* dynamo_compile: https://fburl.com/scuba/dynamo_compile/sandbox/jrascnf9
* pt2_compile_events: https://fburl.com/scuba/pt2_compile_events/l3jnla06
* tlparse: https://fburl.com/bq5a9nqs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141291
Approved by: https://github.com/jamesjwu
2024-11-25 13:18:40 +00:00
1aea642393 pytorch/feature: Record if inductor fx cache is enabled (#141059)
This uses the underlying infrastructure and records if the fx cache is
enabled.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141059
Approved by: https://github.com/masnesral
2024-11-23 01:55:27 +00:00
45d62d6fc5 [dynamo] Added cuda and triton versions to dynamo_compile (#141290)
Opening another PR since #141140 was reverted.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141290
Approved by: https://github.com/masnesral
2024-11-22 20:04:42 +00:00
db4e8a1d8a [ca] expose option to collect sizes as dynamic (#141153)
This is to address recompiles from eager nodes that saved dynamic activations

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141153
Approved by: https://github.com/jansel
ghstack dependencies: #141152
2024-11-22 19:26:27 +00:00
f5d00f1456 pytorch/features: Make a feature logger and record triton bundling (#141056)
This modifies metrics_context to allow us to store whether a feature was
used or not.

This also starts recording this for triton bundling.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141056
Approved by: https://github.com/masnesral
2024-11-22 01:31:08 +00:00
4e34fbdcbc Add inductor_fx_graph_cache stats to dynamo_utils (#141190)
Summary:
Add the following inductor fx graph cache stats to dynamo compile

- inductor_fx_cache_hit_count
- inductor_fx_cache_miss_count
- inductor_fx_cache_backend_type
- inductor_fx_cache_hit_keys
- inductor_fx_cache_miss_keys
- remote_cache_version

Test Plan: Run local tests and staging logger: P1683061460

Differential Revision: D66232206

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141190
Approved by: https://github.com/masnesral
2024-11-21 20:59:10 +00:00
149677e30c Revert "[dynamo] Added cuda and triton versions to dynamo_compile" (#141280)
Reverts pytorch/pytorch#141140

reason: conflicts with https://github.com/pytorch/pytorch/pull/141190 and wasn't merged using mergebot

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141280
Approved by: https://github.com/clee2000, https://github.com/kit1980
2024-11-21 20:50:06 +00:00
11d0ba068f [dynamo] Added cuda and triton versions to dynamo_compile (#141140)
[dynamo] Added cuda and triton versions to dynamo_compile (#141140)

Summary:

Add cuda and triton versions to dynamo_compile logging site.

Test Plan:
$ buck2 run mode/opt //scripts/oulgen:runner
File changed: fbcode//caffe2/torch/_dynamo/convert_frame.py
Buck UI: https://www.internalfb.com/buck2/1a8ada1f-d54e-44b2-a368-b2ff2030e113
Network: Up: 65KiB  Down: 0B  (reSessionID-8f4d1d6d-a680-4ecc-8e73-c29c932d824b)
Jobs completed: 2166. Time elapsed: 7.0s.
Cache hits: 0%. Commands: 3 (cached: 0, remote: 0, local: 3)
BUILD SUCCEEDED
...
Cuda: 12.4.0
Triton: 3.0.0

Reviewed By: masnesral

Differential Revision: D66181508
2024-11-21 12:20:02 -08:00
12e95aa4ee [BE]: Apply PERF401 autofixes from ruff (#140980)
* Automatically applies ruff rule 401. Turns loops into equivalent list comprehensions which are faster and do not leak the scope of the loop variables.
* list comprehensions not only often have better typing, but are 50+% faster than for loops on overhead. They also preserve length information etc and are better for the interpreter to optimize.
* Manually went back and made mypy happy after the change.
* Also fixed style lints in files covered by flake8 but not by pyfmt

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140980
Approved by: https://github.com/justinchuby, https://github.com/malfet
2024-11-20 17:52:07 +00:00
ff17d2b83e [easy][logging] Remove dynamo_timed fwd_only param (#140993)
Summary: It's ignored; remove it

Test Plan: CI

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140993
Approved by: https://github.com/ezyang
2024-11-20 02:31:51 +00:00
1e234e63b3 [pytorch][dynamo_compile] Log inductor config to dynamo_compile (#140790)
Summary:
Scrubbed inductor config logging to dynamo_compile as json:str.

Scrub RE: `r'((^TYPE_CHECKING$)|(.*_progress$)|(.*TESTING.*)|(.*(rocm|halide).*)|(^trace\..*)|(^_))'`to save some space.

Test Plan:
Staging logger: https://fburl.com/data/ltkt08zm

P1679697917

{F1958428018}

Differential Revision: D65806399

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140790
Approved by: https://github.com/masnesral
2024-11-19 02:39:33 +00:00
8d5b3eeaa6 Remove __start__ stack, log backward compile to empty stack (#140431)
Summary:
This diff removes "__start__" from all stacks in Pt2 Compile Events, as it's unnecessary.

It also starts logging events for backward compile, because otherwise we have no toplevel event representing full backward compilation. This gives us a toplevel event outside of the inductor compile.

Test Plan:
New chromium events:

https://interncache-all.fbcdn.net/manifold/perfetto-artifacts/tree/ui/index.html?url=https%3A%2F%2Finterncache-all.fbcdn.net%2Fmanifold%2Ftlparse_reports%2Ftree%2Flogs%2Fjjwu%2Fcustom%2Fstuff4%2Fchromium_events.json#!/viewer?url=https%3A%2F%2Finterncache-all.fbcdn.net%2Fmanifold%2Ftlparse_reports%2Ftree%2Flogs%2Fjjwu%2Fcustom%2Fstuff4%2Fchromium_events.json&local_cache_key

New tlparse:
https://interncache-all.fbcdn.net/manifold/tlparse_reports/tree/logs/jjwu/custom/stuff4/index.html

New scuba icicle view, still good: https://fburl.com/scuba/pt2_compile_events/z6gr3z53

Differential Revision: D65832045

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140431
Approved by: https://github.com/masnesral
2024-11-18 22:48:31 +00:00
e1d6c08f3d Specialize symfloats when getting fake value involves complex args (#140832)
Fixed `PYTORCH_TEST_WITH_DYNAMO=1 tlp python test/test_sparse_csr.py TestSparseCSRCPU.test_sampled_addmm_cpu_complex64` when `specialize_float=False`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140832
Approved by: https://github.com/ezyang
ghstack dependencies: #140830
2024-11-17 18:17:54 +00:00
90d3584147 [dyanmo] support subclasses of namedtuple type (#140534)
Allow subclassing namedtuple type. Allow assign attributes to instances of these subtypes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140534
Approved by: https://github.com/jansel
2024-11-17 14:13:40 +00:00
e2e67a010a [logging] Add dynamo_compile fields for pre-dispatch/joint/post-dispatch times (#140306)
Tested internally: P1679622670

Differential Revision: [D65986059](https://our.internmc.facebook.com/intern/diff/D65986059)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140306
Approved by: https://github.com/ezyang
2024-11-15 15:02:08 +00:00
b11ff3cf60 [logging] Overhaul dynamo_timed and CompilationMetrics logging. (#139849)
Here's the overview:

There's a new contextmanager singleton called MetricsContext. Entering the MetricsContext is how we demarcate the boundary on which we'll create a single CompilationMetrics object, and therefore, a single dynamo_compile log entry. While we're inside the MetricsContext, we can update/set many different metrics. Most importantly: `dynamo_timed` can also update the in-progress MetricsContext. In the proposal here, we tell `dynamo_timed` that we want it to do so by providing the name of the MetricsContext field to increment. There can be many `dynamo_timed` calls in different parts of the code updating different fields. Then when the MetricsContext exits, that's when the logging of everything gathered finally happens. One potential footgun is trying to use `dynamo_timed` when we haven't entered the MetricsContext, but we assert on that problem. Another problem is that we re-enter the context recursively, but we watch for that and do the logging only when the outermost exits.

Some specifics:
* Introduce MetricsContext - a context manager that on exit, records the CompilationMetrics (which also logs to dynamo_compile).
* Completely remove the concept of frame_phase_timing. Instead, update the MetricsContext during compilation, either directly or via dynamo_timed.
* Remove some globals we previously used to accumulate counters to later populate a CompilationMetrics. We use CompilationMetrics set/update/increment APIs instead.
* `record_compilation_metrics` is now called on exit from MetricsContext.
* Populate legacy CompilationMetrics fields right before logging, inside `record_compilation_metrics`.
* Remove the one-off `add_remote_cache_time_saved` helper; capture that timing directly into the MetricsContext.

And specifically, several changes to dynamo_timed:
* "Modernize" the parameters and update all callsites accordingly.
* Move the backwards logging of the CompilationMetrics to the backwards compile location.
* Add a parameter for which CompilationMetrics field to update

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139849
Approved by: https://github.com/ezyang
2024-11-14 19:11:20 +00:00
9ff368c270 [pytorch] Add logger for pt2 compile chromium events to hive (#139941)
Summary:
X-link: https://github.com/pytorch/benchmark/pull/2535

Logging raw chromium events to hive per job run enables us to build combined rank perfetto traces without having to depend on Logarithm and deal with things like rate limits etc.

We can easily build a utility to query hive and upload traces to manifold and view them on perfetto

Test Plan:
Launch a job

```
buck2 run mode/opt //aps_models/examples/dlrm:dlrm_train_app -- --config-name train_mast_fsdp_torchdynamo launcher.data_project=apf_ai_infra launcher.fbl_entitlement=ai_infra_training_rnd_tc  launcher.hardware=TC_ANY_80G
```

Local run
```
Perfetto: ['https://interncache-all.fbcdn.net/manifold/perfetto-artifacts/tree/ui/index.html?url=https://interncache-all.fbcdn.net/manifold/pt2_compile_traces_test/tree/pt2_trace_files/aps-ppanchalia-426838c277/0/0/2bc9975d-921c-4766-9cb2-e7ce9833ae96.json']
```

{F1954710538}

Differential Revision: D65525513

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139941
Approved by: https://github.com/jamesjwu
2024-11-14 18:27:38 +00:00
f98c601efe Avoid logging zeros (#139968)
Summary: title

Test Plan: NA

Differential Revision: D65582953

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139968
Approved by: https://github.com/zou3519
2024-11-14 15:46:49 +00:00
d63eb3c46c Revert "[logging] Overhaul dynamo_timed and CompilationMetrics logging. (#139849)"
This reverts commit cb15c1515778499ae801dcf67d55c8bdab4724ef.

Reverted https://github.com/pytorch/pytorch/pull/139849 on behalf of https://github.com/kit1980 due to Breaking an internal tests + there is a bug according to the author ([comment](https://github.com/pytorch/pytorch/pull/139849#issuecomment-2474459094))
2024-11-13 18:47:51 +00:00
cb15c15157 [logging] Overhaul dynamo_timed and CompilationMetrics logging. (#139849)
Here's the overview:

There's a new contextmanager singleton called MetricsContext. Entering the MetricsContext is how we demarcate the boundary on which we'll create a single CompilationMetrics object, and therefore, a single dynamo_compile log entry. While we're inside the MetricsContext, we can update/set many different metrics. Most importantly: `dynamo_timed` can also update the in-progress MetricsContext. In the proposal here, we tell `dynamo_timed` that we want it to do so by providing the name of the MetricsContext field to increment. There can be many `dynamo_timed` calls in different parts of the code updating different fields. Then when the MetricsContext exits, that's when the logging of everything gathered finally happens. One potential footgun is trying to use `dynamo_timed` when we haven't entered the MetricsContext, but we assert on that problem. Another problem is that we re-enter the context recursively, but we watch for that and do the logging only when the outermost exits.

Some specifics:
* Introduce MetricsContext - a context manager that on exit, records the CompilationMetrics (which also logs to dynamo_compile).
* Completely remove the concept of frame_phase_timing. Instead, update the MetricsContext during compilation, either directly or via dynamo_timed.
* Remove some globals we previously used to accumulate counters to later populate a CompilationMetrics. We use CompilationMetrics set/update/increment APIs instead.
* `record_compilation_metrics` is now called on exit from MetricsContext.
* Populate legacy CompilationMetrics fields right before logging, inside `record_compilation_metrics`.
* Remove the one-off `add_remote_cache_time_saved` helper; capture that timing directly into the MetricsContext.

And specifically, several changes to dynamo_timed:
* "Modernize" the parameters and update all callsites accordingly.
* Move the backwards logging of the CompilationMetrics to the backwards compile location.
* Add a parameter for which CompilationMetrics field to update

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139849
Approved by: https://github.com/ezyang
ghstack dependencies: #140094
2024-11-11 14:24:23 +00:00
86792a5a8d [invoke_subgraph] User facing API to support arbitrary args and kwargs (#139162)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139162
Approved by: https://github.com/zou3519
2024-11-08 03:31:19 +00:00
d8afa21ef2 specialize symfloats for wrapped_gradient in get_fake_value (#139935)
Fixes `PYTORCH_TEST_WITH_DYNAMO=1 python test/test_torch.py TestTorchDeviceTypeCPU.test_gradient_type_promotion_cpu` when `specialize_float=False`

Reviewers might wonder why we need to have this whitelist. Can't we rely on python_arg_parser.h to do the specialization generically? Alas this path doesn't actually FFI to C++ so we do need to do the specialization in pythonland.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139935
Approved by: https://github.com/ezyang
ghstack dependencies: #139569, #139457, #139568, #139572, #139846, #139454, #139896
2024-11-07 20:27:02 +00:00
1270c78268 Add logging for num_triton_bundles (#139807)
Summary: Adding logs for number of inductor cache triton bundles

Test Plan:
Ran adhoc code and looked at dynamo_compile/sandbox

https://fburl.com/scuba/dynamo_compile/sandbox/nhktfy19

Differential Revision: D65490826

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139807
Approved by: https://github.com/masnesral
2024-11-06 21:11:04 +00:00
3f248a5735 Classify miss-inplaced tensors in logs. (#139240)
Summary:
use signpost logs,
a followup is to remove the field possibly_missed_reinplacing_opportunities form dynamo compile table.

Differential Revision: D65180194

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139240
Approved by: https://github.com/zou3519
2024-11-04 23:56:14 +00:00
9919932783 Specialize symfloats that flow through is_integer (#139572)
Fixes `python test/dynamo/test_dynamic_shapes.py DynamicShapesFunctionTests.test_number_method_method_is_integer_num_type6_dynamic_shapes` when specialize_float = False

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139572
Approved by: https://github.com/ezyang
ghstack dependencies: #139569, #139457, #139568
2024-11-04 23:35:35 +00:00