Commit Graph

34 Commits

Author SHA1 Message Date
dc63248b76 Make dynamo configs more amenable to static type checking (#112130)
`install_config_module` makes a regular module into a ConfigModule with
extra methods defined on it. mypy thinks those extra methods (or module
functions) are undefined since it cannot analyze something so
dynamic. As a workaround, I've created a fake module that defines these
extra functions, which I import into the config modules during type
checking.

As part of this change, I've also added more types to config_utils.py
and enabled typechecking for torch/_dynamo/config.py.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112130
Approved by: https://github.com/jansel
2023-11-08 21:17:45 +00:00
a126bbfea3 [AOTInductor] Include AOTI debug folder in package (#112514)
Summary:
Allow user to set debug dir for Inductor

Include AOTInductor debug folder in the package.

```
zipinfo package.zip
Archive:  package.zip
Zip file size: 1325264 bytes, number of entries: 46
-rw----     0.0 fat      212 bl stor 80-000-00 00:00 package/data/aotinductor/merge-a100/aotinductor_pickle_data.json
-rw----     0.0 fat     6024 bl stor 80-000-00 00:00 package/data/aotinductor/merge-a100/debug/torchinductor/model___9.0/fx_graph_runnable.py
-rw----     0.0 fat     9031 bl stor 80-000-00 00:00 package/data/aotinductor/merge-a100/debug/torchinductor/model___9.0/fx_graph_readable.py
-rw----     0.0 fat     9202 bl stor 80-000-00 00:00 package/data/aotinductor/merge-a100/debug/torchinductor/model___9.0/fx_graph_transformed.py
-rw----     0.0 fat    10865 bl stor 80-000-00 00:00 package/data/aotinductor/merge-a100/debug/torchinductor/model___9.0/ir_pre_fusion.txt
-rw----     0.0 fat    10865 bl stor 80-000-00 00:00 package/data/aotinductor/merge-a100/debug/torchinductor/model___9.0/ir_post_fusion.txt
-rw----     0.0 fat    13553 bl stor 80-000-00 00:00 package/data/aotinductor/merge-a100/debug/torchinductor/model___9.0/output_code.py
-rw----     0.0 fat     5822 bl stor 80-000-00 00:00 package/data/aotinductor/merge-a100/debug/torchinductor/model___9.1/fx_graph_runnable.py
-rw----     0.0 fat     8817 bl stor 80-000-00 00:00 package/data/aotinductor/merge-a100/debug/torchinductor/model___9.1/fx_graph_readable.py
-rw----     0.0 fat     8988 bl stor 80-000-00 00:00 package/data/aotinductor/merge-a100/debug/torchinductor/model___9.1/fx_graph_transformed.py
-rw----     0.0 fat    10858 bl stor 80-000-00 00:00 package/data/aotinductor/merge-a100/debug/torchinductor/model___9.1/ir_pre_fusion.txt
-rw----     0.0 fat    10858 bl stor 80-000-00 00:00 package/data/aotinductor/merge-a100/debug/torchinductor/model___9.1/ir_post_fusion.txt
```

Test Plan: CIs

Reviewed By: chenyang78

Differential Revision: D50815320

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112514
Approved by: https://github.com/chenyang78, https://github.com/desertfire
2023-11-01 08:25:11 +00:00
a21851c69d fix(inductor): ForeachKernelSchedulerNode group shape should be opaque for graph debug (#110336)
~~Shape is assumed by `TensorMetadata` to be torch.Shape/tuple, however, some of the scheduler node groups utilize `int`, so convert to tuple.~~

Root cause is actually `foreach` scheduler node having silent-error group of int, when in fact it ought to be opaque `foreach`.

**Previously:** silent error / confusing shape of (0,)
![image](https://github.com/pytorch/pytorch/assets/9093549/5bc2a3c7-151f-4433-bbf8-044c7b03e989)

**Now:** clear that it is foreach which does not have well-defined shape:
![image](https://github.com/pytorch/pytorch/assets/9093549/8373080d-4519-4e74-8a3b-da463e9968da)

~~Alternate might be to create list of shapes for each of its subnodes. Actually, for debuggability sake, I may prefer this. We can ensure that the recursive generation of this string is only done dynamically in a debug code path. Else, incrementally computing it on initialization of ForeachKernel may also be feasible.~~ This is quite infeasible for 100s of params.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110336
Approved by: https://github.com/mlazos
2023-10-31 18:44:08 +00:00
79212430df feat(inductor): fx graph debug should display device (#110346)
Device mismatch issues are root cause of: https://github.com/pytorch/pytorch/issues/107006, hence make device-related scheduling issues easier to diagnose.
Also format single-kwarg graphs to be more concise

Example rendering:
![image](https://github.com/pytorch/pytorch/assets/9093549/1b59a994-f2df-45c9-8cb7-37eb3ba12654)

CC code owners: @ngimel @jansel @shunting314 @mlazos @peterbell10

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110346
Approved by: https://github.com/eellison
2023-10-11 00:34:55 +00:00
c5f06b9753 Re-enable test_copy_transpose_math_view, neg_view/dce fix (#110651)
- neg view can just be lowered to neg() post functionalization
- we were treating all fallback kernels as not having side effects. we shouldn't dce mutating fallback kernels - either mutations induced by the reinplacing pass or clone_ with unsupported arguments (complex)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110651
Approved by: https://github.com/Chillee, https://github.com/jansel, https://github.com/malfet, https://github.com/Skylion007
2023-10-10 16:34:01 +00:00
772e104dfd [inductor] visualize fused ops in svg graph (#107752)
example usage
* `TORCH_COMPILE_DEBUG=1 INDUCTOR_ORIG_FX_SVG=1 INDUCTOR_POST_FUSION_SVG=1 python trig.py`: show original fx node name, file, and code. see snapshot 2 where we have origin_0, 1, 2
* trig.py can be found in P816304818

Implementation
* keep original fx graph in GraphLowering, ```self.orig_gm: torch.fx.GraphModule = gm.__copy__()```
* draw original fx graph with origins ir_post_fusion ```V.debug.draw_orig_fx_graph(self.orig_gm, self.scheduler.nodes)```. node.meta["buff_meta"] tracks buf_name

<img width="350" alt="Screenshot 2023-08-29 at 12 40 24 PM" src="https://github.com/pytorch/pytorch/assets/134637289/c4e197cb-ab3b-4a09-a584-c1356376accb">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107752
Approved by: https://github.com/mlazos
2023-09-21 08:03:05 +00:00
fe452108fb Enable typechecking for _inductor/debug.py (#109335)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109335
Approved by: https://github.com/eellison
ghstack dependencies: #109269, #109347
2023-09-18 18:12:23 +00:00
91778ada87 [inductor] graph replayer (#106952)
Recently I feel it's a bit painful to run benchmark scripts on my dev environment. E.g., the command below
```
 python benchmarks/dynamo/huggingface.py --backend inductor --amp --performance --only YituTechConvBert --training
```
took about 2 minutes to run. It may take even longer for some other models.

The command is slow since it
- need do dynamo work
- verify the model on CPU
- run perf tests
- compile all the graphs

However, often times I only need to debug inductor specific logic like loop ordering and fusion. A lot of the things the script is done are useless for me. Also I only need test one graph at a time (e.g. check fwd graph first and when I'm done, continue to check bwd graph) rather than compiling all the graphs.

The graph replayer add a `@save_args` decorator to compile_fx_inner function. When `config.save_args` is true, it will pickle all the arguments to `comple_fx_inner` to the file system.  Later on, we can call `load_args_and_run_compile_fx_inner("/tmp/inductor_saved_args/compile_fx_inner_0.pkl")` to replay the graph and compile it with inductor.

Replaying the fwd graph took around 60 seconds (maybe this can be further reduced but this is already 2x speedup for dev efficiency) , and it only took around 20 seconds to reach `Scheduler.__init__` method.

I also checked `TORCH_COMPILE_DEBUG` flag that already exists. The most similar part of `TORCH_COMPILE_DEBUG` is it can save a graph and it's arguments and later on rerun it. But the difference here is, rather than run the model, we want to call inductor API to compile the model (without even going thru dynamo or aot-autograd).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106952
Approved by: https://github.com/jansel
ghstack dependencies: #106990
2023-08-11 22:28:20 +00:00
8f4d8b3773 More descriptive graph diagram names in svg (#106146)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106146
Approved by: https://github.com/jansel, https://github.com/Chillee
2023-07-28 17:34:09 +00:00
b93590b692 Copy debug artifacts instead of renaming (#104561)
Fixes https://github.com/pytorch/pytorch/issues/100567

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104561
Approved by: https://github.com/jansel
2023-07-10 17:44:13 +00:00
8b64dee5d2 [fix] torch_compile_debug don't log with 0 (#100462)
Fixes https://github.com/pytorch/pytorch/issues/99906

Tested locally.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100462
Approved by: https://github.com/mlazos
2023-05-03 08:23:09 +00:00
e2a3817dfd [BE] Enable C419 rule for any all shortcircuiting (#99890)
Apparently https://github.com/pytorch/pytorch/pull/78142 made torch.JIT allow for simple generator expressions which allows us to enable rules that replace unnecessary list comprehensions with generators in any/all. This was originally part of #99280 but I split it off into this PR so that it can be easily reverted should anything break.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99890
Approved by: https://github.com/justinchuby, https://github.com/kit1980, https://github.com/malfet
2023-04-25 15:02:13 +00:00
547bef11ee tweak heuristic for sdpa selection based off of *data* (and a decision tree) (#99644)
High level approach:
1. I generated a bunch of data comparing FlashAttention and Cutlass implementations (https://pastebin.com/pe0j3YeK)
2. I trained a decision tree using standard train/val split methodology and hyperparameter sweeps (https://pastebin.com/fjYX1HjR).
2a. I did a bunch of feature augmentation to capture interactions between features.

The heuristic I ended up with is:
```
use_flash = seq_len / (num_heads * batch_size) > 6
```

TL;DR: On my dataset, where FlashAttention and Cutlass differ by more than 10%, the existing heuristic achieves 69% accuracy.  My new heuristic achieves 94% accuracy.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99644
Approved by: https://github.com/ngimel, https://github.com/drisspg
2023-04-21 23:28:44 +00:00
2e25fb5d55 Refactor debug_utils into after_aot and after_dynamo modules (#99450)
There are no code changes but I did take the opportunity to
reorder and group the functions once they were placed in their
respective modules.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99450
Approved by: https://github.com/anijain2305
2023-04-19 16:00:19 +00:00
ee9a9b7add Remove old logging callsites (#98095)
Get around GH first issue, OSS only changes for https://github.com/pytorch/pytorch/pull/97182

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98095
Approved by: https://github.com/anijain2305
2023-04-01 00:57:37 +00:00
a1c46e5f8f component-level configurable logging for dynamo, inductor, aot (#94858)
Summary:

Adds NNC-like logging that is configured through an env var `TORCH_COMPILE_LOGS`
Examples:
`TORCH_LOGS="dynamo,guards" python script.py` - prints dynamo logs at level INFO with guards of all functions that are compiled

`TORCH_LOGS="+dynamo,guards,graph" python script.py` - prints dynamo logs at level DEBUG with guards and graphs (in tabular) format of all graphs that are compiled

[More examples with full output](https://gist.github.com/mlazos/b17f474457308ce15e88c91721ac1cce)

Implementation:
The implementation parses the log settings from the environment, finds any components (aot, dynamo, inductor) or other loggable objects (guards, graph, etc.) and generates a log_state object. This object contains all of the enabled artifacts, and a qualified log name -> level mapping. _init_logs then adds handlers to the highest level logs (the registered logs), and sets any artifact loggers to level DEBUG if the artifact is enabled.

Note: set_logs is an alternative for manipulating the log_state, but if the environment contains TORCH_LOGS, the environment settings will be prioritized.

Adding a new log:
To add a new log, a dev should add their log name to torch._logging._registrations (there are examples there already).

Adding a new artifact:
To add a new artifact, a dev should add their artifact name to torch._logging._registrations as well.
Additionally, wherever the artifact is logged, `torch._logging.getArtifactLogger(__name__, <artifact_name>)` should be used instead of the standard logging implementation.

[design doc](https://docs.google.com/document/d/1ZRfTWKa8eaPq1AxaiHrq4ASTPouzzlPiuquSBEJYwS8/edit#)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94858
Approved by: https://github.com/ezyang
2023-03-18 04:17:31 +00:00
f03db8d6cb [reland2][inductor] Add an AOT compilation mode for Inductor CPP backend (#96520)
Summary: This is a reland of https://github.com/pytorch/pytorch/pull/94822.
Solved the long compilation issue for inductor cpp tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96520
Approved by: https://github.com/huydhn, https://github.com/malfet
2023-03-14 16:10:54 +00:00
fe05266fda Revert "[reland][inductor] Add an AOT compilation mode for Inductor CPP backend (#95985)"
This reverts commit deaf9e5e659a1f73656cbbacb39448498e857163.

Reverted https://github.com/pytorch/pytorch/pull/95985 on behalf of https://github.com/huydhn due to Sorry for reverting this. It increased the test time significantly for ASAN (and may be other test shards). ASAN tests on PR passed but it was barely not timing out. I have updated my initial findings in https://github.com/pytorch/pytorch/issues/96378
2023-03-09 01:45:24 +00:00
deaf9e5e65 [reland][inductor] Add an AOT compilation mode for Inductor CPP backend (#95985)
Summary: This is a reland of https://github.com/pytorch/pytorch/pull/94822

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95985
Approved by: https://github.com/jansel
2023-03-08 20:02:32 +00:00
879400e4e8 Revert "[inductor] Add an AOT compilation mode for Inductor CPP backend (#94822)"
This reverts commit 73b66098b2f43be508e1975fd6a425ed6308b993.

Reverted https://github.com/pytorch/pytorch/pull/94822 on behalf of https://github.com/clee2000 due to broke inductor_tmm_cpu_accuracy, 73b66098b2 (11745396725)
2023-03-03 17:33:27 +00:00
73b66098b2 [inductor] Add an AOT compilation mode for Inductor CPP backend (#94822)
Summary: The AOT mode currently works for the CPP backend. When turned on, Inductor compiles the model code into a .so file with aot_inductor_entry as the entry function. If the AOT compilation fails, Inductor will explicitly fail.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94822
Approved by: https://github.com/jansel
2023-03-03 14:18:09 +00:00
ca9ebf9e2b Delete dynamo_import and inductor_import (#93851)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93851
Approved by: https://github.com/albanD, https://github.com/jansel
2023-02-02 01:51:29 +00:00
8c09a005c5 [inductor] Pattern matching engine (copy) (#93291)
This is an exact duplicate of https://github.com/pytorch/pytorch/pull/90739

The fbcode workflow for landing that diff seems buggy.  The github-export-checks task is failing with credentials errors.  Plan to try to land it using GH1.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93291
Approved by: https://github.com/desertfire
2023-01-31 04:51:00 +00:00
6f1727b288 Print aot graphs if user specifies aot graph env vars (#92720)
When integrating AOT logging with TorchInductor trace, the ability to print graphs to the console if the user specified any of the env vars was removed (in favor of using TORCH_COMPILE_DEBUG). This restores this by checking if the user set any of the aot debug variables *before* setting up the remainder of the logging, and adding a stream to stdout if any of those env vars are set.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92720
Approved by: https://github.com/Chillee
2023-01-21 04:46:35 +00:00
9b173b87b2 Refactor away leftover import indirection (#92188)
This indirect ways of importing are a leftover from when we wanted to support both `import torchdynamo` and `import torch._dynamo`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92188
Approved by: https://github.com/desertfire
2023-01-18 04:53:05 +00:00
7c1c239db1 [inductor] Rewrite Triton templates + epilogue fusion (retry) (#91575)
This reverts commit 94262efc7d381ace82aa74ed2f5f5ec76f8fca95 to reland #91105 / #90738.

Fixes https://github.com/pytorch/torchdynamo/issues/2015

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91575
Approved by: https://github.com/ngimel
2023-01-11 00:08:03 +00:00
94262efc7d Revert "[inductor] Rewrite Triton templates + epilogue fusion (retry) (#91105)"
This reverts commit d6dd2e97da619319a103d1061290fe33ce33b6a4.

Reverted https://github.com/pytorch/pytorch/pull/91105 on behalf of https://github.com/atalman due to Broke internal builds
2022-12-21 00:02:38 +00:00
d6dd2e97da [inductor] Rewrite Triton templates + epilogue fusion (retry) (#91105)
https://github.com/pytorch/pytorch/pull/90738 seems a bit borked. ghimport fails on it, and I unlinked it from the Phabricator diff, but it still won't land.  This is an exact copy that PR without using ghstack.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91105
Approved by: https://github.com/ngimel
2022-12-20 02:38:23 +00:00
0b3316ad2c Don't enable debug_fake_crossref for TORCH_COMPILE_DEBUG (#90666)
It is kind of flaky, it doesn't work with dynamic shapes, and I think the debug interpreter is a better way to detect if you've had a size/stride propagation accident.

Fixes https://github.com/pytorch/pytorch/issues/90652

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90666
Approved by: https://github.com/voznesenskym
2022-12-12 14:20:40 +00:00
b95ea4f149 [pt2] Reset dynamo log level when exiting inductor debug context (#90473)
When entering an inductor debug context we increase the log level of
dynamo; I guess this makes sense, since if we're debugging inductor, and
inductor calls into dynamo, we probably want visibility into what dynamo is
doing.

But when we exit that context, we probably want to go back to whatever level of
dynamo-specific logging was in place before.  Dynamo generates lots of debug
info (guards, bytecode), and it's a lot to sift through if you're not
specifically interested in it.

Differential Revision: [D41841879](https://our.internmc.facebook.com/intern/diff/D41841879/)

Differential Revision: [D41841879](https://our.internmc.facebook.com/intern/diff/D41841879)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90473
Approved by: https://github.com/mlazos, https://github.com/jansel
2022-12-12 04:39:37 +00:00
730e44bbc7 Add logging for aot autograd and unified debug flag (#88987)
- Adds `log_level` to aot's config
- Outputs log to `<graph_name>_<log_level>.log` in aot_torchinductor subfolder of the debug directory
- Modifies the Inductor debug context to use the graph name when naming the folder instead of the os pid
- Adds `TORCH_COMPILE_DEBUG` flag to enable it, (as well as separate dynamo and inductor logs)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88987
Approved by: https://github.com/Chillee
2022-12-09 17:28:10 +00:00
e150a6212b Added gm.print_readable to torchinductor_trace output (#87717)
cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87717
Approved by: https://github.com/ngimel
2022-10-25 22:31:49 +00:00
8461460d55 Unified debug directory for dynamo/inductor tools (#87438)
Fixes https://github.com/pytorch/torchdynamo/issues/1705
Fixes https://github.com/pytorch/torchdynamo/issues/1383

Adds a debug directory by default called `torchdynamo_debug` in the current working directory.
In the debug directory for each run of dynamo (an enter and exit of optimize) folder run_\<timestamp\> is created which contains any minifier/inductor/torchdynamo artifacts under respective folders.

Updated the minifier, record replay, and inductor tracing to use this directory

cc @jansel @lezcano @fdrocha @soumith @voznesenskym @yanboliang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87438
Approved by: https://github.com/soumith
2022-10-22 03:43:11 +00:00
c7c09722ad Move TorchDynamo into PyTorch core (#86461)
Context:
https://github.com/pytorch/torchdynamo/issues/1588

This PR moves [TorchDynamo](https://github.com/pytorch/torchdynamo) and TorchInductor into PyTorch core.
- `torchdynamo` becomes `torch._dynamo`
- `torchinductor` becomes `torch._inductor`

This PR was generated by running `copy_to_core.sh` in https://github.com/pytorch/torchdynamo/pull/1538

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86461
Approved by: https://github.com/voznesenskym
2022-10-13 23:18:06 +00:00