12 Commits

Author SHA1 Message Date
05b9b63fb6 typing inductor and placeholder backends (#160366)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160366
Approved by: https://github.com/Skylion007
ghstack dependencies: #160362, #160363, #160364, #160365
2025-08-15 02:09:31 +00:00
7a41f20794 [inductor] Quiesce Triton compile worker pool after each dynamo compile (#156187)
For internal usages, keeping the Triton compile worker pool active for the lifetime of the process has caused some challenges, e.g., it slows down and muddies profiling due to the huge number of threads on a box: N threads = 8 ranks * 32 subprocs * M threads started by torch. Also, each subproc can use more than 1GB each. This PR adds the functionality to shutdown worker subprocs after each dynamo compile when using the SubprocPool implementation. The idea is to leave the main sidecar process running, but signal it to tear down its internal ProcessPoolExecutor when compile is finished. Restarting the ProcessPoolExecutor is relatively fast, e.g., 500ms because the ProcessPoolExecutor forks from the sidecar. Changes:
* Do not start the ProcessPoolExecutor automatically when compile_fx is imported. Instead, start the sidecar process only. The sidecar process imports torch, so is still slow to start.
* Introduce wakeup() and quiesce() calls to the implementation to start and stop the ProcessPoolExecutor.
* Add a context manager to automatically quiesce() at the end of dynamo compilation.
* Signal a wakeup() in compile_fx only when we have cuda devices.
* Add a killswitch so we can turn of quiescing.

Testing:
For correctness, the stacked change at https://github.com/pytorch/pytorch/pull/156534 enables the feature for OSS so it's exercised in CI.

For performance, because of recent compile-time variance (see https://github.com/pytorch/pytorch/issues/152566), it's pretty hard to glean whether there's a regression....

* Training: https://hud.pytorch.org/benchmark/compilers?dashboard=torchinductor&startTime=Tue%2C%2017%20Jun%202025%2021%3A32%3A04%20GMT&stopTime=Tue%2C%2024%20Jun%202025%2021%3A32%3A04%20GMT&granularity=hour&mode=training&dtype=amp&deviceName=cuda%20(h100)&lBranch=gh/masnesral/210/head&lCommit=1b7315031c3bfad66a1a01700167a9ca1a2ae5f1&rBranch=main&rCommit=eab45643f22e58ee12d95d8b0162d51ca0a50801
* Inference: https://hud.pytorch.org/benchmark/compilers?dashboard=torchinductor&startTime=Tue%2C%2017%20Jun%202025%2021%3A32%3A04%20GMT&stopTime=Tue%2C%2024%20Jun%202025%2021%3A32%3A04%20GMT&granularity=hour&mode=inference&dtype=bfloat16&deviceName=cuda%20(h100)&lBranch=gh/masnesral/210/head&lCommit=1b7315031c3bfad66a1a01700167a9ca1a2ae5f1&rBranch=main&rCommit=eab45643f22e58ee12d95d8b0162d51ca0a50801

The wins (mostly for inference) don't make sense, but I'm also skeptical of the losses (mostly for training). I can't repro any of the slowdowns locally. Furthermore, check out the benchmarking results for the stacked diff, which actually enables the quiescing functionality for OSS. That should only slow down compile since there can only be overhead to stop and start the workers. But the results are somehow better:

* Training: https://hud.pytorch.org/benchmark/compilers?dashboard=torchinductor&startTime=Tue%2C%2017%20Jun%202025%2021%3A32%3A04%20GMT&stopTime=Tue%2C%2024%20Jun%202025%2021%3A32%3A04%20GMT&granularity=hour&mode=training&dtype=amp&deviceName=cuda%20(h100)&lBranch=gh/masnesral/214/head&lCommit=41943253882a019b8ceafcd2bf4cd6acbe0cbca9&rBranch=main&rCommit=eab45643f22e58ee12d95d8b0162d51ca0a50801
* Inference: https://hud.pytorch.org/benchmark/compilers?dashboard=torchinductor&startTime=Tue%2C%2017%20Jun%202025%2021%3A32%3A04%20GMT&stopTime=Tue%2C%2024%20Jun%202025%2021%3A32%3A04%20GMT&granularity=hour&mode=inference&dtype=bfloat16&deviceName=cuda%20(h100)&lBranch=gh/masnesral/214/head&lCommit=41943253882a019b8ceafcd2bf4cd6acbe0cbca9&rBranch=main&rCommit=eab45643f22e58ee12d95d8b0162d51ca0a50801

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156187
Approved by: https://github.com/aorenste, https://github.com/jansel
2025-07-08 22:53:13 +00:00
7fdd754136 [compile-time traces] Profile large missing gaps in compile time (#151256)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151256
Approved by: https://github.com/bdhirsh, https://github.com/masnesral, https://github.com/zou3519, https://github.com/jansel
2025-05-13 14:44:51 +00:00
fd04c79878 Revert "[aot autograd][logging] Profile large missing gaps in compile time tracing (#151256)"
This reverts commit 8e373592c8be3e28a5f5a774fc1d517aa3dbe8b4.

Reverted https://github.com/pytorch/pytorch/pull/151256 on behalf of https://github.com/Camyll due to breaking internal tests, cannot import ([comment](https://github.com/pytorch/pytorch/pull/151256#issuecomment-2819244186))
2025-04-21 18:49:23 +00:00
8e373592c8 [aot autograd][logging] Profile large missing gaps in compile time tracing (#151256)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151256
Approved by: https://github.com/bdhirsh, https://github.com/masnesral
ghstack dependencies: #151330
2025-04-16 20:37:08 +00:00
21c2565f35 Document dynamo (#146736)
Many files in dynamo are currently lacking file/module-level documentation, which makes it hard to know what they do at a glance and without digging into the code. This fixes that.

Note: documentation was AI-generated and could be incorrect, please review carefully.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146736
Approved by: https://github.com/jansel, https://github.com/StrongerXi, https://github.com/anijain2305, https://github.com/zou3519
2025-02-13 00:02:21 +00:00
3caf3baabb [inductor] enable inductor backend for dynamo on Windows. (#133921)
Changes:
Enable inductor backend for dynamo on Windows.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133921
Approved by: https://github.com/jgong5, https://github.com/jansel
2024-08-20 16:46:19 +00:00
e0f6fa6a7c Windows Dynamo Error Removal CI Check (#115969)
Rebase of #111313 onto `main`, for CI validation

Co-authored-by: Stella Laurenzo <stellaraccident@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115969
Approved by: https://github.com/PaliC, https://github.com/thiagocrepaldi
2024-02-14 21:14:36 +00:00
4a5b2cd6cb Revert "Windows Dynamo Error Removal CI Check (#115969)"
This reverts commit 45e7af5818f1d4ab1cf568390b3721b9be4251a9.

Reverted https://github.com/pytorch/pytorch/pull/115969 on behalf of https://github.com/PaliC due to this pr ended up breaking some of our periodic tests ([comment](https://github.com/pytorch/pytorch/pull/115969#issuecomment-1942934386))
2024-02-14 01:11:46 +00:00
45e7af5818 Windows Dynamo Error Removal CI Check (#115969)
Rebase of #111313 onto `main`, for CI validation

Co-authored-by: Stella Laurenzo <stellaraccident@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115969
Approved by: https://github.com/ezyang
2024-02-08 21:23:45 +00:00
d03173e88c Unify MYPYINDUCTOR and MYPY (#118432)
The original motivation for MYPYINDUCTOR was a faster type checking configuration that only checked a subset of files. With the removal of `follow_imports = ignore`, we are now able to use dmypy to do fast incremental typechecking, eliminating the need for this.

Perhaps erroneously, when I tee'ed up this PR I elected to delete the `follow_imports = skip` designations in the mypy-inductor.ini. This lead to a number of extra type error suppressions that I manually edited. You will need to review.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118432
Approved by: https://github.com/Skylion007
ghstack dependencies: #118414, #118418
2024-01-27 17:23:20 +00:00
60e8c766b5 Refactor dynamo training backends (#93409)
This splits training.py into many files and moves them from `dynamo.optimizations.training` to `dynamo.backends.*`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93409
Approved by: https://github.com/ezyang
2023-02-03 03:07:15 +00:00