pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Yuanyuan Chen	e595136187	Enable PLC1802 on ruff (#165813 ) This PR enables ruff check `PLC1802`, which detects len calls on sequences in a boolean test context. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165813 Approved by: https://github.com/ezyang	2025-10-18 05:44:14 +00:00
Maggie Moss	c855f8632e	Pyrefly suppressions 7/n (#164913 ) Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283 Almost there! Test plan: dmypy restart && python3 scripts/lintrunner.py -a pyrefly check step 1: delete lines in the pyrefly.toml file from the project-excludes field step 2: run pyrefly check step 3: add suppressions, clean up unused suppressions before: https://gist.github.com/maggiemoss/4b3bf2037014e116bc00706a16aef199 after: INFO 0 errors (6,884 ignored) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164913 Approved by: https://github.com/oulgen	2025-10-08 07:27:17 +00:00
Edward Yang	65aa62d50d	Use codegen for the boxed interpreters (#164573 ) Authored with claude code. The arg parsing is kind of horrible, open to more suggestions. Signed-off-by: Edward Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/164573 Approved by: https://github.com/albanD, https://github.com/jansel	2025-10-08 06:27:44 +00:00
Tugsbayasgalan Manlaibaatar	2a11ce2c78	Support calling torch.compile inside non-strict export (#164171 ) So this fixes at least two issues: 1) When we are invoking inductor backend, we apply pre-grad passes which try to find correct fake mode to use. In the nested case, we will run into clash when there is closure variable in the inductor region because non-strict would have fakified this variable before hand and inner torch.compile would have created a new fresh fake mode. This is not a problem in regular torch.compile because inner torch.compile gets ignored. I don't know if we are supposed to inherit fake mode from parent context in this case. But we can avoid this problem if we just default to eager backend which is fine in this case because the point of export is to capture aten operators. Going to inductor would mean we will lose inner torch.compile ops. 2) There is custom torch function modes in export that track number of torch fns executed and inner compile itself doesn't work because of guard failure as this mode state gets changed. I noticed torch.cond fixes this problem by carefully stashing the torch function mode and defer it in the backend. So the correct thing to do here is just re-use torch.cond implementation unconditionally. So the things i did for fixing above were: 1) Always default to eager backend when compile is invoked inside export. I needed to make how torch.cond sets up the fresh tracing env into an util that can be shared. 2) The previous eager backend for torch.cond was wrong because the context managers didn't actually persist until the backend is invoked. 3) torch.cond used only disable TorchFunctionMetadata tf mode and stash it for later, but in fact, we should do both TorchFunctionMetadata and PreDispatchTorchFunctionMode. With above fixes, we are able to export flex attention in export. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164171 Approved by: https://github.com/ydwu4	2025-10-03 16:31:07 +00:00
Nikita Shulga	f9fa138a39	[BE] Delete all pre py-3.10 checks (#163653 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/163653 Approved by: https://github.com/jansel ghstack dependencies: #163648, #163649	2025-09-23 23:22:53 +00:00
Brian Hirsh	0d71a9dd5b	fix incorrect interaction between DDPOptimizer and donated buffers (#160745 ) This should fix https://x.com/wightmanr/status/1953147089518772254?t=ng_R4t0-tRhO_qQE8NqOhw&s=19. Still working on adding a reasonable test. You can see more of a description of the problem in the code comments. But the TLDR is that: * When using DDPOptimizer, we partition the graph and compile several subgraphs. So 1 dynamo graphs becomes N AOT/inductor artifacts * We have some existing logic to stash graph metadata (`fw_metadata`) in dynamo's TracingContext. When using DDPOptimizer, we generate one `fw_metadata` per AOT graph, and we stash it on the 1 TracingContext from dynamo. So we end up clobbering the `fw_metadata` for graph i-1 when AOT and inductor start compiling graph i * This is normally ok, but it becomes a problem if inductor ever wants to read from this `fw_metadata` during backward compilation. Why? We (by default) compile the backwards lazily. So when using DDPOptimizer, we will compile backward graph N, then bw graph N-1, etc. But... at the time that we have stated compiling bw graph N-1, its corresponding fw_metadata has already been clobbered! So we end up reusing graph N's metadata for all of our backward graph compilations. With donated buffer metadata, that means we end up donated and writing into incorrect input buffers The fix that I added was to add more dedicated DDPOptimizer metadata into the TracingContext, so we can properly switch between these N different `fw_metadata` objects in the backward. Pull Request resolved: https://github.com/pytorch/pytorch/pull/160745 Approved by: https://github.com/ezyang, https://github.com/zou3519	2025-09-04 21:57:27 +00:00
Lucas Kabela	4d5f92aa39	typing tvm.py (#160369 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/160369 Approved by: https://github.com/Skylion007 ghstack dependencies: #160362, #160363, #160364, #160365, #160366, #160367, #160368	2025-08-15 02:09:31 +00:00
Lucas Kabela	39ca0ce0c8	Type backend torchxla (#160368 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/160368 Approved by: https://github.com/Skylion007 ghstack dependencies: #160362, #160363, #160364, #160365, #160366, #160367	2025-08-15 02:09:31 +00:00
Lucas Kabela	d52bb67ac3	typing registry.py (#160367 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/160367 Approved by: https://github.com/Skylion007 ghstack dependencies: #160362, #160363, #160364, #160365, #160366	2025-08-15 02:09:31 +00:00
Lucas Kabela	05b9b63fb6	typing inductor and placeholder backends (#160366 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/160366 Approved by: https://github.com/Skylion007 ghstack dependencies: #160362, #160363, #160364, #160365	2025-08-15 02:09:31 +00:00
Lucas Kabela	453cfa5153	typing distributed.py (#160365 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/160365 Approved by: https://github.com/StrongerXi ghstack dependencies: #160362, #160363, #160364	2025-08-15 02:09:31 +00:00
Lucas Kabela	9faca5f260	typing debugging.py (#160364 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/160364 Approved by: https://github.com/Skylion007 ghstack dependencies: #160362, #160363	2025-08-15 02:09:31 +00:00
Lucas Kabela	6fe6dd9fdc	Type cudagraphs.py (#160363 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/160363 Approved by: https://github.com/StrongerXi ghstack dependencies: #160362	2025-08-15 02:09:31 +00:00
Lucas Kabela	f82c7eed84	Typing for common.py (#160362 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/160362 Approved by: https://github.com/Skylion007	2025-08-15 02:09:31 +00:00
Ti-Tai Wang	5606c516fd	[ONNX] Remove legacy Dort (#158258 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/158258 Approved by: https://github.com/justinchuby, https://github.com/malfet	2025-07-15 19:14:06 +00:00
Sam Larsen	7a41f20794	[inductor] Quiesce Triton compile worker pool after each dynamo compile (#156187 ) For internal usages, keeping the Triton compile worker pool active for the lifetime of the process has caused some challenges, e.g., it slows down and muddies profiling due to the huge number of threads on a box: N threads = 8 ranks * 32 subprocs * M threads started by torch. Also, each subproc can use more than 1GB each. This PR adds the functionality to shutdown worker subprocs after each dynamo compile when using the SubprocPool implementation. The idea is to leave the main sidecar process running, but signal it to tear down its internal ProcessPoolExecutor when compile is finished. Restarting the ProcessPoolExecutor is relatively fast, e.g., 500ms because the ProcessPoolExecutor forks from the sidecar. Changes: * Do not start the ProcessPoolExecutor automatically when compile_fx is imported. Instead, start the sidecar process only. The sidecar process imports torch, so is still slow to start. * Introduce wakeup() and quiesce() calls to the implementation to start and stop the ProcessPoolExecutor. * Add a context manager to automatically quiesce() at the end of dynamo compilation. * Signal a wakeup() in compile_fx only when we have cuda devices. * Add a killswitch so we can turn of quiescing. Testing: For correctness, the stacked change at https://github.com/pytorch/pytorch/pull/156534 enables the feature for OSS so it's exercised in CI. For performance, because of recent compile-time variance (see https://github.com/pytorch/pytorch/issues/152566), it's pretty hard to glean whether there's a regression.... * Training: https://hud.pytorch.org/benchmark/compilers?dashboard=torchinductor&startTime=Tue%2C%2017%20Jun%202025%2021%3A32%3A04%20GMT&stopTime=Tue%2C%2024%20Jun%202025%2021%3A32%3A04%20GMT&granularity=hour&mode=training&dtype=amp&deviceName=cuda%20(h100)&lBranch=gh/masnesral/210/head&lCommit=1b7315031c3bfad66a1a01700167a9ca1a2ae5f1&rBranch=main&rCommit=eab45643f22e58ee12d95d8b0162d51ca0a50801 * Inference: https://hud.pytorch.org/benchmark/compilers?dashboard=torchinductor&startTime=Tue%2C%2017%20Jun%202025%2021%3A32%3A04%20GMT&stopTime=Tue%2C%2024%20Jun%202025%2021%3A32%3A04%20GMT&granularity=hour&mode=inference&dtype=bfloat16&deviceName=cuda%20(h100)&lBranch=gh/masnesral/210/head&lCommit=1b7315031c3bfad66a1a01700167a9ca1a2ae5f1&rBranch=main&rCommit=eab45643f22e58ee12d95d8b0162d51ca0a50801 The wins (mostly for inference) don't make sense, but I'm also skeptical of the losses (mostly for training). I can't repro any of the slowdowns locally. Furthermore, check out the benchmarking results for the stacked diff, which actually enables the quiescing functionality for OSS. That should only slow down compile since there can only be overhead to stop and start the workers. But the results are somehow better: * Training: https://hud.pytorch.org/benchmark/compilers?dashboard=torchinductor&startTime=Tue%2C%2017%20Jun%202025%2021%3A32%3A04%20GMT&stopTime=Tue%2C%2024%20Jun%202025%2021%3A32%3A04%20GMT&granularity=hour&mode=training&dtype=amp&deviceName=cuda%20(h100)&lBranch=gh/masnesral/214/head&lCommit=41943253882a019b8ceafcd2bf4cd6acbe0cbca9&rBranch=main&rCommit=eab45643f22e58ee12d95d8b0162d51ca0a50801 * Inference: https://hud.pytorch.org/benchmark/compilers?dashboard=torchinductor&startTime=Tue%2C%2017%20Jun%202025%2021%3A32%3A04%20GMT&stopTime=Tue%2C%2024%20Jun%202025%2021%3A32%3A04%20GMT&granularity=hour&mode=inference&dtype=bfloat16&deviceName=cuda%20(h100)&lBranch=gh/masnesral/214/head&lCommit=41943253882a019b8ceafcd2bf4cd6acbe0cbca9&rBranch=main&rCommit=eab45643f22e58ee12d95d8b0162d51ca0a50801 Pull Request resolved: https://github.com/pytorch/pytorch/pull/156187 Approved by: https://github.com/aorenste, https://github.com/jansel	2025-07-08 22:53:13 +00:00
Xuehai Pan	1b2146fc6d	[BE][4/16] fix typos in torch/ (torch/_dynamo/) (#156314 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156314 Approved by: https://github.com/jingsh ghstack dependencies: #156313	2025-06-23 02:57:19 +00:00
PyTorch MergeBot	5b427c92a8	Revert "[BE][4/16] fix typos in torch/ (torch/_dynamo/) (#156314 )" This reverts commit ead741c5fb0036e0fc95b79d4fe1af3a426e1306. Reverted https://github.com/pytorch/pytorch/pull/156314 on behalf of https://github.com/atalman due to export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_input_aliasing_contents_backend_aot_eager [GH job link](https://github.com/pytorch/pytorch/actions/runs/15804799771/job/44548489912) [HUD commit link](`c95f7fa874`) ([comment](https://github.com/pytorch/pytorch/pull/156313#issuecomment-2994171213))	2025-06-22 12:31:57 +00:00
Xuehai Pan	ead741c5fb	[BE][4/16] fix typos in torch/ (torch/_dynamo/) (#156314 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156314 Approved by: https://github.com/jingsh ghstack dependencies: #156313	2025-06-22 08:43:18 +00:00
Boyuan Feng	38410cf9b5	Fix DDPOptimizer issue on static tensor index (#155746 ) We rely on `_try_get_metadata_from_dynamo()` to get static input indices. When the meta info is missing, it just returns an empty list of static input indices. This wrong list of static input indices lead to repeated cudagraph re-recording, which looks like a hang from the user perspective. `bc3972b80a/torch/_functorch/aot_autograd.py (L1025-L1031)` The root cause is `split_module` in DDP Optimizer loses meta info and gm attributes. This PR fixes the issue by propagating these metadata from original module to submodules. `bc3972b80a/torch/_dynamo/backends/distributed.py (L515-L517)` Fixes #140395 Pull Request resolved: https://github.com/pytorch/pytorch/pull/155746 Approved by: https://github.com/xmfan, https://github.com/bdhirsh	2025-06-14 00:15:58 +00:00
Oguz Ulgen	d1947a8707	Migrate from lru_cache to cache (#155613 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/155613 Approved by: https://github.com/ezyang ghstack dependencies: #155612	2025-06-11 19:44:18 +00:00
James Wu	be2ad70cfa	Fix dynamo tracing into AOTAutogradCache results in cpu tensors (#155251 ) On this line, we see that the bw_compiler that dynamo uses for AotAutograd automatically disables the backward runnable: `05dd638ee9/torch/_dynamo/backends/common.py (L76)` This disables dynamo in the bw_compiler but also disables the runnable the compiler returns. On a AOTAutogradCache hit, however, we never call the bw_compiler! So we don't disable dynamo properly. This only has an effect on certain cases of cpu tensors' backwards, where the backward is being done in python land, and dynamo unnecessarily tries to trace through the inductor generated code. It also only matters if the backward is being accessed outside of dynamo itself (say, in a graph break in eager mode), since dynamo properly disables the forward function already. ``` I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] TorchDynamo attempted to trace the following frames: [ I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] * fn /home/jjwu/test.py:9 I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] * cast /data/users/jjwu/a/pytorch-env/lib/python3.10/typing.py:1737 I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] * call /tmp/torchinductor_jjwu/rq/crq327nhoyjzog5n3qlchauucdrunrtutwmmoh7ipoe2ngnson5s.py:35 I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] * fn /home/jjwu/test.py:9 I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] * cast /data/users/jjwu/a/pytorch-env/lib/python3.10/typing.py:1737 I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] * call /tmp/torchinductor_jjwu/rq/crq327nhoyjzog5n3qlchauucdrunrtutwmmoh7ipoe2ngnson5s.py:35 I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] ] ``` This PR fixes the issue and adds a unit test showing that with or without cache hit, the frames dynamo is tracing is identical. Fixes https://github.com/pytorch/pytorch/issues/154536 Pull Request resolved: https://github.com/pytorch/pytorch/pull/155251 Approved by: https://github.com/bdhirsh, https://github.com/anijain2305	2025-06-09 02:06:16 +00:00
Animesh Jain	7fdd754136	[compile-time traces] Profile large missing gaps in compile time (#151256 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151256 Approved by: https://github.com/bdhirsh, https://github.com/masnesral, https://github.com/zou3519, https://github.com/jansel	2025-05-13 14:44:51 +00:00
PyTorch MergeBot	fd04c79878	Revert "[aot autograd][logging] Profile large missing gaps in compile time tracing (#151256 )" This reverts commit 8e373592c8be3e28a5f5a774fc1d517aa3dbe8b4. Reverted https://github.com/pytorch/pytorch/pull/151256 on behalf of https://github.com/Camyll due to breaking internal tests, cannot import ([comment](https://github.com/pytorch/pytorch/pull/151256#issuecomment-2819244186))	2025-04-21 18:49:23 +00:00
Animesh Jain	8e373592c8	[aot autograd][logging] Profile large missing gaps in compile time tracing (#151256 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151256 Approved by: https://github.com/bdhirsh, https://github.com/masnesral ghstack dependencies: #151330	2025-04-16 20:37:08 +00:00
William Wen	3ac5a499dd	[dynamo] add dynamo disable reasons to codebase (#150440 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150440 Approved by: https://github.com/jansel, https://github.com/zou3519 ghstack dependencies: #150341	2025-04-02 04:26:48 +00:00
Xuehai Pan	3ce352e389	[BE][PYFMT] migrate PYFMT for `torch._dynamo` to `ruff format` (#144549 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144549 Approved by: https://github.com/jansel	2025-02-28 03:03:53 +00:00
PyTorch MergeBot	9a883007a2	Revert "Implement cuda graphs implementation of torch.cond and torch.while_loop (#140979 )" This reverts commit c7515da7b00de40942c83dc5856b6daec727e280. Reverted https://github.com/pytorch/pytorch/pull/140979 on behalf of https://github.com/huydhn due to This change has been reported to break internal code ([comment](https://github.com/pytorch/pytorch/pull/140979#issuecomment-2657361940))	2025-02-13 18:04:26 +00:00
Raymond Li	21c2565f35	Document dynamo (#146736 ) Many files in dynamo are currently lacking file/module-level documentation, which makes it hard to know what they do at a glance and without digging into the code. This fixes that. Note: documentation was AI-generated and could be incorrect, please review carefully. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146736 Approved by: https://github.com/jansel, https://github.com/StrongerXi, https://github.com/anijain2305, https://github.com/zou3519	2025-02-13 00:02:21 +00:00
Daniel Galvez	c7515da7b0	Implement cuda graphs implementation of torch.cond and torch.while_loop (#140979 ) This is a new PR for #130386 , which got stale and was closed. Since I force-pushed to that branch in order to rebase it on top of main, the PR can no longer be reopened, according to https://github.com/isaacs/github/issues/361 I fixed the possibly-not-warmed-up problem described here: https://github.com/pytorch/pytorch/pull/130386/files#r1690856534 Since starting this, torch.cond and torch.while_loop now apparently have support for backward passes. I will look into what it might take to support that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140979 Approved by: https://github.com/eqy, https://github.com/eellison	2025-02-11 18:16:15 +00:00
Li Yu (ads)	e6a84be3d3	[PyTorch] Add backend aot_eager_decomp_partition_with_mode (#143250 ) Summary: ## Why To make it possible to run torch dispatch mode inside compiled modules. This is to enable running MemoryTrackerMode (in next diff) to collect memory usage of compiled modules. ## What Add a backend aot_eager_decomp_partition_with_mode. Add an enable_log to the backend to control the compilation logging (which can be very verbose and slow the run of mode) Test Plan: unittest E2e tested in the next diff which shows the memory read from the mode passed to this backend is very close to the actual job's memory snapshot. Differential Revision: D67227144 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143250 Approved by: https://github.com/bdhirsh	2025-01-22 23:20:59 +00:00
Aaron Orenstein	a79100ab11	PEP585 update - torch/_dynamo (#145105 ) See #145101 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145105 Approved by: https://github.com/bobrenjc93	2025-01-18 20:47:11 +00:00
bobrenjc93	1fe3af2c68	Migrate from Tuple -> tuple in torch/_dynamo (#144261 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144261 Approved by: https://github.com/aorenste, https://github.com/zou3519	2025-01-10 07:45:57 +00:00
Xuehai Pan	dcc3cf7066	[BE] fix ruff rule E226: add missing whitespace around operator in f-strings (#144415 ) The fixes are generated by: ```bash ruff check --fix --preview --unsafe-fixes --select=E226 . lintrunner -a --take "RUFF,PYFMT" --all-files ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/144415 Approved by: https://github.com/huydhn, https://github.com/Skylion007	2025-01-08 21:55:00 +00:00
Simon Fan	f4969c8235	fix torch.compile + ddp + non-reentrant AC pack hook firing count (#144271 ) FIXES https://github.com/pytorch/pytorch/issues/144035 In order to preserve hook firing semantics, we disabled pack/unpack hooks for torch.compile: https://github.com/pytorch/pytorch/pull/123196. In DDP under torch.compile, there's this other callsite that we need to disable hooks for Pull Request resolved: https://github.com/pytorch/pytorch/pull/144271 Approved by: https://github.com/bdhirsh, https://github.com/soulitzer	2025-01-07 21:08:52 +00:00
Tom Ritchford	dc23f1944a	Remove unused Python variables in torch/[_-a]* (#133492 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133492 Approved by: https://github.com/albanD	2024-12-12 17:39:14 +00:00
PyTorch MergeBot	5c97ac9721	Revert "Remove unused Python variables in torch/[_-a]* (#133492 )" This reverts commit fda975a7b3071a20dab8fc2c4e453479e1bb7cf2. Reverted https://github.com/pytorch/pytorch/pull/133492 on behalf of https://github.com/clee2000 due to Sorry, I need to revert this in order to revert something else. The only thing you need to do is rebase and remerge ([comment](https://github.com/pytorch/pytorch/pull/133492#issuecomment-2536635516))	2024-12-11 17:29:12 +00:00
Tom Ritchford	fda975a7b3	Remove unused Python variables in torch/[_-a]* (#133492 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133492 Approved by: https://github.com/albanD	2024-12-10 21:48:44 +00:00
Yuanjing Shi	117b6c3e2c	[Easy][Dynamo][TVM] remove unnecessary prints (#142445 ) This PR intends to remove the unnecessary prints in the auto-scheduler of dynamo's TVM backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/142445 Approved by: https://github.com/jansel	2024-12-10 19:52:02 +00:00
James Wu	6e203ae6de	[REFACTOR] Implement AOTDispatchCompiler wrapper (#142205 ) This implements a new wrapper class AOTDispatchCompiler wrapper, which is just a wrapper around a callable that returns an OutputCode. We can then use it in AOTDispatch to decide whether or not to use the cache: if fw_compiler, bw_compiler and inference_compiler are all AOTDispatchCompilers, then we enable caching. This type is pretty close to _CompiledFxGraphCallable, except it's not allowed to take any kwargs. Not sure how to consolidate the two ideas together just yet: unfortunately, there's no way to properly annotate the types to make them related. But a lot of the time, the input to this function will be a partially applied _CompiledFxGraphCallable. This allows the PR above this one to enable AOTAutogradCache everywhere, but not increase instruction count or enable cache on unit tests that use aot_eager or other non inductor compilers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/142205 Approved by: https://github.com/oulgen, https://github.com/bdhirsh	2024-12-06 23:23:20 +00:00
Bob Ren	f3f7ba5a69	Restart dynamo analysis when we fail to tensorify away all symfloat inputs (#140346 ) Fixes a bunch of benchmarks that failed with cudagraph errors including `tlp python benchmarks/dynamo/timm_models.py --device cuda --inductor --accuracy --amp --training --only resmlp_12_224` when `specialize_float=False` Also brings down number of overall failures (with keep-going) from 108 => 62. I'd estimate >80% of those 62 are wobbly expect tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140346 Approved by: https://github.com/ezyang ghstack dependencies: #140983, #141003	2024-11-20 21:20:41 +00:00
Aaron Gokaslan	12e95aa4ee	[BE]: Apply PERF401 autofixes from ruff (#140980 ) * Automatically applies ruff rule 401. Turns loops into equivalent list comprehensions which are faster and do not leak the scope of the loop variables. * list comprehensions not only often have better typing, but are 50+% faster than for loops on overhead. They also preserve length information etc and are better for the interpreter to optimize. * Manually went back and made mypy happy after the change. * Also fixed style lints in files covered by flake8 but not by pyfmt Pull Request resolved: https://github.com/pytorch/pytorch/pull/140980 Approved by: https://github.com/justinchuby, https://github.com/malfet	2024-11-20 17:52:07 +00:00
eellison	ee2f8a50d3	Class rename (#139490 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139490 Approved by: https://github.com/exclamaforte, https://github.com/zou3519 ghstack dependencies: #139295	2024-11-02 00:10:17 +00:00
chilli	392221b390	Made DDPOptimizer work with HOPs (#138787 ) Fixes https://github.com/pytorch/pytorch/issues/137481 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138787 Approved by: https://github.com/yf225 ghstack dependencies: #138733, #138794, #138881	2024-10-25 18:59:01 +00:00
eellison	fe18a221eb	Add debug backend that applies CrossRefFakeMode, use in compiler bisector (#138651 ) I was debugging an internal ne divergence for a while that ended up being because of a bad meta. I added an explicit a config option and an explicit backend `aot_eager_decomp_partition_crossref` to enable the FakeCrossRefMode when running the graph. I added an explicit backend bc I suspect it will be useful for internal models but I'm also happy to leave as config option. It will only test ops that have meta to avoid memory overhead of hitting fallback path and running in eager. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138651 Approved by: https://github.com/zou3519, https://github.com/bdhirsh	2024-10-25 15:58:36 +00:00
Aaron Orenstein	07cc4bd3e2	typing compile_fx.py (#138033 ) Type annotations for compile_fx. - Some of the stuff here is pretty complicated (functions which return functions that take functions) so I bailed on those and used `Any` just to get the rest landed. - There are also changes to type signatures in other files which I did just to let mypy know more about the types in compile_fx.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138033 Approved by: https://github.com/Skylion007	2024-10-21 18:14:59 +00:00
Adnan Akhundov	809ff3b274	Add host-side Triton TMA support to Dynamo (#137677 ) This adds Dynamo tracing support for the host-side Triton TMA API (see `create_2d_tma_descriptor` calls on the host in the [Triton tutorial](https://triton-lang.org/main/getting-started/tutorials/09-persistent-matmul.html#sphx-glr-getting-started-tutorials-09-persistent-matmul-py)). A few notes: - Here we assume the availability of the host-side TMA API added to upstream Triton in https://github.com/triton-lang/triton/pull/4498. As of time of writing, this is not a part of the PT2 OSS Triton pin (although back-ported internally). OSS Triton pin update should be done in December 2024. - To capture the chain of calls `t.data_ptr() --> create_{1d,2d}_tma_descriptor(ptr, ...) --> kernel[grid](tma_desc, ...)`, we add three new variable trackers: `DataPtrVariable`, `CreateTMADescriptorVariable` (for the function), `TMADescriptorVariable` (for TMA descriptor object). This is to maintain the path back from the Triton kernel to the Tensor from which the TMA descriptor has been created. - The newly introduced variables have `reconstruct` methods used in case of graph breaks. - The `tma_descriptor_metadata` extracted from the captured `create_{1d,2d}_tma_descriptor` calls is propagated through the HOPs in Dynamo and AOTAutograd to be used by the downstream compiler (e.g., Inductor). See the unit tests for how the captured HOP arguments look like. - In the Dynamo-captured fx graph, we replace the TMA descriptor arguments of the Triton kernel by the underlying Tensors, to be able to track the input/output relationships in terms of Tensors. - In the Triton kernel mutation analysis pass (in AOTAutograd), we use the `tt.experimental_descriptor_store` TTIR op to detect mutations of the underlying tensors via TMA descriptors. So that downstream AOTAutograd can perform functionalizations as required. - JIT Inductor and AOT Inductor support will be implemented in follow-up PRs. Differential Revision: [D64404928](https://our.internmc.facebook.com/intern/diff/D64404928) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137677 Approved by: https://github.com/zou3519	2024-10-16 02:18:48 +00:00
eellison	8543000c27	Search through config changes in compiler bisector (#137346 ) Follow up to https://github.com/pytorch/pytorch/pull/131936. In the original bisector you'd have to test inline if we were disabling a component - `if BisectionManager.disable_subsystem("inductor", "post_grad_passes", debug_info)`. This adds a convenient way of testing config changes for root causing issue. I've added `emulate_precision_casts` and aot_eager_decomp_partition cse as initial ones. Pull Request resolved: https://github.com/pytorch/pytorch/pull/137346 Approved by: https://github.com/zou3519	2024-10-11 20:24:54 +00:00
Michael Lazos	27dee935af	[Dynamo] Ensure torch function modes are dispatched on builtin ops (#137117 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137117 Approved by: https://github.com/yanboliang, https://github.com/williamwen42 ghstack dependencies: #137114, #137115, #137116	2024-10-09 02:29:40 +00:00
PyTorch MergeBot	2d18c2d5e7	Revert "[Dynamo] Ensure torch function modes are dispatched on builtin ops (#137117 )" This reverts commit 941be418d8ec3290d0e3bae0e16a443be26b3075. Reverted https://github.com/pytorch/pytorch/pull/137117 on behalf of https://github.com/huydhn due to The top of the stack has been reverted but it leaves trunk in a broken state, so I try to revert the rest of the stack ([comment](https://github.com/pytorch/pytorch/pull/137114#issuecomment-2400765603))	2024-10-08 20:33:17 +00:00

1 2 3 4

187 Commits