pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 12:54:11 +08:00

Author	SHA1	Message	Date
Yuanyuan Chen	3255e7872b	Enable all flake8-logging-format rules (#164655 ) These rules are enabled by removing existing suppressions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164655 Approved by: https://github.com/janeyx99, https://github.com/mlazos	2025-10-19 00:59:28 +00:00
James Wu	b54e466fd0	Megacache integration (#163533 ) This diff adds megacache integration for DynamoCache. Because DynamoCache requires lazy serialization, i.e. it can only be serialized once all relevant backends have been compiled and we're ready for a save, we actually do the DynamoCache saving only on a call to `torch.compiler.save_cache_artifacts`. Differential Revision: [D82735763](https://our.internmc.facebook.com/intern/diff/D82735763/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/163533 Approved by: https://github.com/oulgen, https://github.com/zhxchen17	2025-10-15 22:49:15 +00:00
Maggie Moss	c855f8632e	Pyrefly suppressions 7/n (#164913 ) Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283 Almost there! Test plan: dmypy restart && python3 scripts/lintrunner.py -a pyrefly check step 1: delete lines in the pyrefly.toml file from the project-excludes field step 2: run pyrefly check step 3: add suppressions, clean up unused suppressions before: https://gist.github.com/maggiemoss/4b3bf2037014e116bc00706a16aef199 after: INFO 0 errors (6,884 ignored) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164913 Approved by: https://github.com/oulgen	2025-10-08 07:27:17 +00:00
zhxchen17	4c3c0ef2f1	[precompile] Load source cache for AOT compile as well. (#164773 ) Adding source_get_cache also to AOT compile case. Since the guard manager loader code can be shared between AOT and caching, we added a new function load_guard_manager to avoid code duplication between two workflows, for loading guards. Test Plan: test_guard_serialization.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/164773 Approved by: https://github.com/yiming0416, https://github.com/dolpm	2025-10-07 18:47:09 +00:00
Zhengxu Chen	23ab6a45e5	[precompile][ez] Add instrumentation for guard loading/building. (#164602 ) Summary: as title. Test Plan: CI Differential Revision: D83868533 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164602 Approved by: https://github.com/dolpm	2025-10-06 20:16:09 +00:00
Zhengxu Chen	4bd1505f84	[precompile][ez] Inline type definition for dynamo cache entry. (#164580 ) Summary: as title. DynamoCaptureOutput in package.py is not actively used in other files. Inline it to reduce confusion. Test Plan: CI Differential Revision: D83846957 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164580 Approved by: https://github.com/dolpm	2025-10-06 16:00:59 +00:00
Zhengxu Chen	16f9bef642	[precompile] Fix guard serialization loading bugs. (#164490 ) Summary: Added a set of fixes triggered by fm training job. Overall the theme here is that we should get rid of saved objects as much as possible when they are not used in guard reconstruction. Sometimes for objects that cannot be saved (like local functions) we still try our best to save their closures. Test Plan: test_guard_serialization.py test_lazy_awatiable.py Differential Revision: D83766926 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164490 Approved by: https://github.com/jamesjwu	2025-10-03 19:20:07 +00:00
James Wu	fa5306b4f5	Support partial _DynamoCacheEntries when not all backends available (#163521 ) Differential Revision: [D82735769](https://our.internmc.facebook.com/intern/diff/D82735769/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/163521 Approved by: https://github.com/zhxchen17	2025-10-03 16:14:32 +00:00
Avik Chaudhuri	d70c0babf5	minimize graph capture output (#162211 ) Currently OutputGraphGuardsState is separated out as a serializable interface for OutputGraph, but some of the typing around it is incorrect in dynamo's guards.py and output_graph.py: more fields are used by code than claimed by OutputGraphGuardsState, and it works because either the full OutputGraph is passed in or the parts that use those fields are dead when OutputGraphGuardsState is passed in. In this PR we try to further separate the necessary fields of OutputGraph that should be retained by a full graph capture mechanism, not just limited to dynamo (as it is currently) but also something like make_fx (in the future). Since these fields do not need to be serialized, the result is an intermediate "common" data structure that is between OutputGraphGuardsState and OutputGraph in the inheritance hierarchy. Differential Revision: D81718791 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162211 Approved by: https://github.com/zhxchen17	2025-09-20 15:52:28 +00:00
James Wu	bfe9e60ffb	Simplify PrecompileContext to no longer be a CacheArtifactManager (#162886 ) Summary: This diff does a big refactor of PrecompileContext to make it considerably simpler: instead of being a CacheArtifactManager and managing a bunch of bytes, it simply stores two things: dynamo cache entries and backend cache entries. When asked, it stitches them together into PrecompileCacheEntries, which are stored by DynamoCache. This structure then allows us to register DynamoCache to the regular Megacache API, instead of having two separate APIs that are confusing. It also lets us remove the autotune cache integration, since MegaCache API will automatically store autotune cache entries. The intent here is that users who want to use caching precompile will simply be able to use torch.compiler.save_cache_artifacts as before, just with `torch.dynamo.config.caching_precompile` set to True. They can also directly interact with PrecompileContext if they wish to specifically only load Precompile entries, using PrecompileContext.create_cache_entries(). Saving single entries and such with DynamoCache still works normally. Test Plan: All existing unit tests pass. Rollback Plan: Differential Revision: D82380307 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162886 Approved by: https://github.com/zhxchen17	2025-09-20 01:24:37 +00:00
James Wu	3c9e220f34	Refactor PrecompileContext to be considerably more debuggable (#162740 ) Summary: This diff does a few things: - It refactors PrecompileContext to store DynamoCacheEntries directly on the context. This allows us at serialization time to check if the dynamo cache entry has all its backends ready for serialization, and if not, skip unnecessarily serializing it - It also gives us the ability to print out a `debug` JSON, which contains a mapping for everything being serialized and deserialized. Here's an example of what that JSON looks like: ``` { "artifacts": { "precompile_aot_autograd": [ "__compiled_fn_8_306d538b_f7f8_4ab4_98a1_b5ff4493f99d" ], "precompile_dynamo": [ { "backend_ids": [ "__compiled_fn_8_306d538b_f7f8_4ab4_98a1_b5ff4493f99d" ], "fn_name": "TorchBenchmarkRunner.forward_and_backward_pass", "num_codes": "10", "python_version": "3.12.11+meta", "torch_version": "2.10.0a0+fb" } ] }, "num_entries": 1 } ``` Test Plan: Existing tests pass. NanoGPT tlparse showing the new debug: https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpeIsL5G/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=10000 Note that there aren't compile ids since we're logging this in PrecompileContext.serialize() for now, where there isn't a compile yet. I think this is fine for now, as no compile ID makes sense here. If anything, these kind of belong in a "Global" compile ID, which I will not implement in this PR. Rollback Plan: Differential Revision: D82232574 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162740 Approved by: https://github.com/zhxchen17	2025-09-19 01:14:28 +00:00
xinan.lin	e93706c2c8	[Intel GPU][pre_compile] Add XPU toolkit version and hardware info in compiled model check. (#162951 ) Following #162438, this PR generalized the origin CUDA only check, and add XPU check. Fixes #162939, Fixes #162938, Fixes #163032，Fixes #163045 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162951 Approved by: https://github.com/EikanWang, https://github.com/jansel	2025-09-18 00:04:22 +00:00
Zhengxu Chen	bb3f3cc65e	[precompile] Store traced file information with CompileArtifacts. (#162983 ) Summary: Add some metadata to CompileArtifacts, so that it contains the source code information about the original code while they are being traced. For now, we will not provide a verification method to end user and instead we just provide which files are inlined. It's up to user to verify the content from these files are not changed (because it's optional for many users to validate source code changes anyway in aot precompile) Test Plan: buck run @mode/opt test/dynamo:test_dynamo -- -k test_file_change buck run @mode/opt test/dynamo:test_dynamo -- -k test_aot_compile_source_info Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/162983 Approved by: https://github.com/yushangdi	2025-09-16 18:27:48 +00:00
Shangdi Yu	ccb450b190	[pre_compile] Add check for cuda and hardware version (#162438 ) if we detect compiled model is using cuda in meaningful way, we should store information about cuda + hardware Example: `SystemInfo(python_version='3.12.9', torch_version='2.9.0a0+gite02b0e6', cuda_version='12.6', triton_version=(3, 4), gpu_name='NVIDIA PG509-210')` Pull Request resolved: https://github.com/pytorch/pytorch/pull/162438 Approved by: https://github.com/zhxchen17	2025-09-12 01:42:07 +00:00
Zhengxu Chen	8485aac873	[precompile] Fix inlined source tracking with generators. (#162389 ) Summary: When compiled code has generator, code.co_firstlineno will be inconsistent with the result from inspect.getsource, which returns the toplevel enclosing code source rather than the inner code location. In this case, it seems simpler to just use the toplevel enclosing code location rather than the co_firstlineno field. Test Plan: test_package.py -k test_code_with_generator Rollback Plan: Differential Revision: D81929751 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162389 Approved by: https://github.com/dolpm, https://github.com/hrithick-codes	2025-09-09 00:13:54 +00:00
James Wu	9668210302	Allow bypasses for Precompile when guards, etc. cannot be serialized (#160902 ) This adds a new function `bypass_package` and `CompilePackage.bypass_current_entry()`. This allows us to safely bypass if there are models with unserializable or incompatible parts. When we encounter something incompatible, we'll raise a bypass and ignore that particular code in DynamoCodeEntry. Pull Request resolved: https://github.com/pytorch/pytorch/pull/160902 Approved by: https://github.com/zhxchen17	2025-08-21 18:20:42 +00:00
James Wu	4014672b30	Replace guard_serialization_mode with save_guards, remove load cases (#160531 ) This PR replaces "guard_serialization_mode" into `save_guards`. All cases where we care about whether or not we're loading guards can be inferred automatically from the existing inputs. The only case that's special here is whether or not to check guards. We don't want to check guards on guard load in CheckFnManager, because these guards have already been checked on save. Therefore, we put the setting in OutputGraphGuardsState, so that when we save, we bypass the guards check. Because of this change, it is technically possible to do a load and a save in the same CheckFunctionManager.__init__() by passing all the necessary parts, and also passing `save_guards=True`. This should just work out of the box, but so far no callsites need it, so not super important. Next up, we'll work on removing save_guards from GuardBuilder, and putting it into its own phase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/160531 Approved by: https://github.com/zhxchen17	2025-08-18 17:04:17 +00:00
Zhengxu Chen	16d15445f8	Fullgraph graph capture with dynamo. (#159749 ) Summary: Following up on Avik's doc https://docs.google.com/document/d/11RW0Bbkp1QwFbEu8rCNW5d7wUFaEkxbL0uLyqcc2jTk/edit?tab=t.0 We are experimenting with a new API which utilizes torch.compile(fullgraph=True) and intend to use it to replace the old dynamo.export() API. This PR adds a prototype for the API described in the doc. Test Plan: test_misc -- -k test_aot_capture Rollback Plan: Differential Revision: D79534608 Pull Request resolved: https://github.com/pytorch/pytorch/pull/159749 Approved by: https://github.com/tugsbayasgalan	2025-08-12 22:06:18 +00:00
James Wu	90fd06be71	Various bugfixes for running NanoGPT training (#159166 ) Fix various small bugs with running nanogpt on torchbenchmark in OSS under python 3.10. After these changes, the following now succeeds: ``` tlp python benchmarks/dynamo/torchbench.py --only nanogpt --performance --training --backend inductor --caching-precompile --warm-start-latency ``` Cold start: https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmp12LuZ5/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=10000 Warm start (we are invesigating the recompile): https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpT5YTB2/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=10000 Pull Request resolved: https://github.com/pytorch/pytorch/pull/159166 Approved by: https://github.com/zhxchen17	2025-07-30 16:30:22 +00:00
zhxchen17	90c241dedd	[precompile] Support user defined function calls from bytecode. (#158947 ) Previously precompile was implemented under the assumption that dynamo always inlines the user code and generate resume functions when a graph break is hit. In cases like nanogpt training, there exists nontrivial amount of code causing dynamo to fail the speculation and stop inlining certain type of user function. This results in more code objects to be tracked by CompilePackage. Since these new code objects are user defined, we need to also serialize the location of these code so that we can load the precompile entries to the these code objects in another process. With this fix, we are able to run nanogpt inference+training with precompile under torchbench. Differential Revision: [D78691422](https://our.internmc.facebook.com/intern/diff/D78691422/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/158947 Approved by: https://github.com/jamesjwu	2025-07-24 20:10:57 +00:00
James Wu	f55c5d085e	[Precompile] Various small bugfixes, add CachingPrecompile to torchbench (#158847 ) This PR addresses a few small bugfixes needed to make NanoGPT inference work, and also adds a new `--caching-precompile` argument to torchbench. With `--caching-precompile`, after every benchmark we save precompile artifacts to DynamoCache, allowing us to test caching precompile on all existing benchmarks. The following bugfixes are in this PR to make all of this work: - Fix global variables being pruned with DUPLICATE_INPUT guards. DUPLICATE_INPUT guards have additional vars from the second input, which we track with additional_local_vars, but we never tracked additional global variables. This fixes the issue. (See torch/_dynamo/guards.py changes) - Return None from PRecompileContext.serialize() if no new dynamo compiles occurred. There's no reason to save artifacts (i.e. autotuning artifacts, etc) if no dynamo_compile occurred, so we return None early. We may later want to support editing existing dynamo artifacts as a TODO, but that's upcoming. - log `dynamo_start` on CompilePackage.load: This is only needed so that tlparse doesn't ignore TORCH_TRACE logs generated when caching precompile hits. If there are no actual compiles, we never log a "dynamo_start" entry, which makes internal tlparse ignore the TORCH_TRACE file. ## Test Plan After this PR, the following now works: ``` TORCH_LOGS=dynamo tlp python benchmarks/dynamo/torchbench.py --only nanogpt --performance --inference --backend inductor --caching-precompile --warm-start-latency ``` tlparse result (internal): Cold Start (6 seconds): https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpAWe0zD/dedicated_log_torch_trace_vk9nkp4m.log/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=10000 Warm Start (~1 s): https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpAWe0zD/dedicated_log_torch_trace_5l4iwrpm.log/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=10000 The 1 second of warm start here can be improved: the costs here are mostly in starting up workers and triton and initializing CUDA, a lot of which should not be included in the compile time cost in real world scenarios where these are already loaded before training begins. Pull Request resolved: https://github.com/pytorch/pytorch/pull/158847 Approved by: https://github.com/zhxchen17	2025-07-24 14:09:54 +00:00
PyTorch MergeBot	76be282e3a	Revert "[Precompile] Various small bugfixes, add CachingPrecompile to torchbench (#158847 )" This reverts commit d898d0d437bfdc0719e6c69d5005606c5e64fca8. Reverted https://github.com/pytorch/pytorch/pull/158847 on behalf of https://github.com/jithunnair-amd due to Broke ROCm CI jobs on MI200 and MI300 ([comment](https://github.com/pytorch/pytorch/pull/158847#issuecomment-3109664713))	2025-07-23 18:25:46 +00:00
James Wu	d898d0d437	[Precompile] Various small bugfixes, add CachingPrecompile to torchbench (#158847 ) This PR addresses a few small bugfixes needed to make NanoGPT inference work, and also adds a new `--caching-precompile` argument to torchbench. With `--caching-precompile`, after every benchmark we save precompile artifacts to DynamoCache, allowing us to test caching precompile on all existing benchmarks. The following bugfixes are in this PR to make all of this work: - Fix global variables being pruned with DUPLICATE_INPUT guards. DUPLICATE_INPUT guards have additional vars from the second input, which we track with additional_local_vars, but we never tracked additional global variables. This fixes the issue. (See torch/_dynamo/guards.py changes) - Return None from PRecompileContext.serialize() if no new dynamo compiles occurred. There's no reason to save artifacts (i.e. autotuning artifacts, etc) if no dynamo_compile occurred, so we return None early. We may later want to support editing existing dynamo artifacts as a TODO, but that's upcoming. - log `dynamo_start` on CompilePackage.load: This is only needed so that tlparse doesn't ignore TORCH_TRACE logs generated when caching precompile hits. If there are no actual compiles, we never log a "dynamo_start" entry, which makes internal tlparse ignore the TORCH_TRACE file. ## Test Plan After this PR, the following now works: ``` TORCH_LOGS=dynamo tlp python benchmarks/dynamo/torchbench.py --only nanogpt --performance --inference --backend inductor --caching-precompile --warm-start-latency ``` tlparse result (internal): Cold Start (6 seconds): https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpAWe0zD/dedicated_log_torch_trace_vk9nkp4m.log/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=10000 Warm Start (~1 s): https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpAWe0zD/dedicated_log_torch_trace_5l4iwrpm.log/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=10000 The 1 second of warm start here can be improved: the costs here are mostly in starting up workers and triton and initializing CUDA, a lot of which should not be included in the compile time cost in real world scenarios where these are already loaded before training begins. Pull Request resolved: https://github.com/pytorch/pytorch/pull/158847 Approved by: https://github.com/zhxchen17	2025-07-23 15:06:54 +00:00
Lucas Kabela	a4d753295e	[Dynamo][Better Engineering] Add enhanced typing support to `_dynamo/eval_frame.py` (#158276 ) As part of better engineering week, we would like to improve out type support to improve dev experience in dynamo This PR adds strict typing support to the main entrypoint for dynamo, `eval_frame.py` Running ``` mypy torch/_dynamo/eval_frame.py --linecount-report /tmp/coverage_log ``` \| -------- \| Lines Unannotated \| Lines Total \| % lines covered \| Funcs Unannotated \| Funcs Total \| % funcs covered \| \| -------- \| ------- \| -------- \| ------- \| ------- \| ------- \| ------- \| \| Main \| 623 \| 2232 \| 27.91% \| 19 \| 68 \| 27.94% \| \| This PR \| 2285 \| 2285 \| 100.00% \| 68 \| 68 \| 100.00% \| \| Delta \| +1662 \| +63 \| +72.09% \| +49 \| 0 \| +72.06% \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/158276 Approved by: https://github.com/williamwen42 Co-authored-by: William Wen <williamwen@meta.com>	2025-07-16 23:31:10 +00:00
James Wu	ef4cca2d79	[precompile] Increment frame and add compile ids when loading packages (#158028 ) When loading a package and calling package.install(backends), we create a new frame and compile id for each package load, so that tlparse and chromium events still show compile times on warm start. There is an argument for not doing this in AOT precompile, as no "compile" occurs. So for now, we put it in `package.install`, which hopefully won't be a thing for AOT precompile. ## Recompiles Recompiles get saved to the same frame and code entry, so on warm start, each recompile will get collapsed into the same entry. Therefore, dynamo compiles that have recompiles on cold start (0/0, 0/1, 0/2, etc) will all get collapsed into a single compile id (0/0), as warm start will load all of the entries properly. ## Graph breaks Graph breaks get their own compile id, and therefore their own code entry. These are replicated on warm start, so if cold start you had 4 different graphs (and therefore 4 compile ids), you'll have 4 compile ids on warm start as well. ## Test plan Added a frame counter check to existing unit tests for automatic dynamic, showing that old and new frame counter between old and new load is the same. This is the chromium event for test_automatic_dynamo_graph_breaks_device_cuda: ``` python test/dynamo/test_package.py -k test_automatic_dynamo_graph_breaks_device_cuda ``` <img width="2216" height="508" alt="image" src="https://github.com/user-attachments/assets/f604ed33-5c31-464b-9320-d67b2e6f57a1" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/158028 Approved by: https://github.com/oulgen	2025-07-15 00:53:52 +00:00
James Wu	be56a8d7ac	Automatically load and save dynamo entries via caching_precompile (#155913 ) This PR adds a new config option, `caching_precompile`, and a `DynamoCache`, which loads and saves Dynamo Cache entries automatically. It also hooks up DynamoCache to PrecompileContext, so that we can save multiple cache entries. When this configuration is turned on, we: - Automatically create and initialize a CompilePackage on every torch.compile - Automatically use BundledAutogradcache - Automatically save the CompilePackage entry to DynamoCache after every compile You can also use PrecompileContext.serialize() to manually serialize a full object. I've added unit tests to exhibit this behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155913 Approved by: https://github.com/zhxchen17	2025-07-07 23:57:17 +00:00
PyTorch MergeBot	ae1094b72b	Revert "[WIP] Automatically load and save dynamo entries via caching_precompile (#155913 )" This reverts commit e466dab164d9236bfe5817ec8e4d24c7b9d3e392. Reverted https://github.com/pytorch/pytorch/pull/155913 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to fail a test in trunk ([comment](https://github.com/pytorch/pytorch/pull/155913#issuecomment-3045914878))	2025-07-07 16:53:35 +00:00
James Wu	e466dab164	[WIP] Automatically load and save dynamo entries via caching_precompile (#155913 ) This PR adds a new config option, `caching_precompile`, and a `DynamoCache`, which loads and saves Dynamo Cache entries automatically. It also hooks up DynamoCache to PrecompileContext, so that we can save multiple cache entries. When this configuration is turned on, we: - Automatically create and initialize a CompilePackage on every torch.compile - Automatically use BundledAutogradcache - Automatically save the CompilePackage entry to DynamoCache after every compile You can also use PrecompileContext.serialize() to manually serialize a full object. I've added unit tests to exhibit this behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155913 Approved by: https://github.com/zhxchen17	2025-07-07 11:56:30 +00:00
zhxchen17	e20784f228	[dynamo] Support BUILTIN_MATCH serialization. (#157016 ) Serialize BUILTIN_MATCH since they are all stored in __builtin__ dict. Also fixed an issue that the wrong global scope is passed to CheckFunctionManager while loading guards. Previously we can always reuse the compile-time global scope for evaluating guards because the compile-time and runtime global scope are always the same. For precompile, we need to serialize the compile-time global scope for loading only. We need to point the CheckFunctionManager to the new global scope after loading is finished for evaluating guards. Differential Revision: [D77159313](https://our.internmc.facebook.com/intern/diff/D77159313/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/157016 Approved by: https://github.com/jansel, https://github.com/jamesjwu	2025-07-02 20:24:24 +00:00
zhxchen17	f096820d0f	[precompile] Detect source code changes for save/load. (#156432 ) Go through all dynamo traced functions and compute checksum for them. While loading a precompilation back to memory, we will always check the checksum and refuse to load when source code changes are detected. Differential Revision: [D76987123](https://our.internmc.facebook.com/intern/diff/D76987123/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156432 Approved by: https://github.com/jansel, https://github.com/jamesjwu	2025-06-30 21:16:15 +00:00
James Wu	070aa59e49	Refactor DynamoStore into disk and in memory implementations (#155818 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/155818 Approved by: https://github.com/zhxchen17	2025-06-25 18:24:28 +00:00
Xuehai Pan	1b2146fc6d	[BE][4/16] fix typos in torch/ (torch/_dynamo/) (#156314 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156314 Approved by: https://github.com/jingsh ghstack dependencies: #156313	2025-06-23 02:57:19 +00:00
James Wu	10fb98a004	[Precompile] Hook up backend="inductor" (#155387 ) This PR adds the necessary things to register and record backend ids from BundledAOTAutogradCacheEntry. One TODO to point out; in this diff, if there are multiple backends that would have the same AOTAutogradCache key (traditional cache key, not backend_id), we just end up serializing the same BundledAOTAutogradCache entry multiple times. This is not ideal obviously, so we'll want to deduplicate these and just track the different keys that one BundledAOTAutogradCacheEntry is associated with instead. This shouldn't be super hard to do, though, as we just need to run a deduplication step on call to `serialize()`, I think. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155387 Approved by: https://github.com/oulgen	2025-06-22 15:05:08 +00:00
PyTorch MergeBot	5b427c92a8	Revert "[BE][4/16] fix typos in torch/ (torch/_dynamo/) (#156314 )" This reverts commit ead741c5fb0036e0fc95b79d4fe1af3a426e1306. Reverted https://github.com/pytorch/pytorch/pull/156314 on behalf of https://github.com/atalman due to export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_input_aliasing_contents_backend_aot_eager [GH job link](https://github.com/pytorch/pytorch/actions/runs/15804799771/job/44548489912) [HUD commit link](`c95f7fa874`) ([comment](https://github.com/pytorch/pytorch/pull/156313#issuecomment-2994171213))	2025-06-22 12:31:57 +00:00
Xuehai Pan	ead741c5fb	[BE][4/16] fix typos in torch/ (torch/_dynamo/) (#156314 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156314 Approved by: https://github.com/jingsh ghstack dependencies: #156313	2025-06-22 08:43:18 +00:00
PyTorch MergeBot	edd45f3a02	Revert "[Precompile] Hook up backend="inductor" (#155387 )" This reverts commit 2c68c3e8d5e9a235f5861be6486de4959f80c840. Reverted https://github.com/pytorch/pytorch/pull/155387 on behalf of https://github.com/atalman due to dynamo/test_precompile_context.py::PrecompileContextTests::test_basic [GH job link](https://github.com/pytorch/pytorch/actions/runs/15772892021/job/44464141039) [HUD commit link](`2c68c3e8d5`) ([comment](https://github.com/pytorch/pytorch/pull/155387#issuecomment-2992044073))	2025-06-20 15:30:04 +00:00
James Wu	2c68c3e8d5	[Precompile] Hook up backend="inductor" (#155387 ) This PR adds the necessary things to register and record backend ids from BundledAOTAutogradCacheEntry. One TODO to point out; in this diff, if there are multiple backends that would have the same AOTAutogradCache key (traditional cache key, not backend_id), we just end up serializing the same BundledAOTAutogradCache entry multiple times. This is not ideal obviously, so we'll want to deduplicate these and just track the different keys that one BundledAOTAutogradCacheEntry is associated with instead. This shouldn't be super hard to do, though, as we just need to run a deduplication step on call to `serialize()`, I think. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155387 Approved by: https://github.com/oulgen	2025-06-20 06:38:29 +00:00
James Wu	3819584f12	[precompile] Implement PrecompileContext for recording precompile artifacts, integrate with CompilePackage (#154415 ) This PR implements a basic interface and test for PrecompileContext, a special CacheArtifactManager specifically designed for precompile. The job of a PrecompileContext is to record things precompile needs as torch is compiling, dump it all into bytes, and then stitch it back together into a cache of callables. ## Why use CacheArtifactManager? Precompile needs a way to record various serializable data as torch is compiling. CacheArtifactManager already does this today pretty well, handling a lot of serialization and cache information. So we're reusing a bunch of that infrastructure directly. ## How is it different from CacheArtifactManager? Unlike regular CacheArtifactManager, PrecompileContext needs to be able to take the recorded artifacts and stitch them together after deserialization, to create a single working callable. Since PrecompileContext doesn't need the cache keys, the "key" field of PrecompileArtifacts can be used for metadata relating to how to stitch the individual functions being compiled together into a full callable. For example, on a given dynamo compile, if there are multiple functions (via graph breaks or recompiles) being compiled, MegaCache would organize it like so: ![image](https://github.com/user-attachments/assets/49a0a75b-1e7f-4d96-8d81-6769fe5a53ca) Whereas we'd visualize PrecompileContext's result like so: ![image](https://github.com/user-attachments/assets/fcc0dd4e-dfbf-4b13-9c08-2e99b373180b) For now, we just handle eager mode; in the diff above, I'll hook up the other backend artifacts from PrecompileContext. After this PR, precompile consists of three main interfaces: ### CompilePackage - Everything needed to run one torch.compile'd function (including graph breaks) - `__init__(fn, cache_entry)` Initializes with a DynamoCacheEntry - `install(backends)` load precompile artifacts into function's dynamo state with a dictionary of backends - `cache_entry()` return a serializable cache entry to save ### DynamoStore - Responsible for tracking CompilePackages on disk (and/or in memory) - `load_package(path)`: load a package given a torch compiled function and a path to the cache artifact - `save_package(package, path): Save a CompiledPackage to a path. Calls PrecompileContext to grab backend data - `record_package(package)`: Record a package to PrecompileContext (for global serialization/deserialization) ### PrecompileContext - Overarching context for serializing and deserializing precompile artifacts. Supports global and local setups. - `serialize()`: (Global) serializes all artifacts in PrecompileContext into bytes - `populate_caches(bytes)`: (Global) takes serialized bytes and puts them into DynamoStore (TODO) - `serialize_artifact_by_key(key)`: (Local) serialize a single artifact by its cache key <img width="1455" alt="image" src="https://github.com/user-attachments/assets/99b61330-7607-4763-bdbc-85b366e82cdd" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/154415 Approved by: https://github.com/zhxchen17 ghstack dependencies: #155118	2025-06-13 14:11:24 +00:00
James Wu	b2fc9cfea1	[precompile] Add CompilePackage to serialize dynamo states. (#155118 ) Adding a per torch.compile() object CompilePackage which tracks dynamo artifact. CompilePackage is considered a low level component and should not be directly exposed to end users. It has the following interface: 1. `CompilePackage.__init__()` which optionally takes previously serialized dynamo states. a. when `dynamo` argument is None, it will contruct a brand new CompilePackage object. b. when `dynamo` argument is not None, it will load a pre-compiled dynamo state. 2. `package.save()` which dumps the dynamo states into _DynamoCacheEntry. 3. `package.install(backends)` which will handle all the side-effectful global scope updates with compiled functions and resume functions. This diff focus on making the low level mechanism for precompile. It will be left to upper level interface to use these API to build more user-facing frontend. Differential Revision: [D75956538](https://our.internmc.facebook.com/intern/diff/D75956538/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/155118 Approved by: https://github.com/jamesjwu Co-authored-by: James Wu <jjwu@meta.com>	2025-06-13 13:54:10 +00:00

39 Commits