166 Commits

Author SHA1 Message Date
fdab48a7c1 Enable all PIE rules on ruff (#165814)
This PR enables all PIE rules on ruff, there are already some enabled rules from this family, the new added rules are
```
PIE796  Enum contains duplicate value: {value}
PIE808  Unnecessary start argument in range
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165814
Approved by: https://github.com/ezyang
2025-10-18 07:36:18 +00:00
24520b8386 Revert "Enable all PIE rules on ruff (#165814)"
This reverts commit c79dfdc6550e872783aa5cb5fc9e86589bf18872.

Reverted https://github.com/pytorch/pytorch/pull/165814 on behalf of https://github.com/cyyever due to Need to cover more files ([comment](https://github.com/pytorch/pytorch/pull/165814#issuecomment-3417931863))
2025-10-18 07:21:08 +00:00
c79dfdc655 Enable all PIE rules on ruff (#165814)
This PR enables all PIE rules on ruff, there are already some enabled rules from this family, the new added rules are
```
PIE796  Enum contains duplicate value: {value}
PIE808  Unnecessary start argument in range
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165814
Approved by: https://github.com/ezyang
2025-10-18 06:40:12 +00:00
b54e466fd0 Megacache integration (#163533)
This diff adds megacache integration for DynamoCache.

Because DynamoCache requires lazy serialization, i.e. it can only be serialized once all relevant backends have been compiled and we're ready for a save, we actually do the DynamoCache saving only on a call to `torch.compiler.save_cache_artifacts`.

Differential Revision: [D82735763](https://our.internmc.facebook.com/intern/diff/D82735763/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163533
Approved by: https://github.com/oulgen, https://github.com/zhxchen17
2025-10-15 22:49:15 +00:00
159c2140f7 test: ensure editable cached wrapper is respected (#160943)
## Summary
- add a test verifying that editing the local cache wrapper is picked up after Dynamo reset

## Testing
- `lintrunner -a` *(fails: FLAKE8 failure, TEST_HAS_MAIN failure, CODESPELL failure, PYFMT failure)*
- `PYTHONPATH=. python test/inductor/test_codecache.py TestPyCodeCache.test_editable_cached_wrapper -v`

------
https://chatgpt.com/codex/tasks/task_e_68a3aa3fcc9883239b17d1f4250d1e89

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160943
Approved by: https://github.com/xmfan, https://github.com/albanD
2025-09-18 19:24:51 +00:00
4d4abec80f allow user to pass in custom partitioner function (#157580)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157580
Approved by: https://github.com/bdhirsh
2025-09-05 22:49:39 +00:00
0cd6c56bdf Revert "test: ensure editable cached wrapper is respected (#160943)"
This reverts commit bbedc71fd3267c639c38b4ec25eaa22f973d9c4d.

Reverted https://github.com/pytorch/pytorch/pull/160943 on behalf of https://github.com/jeanschmidt due to See [D81486248](https://www.internalfb.com/diff/D81486248) for details on broken test ([comment](https://github.com/pytorch/pytorch/pull/160943#issuecomment-3249565671))
2025-09-03 14:46:35 +00:00
bbedc71fd3 test: ensure editable cached wrapper is respected (#160943)
## Summary
- add a test verifying that editing the local cache wrapper is picked up after Dynamo reset

## Testing
- `lintrunner -a` *(fails: FLAKE8 failure, TEST_HAS_MAIN failure, CODESPELL failure, PYFMT failure)*
- `PYTHONPATH=. python test/inductor/test_codecache.py TestPyCodeCache.test_editable_cached_wrapper -v`

------
https://chatgpt.com/codex/tasks/task_e_68a3aa3fcc9883239b17d1f4250d1e89

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160943
Approved by: https://github.com/xmfan
2025-09-02 01:48:30 +00:00
0e3e377bd5 [inductor] fix CompiledArtifact.load path on Windows. (#160268)
fix CompiledArtifact.load path on Windows.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160268
Approved by: https://github.com/ezyang
2025-08-10 14:22:52 +00:00
af10f1f86c Fix requires_cuda to requires_cuda_and_triton (#160222)
Fixes ##159399

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160222
Approved by: https://github.com/janeyx99
2025-08-10 07:05:52 +00:00
8047421fbb [Linter] Expanding the scope of detecting device-bias code. (#159949)
Currently, the device-bias linter only targets functions decorated with @requires_gpu. This PR adds support for two new detection scenarios:
1. Detect device-bias code in functions decorated with @requires_triton.
2. Detect device-bias code for entire test suites that are defined as shared across GPUs. For example:
```
if __name__ == "__main__":
    if HAS_GPU:
        run_tests()

```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159949
Approved by: https://github.com/EikanWang, https://github.com/jansel
2025-08-09 09:41:16 +00:00
50f23ff6f8 rename-HAS_CUDA-to-HAS_CUDA_AND_TRITON (#159883)
Fixes #159399
"Modified torch.testing._internal.inductor_utils and test/inductor"

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159883
Approved by: https://github.com/janeyx99
2025-08-08 15:44:52 +00:00
19aa8eb4f5 [TF32][Flex Attention] Turn off TF32 for reference computation in test_flex_decoding (#158979)
Seems to avoid threshold (fudge factor) twiddling games as this causes the checks to go down the "very small ref error" path instead.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158979
Approved by: https://github.com/drisspg, https://github.com/BoyuanFeng, https://github.com/nWEIdia
2025-07-28 18:38:23 +00:00
d3d9bc1c31 [inductor] Allow backends to register their own custom config object (#158254)
An out of tree backend can have its own configuration options that the user can enable to control inductor compilation. These config options need to be taken into account when calculating the key that is used to determine cache miss / hits. This PR allows out of tree backends to specify a custom config module that has the same type as `torch._inductor.config` that can be used to control codegen (in addition to the default config), and will be used when creating the cache key.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158254
Approved by: https://github.com/eellison
2025-07-23 15:56:06 +00:00
fb462cec8d Normalize placeholder names in AOTAutogradCache (#157916)
This PR adds a pass to sanitize_gm_for_cache which normalizes all placeholder names across input dynamo graphs to AOTAutogradCache. This is safe because nothing underneath AOTAutograd uses the node names on the
original dynamo graph: AOTAutograd re-traces with its own nodes, and guards are
in terms of original sources rather than placeholder names.

Note that the dynamo output graphs traced by tlparse will not show this change because it's done before this sanitization step. The aot autograd outputs also will not change because AOTAutograd's own traced graphs don't use the original placeholders of the dynamo graph. Thus, this change is essentially a no-op from everyone's perspective except for cache key checks.

Fixes #157792

Pull Request resolved: https://github.com/pytorch/pytorch/pull/157916
Approved by: https://github.com/zou3519
2025-07-14 17:45:11 +00:00
5bd7804be2 Support caching if joint_custom_pre_pass/joint_custom_post_pass implement the proper interface (#157990)
Summary: Essentially, treat joint_custom_pre_pass/joint_custom_post_pass the same as post_grad_custom_post_pass/post_grad_custom_pre_pass.

Test Plan: More unit tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/157990
Approved by: https://github.com/oulgen
2025-07-10 19:17:11 +00:00
a9537b626c [standalone_compile] Fix single Tensor outputs from split_module (#157803)
We assumed that the output in an FX graph would always just be a
list[Tensor], even in the single tensor return case.
It is possible for the output to be a single Tensor. This can happen
by calling torch.fx.split_module on the module.

Test Plan:
- new test

Pull Request resolved: https://github.com/pytorch/pytorch/pull/157803
Approved by: https://github.com/oulgen
2025-07-10 12:49:03 +00:00
17687eb792 [BE][4/6] fix typos in test/ (test/inductor/) (#157638)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157638
Approved by: https://github.com/yewentao256, https://github.com/jansel
2025-07-06 06:34:25 +00:00
f5e6e52f25 [BE][PYFMT] migrate PYFMT for test/inductor/ to ruff format (#148186)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148186
Approved by: https://github.com/jansel
2025-06-24 11:12:11 +00:00
a2a75be0f8 Rename inductor cache (#156128)
Requested by Simon on a different PR

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156128
Approved by: https://github.com/xmfan
2025-06-17 03:57:18 +00:00
ce79056471 Custom FX pass for inductor's backend registration (#154841)
This PR is related to RFC #153532. It is an extension to Inductor's backend registration interface to allow to register custom FX passes by the backend.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154841
Approved by: https://github.com/jansel

Co-authored-by: Jason Ansel <jansel@jansel.net>
2025-06-14 17:29:54 +00:00
a2b0b2698d inductor codecache: include private inductor configs in cache key (#153672)
Fixes https://github.com/pytorch/torchtitan/issues/1185

It looks like inductor's logic to include inductor configs in the cache key skips configs with a leading underscore by default. This came up in torchtitan - there's an asyncTP pipelining pass in inductor gated by a private config, and by not caching on the config we were attempting to use asyncTP when we shouldn't be.

I'm not sure how worried we should be on the blast radius of this change. On the one hand:

(1) it technically fixes any silent correctness issues in the cache around any other private inductor configs (it looks like there are a few)

(2) there is some risk that there are some "harmless" configs that we are now including in the key, which may increase false negatives. I do see that there is an explicit list for "configs we want to ignore for caching" (`_save_config_ignore`), so my hope is that all harmless configs are already encapsulated there.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153672
Approved by: https://github.com/oulgen
2025-06-11 01:33:24 +00:00
79bdafe5b6 Revert "Custom FX pass for inductor's backend registration (#154841)"
This reverts commit e694280d1215caf70f41575f2611bfa26c69ebdb.

Reverted https://github.com/pytorch/pytorch/pull/154841 on behalf of https://github.com/clee2000 due to failing some tests internally D76135706 ([comment](https://github.com/pytorch/pytorch/pull/154841#issuecomment-2956357711))
2025-06-09 16:56:45 +00:00
e694280d12 Custom FX pass for inductor's backend registration (#154841)
This PR is related to RFC #153532. It is an extension to Inductor's backend registration interface to allow to register custom FX passes by the backend.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154841
Approved by: https://github.com/jansel

Co-authored-by: Jason Ansel <jansel@jansel.net>
2025-06-06 06:49:44 +00:00
fa63de0866 Handle empty linemaps in PyCodeCache (#155064)
Some functions have empty linemaps, and if you call `PyCodeCache.stack_frames_for_code` on code in the wrong order, you'll end up triggering a too many values to unpack issue: https://github.com/pytorch/pytorch/issues/154536

Specifically, if you populate PyCodeCache's linemap via caching, and then request the stack frames of a inductor generated output file that has an empty linemap, this function will try to unpack too many arguments.

Test plan:
```
import os

os.environ["TORCHINDUCTOR_FX_GRAPH_CACHE"] = "1"
os.environ["TORCHINDUCTOR_AUTOGRAD_CACHE"] = "1"

import torch

@torch.compile
def fn(x: torch.Tensor):
    (x_grad,) = torch.autograd.grad(x.sum(), x)
    return x_grad

x = torch.randn(10, 10, requires_grad=True)
result = fn(x)
```

Run this twice and see that everything works as expected.

It's hard to exactly pinpoint a good unit test for this: it requires a whole lot of moving parts to get the issue to trigger because:

- The callsite in question in dynamo, without caching, will always run before generating the code, so cls.linemaps[path] will be None most of the time
- The inductor generated output needs to call *back* into dynamo via `assert_size_stride`
- In our test case, the CompiledBackward needs to not have linemaps, and also be called in the middle of a graph break while compiling a different cached function. Caching switches the order the PyCodeCache.linemap is populated (i.e. either before or after the graph break is evaluated), which causes the issue.

All these things need to interact together to create the bug, so it's a bit difficult to write a simple unit test.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155064
Approved by: https://github.com/bdhirsh
2025-06-05 03:54:35 +00:00
a4da1d4a47 [Graph Partition] support standalone_compile (#154698)
For graph partition, `write_get_raw_stream_header_once` is done once so the autotune code may not have the header. This PR additionally calls `write_get_raw_stream_header` in `codegen_device_guard_enter` before `get_raw_stream` is used.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154698
Approved by: https://github.com/oulgen
2025-06-03 07:40:42 +00:00
6cb6da6ea2 [triton pin][test] relax codecache test checks for number of triton artifacts (#154879)
Triton has added another artifact that gets generated (triton-lang/triton#6992), so `test_cache_load_function` started failing as there are now 8 (instead of 7) artifacts.

Instead of figuring out a way to check exactly which set of artifacts will get generated, I instead modified the test to just check that there are _at least_ 6 artifacts, to account for different platforms (intel/amd/nvidia) and different triton versions (which may or may not have a `.source` artifact)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154879
Approved by: https://github.com/oulgen, https://github.com/masnesral
2025-06-03 00:52:54 +00:00
31f95b5d2e Revert "inductor codecache: include private inductor configs in cache key (#153672)"
This reverts commit 2c1cb38d9516e10474b4f12a2e839046648a71a8.

Reverted https://github.com/pytorch/pytorch/pull/153672 on behalf of https://github.com/malfet due to Looks like it regressed pr_time_benchmarks, see ba3f91af97/1 ([comment](https://github.com/pytorch/pytorch/pull/153672#issuecomment-2922759739))
2025-05-30 15:54:14 +00:00
2c1cb38d95 inductor codecache: include private inductor configs in cache key (#153672)
Fixes https://github.com/pytorch/torchtitan/issues/1185

It looks like inductor's logic to include inductor configs in the cache key skips configs with a leading underscore by default. This came up in torchtitan - there's an asyncTP pipelining pass in inductor gated by a private config, and by not caching on the config we were attempting to use asyncTP when we shouldn't be.

I'm not sure how worried we should be on the blast radius of this change. On the one hand:

(1) it technically fixes any silent correctness issues in the cache around any other private inductor configs (it looks like there are a few)

(2) there is some risk that there are some "harmless" configs that we are now including in the key, which may increase false negatives. I do see that there is an explicit list for "configs we want to ignore for caching" (`_save_config_ignore`), so my hope is that all harmless configs are already encapsulated there.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153672
Approved by: https://github.com/oulgen
ghstack dependencies: #153766
2025-05-30 00:24:29 +00:00
254293b777 Add flag _metrics_log_runtime to disable runtime metric logging by default (#153506)
https://github.com/pytorch/pytorch/pull/152708 expanded support of `get_estimated_runtime` to many more types of `SchedulerNodes`. This caused an increase in compile time because we're always calling `get_estimated_runtime` to populate the metrics table. This PR adds a flag for this logging, which reduces the instruction count by 8%. Long term, we should probably merge metrics.py with TORCH_LOGS/tlparse (suggestion from @xmfan).

Update: added support for TORCH_LOGS for the metrics logging.

Test Plan:
mm_loop.py and many existing tests cover.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153506
Approved by: https://github.com/eellison
2025-05-22 01:02:11 +00:00
b184e3da9c [easy] Fix internal only test (#154035)
Internally static cuda launcher isn't enabled, so we need to always enable it

Differential Revision: [D75146584](https://our.internmc.facebook.com/intern/diff/D75146584/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154035
Approved by: https://github.com/Skylion007
ghstack dependencies: #153565
2025-05-21 19:00:55 +00:00
bb7e30c165 [MegaCache] Make MegaCache generic to allow external plugins registration (#152977)
Implements #152976

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152977
Approved by: https://github.com/oulgen
2025-05-21 18:18:47 +00:00
4b759d98f8 Recheck autotune cache on static cuda launcher load (#153565)
When loading statically launchable triton kernels from FxGraphCache, since we don't instantiate a CachingAutotuner like we do normally, we need to recheck the autotune cache based on the existing compile results. If we get a hit, we take the compile result whose config matches the best config.

Sometimes, the best config will have been from coordinate descent tuning. In this case, FxGraphCache today does not cache the resulting triton kernel, neither with static or without static cuda launcher. This is because coordinate descent tuning happens at runtime, and if the best config happens to not be one of the precompiled configs.

Test Plan:
New unit test that failed before

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153565
Approved by: https://github.com/aorenste
2025-05-20 14:00:43 +00:00
b0e5402377 Revert "Recheck autotune cache on static cuda launcher load (#153565)"
This reverts commit 02af4e88e4e76309672dbc9b5970ae630df525c7.

Reverted https://github.com/pytorch/pytorch/pull/153565 on behalf of https://github.com/malfet due to Looks like it broke ROCM, see ee72c53c88/1 ([comment](https://github.com/pytorch/pytorch/pull/153565#issuecomment-2891673913))
2025-05-19 16:52:48 +00:00
02af4e88e4 Recheck autotune cache on static cuda launcher load (#153565)
When loading statically launchable triton kernels from FxGraphCache, since we don't instantiate a CachingAutotuner like we do normally, we need to recheck the autotune cache based on the existing compile results. If we get a hit, we take the compile result whose config matches the best config.

Sometimes, the best config will have been from coordinate descent tuning. In this case, FxGraphCache today does not cache the resulting triton kernel, neither with static or without static cuda launcher. This is because coordinate descent tuning happens at runtime, and if the best config happens to not be one of the precompiled configs.

Test Plan:
New unit test that failed before

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153565
Approved by: https://github.com/aorenste
2025-05-19 12:50:22 +00:00
dd7d231ed3 [cutlass backend][test] re-enable test_cuda_compile_command for fbcode (#153001)
Differential Revision: [D74284047](https://our.internmc.facebook.com/intern/diff/D74284047/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153001
Approved by: https://github.com/ColinPeppler
2025-05-07 19:06:24 +00:00
c461ba6522 [aot] mark dynamic activations as maybe dynamic (#149707)
Today, we mark graph outputs as maybe dynamic, this lets a compilation to communicate to future compilations whether certain graph inputs are dynamic. Similarly, we can do this to saved activations, which may be used in future compilations as well. This is especially prevalent in compiled autograd, where tensor activations will always become graph inputs.

Changes to the tests were mainly cosmetic, with the exception of tests that relied on duck shaping. By annotating tensor dims, we prevent them from reusing pre-existing symbols, so this change will make graphs use duck shapes less than before, which affects some of the caching tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149707
Approved by: https://github.com/bdhirsh
2025-05-01 21:59:36 +00:00
22ecaeb145 [standalone_compile] fix dynamic shapes with config_patches (#152462)
compile_fx with config_patches goes down another path where we need to
propagate the kwarg...

Test Plan:
- updated test

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152462
Approved by: https://github.com/oulgen
2025-04-30 21:02:14 +00:00
f9bdfe90ae [MegaCache] Return None on no compilation (#151921)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151921
Approved by: https://github.com/jamesjwu
2025-04-23 04:32:06 +00:00
596296fb0b [standalone_compile] Dynamic shape handling (#151788)
standalone_compile needs to get dynamic shape information from
somewhere. We add a new `dynamic_shapes` argument with three options:

1. from the passed-in graph (dynamic="from_graph"). This is the default.
2. from the example inputs, thereby specializing on them. (dynamic="from_example_inputs")
3. from the current tracing context (dynamic="from_tracing_context")

1 and 3 are not exactly the same. 2 can also be used for more advanced
things... (specialize on one input but not the other).

Most of this PR is tests.

Test Plan:
- a lot of new tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151788
Approved by: https://github.com/oulgen
2025-04-22 20:17:24 +00:00
67c2869a38 Unpack the output code in the standalone_compile (#151609)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151609
Approved by: https://github.com/zou3519
ghstack dependencies: #151768
2025-04-21 17:37:38 +00:00
287998b87f Run standalone compile tests on cpu/gpu (#151768)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151768
Approved by: https://github.com/zou3519
2025-04-21 17:37:29 +00:00
29317f8585 [standalone_compile] Some misc fixes (#151502)
This PR fixes two things.

The first problem is that in the vLLM style standalone_compile is
called from within a custom torch.compile backend. If there already is a
FakeTensorMode (which there is), we shouldn't create a new
FakeTensorMode with the same shape_env, instead we should just reuse the
same FakeTensorMode.

The second thing is that compile_fx can mutate the passed in gm, so we
deepcopy (since standalone_compile should be standalone)

Test Plan:
- new test
- updated old tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151502
Approved by: https://github.com/oulgen
ghstack dependencies: #151501, #151551
2025-04-18 12:34:13 +00:00
58310a0043 [standalone_compile] support multiple returns (#151551)
We were only returning the first one. There's an edge case on what to do
if the original function returns a single Tensor. capture(f) returns a
function that returns a tuple of one Tensor in this case and we were
originally converting this back to one single Tensor. I think it's fine
to return a tuple of one Tensor (that is what the graph passed to
standalone_compile asked for!) but we can revisit.
fine

Test Plan:
- modified one test to used multiple outputs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151551
Approved by: https://github.com/Skylion007, https://github.com/oulgen
ghstack dependencies: #151501
2025-04-18 12:34:13 +00:00
ac715e96b4 [standalone_compile] Don't check if path is directory if it doesn't exist (#151501)
os.path.isdir(path) will return False if the path doesn't exist.

Test Plan:
- new test

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151501
Approved by: https://github.com/Skylion007, https://github.com/oulgen
2025-04-18 12:34:13 +00:00
8404c09b15 [MegaCache] Rename the PGO artifact when used between different jobs (#151482)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151482
Approved by: https://github.com/bobrenjc93, https://github.com/jamesjwu
2025-04-17 17:09:29 +00:00
3cf0e2d8ec Add inductor standalone_compile API (#150670)
This PR adds standalone_compile API that does precompilation via caching to support vLLM use case in the short term while we work on the longer term precompilation solution.

```
standalone_compile(gm, example_inputs, options) -> CompiledArtifact
CompiledArtifact.save(path, format: binary|unpacked = binary)
CompiledArtifact.load(path, format: binary|unpacked = binary)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150670
Approved by: https://github.com/jamesjwu, https://github.com/zou3519
2025-04-15 23:38:15 +00:00
74f6bc28a7 Revert "Add inductor standalone_compile API (#150670)"
This reverts commit c9aef508984a31f03821eaad381468673ef29c0a.

Reverted https://github.com/pytorch/pytorch/pull/150670 on behalf of https://github.com/Camyll due to breaking internal builds with torch module not found error ([comment](https://github.com/pytorch/pytorch/pull/150670#issuecomment-2806975267))
2025-04-15 17:35:59 +00:00
c9aef50898 Add inductor standalone_compile API (#150670)
This PR adds standalone_compile API that does precompilation via caching to support vLLM use case in the short term while we work on the longer term precompilation solution.

```
standalone_compile(gm, example_inputs, options) -> CompiledArtifact
CompiledArtifact.save(path, format: binary|unpacked = binary)
CompiledArtifact.load(path, format: binary|unpacked = binary)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150670
Approved by: https://github.com/jamesjwu, https://github.com/zou3519
2025-04-14 22:00:09 +00:00
070f389745 Mark auto_functionalized HOPs as cacheable (#151194)
Fixes #151188

Test Plan:
- new tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151194
Approved by: https://github.com/oulgen, https://github.com/anijain2305
ghstack dependencies: #151193
2025-04-14 20:05:32 +00:00