Fix fake tensor caching when output has unbacked (#153034)

We handle fake tensor caching in two ways: 1. If the inputs have no symbols (SymInt, etc) then we cache on the FakeTensorMode. 2. If the inputs have symbols then we cache on the ShapeEnv. This way the symbols in the inputs and outputs are associated with the guards in place at the time of the call. However - it's possible to have an op where there are no symbols in the inputs but there is an unbacked symbol in the output. In this case we shouldn't cache at all because what would that really mean? So this PR changes the caching behavior so that if there's a symbol in the output which doesn't come in some way from the input then we refuse to cache that op. Added a test which checks for this case. While in there I also did a couple other related changes: 1. Added negative caching - if we see that an (op, args) failed to cache previously we don't even bother trying to cache it again. 2. Reworked the inner behavior of _cached_dispatch_impl a little to make it more clear which bits we expect to be able to throw _BypassDispatchCache and add some comments. The latest version of this also: 1. Addresses the problem that caused #153891. The issue was that with caching ops are required to support `__eq__`. Unfortunately _RecordFunction is minimalistic and doesn't support that - so in the off-chance that two keys hash to the same value the `__eq__` check would raise an exception. Apparently this was much more common on MacOS where memory patterns end up with more reuse (so the object IDs are the same and give you the same hash value for objects that use pointer hash). Tested locally on MacOS where running ``` python test/inductor/test_torchinductor.py GPUTests ``` was pretty much guaranteed to fail (at least for me) somewhere around test 100-200 and passed all 800 tests after this change. Another way to test this is to run the inductor tests with `torch._subclasses.fake_tensor._DispatchCacheKey.__hash__` monkey-patched to return a constant (causing all values to hash-collide) but this can't really be checked-in since it causes the cache lookup to turn into an O(n) lookup which takes a crazy long time to run through all the tests... 2. Folds in #153780 to ensure that exceptions raised from the op don't include the context from the cache key bypass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153034 Approved by: https://github.com/masnesral, https://github.com/tugsbayasgalan
2025-10-20 21:14:14 +08:00 · 2025-05-22 15:51:35 -07:00
parent 866142ff16
commit 4b7abce6a4
3 changed files with 155 additions and 48 deletions
--- a/test/test_fake_tensor.py
+++ b/test/test_fake_tensor.py
@ -2265,13 +2265,10 @@ class FakeTensorDispatchCache(TestCase):
        gc.collect()
        self.assertTrue(count_invoke_subgraph_keys() == 0)

-
-
    @skipIfTorchDynamo("cache hit/miss changes with invoke_subgraph caching")
    def test_invoke_subgraph_cacheable_inplace(self):
        invoke_subgraph = torch._higher_order_ops.invoke_subgraph

-
        def fn(x, y):
            # aten ops are used so that eager backend graph is suitable for fake
            # tensor testing
@ -2317,5 +2314,32 @@ class FakeTensorDispatchCache(TestCase):
                    extract_tensor_metadata(b),
                )

+    @skipIfTorchDynamo("cache hit/miss changes with invoke_subgraph caching")
+    def test_unbacked_output(self):
+        # The point of this test is to have an op which has no symbols as input
+        # but a symbol as an output and make sure that we skip caching it.
+        class LengthsGather(torch.nn.Module):
+            def forward(
+                self,
+                input: torch.Tensor,
+                lengths: torch.Tensor,
+                indices: torch.Tensor,
+                offsets: torch.Tensor,
+            ) -> torch.Tensor:
+                bias = torch.gather(offsets, 0, indices)
+                lengths_selected = torch.gather(lengths, 0, indices)
+                index = torch.repeat_interleave(bias, lengths_selected, dim=0)
+                return index
+
+        input = torch.tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
+        lengths = torch.tensor([0, 2, 3, 1, 4])
+        indices = torch.tensor([2, 3, 4, 6, 7, 8, 9])
+        offsets = torch.cumsum(lengths, 0)
+        ep = torch.export.export(LengthsGather(), (input, lengths, indices, offsets), strict=False)
+
+        FakeTensorMode.cache_clear()
+        ep.run_decompositions({})
+        self.assertBypasses("unrepresented symbol in output", 2)
+
 if __name__ == "__main__":
    run_tests()