[ROCm][tunableop] UT tolerance increase for matmul_small_brute_force_tunableop at FP16 (#158788)

TunableOp will sometimes find a less precise solution due to the small input vectors used in this UT. Bumping op tolerance to eliminate flakiness. Pull Request resolved: https://github.com/pytorch/pytorch/pull/158788 Approved by: https://github.com/jeffdaily
2025-10-20 21:14:14 +08:00 · 2025-07-22 19:45:35 +00:00
parent 659bfbf443
commit c917c63282
1 changed files with 1 additions and 0 deletions
--- a/test/test_linalg.py
+++ b/test/test_linalg.py
@ -4762,6 +4762,7 @@ class TestLinalg(TestCase):
    @onlyCUDA
    @skipCUDAIfNotRocm  # Skipping due to SM89 OOM in CI, UT doesn't do much on NV anyways
    @dtypes(*floating_types_and(torch.half))
+    @precisionOverride({torch.float16: 1e-1})  # TunableOp may occasionally find less precise solution
    def test_matmul_small_brute_force_tunableop(self, device, dtype):
        # disable tunableop buffer rotation for all tests everywhere, it can be slow
        # We set the TunableOp numerical check environment variable here because it is