[ROCm][tunableop] UT tolerance increase for matmul_small_brute_force_tunableop at FP16 (#158788)

TunableOp will sometimes find a less precise solution due to the small input vectors used in this UT. Bumping op tolerance to eliminate flakiness.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158788
Approved by: https://github.com/jeffdaily
This commit is contained in:
Nichols A. Romero
2025-07-22 19:45:35 +00:00
committed by PyTorch MergeBot
parent 659bfbf443
commit c917c63282

View File

@ -4762,6 +4762,7 @@ class TestLinalg(TestCase):
@onlyCUDA
@skipCUDAIfNotRocm # Skipping due to SM89 OOM in CI, UT doesn't do much on NV anyways
@dtypes(*floating_types_and(torch.half))
@precisionOverride({torch.float16: 1e-1}) # TunableOp may occasionally find less precise solution
def test_matmul_small_brute_force_tunableop(self, device, dtype):
# disable tunableop buffer rotation for all tests everywhere, it can be slow
# We set the TunableOp numerical check environment variable here because it is