Add compiler bisector (#131936)

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

This is a utility to aid the torch.compile debugging. You provide a function that returns True on success, False on failure, or do something out of process and run bisect_helper `good | bad`.

The bisector will first go through backends - `eager`, `aot_eager`, `aot_eager_decomp_partition`, `inductor` to find the first failing backend. Then, it will go through subsystems within the backend - currently limited but could be expanded - and try to find the first subsystem for which disabling fixes the problem. Once it has found the failing subsystem, it will find the number of times the subsystem is applied, and then bisect through it.

An example usage of how to hook it up for aot_eager_decomp_partition and decomposition subsystem is :

```
    from torch._inductor.bisect_helper import BisectionManager
    if op in CURRENT_DECOMPOSITION_TABLE:
        if BisectionManager.disable_subsystem("aot_eager_decomp_partition", "decomposition", lambda: repr(op)):
            return NotImplemented
```

Once it has discovered the problematic change, it will print out the associated debug info, and you can set the same limits with `TORCH_BISECT_BACKEND` `TORCH_BISECT_SUBSYSTEM` and `TORCH_BISECT_MAX`.

We could add further options as an automated way of going through a check list for checking divergence - e.g., the mode to emulate amp casts.

Fix for https://github.com/pytorch/pytorch/issues/126546

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131936
Approved by: https://github.com/ezyang

This commit is contained in:

eellison

2024-10-09 10:10:52 -07:00

committed by

PyTorch MergeBot

parent cfe970260a

commit 47af7cc962

10 changed files with 685 additions and 41 deletions

									
										1

test/run_test.py
									
												View File
												
				@ -220,6 +220,7 @@ RUN_PARALLEL_BLOCKLIST = [

				    "test_cuda_nvml_based_avail",

				    # temporarily sets a global config

				    "test_autograd_fallback",

				    "inductor/test_compiler_bisector",

				] + FSDP_TEST

				# Test files that should always be run serially with other test files,

Add compiler bisector (#131936)

1 test/run_test.py Unescape Escape View File

1

test/run_test.py

View File