mirror of
https://github.com/pytorch/pytorch.git
synced 2025-10-20 21:14:14 +08:00
Add compiler bisector (#131936)
This is a utility to aid the torch.compile debugging. You provide a function that returns True on success, False on failure, or do something out of process and run bisect_helper `good | bad`. The bisector will first go through backends - `eager`, `aot_eager`, `aot_eager_decomp_partition`, `inductor` to find the first failing backend. Then, it will go through subsystems within the backend - currently limited but could be expanded - and try to find the first subsystem for which disabling fixes the problem. Once it has found the failing subsystem, it will find the number of times the subsystem is applied, and then bisect through it. An example usage of how to hook it up for aot_eager_decomp_partition and decomposition subsystem is : ``` from torch._inductor.bisect_helper import BisectionManager if op in CURRENT_DECOMPOSITION_TABLE: if BisectionManager.disable_subsystem("aot_eager_decomp_partition", "decomposition", lambda: repr(op)): return NotImplemented ``` Once it has discovered the problematic change, it will print out the associated debug info, and you can set the same limits with `TORCH_BISECT_BACKEND` `TORCH_BISECT_SUBSYSTEM` and `TORCH_BISECT_MAX`. We could add further options as an automated way of going through a check list for checking divergence - e.g., the mode to emulate amp casts. Fix for https://github.com/pytorch/pytorch/issues/126546 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131936 Approved by: https://github.com/ezyang
This commit is contained in:
committed by
PyTorch MergeBot
parent
cfe970260a
commit
47af7cc962
@ -220,6 +220,7 @@ RUN_PARALLEL_BLOCKLIST = [
|
||||
"test_cuda_nvml_based_avail",
|
||||
# temporarily sets a global config
|
||||
"test_autograd_fallback",
|
||||
"inductor/test_compiler_bisector",
|
||||
] + FSDP_TEST
|
||||
|
||||
# Test files that should always be run serially with other test files,
|
||||
|
Reference in New Issue
Block a user