DebugPlane: add dump_traceback handler (#128904)

This adds a `dump_traceback` handler so you can see all running threads for a job. This uses a temporary file as a buffer when calling `faulthandler.dump_traceback` and requires the GIL to be held during dumping.

Test plan:

```
python test/distributed/elastic/test_control_plane.py -v -k traceback
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128904
Approved by: https://github.com/c-p-i-o
This commit is contained in:
Tristan Rice
2024-06-18 03:40:14 +00:00
committed by PyTorch MergeBot
parent 17abbafdfc
commit 59b4983dc0
3 changed files with 51 additions and 0 deletions

View File

@ -927,6 +927,7 @@ libtorch_python_distributed_sources = libtorch_python_distributed_core_sources +
"torch/csrc/distributed/rpc/unpickled_python_call.cpp",
"torch/csrc/distributed/rpc/unpickled_python_remote_call.cpp",
"torch/csrc/jit/runtime/register_distributed_ops.cpp",
"torch/csrc/distributed/c10d/control_plane/PythonHandlers.cpp",
]
def glob_libtorch_python_sources(gencode_pattern = ":generate-code[{}]"):