mirror of
https://github.com/pytorch/pytorch.git
synced 2025-10-20 21:14:14 +08:00
[doc] Add AOTInductor intermediate debug printer OSS user manual (#163794)
Summary: Add a OSS user manual for AOTI intermediate debug printer so we can link it in the Pytorch conference poster. Test Plan: N/A Differential Revision: D83171374 Pull Request resolved: https://github.com/pytorch/pytorch/pull/163794 Approved by: https://github.com/yushangdi
This commit is contained in:
committed by
PyTorch MergeBot
parent
55840fb4bb
commit
0b0ed6fd33
BIN
docs/source/_static/img/aoti_debug_printer/after_launch.png
Normal file
BIN
docs/source/_static/img/aoti_debug_printer/after_launch.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 194 KiB |
BIN
docs/source/_static/img/aoti_debug_printer/before_launch.png
Normal file
BIN
docs/source/_static/img/aoti_debug_printer/before_launch.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 197 KiB |
83
docs/source/torch.intermediate_debug_printer.md
Normal file
83
docs/source/torch.intermediate_debug_printer.md
Normal file
@ -0,0 +1,83 @@
|
||||
```{eval-rst}
|
||||
:orphan:
|
||||
```
|
||||
|
||||
# AOTInductor Intermediate Value Debug Printer
|
||||
|
||||
This is a user manual on how to use AOT Inductor Intermediate Value Debug Printer tool which is a utility tool that can help pinpoint CUDA IMA kernels / numerical discrepancies when uses AOT Inductor to compile a PyTorch model.
|
||||
|
||||
The main functionality of this tool is to automatically print out / or dump the value info of all intermediate tensor arguments before and after each kernel launch call in AOT Inductor.
|
||||
|
||||
## How to use
|
||||
|
||||
The debug printer can be configured via environment variable. The following flags are both supported to run with internal fbcode buck commands and OSS.
|
||||
|
||||
All configurations are defined here: [torch/_inductor/config.py](https://github.com/pytorch/pytorch/blob/768361e67f0eb36491d7b763ef38d7c928ebefe6/torch/_inductor/config.py#L1493-L1505)
|
||||
|
||||
|
||||
```
|
||||
# options for debug printing/saving for intermediate tensor values for aot inductor
|
||||
|
||||
0: disable debug dumping
|
||||
1: enable saving intermediate tensor values
|
||||
2: enable printing intermediate tensor values
|
||||
3: enable printing kernel names only (useful for pinpointing troublesome kernels)
|
||||
```
|
||||
|
||||
|
||||
1. To enable **default** mode debug printing:
|
||||
|
||||
- Add flag `AOT_INDUCTOR_DEBUG_INTERMEDIATE_VALUE_PRINTER=2` (PRINT_ONLY mode) for default printing all supported kernel tensor arg values.
|
||||
|
||||
- Add flag `AOT_INDUCTOR_FILTERED_KERNELS_TO_PRINT={kernel_name_1, kernel_name_2,...}` for selectively printing tensor values associated with the specified kernels. (suggest to do a run with generating full printing logs first)
|
||||
|
||||
Sample command:
|
||||
|
||||
```
|
||||
AOT_INDUCTOR_FILTERED_KERNELS_TO_PRINT="aoti_torch_cuda_addmm_out" AOT_INDUCTOR_DEBUG_INTERMEDIATE_VALUE_PRINTER=2 TORCH_LOGS="+inductor, output_code" python test/inductor/test_aot_inductor.py -k test_addmm_cuda
|
||||
```
|
||||
|
||||
|
||||
2. To enable **pinpoint** the problematic kernel name only: (Especially useful in CUDA IMA debugging)
|
||||
|
||||
- Add flag `AOT_INDUCTOR_DEBUG_INTERMEDIATE_VALUE_PRINTER=3` (PRINT_KERNEL_NAME_ONLY mode) no tensor numerical values will be dumped.
|
||||
|
||||
Sample command:
|
||||
|
||||
```
|
||||
AOT_INDUCTOR_DEBUG_INTERMEDIATE_VALUE_PRINTER=3 TORCH_LOGS="+inductor, output_code" python test/inductor/test_aot_inductor.py -k test_addmm_cuda
|
||||
```
|
||||
|
||||
3. To enable **save** the intermediate tensor values:
|
||||
|
||||
- Useful when you want to repro the error in a standalone kernel debugging repro. The saved intermediate tensor values can be used as debugging inputs to the problematic kernel.
|
||||
- Set `AOT_INDUCTOR_DEBUG_INTERMEDIATE_VALUE_PRINTER=1` (SAVE_ONLY mode) for default saving all supported kernel tensor arg values to `.pt` in a tmp folder.
|
||||
- Similarly, add `AOT_INDUCTOR_FILTERED_KERNELS_TO_PRINT={kernel_name_1, kernel_name_2,...}` for selectively saving tensor values associated with the specified kernels.
|
||||
|
||||
Sample command:
|
||||
```
|
||||
AOT_INDUCTOR_FILTERED_KERNELS_TO_PRINT="triton_poi_fused_0" AOT_INDUCTOR_DEBUG_INTERMEDIATE_VALUE_PRINTER=1 TORCH_LOGS="+inductor, output_code" python test/inductor/test_aot_inductor.py -k test_addmm_cuda
|
||||
```
|
||||
|
||||
The saved tensor values will be dumped in a format: `<before/after_launch>_<kernel_name>_<arg_name>_<device>.pt`
|
||||
|
||||
The dumped `.pt` tensors can be further loaded and used like this:
|
||||
```
|
||||
def _load_tensor(path):
|
||||
return torch.load(path, weights_only=True)
|
||||
tensor = _load_tensor("../tmp/aoti_torch/before_launch_aoti_torch_cuda_addmm_out_buf1_cuda:0.pt")
|
||||
|
||||
# Simply print tensor to view the full value
|
||||
print(tensor)
|
||||
```
|
||||
|
||||
## Example Outputs
|
||||
|
||||
Before launch tensor stats:
|
||||
|
||||

|
||||
|
||||
|
||||
After launch tensor stats:
|
||||
|
||||

|
Reference in New Issue
Block a user