mirror of
https://github.com/pytorch/pytorch.git
synced 2025-10-20 21:14:14 +08:00
[Docs] Update PT2 Profiler Torch-Compiled Region Image (#158066)
Summary: In Pytorch 2.5 we added source code attribution to PT2 traces. Each Torch-Compiled Region will now have its frame id and frame compile id associated with it. Update the image in the doc and add a description of this in the doc itself Test Plan: {F1980179183} Rollback Plan: Differential Revision: D78118228 Pull Request resolved: https://github.com/pytorch/pytorch/pull/158066 Approved by: https://github.com/aaronenyeshi
This commit is contained in:
committed by
PyTorch MergeBot
parent
cd80f9a4c3
commit
11d6ad8b2e
Binary file not shown.
Before Width: | Height: | Size: 69 KiB After Width: | Height: | Size: 72 KiB |
@ -134,7 +134,7 @@ Note a few things:
|
||||
|
||||
Although there are logging tools for identifying graph breaks, the profiler provides a quick visual method of identifying :ref:`graph breaks <torch.compiler_graph_breaks>`. There are two profiler events to look for: **Torch-Compiled Region** and **CompiledFunction**.
|
||||
|
||||
**Torch-Compiled Region** - which was introduced in PyTorch 2.2 - is a profiler event that covers the entire compiled region. Graph breaks almost always look the same: nested “Torch-Compiled Region” events.
|
||||
**Torch-Compiled Region** - which was introduced in PyTorch 2.2 - is a profiler event that covers the entire compiled region. Graph breaks almost always look the same: nested “Torch-Compiled Region” events. Starting in PyTorch 2.5, the profiler event will also contain the frame ID and the frame compile ID. The frame ID is a unique identifier for the frame, and the frame compile ID denotes how many times the frame has been compiled.
|
||||
|
||||
If you run two separate functions with torch.compile() applied independently on each of them, you should generally expect to see two adjacent (i.e NOT stacked/nested) Torch-Compiled regions. Meanwhile, if you encounter graph breaks (or disable()'ed/skipped regions), expect nested “Torch-Compiled Region” events.
|
||||
|
||||
@ -249,4 +249,4 @@ One common issue is bad GPU utilization. A quick way to identify this is if ther
|
||||
|
||||
This is often the result of CPU overhead, e.g. if the amount of time spent on the CPU between kernel launches is larger than the amount of time spent by the GPU to process the kernels. The issue is more common for small batch sizes.
|
||||
|
||||
When using inductor, enabling CUDA graphs can often help improve performance when launch overhead is a concern.
|
||||
When using inductor, enabling CUDA graphs can often help improve performance when launch overhead is a concern.
|
||||
|
Reference in New Issue
Block a user