[Docs] Update PT2 Profiler Torch-Compiled Region Image (#158066)

Summary: In Pytorch 2.5 we added source code attribution to PT2 traces. Each Torch-Compiled Region will now have its frame id and frame compile id associated with it. Update the image in the doc and add a description of this in the doc itself

Test Plan:
{F1980179183}

Rollback Plan:

Differential Revision: D78118228

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158066
Approved by: https://github.com/aaronenyeshi
This commit is contained in:
Shivam Raikundalia
2025-07-11 07:56:45 +00:00
committed by PyTorch MergeBot
parent cd80f9a4c3
commit 11d6ad8b2e
2 changed files with 2 additions and 2 deletions

Binary file not shown.

Before

Width:  |  Height:  |  Size: 69 KiB

After

Width:  |  Height:  |  Size: 72 KiB

View File

@ -134,7 +134,7 @@ Note a few things:
Although there are logging tools for identifying graph breaks, the profiler provides a quick visual method of identifying :ref:`graph breaks <torch.compiler_graph_breaks>`. There are two profiler events to look for: **Torch-Compiled Region** and **CompiledFunction**.
**Torch-Compiled Region** - which was introduced in PyTorch 2.2 - is a profiler event that covers the entire compiled region. Graph breaks almost always look the same: nested “Torch-Compiled Region” events.
**Torch-Compiled Region** - which was introduced in PyTorch 2.2 - is a profiler event that covers the entire compiled region. Graph breaks almost always look the same: nested “Torch-Compiled Region” events. Starting in PyTorch 2.5, the profiler event will also contain the frame ID and the frame compile ID. The frame ID is a unique identifier for the frame, and the frame compile ID denotes how many times the frame has been compiled.
If you run two separate functions with torch.compile() applied independently on each of them, you should generally expect to see two adjacent (i.e NOT stacked/nested) Torch-Compiled regions. Meanwhile, if you encounter graph breaks (or disable()'ed/skipped regions), expect nested “Torch-Compiled Region” events.
@ -249,4 +249,4 @@ One common issue is bad GPU utilization. A quick way to identify this is if ther
This is often the result of CPU overhead, e.g. if the amount of time spent on the CPU between kernel launches is larger than the amount of time spent by the GPU to process the kernels. The issue is more common for small batch sizes.
When using inductor, enabling CUDA graphs can often help improve performance when launch overhead is a concern.
When using inductor, enabling CUDA graphs can often help improve performance when launch overhead is a concern.