[Docs] Update PT2 Profiler Torch-Compiled Region Image (#158066)

Summary: In Pytorch 2.5 we added source code attribution to PT2 traces. Each Torch-Compiled Region will now have its frame id and frame compile id associated with it. Update the image in the doc and add a description of this in the doc itself Test Plan: {F1980179183} Rollback Plan: Differential Revision: D78118228 Pull Request resolved: https://github.com/pytorch/pytorch/pull/158066 Approved by: https://github.com/aaronenyeshi
2025-10-20 21:14:14 +08:00 · 2025-07-11 07:56:45 +00:00
parent cd80f9a4c3
commit 11d6ad8b2e
2 changed files with 2 additions and 2 deletions
--- a/docs/source/_static/img/profiling_torch_compile/graph_breaks_with_torch_compiled_region.png
+++ b/docs/source/_static/img/profiling_torch_compile/graph_breaks_with_torch_compiled_region.png
--- a/docs/source/torch.compiler_profiling_torch_compile.md
+++ b/docs/source/torch.compiler_profiling_torch_compile.md
@ -134,7 +134,7 @@ Note a few things:

 Although there are logging tools for identifying graph breaks, the profiler provides a quick visual method of identifying :ref:`graph breaks <torch.compiler_graph_breaks>`. There are two profiler events to look for: **Torch-Compiled Region** and **CompiledFunction**.

-**Torch-Compiled Region** - which was introduced in PyTorch 2.2 - is a profiler event that covers the entire compiled region. Graph breaks almost always look the same: nested “Torch-Compiled Region” events.
+**Torch-Compiled Region** - which was introduced in PyTorch 2.2 - is a profiler event that covers the entire compiled region. Graph breaks almost always look the same: nested “Torch-Compiled Region” events. Starting in PyTorch 2.5, the profiler event will also contain the frame ID and the frame compile ID. The frame ID is a unique identifier for the frame, and the frame compile ID denotes how many times the frame has been compiled.

 If you run two separate functions with torch.compile() applied independently on each of them, you should generally expect to see two adjacent (i.e NOT stacked/nested) Torch-Compiled regions. Meanwhile, if you encounter graph breaks (or disable()'ed/skipped regions), expect nested “Torch-Compiled Region” events.

@ -249,4 +249,4 @@ One common issue is bad GPU utilization. A quick way to identify this is if ther

 This is often the result of CPU overhead, e.g. if the amount of time spent on the CPU between kernel launches is larger than the amount of time spent by the GPU to process the kernels. The issue is more common for small batch sizes.

-When using inductor, enabling CUDA graphs can often help improve performance when launch overhead is a concern.
+When using inductor, enabling CUDA graphs can often help improve performance when launch overhead is a concern.