Pull Request resolved: https://github.com/pytorch/pytorch/pull/78164
This PR finishes moving over the python tracer to use the unified event type. Things that changed:
1) The hacky after-the-fact splicing of python events in profiler_kineto.cpp is gone and python events now simply fold into the rest. (Yay!!!) This is a major BE win.
2) Added `ExtraFields<EventType::PyCall>` and `ExtraFields<EventType::PyCCall>`
3) The enter events (time + TraceKey) are now handled by RecordQueue for performance.
4) Python tracing now uses TSC for lower overhead.
Simplifications in profiler_python WRT part 1:
1) Rather than ValueCache emitting an intermediate value_t that gets further converted, load methods can now directly emit ExtraFields<...>
2) The complicated replay in profiler_python.cpp is replaced with a much simpler (and safer) pass to just pair start and end times.
3) During post processing we can now use `CallTypeHelper::map` to automatically pull in all events instead of having to loop over each the entries for each type manually. This will make it simpler to add new types of Python event later.
Differential Revision: [D36515869](https://our.internmc.facebook.com/intern/diff/D36515869/)
Approved by: https://github.com/aaronenyeshi
Summary:
There were two issues with the original PR:
1) My assumption that bound C functions could be trusted to stay alive was not valid. I'm still not entirely sure what was dying, but I've just added a cache so that the first time I see a function I collect the repr just like I was already doing with Python functions.
2) `std::regex` is known to be badly broken and prone to segfaults. Because I'm just doing a very simple prefix prune it's fine to do it manually; see `trimPrefix`. Long term we should move all of PyTorch to `re2` as the internal lint suggests, but CMake is hard and I couldn't get it to work.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68325
Reviewed By: chaekit
Differential Revision: D32432596
Pulled By: robieta
fbshipit-source-id: 06fb4bcdc6933a3e76f6021ca69dc77a467e4b2e
Summary:
This PR instruments the CPython interpreter and integrates the resulting trace into the PyTorch profiler.
The python tracing logic works by enabling `PyEval_SetProfile`, and then logging the minimal information to track every time python calls or returns from a function. A great deal of care has gone into keeping this process very lightweight; the `RawEvent` struct is only two words and doesn't do anything fancy. When a python function is called, we have to do extra work. If the call is to `nn.Module.__call__`, we simply incref to extend the life of the module. Otherwise we check if we have seen the function before, and if not go through the (somewhat expensive) task of saving the strings which we then cache.
To actually get a useful timeline, we have to replay the events to determine the state of the python stack at any given point. A second round of stack replay is needed to figure out what the last python function was for each torch op so we can reconstruct the correct python stack. All of this is done during post processing, so while we want to be reasonably performant it is no longer imperative to shave every last bit.
I still need to do a bit of refinement (particularly where the tracer interfaces with the profiler), but this should give a good sense of the general structure.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67407
Test Plan:
```
import torch
class MyModule(torch.nn.Module):
def __init__(self):
super().__init__()
self.linear = torch.nn.Linear(2, 2)
self.relu = torch.nn.ReLU()
def forward(self, x):
x = self.linear(x)
return self.relu(x)
def call_module():
m = MyModule()
for _ in range(4):
m(torch.ones((2, 2)))
def top_level_fn():
with torch.profiler.profile(with_stack=True) as p:
call_module()
p.export_chrome_trace("test_trace.json")
top_level_fn()
```
<img width="1043" alt="Screen Shot 2021-10-27 at 6 43 18 PM" src="https://user-images.githubusercontent.com/13089297/139171803-f95e70f3-24aa-45e6-9d4b-6d437a3f108d.png">
PS: I've tried to comment liberally, particularly around some of the more magical parts. However I do plan on doing another linting and commenting pass. Hopefully it's not too bad right now.
Reviewed By: gdankel, chaekit
Differential Revision: D32178667
Pulled By: robieta
fbshipit-source-id: 118547104a7d887e830f17b94d3a29ee4f8c482f