[submodule kineto] Enable profiler connection to daemon during init for cpu only jobs (#118320)

Fixes #112389 and https://github.com/facebookincubator/dynolog/issues/208

This PR enables profiler initialization for CPU only use cases. The main goal is to enable on-demand profiling with a daemon when using CPU only mode of PyTorch.
* When CUDA is available the profiler is initialized on first CUDA stream creation (or lazily when profiler is run).
* Since the CUDA stream creation callback does not exist on CPU only PyTorch the profiler is never initied on its own.
* Thus the job does not register with Dynolog when we set "KINETO_USE_DAEMON" env variable to set.

Part of the fix is in Kineto https://github.com/pytorch/kineto/pull/861, we point to it in PyTorch.
The change in PyTorch is to correctly set the `cpuOnly` argument.

## TestPlan:

Build PyTorch from source with USE_CUDA=0 so we have CPU only based build.  Git hash = `a40951defd87b9a5e582cf9112bf7a8bd0930c79`
(See instructions in PyTorch repo)

For the setup we run dynolog daemon in another terminal
```
buck2 run dynolog/src:dynolog  -- --enable_ipc_monitor &
```

Now run an example model in PyTorch - see [linear_model.py](https://github.com/facebookincubator/dynolog/blob/main/scripts/pytorch/linear_model_example.py) , and set the device to 'cpu' inside the code instead of 'cuda'.
```
export KINETO_USE_DAEMON=1
python linear_model_example.py
```
Output shows the profiler registration with dynolog
```
(pytorch) [bcoutinho@devgpu038.ftw6 ~/local/pytorch (main)]$ python linear_model_example.py
INFO:2024-01-25 11:08:53 1807792:1807792 init.cpp:122] Registering daemon config loader, cpuOnly =  1
INFO:2024-01-25 11:08:53 1807792:1807792 DaemonConfigLoader.cpp:63] Setting communication fabric enabled = 1
INFO:2024-01-25 11:08:53 1807792:1807792 IpcFabricConfigClient.cpp:93] Setting up IPC Fabric at endpoint: dynoconfigclient0dc36b8a-e14c-4260-958b-4b2e7d15e986 status = initialized
INFO:2024-01-25 11:08:53 1807792:1807792 DaemonConfigLoader.cpp:63] Setting communication fabric enabled = 1
INFO:2024-01-25 11:08:53 1807792:1807792 DaemonConfigLoader.cpp:63] Setting communication fabric enabled = 1
```

We can also collect a trace using
```
[bcoutinho@devgpu038.ftw6 ~/fbsource/fbcode (3bc85f968)]$ buck2 run dynolog/cli:dyno -- gputrace --log-file /tmp/test.json
Kineto config =
ACTIVITIES_LOG_FILE=/tmp/test.json
PROFILE_START_TIME=0
ACTIVITIES_DURATION_MSECS=500
PROFILE_REPORT_INPUT_SHAPES=false
PROFILE_PROFILE_MEMORY=false
PROFILE_WITH_STACK=false
PROFILE_WITH_FLOPS=false
PROFILE_WITH_MODULES=false
response length = 147
response = {"activityProfilersBusy":0,"activityProfilersTriggered":[1807792],"eventProfilersBusy":0,"eventProfilersTriggered":[],"processesMatched":[1807792]}
Matched 1 processes
Trace output files will be written to:
    /tmp/test_1807792.json
```
And trace file contains the trace correctly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118320
Approved by: https://github.com/aaronenyeshi
This commit is contained in:
briancoutinho
2024-02-03 01:40:51 +00:00
committed by PyTorch MergeBot
parent 494c2ec054
commit d91d21fd6f
2 changed files with 5 additions and 2 deletions

View File

@ -1,4 +1,5 @@
#ifdef USE_KINETO
#include <ATen/Context.h>
#include <libkineto.h>
#include <torch/csrc/autograd/profiler_kineto.h>
@ -77,7 +78,9 @@ struct RegisterLibKinetoClient {
static profiler::impl::LibKinetoClient client;
if (std::getenv("KINETO_USE_DAEMON") != nullptr) {
libkineto_init(/*cpuOnly=*/false, /*logOnError=*/true);
libkineto_init(
/*cpuOnly=*/!(at::hasCUDA() || at::hasXPU() || at::hasMTIA()),
/*logOnError=*/true);
libkineto::api().suppressLogMessages();
}