Files
verl/docs/ascend_tutorial/ascend_profiling_en.rst
Blue Space 545f899844 [BREAKING] [perf] refactor: Profiler api refactor (#2894)
### What does this PR do?

Refactor profiler CI to a unified way.

TODO:

- nsys use `save_path`
- nsys descrete tests are disabled
- torch profiler

cc: @davidmlw 

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

Global profiler config:

```yaml
global_profiler:
  _target_: verl.utils.profiler.ProfilerConfig
  tool: null
  steps: null
  profile_continuous_steps: false
  save_path: outputs/profile
  tool_config:
    nsys:
      _target_: verl.utils.profiler.config.NsightToolConfig
      discrete: false
    npu:
      _target_: verl.utils.profiler.config.NPUToolConfig
      discrete: false
      contents: []
      level: level1
      analysis: true
    torch:
      _target_: verl.utils.profiler.config.TorchProfilerToolConfig
      step_start: 0
      step_end: null
```

Local profiler config:

```yaml
profiler:

  # Required when using verl.utils.omega_conf_to_dataclass to instantiate dataclass configs
  _target_: verl.utils.profiler.ProfilerConfig

  # profiler tool, default same as profiler.tool in global config
  # choices: nsys, npu, torch
  tool: ${oc.select:global_profiler.tool,null}

  # whether enable profile on critic
  enable: False

  # Whether to profile all ranks.
  all_ranks: False

  # The ranks that will be profiled. [] or [0,1,...]
  ranks: []

  # profile results saving path
  save_path: ${oc.select:global_profiler.save_path,null}

  # specific tool config
  tool_config: ${oc.select:global_profiler.tool_config,null}
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-11 09:52:41 +08:00

119 lines
3.7 KiB
ReStructuredText

Data collection based on FSDP (Fully Sharded Data Parallel) backend on Ascend devices(NPU)
==========================================================================================
Last updated: 07/24/2025.
This is a tutorial for data collection using the GRPO or DAPO algorithm
based on FSDP on Ascend devices.
Configuration
-------------
Leverage two levels of configuration to control data collection:
1. **Global profiler control**: Use parameters in ``ppo_trainer.yaml`` to control the collection mode and steps.
2. **Role profile control**: Use parameters in each role's ``profile`` field to control the collection mode for each role.
Global collection control
~~~~~~~~~~~~~~~~~~~~~~~~~
Use parameters in ppo_trainer.yaml to control the collection mode
and steps.
- profiler: Control the ranks and mode of profiling
- tool: The profiling tool to use, options are nsys, npu, torch,
torch_memory.
- steps: This parameter can be set as a list that has
collection steps, such as [2, 4], which means it will collect steps 2
and 4. If set to null, no collection occurs.
- save_path: The path to save the collected data. Default is
"outputs/profile".
Use parameters in ``profiler.tool_config.npu`` to control npu profiler behavior:
- level: Collection level—options are level_none, level0, level1, and
level2
- level_none: Disables all level-based data collection (turns off
profiler_level).
- level0: Collect high-level application data, underlying NPU data,
and operator execution details on NPU.
- level1: Extends level0 by adding CANN-layer AscendCL data and AI
Core performance metrics on NPU.
- level2: Extends level1 by adding CANN-layer Runtime data and AI
CPU metrics.
- contents: A list of options to control the collection content, such as
npu, cpu, memory, shapes, module, stack.
- npu: Whether to collect device-side performance data.
- cpu: Whether to collect host-side performance data.
- memory: Whether to enable memory analysis.
- shapes: Whether to record tensor shapes.
- module: Whether to record framework-layer Python call stack
information.
- stack: Whether to record operator call stack information.
- analysis: Enables automatic data parsing.
Role collection control
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In each role's ``profile`` field, you can control the collection mode for that role.
- enable: Whether to enable profiling for this role.
- all_ranks: Whether to collect data from all ranks.
- ranks: A list of ranks to collect data from. If empty, no data is collected.
- tool_config: Configuration for the profiling tool used by this role.
Examples
--------
Disabling collection
~~~~~~~~~~~~~~~~~~~~
.. code:: yaml
profiler:
steps: null # disable profile
End-to-End collection
~~~~~~~~~~~~~~~~~~~~~
.. code:: yaml
profiler:
steps: [1, 2, 5]
discrete: False
actor_rollout_ref:
actor:
profiler:
enable: True
all_ranks: True
Discrete Mode Collection
~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: yaml
profiler:
discrete: True
Visualization
-------------
Collected data is stored in the user-defined save_path and can be
visualized by using the `MindStudio Insight <https://www.hiascend.com/document/detail/zh/mindstudio/80RC1/GUI_baseddevelopmenttool/msascendinsightug/Insight_userguide_0002.html>`_ tool.
If the analysis parameter is set to False, offline parsing is required after data collection:
.. code:: python
import torch_npu
# Set profiler_path to the parent directory of the "localhost.localdomain_<PID>_<timestamp>_ascend_pt" folder
torch_npu.profiler.profiler.analyse(profiler_path=profiler_path)