verl/docs/ascend_tutorial/ascend_profiling_en.rst

Data collection based on FSDP (Fully Sharded Data Parallel) backend on Ascend devices(NPU)
==========================================================================================

Last updated: 07/24/2025.

This is a tutorial for data collection using the GRPO or DAPO algorithm
based on FSDP on Ascend devices.

Configuration
-------------

Leverage two levels of configuration to control data collection:

1. **Global profiler control**: Use parameters in ``ppo_trainer.yaml`` to control the collection mode and steps.
2. **Role profile control**: Use parameters in each role's ``profile`` field to control the collection mode for each role.

Global collection control
~~~~~~~~~~~~~~~~~~~~~~~~~

Use parameters in ppo_trainer.yaml to control the collection mode
and steps.

-  profiler: Control the ranks and mode of profiling

   -  tool: The profiling tool to use, options are nsys, npu, torch,
      torch_memory.
   -  steps: This parameter can be set as a list that has
      collection steps, such as [2, 4], which means it will collect steps 2
      and 4. If set to null, no collection occurs.
   -  save_path: The path to save the collected data. Default is
      "outputs/profile".

Use parameters in ``profiler.tool_config.npu`` to control npu profiler behavior:

-  level: Collection level—options are level_none, level0, level1, and
   level2

   -  level_none: Disables all level-based data collection (turns off
      profiler_level).
   -  level0: Collect high-level application data, underlying NPU data,
      and operator execution details on NPU.
   -  level1: Extends level0 by adding CANN-layer AscendCL data and AI
      Core performance metrics on NPU.
   -  level2: Extends level1 by adding CANN-layer Runtime data and AI
      CPU metrics.

-  contents: A list of options to control the collection content, such as
   npu, cpu, memory, shapes, module, stack.

   -  npu: Whether to collect device-side performance data.
   -  cpu: Whether to collect host-side performance data.
   -  memory: Whether to enable memory analysis.
   -  shapes: Whether to record tensor shapes.
   -  module: Whether to record framework-layer Python call stack
      information.
   -  stack: Whether to record operator call stack information.

-  analysis: Enables automatic data parsing.


Role collection control
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In each role's ``profile`` field, you can control the collection mode for that role.

-  enable: Whether to enable profiling for this role.
-  all_ranks: Whether to collect data from all ranks.
-  ranks: A list of ranks to collect data from. If empty, no data is collected.
-  tool_config: Configuration for the profiling tool used by this role.


Examples
--------

Disabling collection
~~~~~~~~~~~~~~~~~~~~

.. code:: yaml

      profiler:
         steps: null # disable profile

End-to-End collection
~~~~~~~~~~~~~~~~~~~~~

.. code:: yaml

      profiler:
         steps: [1, 2, 5]
         discrete: False
      actor_rollout_ref:
         actor:
            profiler:
               enable: True
               all_ranks: True


Discrete Mode Collection
~~~~~~~~~~~~~~~~~~~~~~~~

.. code:: yaml

      profiler:
         discrete: True


Visualization
-------------

Collected data is stored in the user-defined save_path and can be
visualized by using the `MindStudio Insight <https://www.hiascend.com/document/detail/zh/mindstudio/80RC1/GUI_baseddevelopmenttool/msascendinsightug/Insight_userguide_0002.html>`_ tool.

If the analysis parameter is set to False, offline parsing is required after data collection:

.. code:: python

    import torch_npu
    # Set profiler_path to the parent directory of the "localhost.localdomain_<PID>_<timestamp>_ascend_pt" folder
    torch_npu.profiler.profiler.analyse(profiler_path=profiler_path)