mirror of
https://github.com/volcengine/verl.git
synced 2025-10-20 13:43:50 +08:00
[deployment, doc] feat: Add SkyPilot integration examples (#3333)
### What does this PR do? Adds SkyPilot integration examples for running verl training jobs on Kubernetes/cloud platforms with GPUs. Includes configurations for PPO, GRPO, and multi-turn tool usage training. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/pulls?q=is%3Apr+skypilot - [x] Format the PR title as `[{modules}] {type}: {description}` ### Test Validated SkyPilot YAML configurations for Ray cluster initialization, dataset downloading, and distributed training setup with H100 GPUs. ### API and Usage Example ```bash # Launch PPO training on 2 nodes sky launch -c verl-ppo examples/skypilot/verl-ppo.yaml --secret WANDB_API_KEY -y # Launch GRPO training sky launch -c verl-grpo examples/skypilot/verl-grpo.yaml --secret WANDB_API_KEY -y # Launch multi-turn tool usage training sky launch -c verl-multiturn examples/skypilot/verl-multiturn-tools.yaml --secret WANDB_API_KEY --secret HF_TOKEN -y ``` Design & Code Changes - Added 3 SkyPilot YAML configurations for PPO, GRPO, and multi-turn training - Added `examples/skypilot/README.md` with setup guide - Added `docs/examples/skypilot_examples.rst` documentation - Updated `docs/index.rst` and `docs/start/multinode.rst` with references ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
This commit is contained in:
144
docs/examples/skypilot_examples.rst
Normal file
144
docs/examples/skypilot_examples.rst
Normal file
@ -0,0 +1,144 @@
|
||||
SkyPilot Examples
|
||||
=================
|
||||
|
||||
This guide provides examples of running VERL reinforcement learning training on Kubernetes clusters or cloud platforms with GPU nodes using `SkyPilot <https://github.com/skypilot-org/skypilot>`_.
|
||||
|
||||
Installation and Configuration
|
||||
-------------------------------
|
||||
|
||||
Step 1: Install SkyPilot
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Choose the installation based on your target platform:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# For Kubernetes only
|
||||
pip install "skypilot[kubernetes]"
|
||||
|
||||
# For AWS
|
||||
pip install "skypilot[aws]"
|
||||
|
||||
# For Google Cloud Platform
|
||||
pip install "skypilot[gcp]"
|
||||
|
||||
# For Azure
|
||||
pip install "skypilot[azure]"
|
||||
|
||||
# For multiple platforms
|
||||
pip install "skypilot[kubernetes,aws,gcp,azure]"
|
||||
|
||||
Step 2: Configure Your Platform
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
See https://docs.skypilot.co/en/latest/getting-started/installation.html
|
||||
|
||||
Step 3: Set Up Environment Variables
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Export necessary API keys for experiment tracking:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# For Weights & Biases tracking
|
||||
export WANDB_API_KEY="your-wandb-api-key"
|
||||
|
||||
# For HuggingFace gated models (if needed)
|
||||
export HF_TOKEN="your-huggingface-token"
|
||||
|
||||
Examples
|
||||
--------
|
||||
|
||||
All example configurations are available in the `examples/skypilot/ <https://github.com/volcengine/verl/tree/main/examples/skypilot>`_ directory on GitHub. See the `README <https://github.com/volcengine/verl/blob/main/examples/skypilot/README.md>`_ for additional details.
|
||||
|
||||
PPO Training
|
||||
~~~~~~~~~~~~
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
sky launch -c verl-ppo verl-ppo.yaml --secret WANDB_API_KEY -y
|
||||
|
||||
Runs PPO training on GSM8K dataset using Qwen2.5-0.5B-Instruct model across 2 nodes with H100 GPUs. Based on examples in ``examples/ppo_trainer/``.
|
||||
|
||||
`View verl-ppo.yaml on GitHub <https://github.com/volcengine/verl/blob/main/examples/skypilot/verl-ppo.yaml>`_
|
||||
|
||||
GRPO Training
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
sky launch -c verl-grpo verl-grpo.yaml --secret WANDB_API_KEY -y
|
||||
|
||||
Runs GRPO (Group Relative Policy Optimization) training on MATH dataset using Qwen2.5-7B-Instruct model. Memory-optimized configuration for 2 nodes. Based on examples in ``examples/grpo_trainer/``.
|
||||
|
||||
`View verl-grpo.yaml on GitHub <https://github.com/volcengine/verl/blob/main/examples/skypilot/verl-grpo.yaml>`_
|
||||
|
||||
Multi-turn Tool Usage Training
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
sky launch -c verl-multiturn verl-multiturn-tools.yaml \
|
||||
--secret WANDB_API_KEY --secret HF_TOKEN -y
|
||||
|
||||
Single-node training with 8xH100 GPUs for multi-turn tool usage with Qwen2.5-3B-Instruct. Includes tool and interaction configurations for GSM8K. Based on examples in ``examples/sglang_multiturn/`` but uses vLLM instead of sglang.
|
||||
|
||||
`View verl-multiturn-tools.yaml on GitHub <https://github.com/volcengine/verl/blob/main/examples/skypilot/verl-multiturn-tools.yaml>`_
|
||||
|
||||
Configuration
|
||||
-------------
|
||||
|
||||
The example YAML files are pre-configured with:
|
||||
|
||||
- **Infrastructure**: Kubernetes clusters (``infra: k8s``) - can be changed to ``infra: aws`` or ``infra: gcp``, etc.
|
||||
- **Docker Image**: VERL's official Docker image with CUDA 12.6 support
|
||||
- **Setup**: Automatically clones and installs VERL from source
|
||||
- **Datasets**: Downloads required datasets during setup phase
|
||||
- **Ray Cluster**: Configures distributed training across nodes
|
||||
- **Logging**: Supports Weights & Biases via ``--secret WANDB_API_KEY``
|
||||
- **Models**: Supports gated HuggingFace models via ``--secret HF_TOKEN``
|
||||
|
||||
Launch Command Options
|
||||
----------------------
|
||||
|
||||
- ``-c <name>``: Cluster name for managing the job
|
||||
- ``--secret KEY``: Pass secrets for API keys (can be used multiple times)
|
||||
- ``-y``: Skip confirmation prompt
|
||||
|
||||
Monitoring Your Jobs
|
||||
--------------------
|
||||
|
||||
Check Cluster Status
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
sky status
|
||||
|
||||
View Logs
|
||||
~~~~~~~~~
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
sky logs verl-ppo # View logs for the PPO job
|
||||
|
||||
SSH into Head Node
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
sky ssh verl-ppo
|
||||
|
||||
Access Ray Dashboard
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
sky status --endpoint 8265 verl-ppo # Get dashboard URL
|
||||
|
||||
Stop a Cluster
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
sky down verl-ppo
|
@ -62,6 +62,7 @@ verl is fast with:
|
||||
examples/ppo_code_architecture
|
||||
examples/gsm8k_example
|
||||
examples/multi_modal_example
|
||||
examples/skypilot_examples
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
@ -69,6 +69,15 @@ Submit job to ray cluster
|
||||
Option 2: Launch via SkyPilot on Kubernetes or clouds
|
||||
------------------------------------------------------
|
||||
|
||||
.. note::
|
||||
Ready-to-use SkyPilot example configurations are available in the `examples/skypilot/ <https://github.com/volcengine/verl/tree/main/examples/skypilot>`_ directory:
|
||||
|
||||
- ``verl-ppo.yaml`` - PPO training with GSM8K dataset
|
||||
- ``verl-grpo.yaml`` - GRPO training with MATH dataset
|
||||
- ``verl-multiturn-tools.yaml`` - Multi-turn tool usage training
|
||||
|
||||
See the `SkyPilot examples README <https://github.com/volcengine/verl/tree/main/examples/skypilot>`_ for detailed usage instructions.
|
||||
|
||||
Step 1: Setup SkyPilot
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
SkyPilot can support different clouds, here we use GCP as example. `install skypilot <https://docs.skypilot.co/en/latest/getting-started/installation.html>`_
|
||||
|
107
examples/skypilot/README.md
Normal file
107
examples/skypilot/README.md
Normal file
@ -0,0 +1,107 @@
|
||||
# verl with SkyPilot
|
||||
|
||||
Run verl reinforcement learning training jobs on Kubernetes clusters or cloud platforms with GPU nodes using [SkyPilot](https://github.com/skypilot-org/skypilot).
|
||||
|
||||
## Installation and Configuration
|
||||
|
||||
### Step 1: Install SkyPilot
|
||||
|
||||
Choose the installation based on your target platform:
|
||||
|
||||
```bash
|
||||
# For Kubernetes only
|
||||
pip install "skypilot[kubernetes]"
|
||||
|
||||
# For AWS
|
||||
pip install "skypilot[aws]"
|
||||
|
||||
# For Google Cloud Platform
|
||||
pip install "skypilot[gcp]"
|
||||
|
||||
# For Azure
|
||||
pip install "skypilot[azure]"
|
||||
|
||||
# For multiple platforms
|
||||
pip install "skypilot[kubernetes,aws,gcp,azure]"
|
||||
```
|
||||
|
||||
### Step 2: Configure Your Platform
|
||||
|
||||
See https://docs.skypilot.co/en/latest/getting-started/installation.html
|
||||
|
||||
### Step 3: Set Up Environment Variables
|
||||
|
||||
Export necessary API keys for experiment tracking:
|
||||
|
||||
```bash
|
||||
# For Weights & Biases tracking
|
||||
export WANDB_API_KEY="your-wandb-api-key"
|
||||
|
||||
# For HuggingFace gated models (if needed)
|
||||
export HF_TOKEN="your-huggingface-token"
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
### PPO Training
|
||||
```bash
|
||||
sky launch -c verl-ppo verl-ppo.yaml --secret WANDB_API_KEY -y
|
||||
```
|
||||
Runs PPO training on GSM8K dataset using Qwen2.5-0.5B-Instruct model across 2 nodes with H100 GPUs. Based on examples in [`../ppo_trainer/`](../ppo_trainer/).
|
||||
|
||||
### GRPO Training
|
||||
```bash
|
||||
sky launch -c verl-grpo verl-grpo.yaml --secret WANDB_API_KEY -y
|
||||
```
|
||||
Runs GRPO (Group Relative Policy Optimization) training on MATH dataset using Qwen2.5-7B-Instruct model. Memory-optimized configuration for 2 nodes. Based on examples in [`../grpo_trainer/`](../grpo_trainer/).
|
||||
|
||||
### Multi-turn Tool Usage Training
|
||||
```bash
|
||||
sky launch -c verl-multiturn verl-multiturn-tools.yaml --secret WANDB_API_KEY --secret HF_TOKEN -y
|
||||
```
|
||||
Single-node training with 8xH100 GPUs for multi-turn tool usage with Qwen2.5-3B-Instruct. Includes tool and interaction configurations for GSM8K. Based on examples in [`../sglang_multiturn/`](../sglang_multiturn/) but uses vLLM instead of sglang.
|
||||
|
||||
## Configuration
|
||||
|
||||
The example YAML files are pre-configured with:
|
||||
|
||||
- **Infrastructure**: Kubernetes clusters (`infra: k8s`) - can be changed to `infra: aws` or `infra: gcp`, etc.
|
||||
- **Docker Image**: verl's official Docker image with CUDA 12.6 support
|
||||
- **Setup**: Automatically clones and installs verl from source
|
||||
- **Datasets**: Downloads required datasets during setup phase
|
||||
- **Ray Cluster**: Configures distributed training across nodes
|
||||
- **Logging**: Supports Weights & Biases via `--secret WANDB_API_KEY`
|
||||
- **Models**: Supports gated HuggingFace models via `--secret HF_TOKEN`
|
||||
|
||||
## Launch Command Options
|
||||
|
||||
- `-c <name>`: Cluster name for managing the job
|
||||
- `--secret KEY`: Pass secrets for API keys (can be used multiple times)
|
||||
- `-y`: Skip confirmation prompt
|
||||
|
||||
## Monitoring Your Jobs
|
||||
|
||||
### Check cluster status
|
||||
```bash
|
||||
sky status
|
||||
```
|
||||
|
||||
### View logs
|
||||
```bash
|
||||
sky logs verl-ppo # View logs for the PPO job
|
||||
```
|
||||
|
||||
### SSH into head node
|
||||
```bash
|
||||
ssh verl-ppo
|
||||
```
|
||||
|
||||
### Access Ray dashboard
|
||||
```bash
|
||||
sky status --endpoint 8265 verl-ppo # Get dashboard URL
|
||||
```
|
||||
|
||||
### Stop a cluster
|
||||
```bash
|
||||
sky down verl-ppo
|
||||
```
|
99
examples/skypilot/verl-grpo.yaml
Normal file
99
examples/skypilot/verl-grpo.yaml
Normal file
@ -0,0 +1,99 @@
|
||||
resources:
|
||||
infra: k8s
|
||||
accelerators: H100:1
|
||||
memory: 128+
|
||||
image_id: docker:verlai/verl:base-verl0.5-cu126-cudnn9.8-torch2.7.0-fa2.7.4
|
||||
ports: 8265
|
||||
|
||||
num_nodes: 2
|
||||
|
||||
secrets:
|
||||
WANDB_API_KEY:
|
||||
|
||||
setup: |
|
||||
rm -rf verl
|
||||
git clone https://github.com/volcengine/verl.git
|
||||
cd verl
|
||||
pip3 install -v -e .[vllm]
|
||||
pip3 install flashinfer-python
|
||||
echo "Downloading Math dataset..."
|
||||
mkdir -p ~/data/math
|
||||
python3 "$(pwd)/examples/data_preprocess/math_dataset.py" --local_dir ~/data/math
|
||||
echo "Math dataset download completed"
|
||||
|
||||
run: |
|
||||
HEAD_IP=$(echo "$SKYPILOT_NODE_IPS" | head -n1)
|
||||
NUM_NODES=$SKYPILOT_NUM_NODES
|
||||
NUM_GPUS_PER_NODE=$SKYPILOT_NUM_GPUS_PER_NODE
|
||||
|
||||
if [ "$SKYPILOT_NODE_RANK" == "0" ]; then
|
||||
echo "Starting Ray head node..."
|
||||
ps aux | grep ray | grep 6379 &> /dev/null || ray start --head --disable-usage-stats \
|
||||
--port=6379 \
|
||||
--dashboard-host=0.0.0.0 \
|
||||
--dashboard-port=8265
|
||||
|
||||
# Wait for all worker nodes to join
|
||||
retry_count=0
|
||||
max_retries=30
|
||||
while [ $retry_count -lt $max_retries ]; do
|
||||
connected_nodes=$(ray status 2>/dev/null | grep -c "node_" || echo "0")
|
||||
echo "Connected nodes: $connected_nodes/$NUM_NODES (attempt $((retry_count+1))/$max_retries)"
|
||||
|
||||
if [ "$connected_nodes" -ge "$NUM_NODES" ]; then
|
||||
echo "All nodes connected to Ray cluster"
|
||||
break
|
||||
fi
|
||||
|
||||
retry_count=$((retry_count+1))
|
||||
sleep 10
|
||||
done
|
||||
|
||||
python3 -m verl.trainer.main_ppo \
|
||||
algorithm.adv_estimator=grpo \
|
||||
data.train_files=$HOME/data/math/train.parquet \
|
||||
data.val_files=$HOME/data/math/test.parquet \
|
||||
data.train_batch_size=32 \
|
||||
data.max_prompt_length=256 \
|
||||
data.max_response_length=256 \
|
||||
data.filter_overlong_prompts=True \
|
||||
data.truncation='error' \
|
||||
actor_rollout_ref.model.path=Qwen/Qwen2.5-7B-Instruct \
|
||||
actor_rollout_ref.actor.optim.lr=1e-6 \
|
||||
actor_rollout_ref.model.use_remove_padding=True \
|
||||
actor_rollout_ref.actor.ppo_mini_batch_size=16 \
|
||||
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \
|
||||
actor_rollout_ref.actor.ppo_epochs=1 \
|
||||
actor_rollout_ref.actor.use_kl_loss=False \
|
||||
actor_rollout_ref.actor.entropy_coeff=0 \
|
||||
actor_rollout_ref.model.enable_gradient_checkpointing=True \
|
||||
actor_rollout_ref.actor.fsdp_config.param_offload=True \
|
||||
actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \
|
||||
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=16 \
|
||||
actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
|
||||
actor_rollout_ref.rollout.name=vllm \
|
||||
actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \
|
||||
actor_rollout_ref.rollout.n=1 \
|
||||
actor_rollout_ref.rollout.enable_chunked_prefill=True \
|
||||
actor_rollout_ref.rollout.max_num_batched_tokens=2048 \
|
||||
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=16 \
|
||||
actor_rollout_ref.ref.fsdp_config.param_offload=True \
|
||||
algorithm.use_kl_in_reward=False \
|
||||
trainer.critic_warmup=0 \
|
||||
trainer.logger=[console,wandb] \
|
||||
trainer.project_name=verl_math_grpo_demo \
|
||||
trainer.experiment_name=qwen25_7b_grpo \
|
||||
trainer.n_gpus_per_node=$NUM_GPUS_PER_NODE \
|
||||
trainer.nnodes=$NUM_NODES \
|
||||
trainer.save_freq=-1 \
|
||||
trainer.test_freq=-1 \
|
||||
trainer.total_epochs=1
|
||||
|
||||
else
|
||||
sleep 15
|
||||
echo "Starting Ray worker node..."
|
||||
ps aux | grep ray | grep $HEAD_IP:6379 &> /dev/null || ray start --address $HEAD_IP:6379 --disable-usage-stats
|
||||
sleep 10
|
||||
fi
|
||||
|
||||
echo "Node setup and Ray start script finished for rank $SKYPILOT_NODE_RANK."
|
91
examples/skypilot/verl-multiturn-tools.yaml
Normal file
91
examples/skypilot/verl-multiturn-tools.yaml
Normal file
@ -0,0 +1,91 @@
|
||||
resources:
|
||||
infra: k8s
|
||||
accelerators: H100:8
|
||||
memory: 128+
|
||||
image_id: docker:verlai/verl:base-verl0.5-cu126-cudnn9.8-torch2.7.0-fa2.7.4
|
||||
ports: 8265
|
||||
|
||||
num_nodes: 1
|
||||
|
||||
secrets:
|
||||
WANDB_API_KEY:
|
||||
HF_TOKEN: # in case you're using gated models from the HF hub
|
||||
|
||||
setup: |
|
||||
rm -rf verl
|
||||
git clone https://github.com/volcengine/verl.git
|
||||
cd verl
|
||||
pip3 install -v -e .[vllm]
|
||||
pip3 install flashinfer-python
|
||||
pip install "transformers<4.54.0" # https://github.com/vllm-project/vllm-ascend/issues/2046
|
||||
# Download GSM8K dataset for multiturn tool training
|
||||
echo "Downloading GSM8K dataset..."
|
||||
mkdir -p ~/data/gsm8k
|
||||
python3 "$(pwd)/examples/data_preprocess/gsm8k.py" --local_dir ~/data/gsm8k
|
||||
echo "GSM8K dataset download completed"
|
||||
|
||||
run: |
|
||||
NUM_GPUS_PER_NODE=$SKYPILOT_NUM_GPUS_PER_NODE
|
||||
PROJECT_DIR="$(pwd)/verl"
|
||||
CONFIG_PATH="$PROJECT_DIR/examples/sglang_multiturn/config"
|
||||
|
||||
# Single node setup - no worker coordination needed
|
||||
echo "Starting Ray head node..."
|
||||
ps aux | grep ray | grep 6379 &> /dev/null || ray start --head --disable-usage-stats \
|
||||
--port=6379 \
|
||||
--dashboard-host=0.0.0.0 \
|
||||
--dashboard-port=8265
|
||||
|
||||
cd verl
|
||||
|
||||
python3 -m verl.trainer.main_ppo \
|
||||
--config-path="$CONFIG_PATH" \
|
||||
--config-name='gsm8k_multiturn_grpo' \
|
||||
algorithm.adv_estimator=grpo \
|
||||
data.train_batch_size=512 \
|
||||
data.max_prompt_length=1024 \
|
||||
data.max_response_length=1024 \
|
||||
data.filter_overlong_prompts=True \
|
||||
data.truncation='error' \
|
||||
data.return_raw_chat=True \
|
||||
data.train_files=$HOME/data/gsm8k/train.parquet \
|
||||
data.val_files=$HOME/data/gsm8k/test.parquet \
|
||||
actor_rollout_ref.model.path=Qwen/Qwen2.5-3B-Instruct \
|
||||
actor_rollout_ref.actor.optim.lr=1e-6 \
|
||||
actor_rollout_ref.model.use_remove_padding=True \
|
||||
actor_rollout_ref.actor.ppo_mini_batch_size=512 \
|
||||
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=32 \
|
||||
actor_rollout_ref.actor.use_kl_loss=True \
|
||||
actor_rollout_ref.actor.kl_loss_coef=0.001 \
|
||||
actor_rollout_ref.actor.kl_loss_type=low_var_kl \
|
||||
actor_rollout_ref.actor.entropy_coeff=0 \
|
||||
actor_rollout_ref.model.enable_gradient_checkpointing=True \
|
||||
actor_rollout_ref.actor.fsdp_config.param_offload=False \
|
||||
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
|
||||
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=64 \
|
||||
actor_rollout_ref.rollout.tensor_model_parallel_size=4 \
|
||||
actor_rollout_ref.rollout.name=vllm \
|
||||
actor_rollout_ref.rollout.gpu_memory_utilization=0.5 \
|
||||
actor_rollout_ref.rollout.n=16 \
|
||||
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=64 \
|
||||
actor_rollout_ref.ref.fsdp_config.param_offload=True \
|
||||
algorithm.use_kl_in_reward=False \
|
||||
trainer.critic_warmup=0 \
|
||||
trainer.logger=[console,wandb] \
|
||||
trainer.project_name=verl_multiturn_tools \
|
||||
trainer.experiment_name=qwen25_7b_gsm8k_multiturn_tools \
|
||||
trainer.n_gpus_per_node=$NUM_GPUS_PER_NODE \
|
||||
trainer.nnodes=1 \
|
||||
trainer.save_freq=10 \
|
||||
trainer.test_freq=5 \
|
||||
trainer.total_epochs=10 \
|
||||
actor_rollout_ref.actor.ppo_max_token_len_per_gpu=8192 \
|
||||
actor_rollout_ref.rollout.log_prob_max_token_len_per_gpu=8192 \
|
||||
actor_rollout_ref.ref.log_prob_max_token_len_per_gpu=8192 \
|
||||
critic.ppo_max_token_len_per_gpu=8192 \
|
||||
critic.forward_max_token_len_per_gpu=8192 \
|
||||
actor_rollout_ref.rollout.multi_turn.tool_config_path="$PROJECT_DIR/examples/sglang_multiturn/config/tool_config/gsm8k_tool_config.yaml" \
|
||||
actor_rollout_ref.rollout.multi_turn.interaction_config_path="$PROJECT_DIR/examples/sglang_multiturn/config/interaction_config/gsm8k_interaction_config.yaml" \
|
||||
actor_rollout_ref.rollout.multi_turn.max_user_turns=1
|
||||
|
||||
echo "Node setup and Ray start script finished for rank $SKYPILOT_NODE_RANK."
|
109
examples/skypilot/verl-ppo.yaml
Normal file
109
examples/skypilot/verl-ppo.yaml
Normal file
@ -0,0 +1,109 @@
|
||||
resources:
|
||||
infra: k8s
|
||||
accelerators: H100:1
|
||||
memory: 128+
|
||||
image_id: docker:verlai/verl:base-verl0.5-cu126-cudnn9.8-torch2.7.0-fa2.7.4
|
||||
ports: 8265
|
||||
|
||||
num_nodes: 2
|
||||
|
||||
secrets:
|
||||
WANDB_API_KEY:
|
||||
|
||||
setup: |
|
||||
rm -rf verl
|
||||
git clone https://github.com/volcengine/verl.git
|
||||
cd verl
|
||||
pip3 install -v -e .[vllm]
|
||||
pip3 install flashinfer-python
|
||||
# Download GSM8K dataset - alternative approach
|
||||
echo "Downloading GSM8K dataset..."
|
||||
mkdir -p ~/data/gsm8k
|
||||
# Check if the script exists and use absolute path
|
||||
if [ -f "$(pwd)/examples/data_preprocess/gsm8k.py" ]; then
|
||||
python3 "$(pwd)/examples/data_preprocess/gsm8k.py" --local_dir ~/data/gsm8k
|
||||
else
|
||||
echo "Warning: gsm8k.py script not found, skipping dataset download"
|
||||
# You might want to download the dataset manually or use a different approach
|
||||
fi
|
||||
echo "GSM8K dataset download completed"
|
||||
|
||||
run: |
|
||||
# Get the Head node's IP and total number of nodes
|
||||
HEAD_IP=$(echo "$SKYPILOT_NODE_IPS" | head -n1)
|
||||
NUM_NODES=$SKYPILOT_NUM_NODES
|
||||
|
||||
# login wandb
|
||||
# python3 -c "import wandb; wandb.login(relogin=True, key='$WANDB_API_KEY')"
|
||||
|
||||
if [ "$SKYPILOT_NODE_RANK" == "0" ]; then
|
||||
# Head node starts Ray Head
|
||||
echo "Starting Ray head node..."
|
||||
ps aux | grep ray | grep 6379 &> /dev/null || ray start --head --disable-usage-stats \
|
||||
--port=6379 \
|
||||
--dashboard-host=0.0.0.0 \
|
||||
--dashboard-port=8265
|
||||
|
||||
# Wait for all worker nodes to join the cluster with better checking
|
||||
echo "Waiting for all nodes to join Ray cluster..."
|
||||
retry_count=0
|
||||
max_retries=30
|
||||
while [ $retry_count -lt $max_retries ]; do
|
||||
connected_nodes=$(ray status 2>/dev/null | grep -c "node_" || echo "0")
|
||||
echo "Connected nodes: $connected_nodes/$NUM_NODES (attempt $((retry_count+1))/$max_retries)"
|
||||
|
||||
if [ "$connected_nodes" -ge "$NUM_NODES" ]; then
|
||||
echo "All nodes connected to Ray cluster"
|
||||
break
|
||||
fi
|
||||
|
||||
retry_count=$((retry_count+1))
|
||||
sleep 10
|
||||
done
|
||||
|
||||
if [ $retry_count -eq $max_retries ]; then
|
||||
echo "WARNING: Not all nodes connected to Ray cluster after $max_retries attempts"
|
||||
echo "Current Ray status:"
|
||||
ray status
|
||||
fi
|
||||
|
||||
python3 -m verl.trainer.main_ppo \
|
||||
data.train_files=$HOME/data/gsm8k/train.parquet \
|
||||
data.val_files=$HOME/data/gsm8k/test.parquet \
|
||||
data.train_batch_size=256 \
|
||||
data.max_prompt_length=512 \
|
||||
data.max_response_length=256 \
|
||||
actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \
|
||||
actor_rollout_ref.actor.optim.lr=1e-6 \
|
||||
actor_rollout_ref.actor.ppo_mini_batch_size=64 \
|
||||
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \
|
||||
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \
|
||||
actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
|
||||
actor_rollout_ref.rollout.name=vllm \
|
||||
actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \
|
||||
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \
|
||||
critic.optim.lr=1e-5 \
|
||||
critic.model.path=Qwen/Qwen2.5-0.5B-Instruct \
|
||||
critic.ppo_micro_batch_size_per_gpu=4 \
|
||||
algorithm.kl_ctrl.kl_coef=0.001 \
|
||||
trainer.logger=[console,wandb] \
|
||||
trainer.val_before_train=False \
|
||||
trainer.default_hdfs_dir=null \
|
||||
trainer.n_gpus_per_node=1 \
|
||||
trainer.nnodes=2 \
|
||||
trainer.save_freq=20 \
|
||||
trainer.test_freq=20 \
|
||||
trainer.total_epochs=2 \
|
||||
trainer.project_name=verl_examples \
|
||||
trainer.experiment_name=experiment_name_gsm8k
|
||||
|
||||
else
|
||||
# Wait for Ray Head to start
|
||||
sleep 15
|
||||
# Worker node starts Ray Worker
|
||||
echo "Starting Ray worker node..."
|
||||
ps aux | grep ray | grep $HEAD_IP:6379 &> /dev/null || ray start --address $HEAD_IP:6379 --disable-usage-stats
|
||||
sleep 10
|
||||
fi
|
||||
|
||||
echo "Node setup and Ray start script finished for rank $SKYPILOT_NODE_RANK."
|
Reference in New Issue
Block a user