### What does this PR do? Adds SkyPilot integration examples for running verl training jobs on Kubernetes/cloud platforms with GPUs. Includes configurations for PPO, GRPO, and multi-turn tool usage training. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/pulls?q=is%3Apr+skypilot - [x] Format the PR title as `[{modules}] {type}: {description}` ### Test Validated SkyPilot YAML configurations for Ray cluster initialization, dataset downloading, and distributed training setup with H100 GPUs. ### API and Usage Example ```bash # Launch PPO training on 2 nodes sky launch -c verl-ppo examples/skypilot/verl-ppo.yaml --secret WANDB_API_KEY -y # Launch GRPO training sky launch -c verl-grpo examples/skypilot/verl-grpo.yaml --secret WANDB_API_KEY -y # Launch multi-turn tool usage training sky launch -c verl-multiturn examples/skypilot/verl-multiturn-tools.yaml --secret WANDB_API_KEY --secret HF_TOKEN -y ``` Design & Code Changes - Added 3 SkyPilot YAML configurations for PPO, GRPO, and multi-turn training - Added `examples/skypilot/README.md` with setup guide - Added `docs/examples/skypilot_examples.rst` documentation - Updated `docs/index.rst` and `docs/start/multinode.rst` with references ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
verl with SkyPilot
Run verl reinforcement learning training jobs on Kubernetes clusters or cloud platforms with GPU nodes using SkyPilot.
Installation and Configuration
Step 1: Install SkyPilot
Choose the installation based on your target platform:
# For Kubernetes only
pip install "skypilot[kubernetes]"
# For AWS
pip install "skypilot[aws]"
# For Google Cloud Platform
pip install "skypilot[gcp]"
# For Azure
pip install "skypilot[azure]"
# For multiple platforms
pip install "skypilot[kubernetes,aws,gcp,azure]"
Step 2: Configure Your Platform
See https://docs.skypilot.co/en/latest/getting-started/installation.html
Step 3: Set Up Environment Variables
Export necessary API keys for experiment tracking:
# For Weights & Biases tracking
export WANDB_API_KEY="your-wandb-api-key"
# For HuggingFace gated models (if needed)
export HF_TOKEN="your-huggingface-token"
Examples
PPO Training
sky launch -c verl-ppo verl-ppo.yaml --secret WANDB_API_KEY -y
Runs PPO training on GSM8K dataset using Qwen2.5-0.5B-Instruct model across 2 nodes with H100 GPUs. Based on examples in ../ppo_trainer/
.
GRPO Training
sky launch -c verl-grpo verl-grpo.yaml --secret WANDB_API_KEY -y
Runs GRPO (Group Relative Policy Optimization) training on MATH dataset using Qwen2.5-7B-Instruct model. Memory-optimized configuration for 2 nodes. Based on examples in ../grpo_trainer/
.
Multi-turn Tool Usage Training
sky launch -c verl-multiturn verl-multiturn-tools.yaml --secret WANDB_API_KEY --secret HF_TOKEN -y
Single-node training with 8xH100 GPUs for multi-turn tool usage with Qwen2.5-3B-Instruct. Includes tool and interaction configurations for GSM8K. Based on examples in ../sglang_multiturn/
but uses vLLM instead of sglang.
Configuration
The example YAML files are pre-configured with:
- Infrastructure: Kubernetes clusters (
infra: k8s
) - can be changed toinfra: aws
orinfra: gcp
, etc. - Docker Image: verl's official Docker image with CUDA 12.6 support
- Setup: Automatically clones and installs verl from source
- Datasets: Downloads required datasets during setup phase
- Ray Cluster: Configures distributed training across nodes
- Logging: Supports Weights & Biases via
--secret WANDB_API_KEY
- Models: Supports gated HuggingFace models via
--secret HF_TOKEN
Launch Command Options
-c <name>
: Cluster name for managing the job--secret KEY
: Pass secrets for API keys (can be used multiple times)-y
: Skip confirmation prompt
Monitoring Your Jobs
Check cluster status
sky status
View logs
sky logs verl-ppo # View logs for the PPO job
SSH into head node
ssh verl-ppo
Access Ray dashboard
sky status --endpoint 8265 verl-ppo # Get dashboard URL
Stop a cluster
sky down verl-ppo