Files
verl/examples/skypilot
Alex Kim f356fc1e56 [deployment, doc] feat: Add SkyPilot integration examples (#3333)
### What does this PR do?

Adds SkyPilot integration examples for running verl training jobs
on Kubernetes/cloud platforms with GPUs. Includes configurations
for PPO, GRPO, and multi-turn tool usage training.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
https://github.com/volcengine/verl/pulls?q=is%3Apr+skypilot
- [x] Format the PR title as `[{modules}] {type}: {description}`

### Test

Validated SkyPilot YAML configurations for Ray cluster
initialization, dataset downloading, and distributed training setup with
H100 GPUs.

### API and Usage Example

```bash
# Launch PPO training on 2 nodes
sky launch -c verl-ppo examples/skypilot/verl-ppo.yaml --secret WANDB_API_KEY -y

# Launch GRPO training
sky launch -c verl-grpo examples/skypilot/verl-grpo.yaml --secret WANDB_API_KEY -y

# Launch multi-turn tool usage training
sky launch -c verl-multiturn examples/skypilot/verl-multiturn-tools.yaml --secret WANDB_API_KEY --secret HF_TOKEN -y
```

Design & Code Changes

- Added 3 SkyPilot YAML configurations for PPO, GRPO, and
multi-turn training
- Added `examples/skypilot/README.md` with setup guide
- Added `docs/examples/skypilot_examples.rst` documentation
- Updated `docs/index.rst` and `docs/start/multinode.rst` with
references

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-04 16:56:00 +08:00
..

verl with SkyPilot

Run verl reinforcement learning training jobs on Kubernetes clusters or cloud platforms with GPU nodes using SkyPilot.

Installation and Configuration

Step 1: Install SkyPilot

Choose the installation based on your target platform:

# For Kubernetes only
pip install "skypilot[kubernetes]"

# For AWS
pip install "skypilot[aws]"

# For Google Cloud Platform
pip install "skypilot[gcp]"

# For Azure
pip install "skypilot[azure]"

# For multiple platforms
pip install "skypilot[kubernetes,aws,gcp,azure]"

Step 2: Configure Your Platform

See https://docs.skypilot.co/en/latest/getting-started/installation.html

Step 3: Set Up Environment Variables

Export necessary API keys for experiment tracking:

# For Weights & Biases tracking
export WANDB_API_KEY="your-wandb-api-key"

# For HuggingFace gated models (if needed)
export HF_TOKEN="your-huggingface-token"

Examples

PPO Training

sky launch -c verl-ppo verl-ppo.yaml --secret WANDB_API_KEY -y

Runs PPO training on GSM8K dataset using Qwen2.5-0.5B-Instruct model across 2 nodes with H100 GPUs. Based on examples in ../ppo_trainer/.

GRPO Training

sky launch -c verl-grpo verl-grpo.yaml --secret WANDB_API_KEY -y

Runs GRPO (Group Relative Policy Optimization) training on MATH dataset using Qwen2.5-7B-Instruct model. Memory-optimized configuration for 2 nodes. Based on examples in ../grpo_trainer/.

Multi-turn Tool Usage Training

sky launch -c verl-multiturn verl-multiturn-tools.yaml --secret WANDB_API_KEY --secret HF_TOKEN -y

Single-node training with 8xH100 GPUs for multi-turn tool usage with Qwen2.5-3B-Instruct. Includes tool and interaction configurations for GSM8K. Based on examples in ../sglang_multiturn/ but uses vLLM instead of sglang.

Configuration

The example YAML files are pre-configured with:

  • Infrastructure: Kubernetes clusters (infra: k8s) - can be changed to infra: aws or infra: gcp, etc.
  • Docker Image: verl's official Docker image with CUDA 12.6 support
  • Setup: Automatically clones and installs verl from source
  • Datasets: Downloads required datasets during setup phase
  • Ray Cluster: Configures distributed training across nodes
  • Logging: Supports Weights & Biases via --secret WANDB_API_KEY
  • Models: Supports gated HuggingFace models via --secret HF_TOKEN

Launch Command Options

  • -c <name>: Cluster name for managing the job
  • --secret KEY: Pass secrets for API keys (can be used multiple times)
  • -y: Skip confirmation prompt

Monitoring Your Jobs

Check cluster status

sky status

View logs

sky logs verl-ppo  # View logs for the PPO job

SSH into head node

ssh verl-ppo

Access Ray dashboard

sky status --endpoint 8265 verl-ppo  # Get dashboard URL

Stop a cluster

sky down verl-ppo