[deployment, doc] feat: Add SkyPilot integration examples (#3333)

### What does this PR do?

Adds SkyPilot integration examples for running verl training jobs
on Kubernetes/cloud platforms with GPUs. Includes configurations
for PPO, GRPO, and multi-turn tool usage training.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
https://github.com/volcengine/verl/pulls?q=is%3Apr+skypilot
- [x] Format the PR title as `[{modules}] {type}: {description}`

### Test

Validated SkyPilot YAML configurations for Ray cluster
initialization, dataset downloading, and distributed training setup with
H100 GPUs.

### API and Usage Example

```bash
# Launch PPO training on 2 nodes
sky launch -c verl-ppo examples/skypilot/verl-ppo.yaml --secret WANDB_API_KEY -y

# Launch GRPO training
sky launch -c verl-grpo examples/skypilot/verl-grpo.yaml --secret WANDB_API_KEY -y

# Launch multi-turn tool usage training
sky launch -c verl-multiturn examples/skypilot/verl-multiturn-tools.yaml --secret WANDB_API_KEY --secret HF_TOKEN -y
```

Design & Code Changes

- Added 3 SkyPilot YAML configurations for PPO, GRPO, and
multi-turn training
- Added `examples/skypilot/README.md` with setup guide
- Added `docs/examples/skypilot_examples.rst` documentation
- Updated `docs/index.rst` and `docs/start/multinode.rst` with
references

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
This commit is contained in:
Alex Kim
2025-09-04 04:56:00 -04:00
committed by GitHub
parent 4d45c12408
commit f356fc1e56
7 changed files with 560 additions and 0 deletions

View File

@ -0,0 +1,144 @@
SkyPilot Examples
=================
This guide provides examples of running VERL reinforcement learning training on Kubernetes clusters or cloud platforms with GPU nodes using `SkyPilot <https://github.com/skypilot-org/skypilot>`_.
Installation and Configuration
-------------------------------
Step 1: Install SkyPilot
~~~~~~~~~~~~~~~~~~~~~~~~~
Choose the installation based on your target platform:
.. code-block:: bash
# For Kubernetes only
pip install "skypilot[kubernetes]"
# For AWS
pip install "skypilot[aws]"
# For Google Cloud Platform
pip install "skypilot[gcp]"
# For Azure
pip install "skypilot[azure]"
# For multiple platforms
pip install "skypilot[kubernetes,aws,gcp,azure]"
Step 2: Configure Your Platform
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
See https://docs.skypilot.co/en/latest/getting-started/installation.html
Step 3: Set Up Environment Variables
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Export necessary API keys for experiment tracking:
.. code-block:: bash
# For Weights & Biases tracking
export WANDB_API_KEY="your-wandb-api-key"
# For HuggingFace gated models (if needed)
export HF_TOKEN="your-huggingface-token"
Examples
--------
All example configurations are available in the `examples/skypilot/ <https://github.com/volcengine/verl/tree/main/examples/skypilot>`_ directory on GitHub. See the `README <https://github.com/volcengine/verl/blob/main/examples/skypilot/README.md>`_ for additional details.
PPO Training
~~~~~~~~~~~~
.. code-block:: bash
sky launch -c verl-ppo verl-ppo.yaml --secret WANDB_API_KEY -y
Runs PPO training on GSM8K dataset using Qwen2.5-0.5B-Instruct model across 2 nodes with H100 GPUs. Based on examples in ``examples/ppo_trainer/``.
`View verl-ppo.yaml on GitHub <https://github.com/volcengine/verl/blob/main/examples/skypilot/verl-ppo.yaml>`_
GRPO Training
~~~~~~~~~~~~~
.. code-block:: bash
sky launch -c verl-grpo verl-grpo.yaml --secret WANDB_API_KEY -y
Runs GRPO (Group Relative Policy Optimization) training on MATH dataset using Qwen2.5-7B-Instruct model. Memory-optimized configuration for 2 nodes. Based on examples in ``examples/grpo_trainer/``.
`View verl-grpo.yaml on GitHub <https://github.com/volcengine/verl/blob/main/examples/skypilot/verl-grpo.yaml>`_
Multi-turn Tool Usage Training
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: bash
sky launch -c verl-multiturn verl-multiturn-tools.yaml \
--secret WANDB_API_KEY --secret HF_TOKEN -y
Single-node training with 8xH100 GPUs for multi-turn tool usage with Qwen2.5-3B-Instruct. Includes tool and interaction configurations for GSM8K. Based on examples in ``examples/sglang_multiturn/`` but uses vLLM instead of sglang.
`View verl-multiturn-tools.yaml on GitHub <https://github.com/volcengine/verl/blob/main/examples/skypilot/verl-multiturn-tools.yaml>`_
Configuration
-------------
The example YAML files are pre-configured with:
- **Infrastructure**: Kubernetes clusters (``infra: k8s``) - can be changed to ``infra: aws`` or ``infra: gcp``, etc.
- **Docker Image**: VERL's official Docker image with CUDA 12.6 support
- **Setup**: Automatically clones and installs VERL from source
- **Datasets**: Downloads required datasets during setup phase
- **Ray Cluster**: Configures distributed training across nodes
- **Logging**: Supports Weights & Biases via ``--secret WANDB_API_KEY``
- **Models**: Supports gated HuggingFace models via ``--secret HF_TOKEN``
Launch Command Options
----------------------
- ``-c <name>``: Cluster name for managing the job
- ``--secret KEY``: Pass secrets for API keys (can be used multiple times)
- ``-y``: Skip confirmation prompt
Monitoring Your Jobs
--------------------
Check Cluster Status
~~~~~~~~~~~~~~~~~~~~
.. code-block:: bash
sky status
View Logs
~~~~~~~~~
.. code-block:: bash
sky logs verl-ppo # View logs for the PPO job
SSH into Head Node
~~~~~~~~~~~~~~~~~~
.. code-block:: bash
sky ssh verl-ppo
Access Ray Dashboard
~~~~~~~~~~~~~~~~~~~~
.. code-block:: bash
sky status --endpoint 8265 verl-ppo # Get dashboard URL
Stop a Cluster
~~~~~~~~~~~~~~
.. code-block:: bash
sky down verl-ppo

View File

@ -62,6 +62,7 @@ verl is fast with:
examples/ppo_code_architecture
examples/gsm8k_example
examples/multi_modal_example
examples/skypilot_examples
.. toctree::
:maxdepth: 1

View File

@ -69,6 +69,15 @@ Submit job to ray cluster
Option 2: Launch via SkyPilot on Kubernetes or clouds
------------------------------------------------------
.. note::
Ready-to-use SkyPilot example configurations are available in the `examples/skypilot/ <https://github.com/volcengine/verl/tree/main/examples/skypilot>`_ directory:
- ``verl-ppo.yaml`` - PPO training with GSM8K dataset
- ``verl-grpo.yaml`` - GRPO training with MATH dataset
- ``verl-multiturn-tools.yaml`` - Multi-turn tool usage training
See the `SkyPilot examples README <https://github.com/volcengine/verl/tree/main/examples/skypilot>`_ for detailed usage instructions.
Step 1: Setup SkyPilot
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SkyPilot can support different clouds, here we use GCP as example. `install skypilot <https://docs.skypilot.co/en/latest/getting-started/installation.html>`_

107
examples/skypilot/README.md Normal file
View File

@ -0,0 +1,107 @@
# verl with SkyPilot
Run verl reinforcement learning training jobs on Kubernetes clusters or cloud platforms with GPU nodes using [SkyPilot](https://github.com/skypilot-org/skypilot).
## Installation and Configuration
### Step 1: Install SkyPilot
Choose the installation based on your target platform:
```bash
# For Kubernetes only
pip install "skypilot[kubernetes]"
# For AWS
pip install "skypilot[aws]"
# For Google Cloud Platform
pip install "skypilot[gcp]"
# For Azure
pip install "skypilot[azure]"
# For multiple platforms
pip install "skypilot[kubernetes,aws,gcp,azure]"
```
### Step 2: Configure Your Platform
See https://docs.skypilot.co/en/latest/getting-started/installation.html
### Step 3: Set Up Environment Variables
Export necessary API keys for experiment tracking:
```bash
# For Weights & Biases tracking
export WANDB_API_KEY="your-wandb-api-key"
# For HuggingFace gated models (if needed)
export HF_TOKEN="your-huggingface-token"
```
## Examples
### PPO Training
```bash
sky launch -c verl-ppo verl-ppo.yaml --secret WANDB_API_KEY -y
```
Runs PPO training on GSM8K dataset using Qwen2.5-0.5B-Instruct model across 2 nodes with H100 GPUs. Based on examples in [`../ppo_trainer/`](../ppo_trainer/).
### GRPO Training
```bash
sky launch -c verl-grpo verl-grpo.yaml --secret WANDB_API_KEY -y
```
Runs GRPO (Group Relative Policy Optimization) training on MATH dataset using Qwen2.5-7B-Instruct model. Memory-optimized configuration for 2 nodes. Based on examples in [`../grpo_trainer/`](../grpo_trainer/).
### Multi-turn Tool Usage Training
```bash
sky launch -c verl-multiturn verl-multiturn-tools.yaml --secret WANDB_API_KEY --secret HF_TOKEN -y
```
Single-node training with 8xH100 GPUs for multi-turn tool usage with Qwen2.5-3B-Instruct. Includes tool and interaction configurations for GSM8K. Based on examples in [`../sglang_multiturn/`](../sglang_multiturn/) but uses vLLM instead of sglang.
## Configuration
The example YAML files are pre-configured with:
- **Infrastructure**: Kubernetes clusters (`infra: k8s`) - can be changed to `infra: aws` or `infra: gcp`, etc.
- **Docker Image**: verl's official Docker image with CUDA 12.6 support
- **Setup**: Automatically clones and installs verl from source
- **Datasets**: Downloads required datasets during setup phase
- **Ray Cluster**: Configures distributed training across nodes
- **Logging**: Supports Weights & Biases via `--secret WANDB_API_KEY`
- **Models**: Supports gated HuggingFace models via `--secret HF_TOKEN`
## Launch Command Options
- `-c <name>`: Cluster name for managing the job
- `--secret KEY`: Pass secrets for API keys (can be used multiple times)
- `-y`: Skip confirmation prompt
## Monitoring Your Jobs
### Check cluster status
```bash
sky status
```
### View logs
```bash
sky logs verl-ppo # View logs for the PPO job
```
### SSH into head node
```bash
ssh verl-ppo
```
### Access Ray dashboard
```bash
sky status --endpoint 8265 verl-ppo # Get dashboard URL
```
### Stop a cluster
```bash
sky down verl-ppo
```

View File

@ -0,0 +1,99 @@
resources:
infra: k8s
accelerators: H100:1
memory: 128+
image_id: docker:verlai/verl:base-verl0.5-cu126-cudnn9.8-torch2.7.0-fa2.7.4
ports: 8265
num_nodes: 2
secrets:
WANDB_API_KEY:
setup: |
rm -rf verl
git clone https://github.com/volcengine/verl.git
cd verl
pip3 install -v -e .[vllm]
pip3 install flashinfer-python
echo "Downloading Math dataset..."
mkdir -p ~/data/math
python3 "$(pwd)/examples/data_preprocess/math_dataset.py" --local_dir ~/data/math
echo "Math dataset download completed"
run: |
HEAD_IP=$(echo "$SKYPILOT_NODE_IPS" | head -n1)
NUM_NODES=$SKYPILOT_NUM_NODES
NUM_GPUS_PER_NODE=$SKYPILOT_NUM_GPUS_PER_NODE
if [ "$SKYPILOT_NODE_RANK" == "0" ]; then
echo "Starting Ray head node..."
ps aux | grep ray | grep 6379 &> /dev/null || ray start --head --disable-usage-stats \
--port=6379 \
--dashboard-host=0.0.0.0 \
--dashboard-port=8265
# Wait for all worker nodes to join
retry_count=0
max_retries=30
while [ $retry_count -lt $max_retries ]; do
connected_nodes=$(ray status 2>/dev/null | grep -c "node_" || echo "0")
echo "Connected nodes: $connected_nodes/$NUM_NODES (attempt $((retry_count+1))/$max_retries)"
if [ "$connected_nodes" -ge "$NUM_NODES" ]; then
echo "All nodes connected to Ray cluster"
break
fi
retry_count=$((retry_count+1))
sleep 10
done
python3 -m verl.trainer.main_ppo \
algorithm.adv_estimator=grpo \
data.train_files=$HOME/data/math/train.parquet \
data.val_files=$HOME/data/math/test.parquet \
data.train_batch_size=32 \
data.max_prompt_length=256 \
data.max_response_length=256 \
data.filter_overlong_prompts=True \
data.truncation='error' \
actor_rollout_ref.model.path=Qwen/Qwen2.5-7B-Instruct \
actor_rollout_ref.actor.optim.lr=1e-6 \
actor_rollout_ref.model.use_remove_padding=True \
actor_rollout_ref.actor.ppo_mini_batch_size=16 \
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \
actor_rollout_ref.actor.ppo_epochs=1 \
actor_rollout_ref.actor.use_kl_loss=False \
actor_rollout_ref.actor.entropy_coeff=0 \
actor_rollout_ref.model.enable_gradient_checkpointing=True \
actor_rollout_ref.actor.fsdp_config.param_offload=True \
actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=16 \
actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
actor_rollout_ref.rollout.name=vllm \
actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \
actor_rollout_ref.rollout.n=1 \
actor_rollout_ref.rollout.enable_chunked_prefill=True \
actor_rollout_ref.rollout.max_num_batched_tokens=2048 \
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=16 \
actor_rollout_ref.ref.fsdp_config.param_offload=True \
algorithm.use_kl_in_reward=False \
trainer.critic_warmup=0 \
trainer.logger=[console,wandb] \
trainer.project_name=verl_math_grpo_demo \
trainer.experiment_name=qwen25_7b_grpo \
trainer.n_gpus_per_node=$NUM_GPUS_PER_NODE \
trainer.nnodes=$NUM_NODES \
trainer.save_freq=-1 \
trainer.test_freq=-1 \
trainer.total_epochs=1
else
sleep 15
echo "Starting Ray worker node..."
ps aux | grep ray | grep $HEAD_IP:6379 &> /dev/null || ray start --address $HEAD_IP:6379 --disable-usage-stats
sleep 10
fi
echo "Node setup and Ray start script finished for rank $SKYPILOT_NODE_RANK."

View File

@ -0,0 +1,91 @@
resources:
infra: k8s
accelerators: H100:8
memory: 128+
image_id: docker:verlai/verl:base-verl0.5-cu126-cudnn9.8-torch2.7.0-fa2.7.4
ports: 8265
num_nodes: 1
secrets:
WANDB_API_KEY:
HF_TOKEN: # in case you're using gated models from the HF hub
setup: |
rm -rf verl
git clone https://github.com/volcengine/verl.git
cd verl
pip3 install -v -e .[vllm]
pip3 install flashinfer-python
pip install "transformers<4.54.0" # https://github.com/vllm-project/vllm-ascend/issues/2046
# Download GSM8K dataset for multiturn tool training
echo "Downloading GSM8K dataset..."
mkdir -p ~/data/gsm8k
python3 "$(pwd)/examples/data_preprocess/gsm8k.py" --local_dir ~/data/gsm8k
echo "GSM8K dataset download completed"
run: |
NUM_GPUS_PER_NODE=$SKYPILOT_NUM_GPUS_PER_NODE
PROJECT_DIR="$(pwd)/verl"
CONFIG_PATH="$PROJECT_DIR/examples/sglang_multiturn/config"
# Single node setup - no worker coordination needed
echo "Starting Ray head node..."
ps aux | grep ray | grep 6379 &> /dev/null || ray start --head --disable-usage-stats \
--port=6379 \
--dashboard-host=0.0.0.0 \
--dashboard-port=8265
cd verl
python3 -m verl.trainer.main_ppo \
--config-path="$CONFIG_PATH" \
--config-name='gsm8k_multiturn_grpo' \
algorithm.adv_estimator=grpo \
data.train_batch_size=512 \
data.max_prompt_length=1024 \
data.max_response_length=1024 \
data.filter_overlong_prompts=True \
data.truncation='error' \
data.return_raw_chat=True \
data.train_files=$HOME/data/gsm8k/train.parquet \
data.val_files=$HOME/data/gsm8k/test.parquet \
actor_rollout_ref.model.path=Qwen/Qwen2.5-3B-Instruct \
actor_rollout_ref.actor.optim.lr=1e-6 \
actor_rollout_ref.model.use_remove_padding=True \
actor_rollout_ref.actor.ppo_mini_batch_size=512 \
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=32 \
actor_rollout_ref.actor.use_kl_loss=True \
actor_rollout_ref.actor.kl_loss_coef=0.001 \
actor_rollout_ref.actor.kl_loss_type=low_var_kl \
actor_rollout_ref.actor.entropy_coeff=0 \
actor_rollout_ref.model.enable_gradient_checkpointing=True \
actor_rollout_ref.actor.fsdp_config.param_offload=False \
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=64 \
actor_rollout_ref.rollout.tensor_model_parallel_size=4 \
actor_rollout_ref.rollout.name=vllm \
actor_rollout_ref.rollout.gpu_memory_utilization=0.5 \
actor_rollout_ref.rollout.n=16 \
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=64 \
actor_rollout_ref.ref.fsdp_config.param_offload=True \
algorithm.use_kl_in_reward=False \
trainer.critic_warmup=0 \
trainer.logger=[console,wandb] \
trainer.project_name=verl_multiturn_tools \
trainer.experiment_name=qwen25_7b_gsm8k_multiturn_tools \
trainer.n_gpus_per_node=$NUM_GPUS_PER_NODE \
trainer.nnodes=1 \
trainer.save_freq=10 \
trainer.test_freq=5 \
trainer.total_epochs=10 \
actor_rollout_ref.actor.ppo_max_token_len_per_gpu=8192 \
actor_rollout_ref.rollout.log_prob_max_token_len_per_gpu=8192 \
actor_rollout_ref.ref.log_prob_max_token_len_per_gpu=8192 \
critic.ppo_max_token_len_per_gpu=8192 \
critic.forward_max_token_len_per_gpu=8192 \
actor_rollout_ref.rollout.multi_turn.tool_config_path="$PROJECT_DIR/examples/sglang_multiturn/config/tool_config/gsm8k_tool_config.yaml" \
actor_rollout_ref.rollout.multi_turn.interaction_config_path="$PROJECT_DIR/examples/sglang_multiturn/config/interaction_config/gsm8k_interaction_config.yaml" \
actor_rollout_ref.rollout.multi_turn.max_user_turns=1
echo "Node setup and Ray start script finished for rank $SKYPILOT_NODE_RANK."

View File

@ -0,0 +1,109 @@
resources:
infra: k8s
accelerators: H100:1
memory: 128+
image_id: docker:verlai/verl:base-verl0.5-cu126-cudnn9.8-torch2.7.0-fa2.7.4
ports: 8265
num_nodes: 2
secrets:
WANDB_API_KEY:
setup: |
rm -rf verl
git clone https://github.com/volcengine/verl.git
cd verl
pip3 install -v -e .[vllm]
pip3 install flashinfer-python
# Download GSM8K dataset - alternative approach
echo "Downloading GSM8K dataset..."
mkdir -p ~/data/gsm8k
# Check if the script exists and use absolute path
if [ -f "$(pwd)/examples/data_preprocess/gsm8k.py" ]; then
python3 "$(pwd)/examples/data_preprocess/gsm8k.py" --local_dir ~/data/gsm8k
else
echo "Warning: gsm8k.py script not found, skipping dataset download"
# You might want to download the dataset manually or use a different approach
fi
echo "GSM8K dataset download completed"
run: |
# Get the Head node's IP and total number of nodes
HEAD_IP=$(echo "$SKYPILOT_NODE_IPS" | head -n1)
NUM_NODES=$SKYPILOT_NUM_NODES
# login wandb
# python3 -c "import wandb; wandb.login(relogin=True, key='$WANDB_API_KEY')"
if [ "$SKYPILOT_NODE_RANK" == "0" ]; then
# Head node starts Ray Head
echo "Starting Ray head node..."
ps aux | grep ray | grep 6379 &> /dev/null || ray start --head --disable-usage-stats \
--port=6379 \
--dashboard-host=0.0.0.0 \
--dashboard-port=8265
# Wait for all worker nodes to join the cluster with better checking
echo "Waiting for all nodes to join Ray cluster..."
retry_count=0
max_retries=30
while [ $retry_count -lt $max_retries ]; do
connected_nodes=$(ray status 2>/dev/null | grep -c "node_" || echo "0")
echo "Connected nodes: $connected_nodes/$NUM_NODES (attempt $((retry_count+1))/$max_retries)"
if [ "$connected_nodes" -ge "$NUM_NODES" ]; then
echo "All nodes connected to Ray cluster"
break
fi
retry_count=$((retry_count+1))
sleep 10
done
if [ $retry_count -eq $max_retries ]; then
echo "WARNING: Not all nodes connected to Ray cluster after $max_retries attempts"
echo "Current Ray status:"
ray status
fi
python3 -m verl.trainer.main_ppo \
data.train_files=$HOME/data/gsm8k/train.parquet \
data.val_files=$HOME/data/gsm8k/test.parquet \
data.train_batch_size=256 \
data.max_prompt_length=512 \
data.max_response_length=256 \
actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \
actor_rollout_ref.actor.optim.lr=1e-6 \
actor_rollout_ref.actor.ppo_mini_batch_size=64 \
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \
actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
actor_rollout_ref.rollout.name=vllm \
actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \
critic.optim.lr=1e-5 \
critic.model.path=Qwen/Qwen2.5-0.5B-Instruct \
critic.ppo_micro_batch_size_per_gpu=4 \
algorithm.kl_ctrl.kl_coef=0.001 \
trainer.logger=[console,wandb] \
trainer.val_before_train=False \
trainer.default_hdfs_dir=null \
trainer.n_gpus_per_node=1 \
trainer.nnodes=2 \
trainer.save_freq=20 \
trainer.test_freq=20 \
trainer.total_epochs=2 \
trainer.project_name=verl_examples \
trainer.experiment_name=experiment_name_gsm8k
else
# Wait for Ray Head to start
sleep 15
# Worker node starts Ray Worker
echo "Starting Ray worker node..."
ps aux | grep ray | grep $HEAD_IP:6379 &> /dev/null || ray start --address $HEAD_IP:6379 --disable-usage-stats
sleep 10
fi
echo "Node setup and Ray start script finished for rank $SKYPILOT_NODE_RANK."