### Summary
#### Minimize Test Workloads
This PR minimizes the test workloads while keeping them meaningful,
reducing the time cost of a test from >10 min to 1~2 min. Specifically,
we
1. set batch sizes and steps as small but still meaningful numbers:
```bash
train_traj_micro_bsz_per_gpu=2 # b
n_resp_per_prompt=4 # g
train_traj_micro_bsz=$((train_traj_micro_bsz_per_gpu * NUM_GPUS)) # b * n
train_traj_mini_bsz=$((train_traj_micro_bsz * 2)) # 2 * b * n
train_prompt_mini_bsz=$((train_traj_mini_bsz * n_resp_per_prompt)) # 2 * b * n / g
train_prompt_bsz=$((train_prompt_mini_bsz * 2)) # 4 * b * n / g
# ...
TOT_TRAIN_STEPS=${TOT_TRAIN_STEPS:-1}
```
2. disable validation (this costs a lot!) / saving / resuming for
training tests by default and leave them to specialized tests
```bash
# Validation
VAL_BEFORE_TRAIN=${VAL_BEFORE_TRAIN:-False}
TEST_FREQ=${TEST_FREQ:--1}
# Save & Resume
RESUME_MODE=${RESUME_MODE:-disable}
SAVE_FREQ=${SAVE_FREQ:--1}
```
#### Improve Triggering Mode
This PRs introduces a more comprehensive triggering logic mode.
Specifically, we
1. consider all Python code by default
2. include related entrypoints (the workflow config, scripts used by it
and hydra config, etc.)
3. exclude unrelated Python code from other components (e.g., recipes,
examples, Megatron, SFT, generation, evaluation, etc. for FSDP training)
An example from `e2e_ppo_trainer`:
```yaml
on:
paths:
- "**/*.py"
# Entrypoints
- ".github/workflows/e2e_ppo_trainer.yml"
- "examples/data_preprocess/gsm8k.py"
- "examples/data_preprocess/geo3k.py"
- "tests/e2e/ppo_trainer"
- "verl/trainer/main_ppo.py"
- "verl/trainer/config/ppo_trainer.yaml"
- "!examples"
- "!verl/trainer/main_*.py"
- "!verl/trainer/fsdp_sft_trainer.py"
# Recipes
- "!recipe"
# Megatron
- "!verl/workers/**/megatron_*.py"
```
#### Avoid missing out errors
Some test scripts didn't end with the main python command and might miss
out the error.
To address this issue, this PR introduces the following options:
```bash
set -xeuo pipefail
```
, which means
- `x`: Print each command before executing it (useful for debugging)
- `e`: Exit immediately if any command fails (returns non-zero exit
status)
- `u`: Treat unset variables as an error
- `o pipefail`: Return the exit status of the last command in a pipeline
that failed, or zero if all succeeded
Together, these options make the script fail fast and provide verbose
output, which helps with debugging and ensuring the script doesn't
continue after encountering errors.
#### Others
Besides, we also
1. unify runner labels into `"L20x8"` to enable preemptive scheduling of
jobs
2. reduce test scripts of minimal differences, grouping by entrypoint
(e.g. `ppo_trainer`, `ppo_megatron_trainer`, recipes, etc.), into a base
script with options
## Summary
This PR changes all the micro_batch_size to micro_batch_size_per_gpu.
**The Core logic of setting batch size:**
- **All algorithmic metrics** (train batch size, ppo mini batch size):
are global (from the perspective of single-controller), which will be
normalized in each Worker.
- **All performance-related parameters** (micro batch size, max token
length in dynamic batch size) are local parameters, which represent the
data sizes per GPU (i.e., each Worker).
## Main Changes
1. Change the scripts and config and delete the normalization for
micro_bsz
2. Fix CI for SFT
This PR adds support for LoRA (Low-Rank Adaptation) for efficient model
fine-tuning.
### Changes
1. Added LoRA configuration support in trainer config
2. Modified FSDP wrapping policy to handle LoRA modules
3. Integrated with existing FSDP training infrastructure
4. Added peft dependency
5. Removed unused ring_attn_utils.py
### Features
- Configurable LoRA rank and alpha parameters
- Target module specification for selective adaptation
- Compatible with FSDP sharding strategy
### Testing
Tested with Qwen2.5-0.5B-Instruct model on GSM8K dataset using the
provided example script.
### Dependencies
- Added `peft` package to requirements.txt
This PR is based on commit 902ddbe6 and has been merged with the latest
upstream main branch.
---------
Co-authored-by: Jiayi Pan <i@jiayipan.me>
Co-authored-by: openhands <openhands@all-hands.dev>