mirror of https://github.com/volcengine/verl.git synced 2025-10-20 13:43:50 +08:00

Files

Chi Zhang 22d082f9a4 [recipe] feat: add open math reasoning (#3767 )

### What does this PR do?

- Add open math reasoning recipe using sft trainer with model engine
- Support setting none to val dataset in sft trainer
- Fix main_eval
- Using aiohttp for main_generation_server to avoid hang in AsyncOpenAI

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

2025-10-15 12:11:41 +08:00

2.0 KiB

Raw Blame History

Open math reasoning

Introduction

In this recipe, we perform SFT on the open math reasoning dataset using the new SFT trainer with backend agostic model engine. Note that our goal is not to replicate the AIMO-2 Winning Solution work, but to demonstrate a SFT demo from end to end.

Note that you may need to modify the path as needed in the following scripts.

Dataset Preprocessing

Download Dataset

hf download nvidia/OpenMathReasoning --repo-type dataset --include data/cot* --local-dir /path/to/dataset/nvidia/OpenMathReasoning
hf download math-ai/aime24 --repo-type dataset --local-dir /path/to/dataset/math-ai/aime24
hf download math-ai/aime25 --repo-type dataset --local-dir /path/to/dataset/math-ai/aime25

Preprocess the dataset

python3 recipe/open_math_reasoning/prepare_nvidia-OpenMathReasoning_sft.py --local_dataset_path /path/to/nvidia/OpenMathReasoning --local_save_dir /path/to/open_math_reasoning

Prepare the eval dataset

python3 recipe/open_math_reasoning/prepare_eval_dataset.py --local_dataset_path /path/to/dataset --local_save_dir /path/to/eval_dataset

Train the model using SFT

FSDP backend

export CKPT_HOME=/path/to/ckpt export BACKEND=fsdp2 export MODEL_ID=Qwen/Qwen3-8B-Base export TRAIN_FILES=/path/to/open_math_reasoning/cot_dataset.parquet bash recipe/open_math_reasoning/run_sft_qwen3_8b.sh

Megatron backend

TODO

Eval the model

Merge checkpoint into huggingface format

python -m verl.model_merger merge --backend fsdp --local_dir /path/to/ckpt/global_step_19751 --target_dir /path/to/ckpt/global_step_19751/huggingface

Generate the responses

export MODEL_PATH=/path/to/ckpt/global_step_19751/huggingface
bash recipe/open_math_reasoning/run_generation.sh

Evaluate the responses

bash recipe/open_math_reasoning/run_eval.sh

You should see the results like:

{'test_score/aime24': 0.584375, 'test_score/aime25': 0.43333333333333335}

2.0 KiB Raw Blame History