Files
verl/recipe/README.md
H 0f5ab5c854 [doc] feat: add retool blog (#2761)
### What does this PR do?

add link to the retool blog

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
2025-07-26 13:13:55 +08:00

4.4 KiB

Recipe

The examples under recipes/ are representative extensions to verl for specific end-to-end RL training recipes. The help the community reproduce experiments, verl team provides a snapshot of the codebase when each recipe is initially PR'ed to verl main. You can find them via github branches

Awesome work using verl

  • Logic-RL: a reproduction of DeepSeek R1 Zero on 2K Tiny Logic Puzzle Dataset. GitHub Repo stars
  • Seed-Coder: RL training of Seed-Coder boosts performance on competitive programming GitHub Repo stars
  • all-hands/openhands-lm-32b-v0.1: A strong, open coding agent model, trained with multi-turn fine-tuning
  • s3 Efficient Yet Effective Search Agent Training via RL GitHub Repo stars
  • Rec-R1: Bridging Generative Large Language Models and Recommendation Systems via Reinforcement Learning
  • Explore RL Data Scaling: Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback
  • FIRE: Flaming-hot initiation with regular execution sampling for large language models
  • DQO: Enhancing multi-Step reasoning abilities of language models through direct Q-function optimization
  • ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
  • cognition-engineering: Test time scaling drives cognition engineering. GitHub Repo stars
  • Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning. GitHub Repo stars
  • AdaRFT: Efficient Reinforcement Finetuning via Adaptive Curriculum Learning GitHub Repo stars
  • critic-rl: LLM critics for code generation GitHub Repo stars
  • self-rewarding-reasoning-LLM: self-rewarding and correction with generative reward models GitHub Repo stars
  • DeepEnlighten: Reproduce R1 with social reasoning tasks and analyze key findings GitHub Repo stars
  • MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse GitHub Repo stars
  • PURE: Credit assignment is the key to successful reinforcement fine-tuning using process reward model GitHub Repo stars
  • cognitive-behaviors: Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs GitHub Repo stars
  • deepscaler: iterative context scaling with GRPO GitHub Repo stars
  • DAPO: the fully open source SOTA RL algorithm that beats DeepSeek-R1-zero-32B GitHub Repo stars
  • NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation GitHub Repo stars