The examples under recipes/ are representative extensions to verl for specific end-to-end RL training recipes. The help the community reproduce experiments, verl team provides a snapshot of the codebase when each recipe is initially PR'ed to verl main. You can find them via github branches

Awesome work using verl

Logic-RL: a reproduction of DeepSeek R1 Zero on 2K Tiny Logic Puzzle Dataset.
Seed-Coder: RL training of Seed-Coder boosts performance on competitive programming
all-hands/openhands-lm-32b-v0.1: A strong, open coding agent model, trained with multi-turn fine-tuning
s3 Efficient Yet Effective Search Agent Training via RL
Rec-R1: Bridging Generative Large Language Models and Recommendation Systems via Reinforcement Learning
Explore RL Data Scaling: Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback
FIRE: Flaming-hot initiation with regular execution sampling for large language models
DQO: Enhancing multi-Step reasoning abilities of language models through direct Q-function optimization
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
cognition-engineering: Test time scaling drives cognition engineering.
Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning.
AdaRFT: Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
critic-rl: LLM critics for code generation
self-rewarding-reasoning-LLM: self-rewarding and correction with generative reward models
DeepEnlighten: Reproduce R1 with social reasoning tasks and analyze key findings
MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse
PURE: Credit assignment is the key to successful reinforcement fine-tuning using process reward model
cognitive-behaviors: Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs
deepscaler: iterative context scaling with GRPO
DAPO: the fully open source SOTA RL algorithm that beats DeepSeek-R1-zero-32B
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation