mirror of
https://github.com/volcengine/verl.git
synced 2025-10-20 13:43:50 +08:00
74 lines
2.5 KiB
ReStructuredText
74 lines
2.5 KiB
ReStructuredText
Welcome to veRL/HybridFlow's documentation!
|
|
================================================
|
|
|
|
veRL (HybridFlow) is a flexible, efficient and industrial-level RL(HF) training framework designed for large language models (LLMs) Post-Training.
|
|
|
|
veRL is flexible and easy to use with:
|
|
|
|
- **Easy to support diverse RL(HF) algorithms**: The Hybrid programming model combines the strengths of single-controller and multi-controller paradigms to enable flexible representation and efficient execution of complex Post-Training dataflows. Allowing users to build RL dataflows in a few lines of code.
|
|
|
|
- **Seamless integration of existing LLM infra with modular API design**: Decouples computation and data dependencies, enabling seamless integration with existing LLM frameworks, such as PyTorch FSDP, Megatron-LM and vLLM. Moreover, users can easily extend to other LLM training and inference frameworks.
|
|
|
|
- **Flexible device mapping**: Supports various placement of models onto different sets of GPUs for efficient resource utilization and scalability across different cluster sizes.
|
|
|
|
- Readily integration with popular Hugging Face models
|
|
|
|
|
|
veRL is fast with:
|
|
|
|
- **State-of-the-art throughput**: By seamlessly integrating existing SOTA LLM training and inference frameworks, veRL achieves high generation and training throughput.
|
|
|
|
- **Efficient actor model resharding with 3D-HybridEngine**: Eliminates memory redundancy and significantly reduces communication overhead during transitions between training and generation phases.
|
|
|
|
--------------------------------------------
|
|
|
|
.. _Contents:
|
|
|
|
.. toctree::
|
|
:maxdepth: 5
|
|
:caption: Preparation
|
|
:titlesonly:
|
|
:numbered:
|
|
|
|
preparation/install
|
|
preparation/prepare_data
|
|
preparation/reward_function
|
|
|
|
.. toctree::
|
|
:maxdepth: 2
|
|
:caption: PPO Example
|
|
:titlesonly:
|
|
:numbered:
|
|
|
|
examples/ppo_code_architecture
|
|
examples/config
|
|
examples/gsm8k_example
|
|
|
|
.. toctree::
|
|
:maxdepth: 1
|
|
:caption: PPO Trainer and Workers
|
|
|
|
workers/ray_trainer
|
|
workers/fsdp_workers
|
|
workers/megatron_workers
|
|
|
|
.. toctree::
|
|
:maxdepth: 1
|
|
:caption: Advance Usage and Extension
|
|
|
|
advance/placement
|
|
advance/dpo_extension
|
|
advance/fsdp_extension
|
|
advance/megatron_extension
|
|
|
|
|
|
Contribution
|
|
-------------
|
|
|
|
veRL is free software; you can redistribute it and/or modify it under the terms
|
|
of the Apache License 2.0. We welcome contributions.
|
|
Join us on `GitHub <https://github.com/volcengine/verl>`_ .
|
|
|
|
.. and check out our
|
|
.. :doc:`contribution guidelines <contribute>`.
|