docs: add hf ckpt to faq, and include verl apis in the website (#427)

Now APIs can be displayed: 


![image](https://github.com/user-attachments/assets/6592ce68-7bf6-46cb-8dd3-a5fa6cd99f3e)
This commit is contained in:
HL
2025-03-01 21:18:30 -08:00
committed by GitHub
parent 99fb2dde77
commit fe547a3320
5 changed files with 31 additions and 20 deletions

View File

@ -6,11 +6,14 @@ version: 2
build: build:
os: ubuntu-22.04 os: ubuntu-22.04
tools: tools:
python: "3.8" python: "3.11"
rust: "1.70"
sphinx: sphinx:
configuration: docs/conf.py configuration: docs/conf.py
python: python:
install: install:
- requirements: docs/requirements-docs.txt - requirements: docs/requirements-docs.txt
- method: pip
path: .

View File

@ -118,18 +118,20 @@ If you find the project helpful, please cite:
verl is inspired by the design of Nemo-Aligner, Deepspeed-chat and OpenRLHF. The project is adopted and supported by Anyscale, Bytedance, LMSys.org, Shanghai AI Lab, Tsinghua University, UC Berkeley, UCLA, UIUC, and University of Hong Kong. verl is inspired by the design of Nemo-Aligner, Deepspeed-chat and OpenRLHF. The project is adopted and supported by Anyscale, Bytedance, LMSys.org, Shanghai AI Lab, Tsinghua University, UC Berkeley, UCLA, UIUC, and University of Hong Kong.
## Awesome work using verl ## Awesome work using verl
- [Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization](https://arxiv.org/abs/2410.09302) - [TinyZero](https://github.com/Jiayi-Pan/TinyZero): a reproduction of **DeepSeek R1 Zero** recipe for reasoning tasks
- [Flaming-hot Initiation with Regular Execution Sampling for Large Language Models](https://arxiv.org/abs/2410.21236) - [PRIME](https://github.com/PRIME-RL/PRIME): Process reinforcement through implicit rewards
- [Process Reinforcement Through Implicit Rewards](https://github.com/PRIME-RL/PRIME/) - [RAGEN](https://github.com/ZihanWang314/ragen): a general-purpose reasoning **agent** training framework
- [TinyZero](https://github.com/Jiayi-Pan/TinyZero): a reproduction of DeepSeek R1 Zero recipe for reasoning tasks - [Logic-RL](https://github.com/Unakar/Logic-RL): a reproduction of DeepSeek R1 Zero on 2K Tiny Logic Puzzle Dataset.
- [RAGEN](https://github.com/ZihanWang314/ragen): a general-purpose reasoning agent training framework
- [Logic R1](https://github.com/Unakar/Logic-RL): a reproduced DeepSeek R1 Zero on 2K Tiny Logic Puzzle Dataset.
- [deepscaler](https://github.com/agentica-project/deepscaler): iterative context scaling with GRPO - [deepscaler](https://github.com/agentica-project/deepscaler): iterative context scaling with GRPO
- [critic-rl](https://github.com/HKUNLP/critic-rl): Teaching Language Models to Critique via Reinforcement Learning - [critic-rl](https://github.com/HKUNLP/critic-rl): LLM critics for code generation
- [Easy-R1](https://github.com/hiyouga/EasyR1): Multi-Modality RL - [Easy-R1](https://github.com/hiyouga/EasyR1): **Multi-modal** RL training framework
- [self-rewarding-reasoning-LLM](https://arxiv.org/pdf/2502.19613): self-rewarding and correction with **generative reward models**
- [Search-R1](https://github.com/PeterGriffinJin/Search-R1): RL with reasoning and **searching (tool-call)** interleaved LLMs
- [DQO](https://arxiv.org/abs/2410.09302): Enhancing multi-Step reasoning abilities of language models through direct Q-function optimization
- [FIRE](https://arxiv.org/abs/2410.21236): Flaming-hot initiation with regular execution sampling for large language models
## Contribution Guide ## Contribution Guide
Contributions from the community are welcome! Contributions from the community are welcome! Please checkout our [roadmap](https://github.com/volcengine/verl/issues/22) and [release plan](https://github.com/volcengine/verl/issues/354).
### Code formatting ### Code formatting
We use yapf (Google style) to enforce strict code formatting when reviewing PRs. To reformat you code locally, make sure you installed **latest** `yapf` We use yapf (Google style) to enforce strict code formatting when reviewing PRs. To reformat you code locally, make sure you installed **latest** `yapf`

View File

@ -55,3 +55,8 @@ Please set the following environment variable. The env var must be set before th
export VLLM_ATTENTION_BACKEND=XFORMERS export VLLM_ATTENTION_BACKEND=XFORMERS
If in doubt, print this env var in each rank to make sure it is properly set. If in doubt, print this env var in each rank to make sure it is properly set.
Checkpoints
------------------------
If you want to convert the model checkpoint into huggingface safetensor format, please refer to ``scripts/model_merger.py``.

View File

@ -6,4 +6,7 @@ sphinx-markdown-tables
# theme default rtd # theme default rtd
# crate-docs-theme # crate-docs-theme
sphinx-rtd-theme sphinx-rtd-theme
# pin tokenizers version to avoid env_logger version req
tokenizers==0.19.1

View File

@ -84,7 +84,7 @@ def union_tensor_dict(tensor_dict1: TensorDict, tensor_dict2: TensorDict) -> Ten
return tensor_dict1 return tensor_dict1
def union_numpy_dict(tensor_dict1: dict[np.ndarray], tensor_dict2: dict[np.ndarray]) -> dict[np.ndarray]: def union_numpy_dict(tensor_dict1: dict[str, np.ndarray], tensor_dict2: dict[str, np.ndarray]) -> dict[str, np.ndarray]:
for key, val in tensor_dict2.items(): for key, val in tensor_dict2.items():
if key in tensor_dict1: if key in tensor_dict1:
assert isinstance(tensor_dict2[key], np.ndarray) assert isinstance(tensor_dict2[key], np.ndarray)
@ -448,19 +448,17 @@ class DataProto:
return self return self
def make_iterator(self, mini_batch_size, epochs, seed=None, dataloader_kwargs=None): def make_iterator(self, mini_batch_size, epochs, seed=None, dataloader_kwargs=None):
"""Make an iterator from the DataProto. This is built upon that TensorDict can be used as a normal Pytorch r"""Make an iterator from the DataProto. This is built upon that TensorDict can be used as a normal Pytorch
dataset. See https://pytorch.org/tensordict/tutorials/data_fashion for more details. dataset. See https://pytorch.org/tensordict/tutorials/data_fashion for more details.
Args: Args:
mini_batch_size (int): mini-batch size when iterating the dataset. We require that mini_batch_size (int): mini-batch size when iterating the dataset. We require that ``batch.batch_size[0] % mini_batch_size == 0``.
``batch.batch_size[0] % mini_batch_size == 0``
epochs (int): number of epochs when iterating the dataset. epochs (int): number of epochs when iterating the dataset.
dataloader_kwargs: internally, it returns a DataLoader over the batch. dataloader_kwargs (Any): internally, it returns a DataLoader over the batch. The dataloader_kwargs is the kwargs passed to the DataLoader.
The dataloader_kwargs is the kwargs passed to the DataLoader
Returns: Returns:
Iterator: an iterator that yields a mini-batch data at a time. The total number of iteration steps is Iterator: an iterator that yields a mini-batch data at a time. The total number of iteration steps is ``self.batch.batch_size * epochs // mini_batch_size``
``self.batch.batch_size * epochs // mini_batch_size``
""" """
assert self.batch.batch_size[0] % mini_batch_size == 0, f"{self.batch.batch_size[0]} % {mini_batch_size} != 0" assert self.batch.batch_size[0] % mini_batch_size == 0, f"{self.batch.batch_size[0]} % {mini_batch_size} != 0"
# we can directly create a dataloader from TensorDict # we can directly create a dataloader from TensorDict