mirror of
https://github.com/volcengine/verl.git
synced 2025-10-20 13:43:50 +08:00
docs: add hf ckpt to faq, and include verl apis in the website (#427)
Now APIs can be displayed: 
This commit is contained in:
@ -6,7 +6,8 @@ version: 2
|
||||
build:
|
||||
os: ubuntu-22.04
|
||||
tools:
|
||||
python: "3.8"
|
||||
python: "3.11"
|
||||
rust: "1.70"
|
||||
|
||||
sphinx:
|
||||
configuration: docs/conf.py
|
||||
@ -14,3 +15,5 @@ sphinx:
|
||||
python:
|
||||
install:
|
||||
- requirements: docs/requirements-docs.txt
|
||||
- method: pip
|
||||
path: .
|
||||
|
20
README.md
20
README.md
@ -118,18 +118,20 @@ If you find the project helpful, please cite:
|
||||
verl is inspired by the design of Nemo-Aligner, Deepspeed-chat and OpenRLHF. The project is adopted and supported by Anyscale, Bytedance, LMSys.org, Shanghai AI Lab, Tsinghua University, UC Berkeley, UCLA, UIUC, and University of Hong Kong.
|
||||
|
||||
## Awesome work using verl
|
||||
- [Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization](https://arxiv.org/abs/2410.09302)
|
||||
- [Flaming-hot Initiation with Regular Execution Sampling for Large Language Models](https://arxiv.org/abs/2410.21236)
|
||||
- [Process Reinforcement Through Implicit Rewards](https://github.com/PRIME-RL/PRIME/)
|
||||
- [TinyZero](https://github.com/Jiayi-Pan/TinyZero): a reproduction of DeepSeek R1 Zero recipe for reasoning tasks
|
||||
- [RAGEN](https://github.com/ZihanWang314/ragen): a general-purpose reasoning agent training framework
|
||||
- [Logic R1](https://github.com/Unakar/Logic-RL): a reproduced DeepSeek R1 Zero on 2K Tiny Logic Puzzle Dataset.
|
||||
- [TinyZero](https://github.com/Jiayi-Pan/TinyZero): a reproduction of **DeepSeek R1 Zero** recipe for reasoning tasks
|
||||
- [PRIME](https://github.com/PRIME-RL/PRIME): Process reinforcement through implicit rewards
|
||||
- [RAGEN](https://github.com/ZihanWang314/ragen): a general-purpose reasoning **agent** training framework
|
||||
- [Logic-RL](https://github.com/Unakar/Logic-RL): a reproduction of DeepSeek R1 Zero on 2K Tiny Logic Puzzle Dataset.
|
||||
- [deepscaler](https://github.com/agentica-project/deepscaler): iterative context scaling with GRPO
|
||||
- [critic-rl](https://github.com/HKUNLP/critic-rl): Teaching Language Models to Critique via Reinforcement Learning
|
||||
- [Easy-R1](https://github.com/hiyouga/EasyR1): Multi-Modality RL
|
||||
- [critic-rl](https://github.com/HKUNLP/critic-rl): LLM critics for code generation
|
||||
- [Easy-R1](https://github.com/hiyouga/EasyR1): **Multi-modal** RL training framework
|
||||
- [self-rewarding-reasoning-LLM](https://arxiv.org/pdf/2502.19613): self-rewarding and correction with **generative reward models**
|
||||
- [Search-R1](https://github.com/PeterGriffinJin/Search-R1): RL with reasoning and **searching (tool-call)** interleaved LLMs
|
||||
- [DQO](https://arxiv.org/abs/2410.09302): Enhancing multi-Step reasoning abilities of language models through direct Q-function optimization
|
||||
- [FIRE](https://arxiv.org/abs/2410.21236): Flaming-hot initiation with regular execution sampling for large language models
|
||||
|
||||
## Contribution Guide
|
||||
Contributions from the community are welcome!
|
||||
Contributions from the community are welcome! Please checkout our [roadmap](https://github.com/volcengine/verl/issues/22) and [release plan](https://github.com/volcengine/verl/issues/354).
|
||||
|
||||
### Code formatting
|
||||
We use yapf (Google style) to enforce strict code formatting when reviewing PRs. To reformat you code locally, make sure you installed **latest** `yapf`
|
||||
|
@ -55,3 +55,8 @@ Please set the following environment variable. The env var must be set before th
|
||||
export VLLM_ATTENTION_BACKEND=XFORMERS
|
||||
|
||||
If in doubt, print this env var in each rank to make sure it is properly set.
|
||||
|
||||
Checkpoints
|
||||
------------------------
|
||||
|
||||
If you want to convert the model checkpoint into huggingface safetensor format, please refer to ``scripts/model_merger.py``.
|
||||
|
@ -7,3 +7,6 @@ sphinx-markdown-tables
|
||||
|
||||
# crate-docs-theme
|
||||
sphinx-rtd-theme
|
||||
|
||||
# pin tokenizers version to avoid env_logger version req
|
||||
tokenizers==0.19.1
|
||||
|
@ -84,7 +84,7 @@ def union_tensor_dict(tensor_dict1: TensorDict, tensor_dict2: TensorDict) -> Ten
|
||||
return tensor_dict1
|
||||
|
||||
|
||||
def union_numpy_dict(tensor_dict1: dict[np.ndarray], tensor_dict2: dict[np.ndarray]) -> dict[np.ndarray]:
|
||||
def union_numpy_dict(tensor_dict1: dict[str, np.ndarray], tensor_dict2: dict[str, np.ndarray]) -> dict[str, np.ndarray]:
|
||||
for key, val in tensor_dict2.items():
|
||||
if key in tensor_dict1:
|
||||
assert isinstance(tensor_dict2[key], np.ndarray)
|
||||
@ -448,19 +448,17 @@ class DataProto:
|
||||
return self
|
||||
|
||||
def make_iterator(self, mini_batch_size, epochs, seed=None, dataloader_kwargs=None):
|
||||
"""Make an iterator from the DataProto. This is built upon that TensorDict can be used as a normal Pytorch
|
||||
r"""Make an iterator from the DataProto. This is built upon that TensorDict can be used as a normal Pytorch
|
||||
dataset. See https://pytorch.org/tensordict/tutorials/data_fashion for more details.
|
||||
|
||||
|
||||
Args:
|
||||
mini_batch_size (int): mini-batch size when iterating the dataset. We require that
|
||||
``batch.batch_size[0] % mini_batch_size == 0``
|
||||
mini_batch_size (int): mini-batch size when iterating the dataset. We require that ``batch.batch_size[0] % mini_batch_size == 0``.
|
||||
epochs (int): number of epochs when iterating the dataset.
|
||||
dataloader_kwargs: internally, it returns a DataLoader over the batch.
|
||||
The dataloader_kwargs is the kwargs passed to the DataLoader
|
||||
dataloader_kwargs (Any): internally, it returns a DataLoader over the batch. The dataloader_kwargs is the kwargs passed to the DataLoader.
|
||||
|
||||
Returns:
|
||||
Iterator: an iterator that yields a mini-batch data at a time. The total number of iteration steps is
|
||||
``self.batch.batch_size * epochs // mini_batch_size``
|
||||
Iterator: an iterator that yields a mini-batch data at a time. The total number of iteration steps is ``self.batch.batch_size * epochs // mini_batch_size``
|
||||
"""
|
||||
assert self.batch.batch_size[0] % mini_batch_size == 0, f"{self.batch.batch_size[0]} % {mini_batch_size} != 0"
|
||||
# we can directly create a dataloader from TensorDict
|
||||
|
Reference in New Issue
Block a user