Add more libraries to rlhf.md (#26374)

Signed-off-by: Michael Goin <mgoin64@gmail.com>
This commit is contained in:
Michael Goin
2025-10-07 16:59:41 -04:00
committed by GitHub
parent 59012df99b
commit 1b86bd8e18

View File

@ -1,8 +1,19 @@
# Reinforcement Learning from Human Feedback # Reinforcement Learning from Human Feedback
Reinforcement Learning from Human Feedback (RLHF) is a technique that fine-tunes language models using human-generated preference data to align model outputs with desired behaviors. Reinforcement Learning from Human Feedback (RLHF) is a technique that fine-tunes language models using human-generated preference data to align model outputs with desired behaviors. vLLM can be used to generate the completions for RLHF.
vLLM can be used to generate the completions for RLHF. Some ways to do this include using libraries like [TRL](https://github.com/huggingface/trl), [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), [verl](https://github.com/volcengine/verl) and [unsloth](https://github.com/unslothai/unsloth). The following open-source RL libraries use vLLM for fast rollouts (sorted alphabetically and non-exhaustive):
- [Cosmos-RL](https://github.com/nvidia-cosmos/cosmos-rl)
- [NeMo-RL](https://github.com/NVIDIA-NeMo/RL)
- [Open Instruct](https://github.com/allenai/open-instruct)
- [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF)
- [PipelineRL](https://github.com/ServiceNow/PipelineRL)
- [Prime-RL](https://github.com/PrimeIntellect-ai/prime-rl)
- [SkyRL](https://github.com/NovaSky-AI/SkyRL)
- [TRL](https://github.com/huggingface/trl)
- [Unsloth](https://github.com/unslothai/unsloth)
- [verl](https://github.com/volcengine/verl)
See the following basic examples to get started if you don't want to use an existing library: See the following basic examples to get started if you don't want to use an existing library: