mirror of
https://github.com/vllm-project/vllm.git
synced 2025-10-20 14:53:52 +08:00
Add more libraries to rlhf.md (#26374)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
This commit is contained in:
@ -1,8 +1,19 @@
|
||||
# Reinforcement Learning from Human Feedback
|
||||
|
||||
Reinforcement Learning from Human Feedback (RLHF) is a technique that fine-tunes language models using human-generated preference data to align model outputs with desired behaviors.
|
||||
Reinforcement Learning from Human Feedback (RLHF) is a technique that fine-tunes language models using human-generated preference data to align model outputs with desired behaviors. vLLM can be used to generate the completions for RLHF.
|
||||
|
||||
vLLM can be used to generate the completions for RLHF. Some ways to do this include using libraries like [TRL](https://github.com/huggingface/trl), [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), [verl](https://github.com/volcengine/verl) and [unsloth](https://github.com/unslothai/unsloth).
|
||||
The following open-source RL libraries use vLLM for fast rollouts (sorted alphabetically and non-exhaustive):
|
||||
|
||||
- [Cosmos-RL](https://github.com/nvidia-cosmos/cosmos-rl)
|
||||
- [NeMo-RL](https://github.com/NVIDIA-NeMo/RL)
|
||||
- [Open Instruct](https://github.com/allenai/open-instruct)
|
||||
- [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF)
|
||||
- [PipelineRL](https://github.com/ServiceNow/PipelineRL)
|
||||
- [Prime-RL](https://github.com/PrimeIntellect-ai/prime-rl)
|
||||
- [SkyRL](https://github.com/NovaSky-AI/SkyRL)
|
||||
- [TRL](https://github.com/huggingface/trl)
|
||||
- [Unsloth](https://github.com/unslothai/unsloth)
|
||||
- [verl](https://github.com/volcengine/verl)
|
||||
|
||||
See the following basic examples to get started if you don't want to use an existing library:
|
||||
|
||||
|
Reference in New Issue
Block a user