mirror of
https://github.com/huggingface/trl.git
synced 2025-10-20 18:43:52 +08:00
349 B
349 B
Reward Functions
This module contains some useful reward functions, primarily intended for use with the [GRPOTrainer
] and [RLOOTrainer
].
accuracy_reward
autodoc rewards.accuracy_reward
think_format_reward
autodoc rewards.think_format_reward
get_soft_overlong_punishment
autodoc rewards.get_soft_overlong_punishment