mirror of https://github.com/huggingface/trl.git synced 2025-10-20 18:43:52 +08:00

Files

Pramodith Ballapuram 8e2d5516ca Add accuracy reward (#4270 )

Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

2025-10-15 18:01:07 -06:00

Reward Functions

This module contains some useful reward functions, primarily intended for use with the [GRPOTrainer] and [RLOOTrainer].

accuracy_reward

autodoc rewards.accuracy_reward

autodoc rewards.think_format_reward

autodoc rewards.get_soft_overlong_punishment