Commit Graph

3 Commits

Author SHA1 Message Date
8e2d5516ca Add accuracy reward (#4270)
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
2025-10-15 18:01:07 -06:00
ae6837f8d4 Removed tokenizer/processor creation from example scripts (#4211) 2025-10-06 18:40:18 +02:00
3c8d7209f1 👁️ Add VLM support to RLOO trainer (#4067)
Co-authored-by: behroozazarkhalili <ermiaazarkhalili>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
2025-09-18 21:54:06 -06:00