491 Commits

Author SHA1 Message Date
19d2f97932 Deprecate BestOfNSampler (#4291)
Co-authored-by: behroozazarkhalili <ermiaazarkhalili>
Co-authored-by: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
2025-10-15 18:06:34 -06:00
8e2d5516ca Add accuracy reward (#4270)
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
2025-10-15 18:01:07 -06:00
94aac4a101 Remove how_to_train.md: outdated training FAQ (#4267)
Co-authored-by: behroozazarkhalili <ermiaazarkhalili>
2025-10-15 23:49:04 +00:00
aa25c2697c Remove using_llama_models.md: outdated Llama2-specific documentation (#4268)
Co-authored-by: behroozazarkhalili <ermiaazarkhalili>
2025-10-15 14:13:27 -07:00
93c7d88563 Remove logging.md: trainer-specific metrics documentation (#4269)
Co-authored-by: behroozazarkhalili <ermiaazarkhalili>
2025-10-15 14:12:32 -07:00
7e0adbc552 Fix CI dev test TypeError: unexpected keyword argument 'load_in_4bit' (#4262) 2025-10-15 18:14:49 +02:00
2aa9506c69 Fix docstring interlinks (#4221) 2025-10-13 13:40:24 +02:00
bcd059a384 Remove obsolete research_projects directory (#4243)
Co-authored-by: behroozazarkhalili <ermiaazarkhalili>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
2025-10-10 10:15:47 -05:00
c38cb69ec7 🧘 Enhance markdown style (#4235) 2025-10-09 13:49:44 -05:00
824ff8c73e Add Efficient Online Training with GRPO and vLLM in TRL to community tutorials (#4219) 2025-10-08 12:59:04 +02:00
30cf68a97b 🎨 Support mixing image+text and text-only examples (#4203)
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
2025-10-07 10:21:10 -06:00
452284b8dc Add trainers taxonomy to docs (#4195) 2025-10-07 16:06:30 +02:00
6be53e19bc [DOCS] fix prose in lora guide (#4217) 2025-10-07 10:40:37 +02:00
3080fc1bd7 Fix LoRA params in Python in LoRA without regret (#4215) 2025-10-07 09:56:04 +02:00
0588b1f01d Updated vLLM integration guide (#4162)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-10-06 15:57:17 +02:00
ced8b337ba [DOCS/FIX] lora without regrets - fix lr (#4207) 2025-10-06 08:23:11 +02:00
1eff7da9e0 [DOCS] Lora without regret (#4181)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: sergiopaniego <sergiopaniegoblanco@gmail.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2025-10-03 20:40:37 +02:00
e086f073cf 🌡️ Have vLLM return processed (temperature scaled) log probs (#4163)
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-10-01 11:58:13 -06:00
da209f89fc 🎁 RewardTrainer refactor (#4093)
Co-authored-by: juejuezi <juejuezi.git@foxmail.com>
Co-authored-by: Yi Shi <96773624+singing-cat@users.noreply.github.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2025-09-30 15:13:45 -06:00
c8a5add88a Fix PEFT interlinks in docstrings (#4178) 2025-09-30 18:32:23 +02:00
864e593e9f Add missing FDivergenceType docstring (#4165) 2025-09-29 20:03:33 +02:00
8a5bfecc3a 💡 Replace <Tip> with new markdown syntax (#4161)
Co-authored-by: sergiopaniego <sergiopaniegoblanco@gmail.com>
2025-09-29 10:48:00 -06:00
9603b41d7e 😷 Refactor GRPO/RLOO to isolate _generate (#4114) 2025-09-25 20:48:52 -06:00
5ee56ed04f Fixed some <Tip> rendering issues (#4143) 2025-09-25 14:47:46 -06:00
d1e24df031 [GRPO]: Sample from a Replay Buffer To Substitute Groups with 0 std. (#4060)
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-09-24 21:12:16 +01:00
d144e73e78 🪙 [Experimental] Support GSPO-token (#3820)
Co-authored-by: LeonEricsson <70749762+LeonEricsson@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
2025-09-24 09:57:18 -06:00
251fdb228a 📌 Pin vLLM version (#4122) 2025-09-23 08:02:30 -06:00
fe02ea2b52 😴 Add vllm_enable_sleep_mode to RLOO Trainer (#4107)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
2025-09-22 19:41:29 -06:00
68408d7219 📽 Multi image support for GRPO/RLOO (#4113) 2025-09-22 18:17:42 -06:00
27f22ba5a1 docs: correct option name to enable vllm sleep mode (#4102)
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
2025-09-22 13:04:00 +02:00
26b497ea63 Fix typos (#4109) 2025-09-19 09:44:07 -06:00
0e204482e6 Some nits GRPO and RLOO trainer docs (#4108)
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2025-09-19 16:37:25 +02:00
3c8d7209f1 👁️ Add VLM support to RLOO trainer (#4067)
Co-authored-by: behroozazarkhalili <ermiaazarkhalili>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
2025-09-18 21:54:06 -06:00
10dc36d610 🌪️ [GFPO]: implement GFPO in GRPOTrainer (#3989) 2025-09-17 19:14:40 -06:00
08ea00289a 🧶 feat: Add WeaveCallback for W&B Weave integration (#4089)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
2025-09-17 18:10:45 -06:00
4bd4acf172 🏞️ Context Parallelism benchmark guide (#4075)
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-09-16 08:46:12 -06:00
8380869d33 Community Tutorials design adaptation for videos (#4095) 2025-09-16 16:28:22 +02:00
e2b18ec4e7 ▶️ Add video to community tutorials (#4090)
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
2025-09-15 10:51:23 -06:00
78f1a928ce 🗑️ Remove deprecated AlignPropTrainer, DDPOTrainer and IterativeSFTTrainer (#4068) 2025-09-15 09:56:41 -06:00
1d0b196f6b Reviewed HF jobs updated docs (#4088) 2025-09-15 08:41:08 -06:00
9955ee7eaa 🐳 Docker update + Simplify Jobs doc (#3931)
Co-authored-by: sergiopaniego <sergiopaniegoblanco@gmail.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2025-09-13 18:35:55 -06:00
d655ce48f8 🌾 [Experimental] BEMA for ref model (#3898)
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2025-09-12 11:47:44 -06:00
91c4bba922 🧪 Add trl.experimental Submodule (#4073)
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-09-12 11:02:23 -06:00
816ac610c0 🪪 Update SFTTrainer to handle labels correctly and add configuration example in paper index (#4051) 2025-09-09 14:49:36 -06:00
a228cb51d1 Add autodoc for BestOfNSampler and improve docstrings (#4034) 2025-09-09 20:28:02 +02:00
659d2c1284 🧨 DFT (#4042) 2025-09-09 08:23:30 -06:00
1b799a23c1 🥓 [docs] add CP docs (#3994)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
2025-09-08 21:46:22 -06:00
e4ebf3ba11 Add autodoc for AlignPropTrainer and AlignPropConfig (#4033) 2025-09-08 20:13:23 +02:00
a1ee7d2182 [doc] Group paper index by trainer (#4027) 2025-09-08 18:03:48 +02:00
1d06757e57 [doc] Paper index for Truncated Importance Sampling (#4026) 2025-09-08 08:11:08 +02:00