|
19d2f97932
|
Deprecate BestOfNSampler (#4291)
Co-authored-by: behroozazarkhalili <ermiaazarkhalili>
Co-authored-by: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
|
2025-10-15 18:06:34 -06:00 |
|
|
8e2d5516ca
|
Add accuracy reward (#4270)
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
|
2025-10-15 18:01:07 -06:00 |
|
|
94aac4a101
|
Remove how_to_train.md: outdated training FAQ (#4267)
Co-authored-by: behroozazarkhalili <ermiaazarkhalili>
|
2025-10-15 23:49:04 +00:00 |
|
|
aa25c2697c
|
Remove using_llama_models.md: outdated Llama2-specific documentation (#4268)
Co-authored-by: behroozazarkhalili <ermiaazarkhalili>
|
2025-10-15 14:13:27 -07:00 |
|
|
93c7d88563
|
Remove logging.md: trainer-specific metrics documentation (#4269)
Co-authored-by: behroozazarkhalili <ermiaazarkhalili>
|
2025-10-15 14:12:32 -07:00 |
|
|
7e0adbc552
|
Fix CI dev test TypeError: unexpected keyword argument 'load_in_4bit' (#4262)
|
2025-10-15 18:14:49 +02:00 |
|
|
2aa9506c69
|
Fix docstring interlinks (#4221)
|
2025-10-13 13:40:24 +02:00 |
|
|
bcd059a384
|
Remove obsolete research_projects directory (#4243)
Co-authored-by: behroozazarkhalili <ermiaazarkhalili>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
|
2025-10-10 10:15:47 -05:00 |
|
|
c38cb69ec7
|
🧘 Enhance markdown style (#4235)
|
2025-10-09 13:49:44 -05:00 |
|
|
824ff8c73e
|
Add Efficient Online Training with GRPO and vLLM in TRL to community tutorials (#4219)
|
2025-10-08 12:59:04 +02:00 |
|
|
30cf68a97b
|
🎨 Support mixing image+text and text-only examples (#4203)
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
|
2025-10-07 10:21:10 -06:00 |
|
|
452284b8dc
|
Add trainers taxonomy to docs (#4195)
|
2025-10-07 16:06:30 +02:00 |
|
|
6be53e19bc
|
[DOCS] fix prose in lora guide (#4217)
|
2025-10-07 10:40:37 +02:00 |
|
|
3080fc1bd7
|
Fix LoRA params in Python in LoRA without regret (#4215)
|
2025-10-07 09:56:04 +02:00 |
|
|
0588b1f01d
|
Updated vLLM integration guide (#4162)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
|
2025-10-06 15:57:17 +02:00 |
|
|
ced8b337ba
|
[DOCS/FIX] lora without regrets - fix lr (#4207)
|
2025-10-06 08:23:11 +02:00 |
|
|
1eff7da9e0
|
[DOCS] Lora without regret (#4181)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: sergiopaniego <sergiopaniegoblanco@gmail.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
|
2025-10-03 20:40:37 +02:00 |
|
|
e086f073cf
|
🌡️ Have vLLM return processed (temperature scaled) log probs (#4163)
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
|
2025-10-01 11:58:13 -06:00 |
|
|
da209f89fc
|
🎁 RewardTrainer refactor (#4093)
Co-authored-by: juejuezi <juejuezi.git@foxmail.com>
Co-authored-by: Yi Shi <96773624+singing-cat@users.noreply.github.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
|
2025-09-30 15:13:45 -06:00 |
|
|
c8a5add88a
|
Fix PEFT interlinks in docstrings (#4178)
|
2025-09-30 18:32:23 +02:00 |
|
|
864e593e9f
|
Add missing FDivergenceType docstring (#4165)
|
2025-09-29 20:03:33 +02:00 |
|
|
8a5bfecc3a
|
💡 Replace <Tip> with new markdown syntax (#4161)
Co-authored-by: sergiopaniego <sergiopaniegoblanco@gmail.com>
|
2025-09-29 10:48:00 -06:00 |
|
|
9603b41d7e
|
😷 Refactor GRPO/RLOO to isolate _generate (#4114)
|
2025-09-25 20:48:52 -06:00 |
|
|
5ee56ed04f
|
Fixed some <Tip> rendering issues (#4143)
|
2025-09-25 14:47:46 -06:00 |
|
|
d1e24df031
|
[GRPO]: Sample from a Replay Buffer To Substitute Groups with 0 std. (#4060)
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
|
2025-09-24 21:12:16 +01:00 |
|
|
d144e73e78
|
🪙 [Experimental] Support GSPO-token (#3820)
Co-authored-by: LeonEricsson <70749762+LeonEricsson@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
|
2025-09-24 09:57:18 -06:00 |
|
|
251fdb228a
|
📌 Pin vLLM version (#4122)
|
2025-09-23 08:02:30 -06:00 |
|
|
fe02ea2b52
|
😴 Add vllm_enable_sleep_mode to RLOO Trainer (#4107)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
|
2025-09-22 19:41:29 -06:00 |
|
|
68408d7219
|
📽 Multi image support for GRPO/RLOO (#4113)
|
2025-09-22 18:17:42 -06:00 |
|
|
27f22ba5a1
|
docs: correct option name to enable vllm sleep mode (#4102)
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
|
2025-09-22 13:04:00 +02:00 |
|
|
26b497ea63
|
Fix typos (#4109)
|
2025-09-19 09:44:07 -06:00 |
|
|
0e204482e6
|
Some nits GRPO and RLOO trainer docs (#4108)
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
|
2025-09-19 16:37:25 +02:00 |
|
|
3c8d7209f1
|
👁️ Add VLM support to RLOO trainer (#4067)
Co-authored-by: behroozazarkhalili <ermiaazarkhalili>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
|
2025-09-18 21:54:06 -06:00 |
|
|
10dc36d610
|
🌪️ [GFPO]: implement GFPO in GRPOTrainer (#3989)
|
2025-09-17 19:14:40 -06:00 |
|
|
08ea00289a
|
🧶 feat: Add WeaveCallback for W&B Weave integration (#4089)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
|
2025-09-17 18:10:45 -06:00 |
|
|
4bd4acf172
|
🏞️ Context Parallelism benchmark guide (#4075)
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
|
2025-09-16 08:46:12 -06:00 |
|
|
8380869d33
|
Community Tutorials design adaptation for videos (#4095)
|
2025-09-16 16:28:22 +02:00 |
|
|
e2b18ec4e7
|
▶️ Add video to community tutorials (#4090)
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
|
2025-09-15 10:51:23 -06:00 |
|
|
78f1a928ce
|
🗑️ Remove deprecated AlignPropTrainer , DDPOTrainer and IterativeSFTTrainer (#4068)
|
2025-09-15 09:56:41 -06:00 |
|
|
1d0b196f6b
|
Reviewed HF jobs updated docs (#4088)
|
2025-09-15 08:41:08 -06:00 |
|
|
9955ee7eaa
|
🐳 Docker update + Simplify Jobs doc (#3931)
Co-authored-by: sergiopaniego <sergiopaniegoblanco@gmail.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
|
2025-09-13 18:35:55 -06:00 |
|
|
d655ce48f8
|
🌾 [Experimental] BEMA for ref model (#3898)
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
|
2025-09-12 11:47:44 -06:00 |
|
|
91c4bba922
|
🧪 Add trl.experimental Submodule (#4073)
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
|
2025-09-12 11:02:23 -06:00 |
|
|
816ac610c0
|
🪪 Update SFTTrainer to handle labels correctly and add configuration example in paper index (#4051)
|
2025-09-09 14:49:36 -06:00 |
|
|
a228cb51d1
|
Add autodoc for BestOfNSampler and improve docstrings (#4034)
|
2025-09-09 20:28:02 +02:00 |
|
|
659d2c1284
|
🧨 DFT (#4042)
|
2025-09-09 08:23:30 -06:00 |
|
|
1b799a23c1
|
🥓 [docs] add CP docs (#3994)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
|
2025-09-08 21:46:22 -06:00 |
|
|
e4ebf3ba11
|
Add autodoc for AlignPropTrainer and AlignPropConfig (#4033)
|
2025-09-08 20:13:23 +02:00 |
|
|
a1ee7d2182
|
[doc] Group paper index by trainer (#4027)
|
2025-09-08 18:03:48 +02:00 |
|
|
1d06757e57
|
[doc] Paper index for Truncated Importance Sampling (#4026)
|
2025-09-08 08:11:08 +02:00 |
|