frozenleaves/trl - trl - Gitea: Git for Me

mirror of https://github.com/huggingface/trl.git synced 2025-10-20 18:43:52 +08:00

Author	SHA1	Message	Date
Quentin Gallouédec	19d2f97932	Deprecate `BestOfNSampler` (#4291 ) Co-authored-by: behroozazarkhalili <ermiaazarkhalili> Co-authored-by: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>	2025-10-15 18:06:34 -06:00
Pramodith Ballapuram	8e2d5516ca	Add accuracy reward (#4270 ) Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>	2025-10-15 18:01:07 -06:00
Behrooz Azarkhalili	94aac4a101	Remove how_to_train.md: outdated training FAQ (#4267 ) Co-authored-by: behroozazarkhalili <ermiaazarkhalili>	2025-10-15 23:49:04 +00:00
Behrooz Azarkhalili	aa25c2697c	Remove using_llama_models.md: outdated Llama2-specific documentation (#4268 ) Co-authored-by: behroozazarkhalili <ermiaazarkhalili>	2025-10-15 14:13:27 -07:00
Behrooz Azarkhalili	93c7d88563	Remove logging.md: trainer-specific metrics documentation (#4269 ) Co-authored-by: behroozazarkhalili <ermiaazarkhalili>	2025-10-15 14:12:32 -07:00
Albert Villanova del Moral	7e0adbc552	Fix CI dev test TypeError: unexpected keyword argument 'load_in_4bit' (#4262 )	2025-10-15 18:14:49 +02:00
Albert Villanova del Moral	2aa9506c69	Fix docstring interlinks (#4221 )	2025-10-13 13:40:24 +02:00
Behrooz Azarkhalili	bcd059a384	Remove obsolete research_projects directory (#4243 ) Co-authored-by: behroozazarkhalili <ermiaazarkhalili> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>	2025-10-10 10:15:47 -05:00
Quentin Gallouédec	c38cb69ec7	🧘 Enhance markdown style (#4235 )	2025-10-09 13:49:44 -05:00
Sergio Paniego Blanco	824ff8c73e	Add Efficient Online Training with GRPO and vLLM in TRL to community tutorials (#4219 )	2025-10-08 12:59:04 +02:00
Quentin Gallouédec	30cf68a97b	🎨 Support mixing image+text and text-only examples (#4203 ) Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>	2025-10-07 10:21:10 -06:00
Sergio Paniego Blanco	452284b8dc	Add trainers taxonomy to docs (#4195 )	2025-10-07 16:06:30 +02:00
burtenshaw	6be53e19bc	[DOCS] fix prose in lora guide (#4217 )	2025-10-07 10:40:37 +02:00
Sergio Paniego Blanco	3080fc1bd7	Fix LoRA params in Python in LoRA without regret (#4215 )	2025-10-07 09:56:04 +02:00
Sergio Paniego Blanco	0588b1f01d	Updated vLLM integration guide (#4162 ) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-10-06 15:57:17 +02:00
burtenshaw	ced8b337ba	[DOCS/FIX] lora without regrets - fix lr (#4207 )	2025-10-06 08:23:11 +02:00
burtenshaw	1eff7da9e0	[DOCS] Lora without regret (#4181 ) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: sergiopaniego <sergiopaniegoblanco@gmail.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>	2025-10-03 20:40:37 +02:00
YonatanGideoni	e086f073cf	🌡️ Have vLLM return processed (temperature scaled) log probs (#4163 ) Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-10-01 11:58:13 -06:00
Quentin Gallouédec	da209f89fc	🎁 `RewardTrainer` refactor (#4093 ) Co-authored-by: juejuezi <juejuezi.git@foxmail.com> Co-authored-by: Yi Shi <96773624+singing-cat@users.noreply.github.com> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>	2025-09-30 15:13:45 -06:00
Albert Villanova del Moral	c8a5add88a	Fix PEFT interlinks in docstrings (#4178 )	2025-09-30 18:32:23 +02:00
Albert Villanova del Moral	864e593e9f	Add missing FDivergenceType docstring (#4165 )	2025-09-29 20:03:33 +02:00
Quentin Gallouédec	8a5bfecc3a	💡 Replace `<Tip>` with new markdown syntax (#4161 ) Co-authored-by: sergiopaniego <sergiopaniegoblanco@gmail.com>	2025-09-29 10:48:00 -06:00
Quentin Gallouédec	9603b41d7e	😷 Refactor GRPO/RLOO to isolate `_generate` (#4114 )	2025-09-25 20:48:52 -06:00
Sergio Paniego Blanco	5ee56ed04f	Fixed some <Tip> rendering issues (#4143 )	2025-09-25 14:47:46 -06:00
Pramodith Ballapuram	d1e24df031	[GRPO]: Sample from a Replay Buffer To Substitute Groups with 0 std. (#4060 ) Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-09-24 21:12:16 +01:00
jinghanhu	d144e73e78	🪙 [Experimental] Support GSPO-token (#3820 ) Co-authored-by: LeonEricsson <70749762+LeonEricsson@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>	2025-09-24 09:57:18 -06:00
Quentin Gallouédec	251fdb228a	📌 Pin vLLM version (#4122 )	2025-09-23 08:02:30 -06:00
Sergio Paniego Blanco	fe02ea2b52	😴 Add `vllm_enable_sleep_mode` to RLOO Trainer (#4107 ) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>	2025-09-22 19:41:29 -06:00
Quentin Gallouédec	68408d7219	📽 Multi image support for GRPO/RLOO (#4113 )	2025-09-22 18:17:42 -06:00
Yasuhiro Fujita	27f22ba5a1	docs: correct option name to enable vllm sleep mode (#4102 ) Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>	2025-09-22 13:04:00 +02:00
Quentin Gallouédec	26b497ea63	Fix typos (#4109 )	2025-09-19 09:44:07 -06:00
Sergio Paniego Blanco	0e204482e6	Some nits GRPO and RLOO trainer docs (#4108 ) Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>	2025-09-19 16:37:25 +02:00
Behrooz Azarkhalili	3c8d7209f1	👁️ Add VLM support to RLOO trainer (#4067 ) Co-authored-by: behroozazarkhalili <ermiaazarkhalili> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>	2025-09-18 21:54:06 -06:00
Minhao Chou	10dc36d610	🌪️ [GFPO]: implement GFPO in GRPOTrainer (#3989 )	2025-09-17 19:14:40 -06:00
Bharat Ramanathan	08ea00289a	🧶 feat: Add WeaveCallback for W&B Weave integration (#4089 ) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>	2025-09-17 18:10:45 -06:00
Sergio Paniego Blanco	4bd4acf172	🏞️ Context Parallelism benchmark guide (#4075 ) Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-09-16 08:46:12 -06:00
Sergio Paniego Blanco	8380869d33	Community Tutorials design adaptation for videos (#4095 )	2025-09-16 16:28:22 +02:00
Quentin Gallouédec	e2b18ec4e7	▶️ Add video to community tutorials (#4090 ) Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>	2025-09-15 10:51:23 -06:00
Quentin Gallouédec	78f1a928ce	🗑️ Remove deprecated `AlignPropTrainer`, `DDPOTrainer` and `IterativeSFTTrainer` (#4068 )	2025-09-15 09:56:41 -06:00
Sergio Paniego Blanco	1d0b196f6b	Reviewed HF jobs updated docs (#4088 )	2025-09-15 08:41:08 -06:00
Quentin Gallouédec	9955ee7eaa	🐳 Docker update + Simplify Jobs doc (#3931 ) Co-authored-by: sergiopaniego <sergiopaniegoblanco@gmail.com> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>	2025-09-13 18:35:55 -06:00
Quentin Gallouédec	d655ce48f8	🌾 [Experimental] BEMA for ref model (#3898 ) Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>	2025-09-12 11:47:44 -06:00
August Moharrami	91c4bba922	🧪 Add `trl.experimental` Submodule (#4073 ) Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-09-12 11:02:23 -06:00
Quentin Gallouédec	816ac610c0	🪪 Update SFTTrainer to handle labels correctly and add configuration example in paper index (#4051 )	2025-09-09 14:49:36 -06:00
Albert Villanova del Moral	a228cb51d1	Add autodoc for BestOfNSampler and improve docstrings (#4034 )	2025-09-09 20:28:02 +02:00
Quentin Gallouédec	659d2c1284	🧨 DFT (#4042 )	2025-09-09 08:23:30 -06:00
Kashif Rasul	1b799a23c1	🥓 [docs] add CP docs (#3994 ) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>	2025-09-08 21:46:22 -06:00
Albert Villanova del Moral	e4ebf3ba11	Add autodoc for AlignPropTrainer and AlignPropConfig (#4033 )	2025-09-08 20:13:23 +02:00
LeonEricsson	a1ee7d2182	[doc] Group paper index by trainer (#4027 )	2025-09-08 18:03:48 +02:00
LeonEricsson	1d06757e57	[doc] Paper index for Truncated Importance Sampling (#4026 )	2025-09-08 08:11:08 +02:00

1 2 3 4 5 ...

491 Commits