frozenleaves/trl - trl - Gitea: Git for Me

mirror of https://github.com/huggingface/trl.git synced 2025-10-20 18:43:52 +08:00

Author	SHA1	Message	Date
Albert Villanova del Moral	2f1802bc6e	Fix missing CI slow tests: ImportError: vLLM is not installed (#4304 )	2025-10-20 08:03:48 +02:00
Pramodith Ballapuram	8e2d5516ca	Add accuracy reward (#4270 ) Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>	2025-10-15 18:01:07 -06:00
Alexander Weers	26b7c2507e	Add support for `token_type_ids` in `DPOTrainer` (#4285 ) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>	2025-10-15 17:33:35 -06:00
Albert Villanova del Moral	c7c041ecc8	Fix CI slow tests: ImportError: vLLM is not installed (#4287 )	2025-10-15 18:15:36 +02:00
Albert Villanova del Moral	ef40c047aa	Replace unittest skipTest with pytest.skip (#4263 )	2025-10-15 18:15:28 +02:00
Albert Villanova del Moral	7e0adbc552	Fix CI dev test TypeError: unexpected keyword argument 'load_in_4bit' (#4262 )	2025-10-15 18:14:49 +02:00
Quentin Gallouédec	773afd9314	💰 `RichProgressCallback` enhancement (#4245 )	2025-10-15 09:39:17 -06:00
Albert Villanova del Moral	966b397201	Fix CI slow test OSError: You are trying to access a gated repo (#4283 )	2025-10-15 16:11:11 +02:00
Albert Villanova del Moral	cefbacb30e	Fix style with make precommit (#4265 )	2025-10-14 12:13:15 +02:00
Albert Villanova del Moral	1684ef279a	Fix Python version check for skipping tests on Python 3.13.8 (#4246 )	2025-10-10 17:41:24 +02:00
Carlos Miguel Patiño	aab21eb5e7	Include `chat_template_kwargs` in `apply_chat_template` (#4233 ) Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>	2025-10-10 10:39:29 -05:00
Albert Villanova del Moral	86d1963cc1	Fix CI slow test AttributeError: 'TestSFTTrainerSlow' object has no attribute 'addCleanup' (#4255 )	2025-10-10 17:19:53 +02:00
Behrooz Azarkhalili	039d526d24	Deprecate unused dataset_formatting module (#4242 ) Co-authored-by: behroozazarkhalili <ermiaazarkhalili> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>	2025-10-10 10:16:18 -05:00
Quentin Gallouédec	0e57b4a9df	🧺 [3/N] Refactor `_generate` in GRPO/RLOO: Rely on generator for prompt truncation (#4153 )	2025-10-10 10:02:11 -05:00
Albert Villanova del Moral	98488e0946	Fix CI slow test ValueError: Unknown loss type: dapo (#4254 )	2025-10-10 16:37:02 +02:00
Albert Villanova del Moral	f45e86571b	Fix CI ImportError for 'require_torch_gpu_if_bnb_not_multi_backend_enabled' (#4253 )	2025-10-10 16:13:22 +02:00
Albert Villanova del Moral	f853e091ea	Fix CI CUDA out of memory errors by improving GPU memory management (#4238 )	2025-10-10 09:49:45 +02:00
Albert Villanova del Moral	3dd7fc2850	Fix CI IndentationError for Python 3.13.8 (#4240 )	2025-10-09 15:46:41 +02:00
Albert Villanova del Moral	a944890ff1	Fix callable annotations (#4216 )	2025-10-08 21:21:21 +02:00
Albert Villanova del Moral	521db3520a	Fix CI unittest asserts (#4234 )	2025-10-08 21:18:41 +02:00
Quentin Gallouédec	d1d0407d3c	🏷️ Account for `token_type_ids` in `DataCollatorForVisionLanguageModeling` (#4190 )	2025-10-08 09:34:48 -06:00
Pramodith Ballapuram	f15399d3d3	Fix entropy and accuracy calculation for prompt_tuning techniques. (#4196 )	2025-10-08 09:42:19 +01:00
Quentin Gallouédec	cc578b6b14	🧺 [2/N] Refactor `_generate` in GRPO/RLOO: Use `prompt_ids` from generation (#4152 )	2025-10-07 12:11:34 -06:00
Quentin Gallouédec	30cf68a97b	🎨 Support mixing image+text and text-only examples (#4203 ) Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>	2025-10-07 10:21:10 -06:00
Quentin Gallouédec	8265800abf	Fix `trl-internal-testing/tiny-DbrxForCausalLM` (#4213 )	2025-10-06 15:11:16 -06:00
Albert Villanova del Moral	45ee98b05e	Replace unittest with pytest (#4188 )	2025-10-06 11:14:54 +02:00
Albert Villanova del Moral	1cbfb00b6a	Replace remaining trainer.tokenizer with trainer.processing_class in GRPO test (#4192 )	2025-10-03 09:08:53 +02:00
Albert Villanova del Moral	d1b4691900	Fix CI ImportError: FlashAttention2 and decorator order for all parameterized tests (#4176 )	2025-10-01 18:01:56 +02:00
Quentin Gallouédec	39c603872f	🔣 Fix test: replace `trainer.tokenizer` by `trainer.processing_class` (#4185 )	2025-10-01 09:16:42 -06:00
Albert Villanova del Moral	5a4021f23e	Fix handling of f_divergence_type in DPO (#4171 ) Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-10-01 09:44:14 +02:00
Quentin Gallouédec	ea66a9e650	🧺 [1/N] Refactor `_generate` in GRPO/RLOO: list of ints instead of tensors (#4146 ) Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>	2025-09-30 16:22:30 -06:00
Quentin Gallouédec	da209f89fc	🎁 `RewardTrainer` refactor (#4093 ) Co-authored-by: juejuezi <juejuezi.git@foxmail.com> Co-authored-by: Yi Shi <96773624+singing-cat@users.noreply.github.com> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>	2025-09-30 15:13:45 -06:00
Quentin Gallouédec	ebb8899f5d	⚡ Fix Flash Attention x Padding-Free loss (#4170 )	2025-09-30 12:01:29 -06:00
Quentin Gallouédec	70e2017dbc	🎞️ Support sequence classification models in `clone_chat_template` (#4097 )	2025-09-30 11:42:56 -06:00
Quentin Gallouédec	4368f54c97	👾 Use our own `require_bitsandbytes` (#4137 )	2025-09-30 11:11:29 -06:00
Albert Villanova del Moral	a7b54f988b	Fix CI ValueError: Unknown loss type: dapo (#4173 )	2025-09-30 18:27:21 +02:00
Quentin Gallouédec	3b9ac65a05	🖨️ Print rich table for messages (#4160 )	2025-09-30 09:07:57 -06:00
Albert Villanova del Moral	6428647063	Remove unnecessary list comprehensions (#4164 )	2025-09-29 20:02:46 +02:00
Quentin Gallouédec	d633c4337f	Fix import statement and GRPO test case (#4141 )	2025-09-24 16:23:32 -06:00
Pramodith Ballapuram	d1e24df031	[GRPO]: Sample from a Replay Buffer To Substitute Groups with 0 std. (#4060 ) Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-09-24 21:12:16 +01:00
Quentin Gallouédec	094e0760d4	🌵 Mark GKD trainer test as expected failure due to OOM issue (#4126 )	2025-09-24 12:26:44 -06:00
Quentin Gallouédec	01c9b4c414	🤸‍♀️ Fix DFT test (#4135 )	2025-09-24 12:25:56 -06:00
jinghanhu	d144e73e78	🪙 [Experimental] Support GSPO-token (#3820 ) Co-authored-by: LeonEricsson <70749762+LeonEricsson@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>	2025-09-24 09:57:18 -06:00
Quentin Gallouédec	be1ffe59d2	🌺 Fix GPT-OSS test (#4134 )	2025-09-24 09:07:48 -06:00
Pramodith Ballapuram	526303edbd	[SFTrainer]: Fix DFT Loss (#4112 ) Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>	2025-09-24 11:46:12 +01:00
Samuel Barry	9e5e60c933	👩‍🦯 Fix usage of VLM using text only (#4080 ) Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-09-23 12:07:25 -06:00
Quentin Gallouédec	68408d7219	📽 Multi image support for GRPO/RLOO (#4113 )	2025-09-22 18:17:42 -06:00
Quentin Gallouédec	b5ca3799ad	🟩 Drop `image_split_sizes` in favour of `image_grid_thw` (#4111 )	2025-09-22 16:38:39 -06:00
Albert Villanova del Moral	a68b4af50f	Fix code style with make precommit (#4119 )	2025-09-22 13:19:54 -06:00
Albert Villanova del Moral	9f0ed8b130	CI hotfix: xfail test_training_with_transformers_paged for transformers<4.57.0 (#4120 )	2025-09-22 13:19:30 -06:00

1 2 3 4 5 ...

528 Commits