528 Commits

Author SHA1 Message Date
2f1802bc6e Fix missing CI slow tests: ImportError: vLLM is not installed (#4304) 2025-10-20 08:03:48 +02:00
8e2d5516ca Add accuracy reward (#4270)
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
2025-10-15 18:01:07 -06:00
26b7c2507e Add support for token_type_ids in DPOTrainer (#4285)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
2025-10-15 17:33:35 -06:00
c7c041ecc8 Fix CI slow tests: ImportError: vLLM is not installed (#4287) 2025-10-15 18:15:36 +02:00
ef40c047aa Replace unittest skipTest with pytest.skip (#4263) 2025-10-15 18:15:28 +02:00
7e0adbc552 Fix CI dev test TypeError: unexpected keyword argument 'load_in_4bit' (#4262) 2025-10-15 18:14:49 +02:00
773afd9314 💰 RichProgressCallback enhancement (#4245) 2025-10-15 09:39:17 -06:00
966b397201 Fix CI slow test OSError: You are trying to access a gated repo (#4283) 2025-10-15 16:11:11 +02:00
cefbacb30e Fix style with make precommit (#4265) 2025-10-14 12:13:15 +02:00
1684ef279a Fix Python version check for skipping tests on Python 3.13.8 (#4246) 2025-10-10 17:41:24 +02:00
aab21eb5e7 Include chat_template_kwargs in apply_chat_template (#4233)
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
2025-10-10 10:39:29 -05:00
86d1963cc1 Fix CI slow test AttributeError: 'TestSFTTrainerSlow' object has no attribute 'addCleanup' (#4255) 2025-10-10 17:19:53 +02:00
039d526d24 Deprecate unused dataset_formatting module (#4242)
Co-authored-by: behroozazarkhalili <ermiaazarkhalili>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
2025-10-10 10:16:18 -05:00
0e57b4a9df 🧺 [3/N] Refactor _generate in GRPO/RLOO: Rely on generator for prompt truncation (#4153) 2025-10-10 10:02:11 -05:00
98488e0946 Fix CI slow test ValueError: Unknown loss type: dapo (#4254) 2025-10-10 16:37:02 +02:00
f45e86571b Fix CI ImportError for 'require_torch_gpu_if_bnb_not_multi_backend_enabled' (#4253) 2025-10-10 16:13:22 +02:00
f853e091ea Fix CI CUDA out of memory errors by improving GPU memory management (#4238) 2025-10-10 09:49:45 +02:00
3dd7fc2850 Fix CI IndentationError for Python 3.13.8 (#4240) 2025-10-09 15:46:41 +02:00
a944890ff1 Fix callable annotations (#4216) 2025-10-08 21:21:21 +02:00
521db3520a Fix CI unittest asserts (#4234) 2025-10-08 21:18:41 +02:00
d1d0407d3c 🏷️ Account for token_type_ids in DataCollatorForVisionLanguageModeling (#4190) 2025-10-08 09:34:48 -06:00
f15399d3d3 Fix entropy and accuracy calculation for prompt_tuning techniques. (#4196) 2025-10-08 09:42:19 +01:00
cc578b6b14 🧺 [2/N] Refactor _generate in GRPO/RLOO: Use prompt_ids from generation (#4152) 2025-10-07 12:11:34 -06:00
30cf68a97b 🎨 Support mixing image+text and text-only examples (#4203)
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
2025-10-07 10:21:10 -06:00
8265800abf Fix trl-internal-testing/tiny-DbrxForCausalLM (#4213) 2025-10-06 15:11:16 -06:00
45ee98b05e Replace unittest with pytest (#4188) 2025-10-06 11:14:54 +02:00
1cbfb00b6a Replace remaining trainer.tokenizer with trainer.processing_class in GRPO test (#4192) 2025-10-03 09:08:53 +02:00
d1b4691900 Fix CI ImportError: FlashAttention2 and decorator order for all parameterized tests (#4176) 2025-10-01 18:01:56 +02:00
39c603872f 🔣 Fix test: replace trainer.tokenizer by trainer.processing_class (#4185) 2025-10-01 09:16:42 -06:00
5a4021f23e Fix handling of f_divergence_type in DPO (#4171)
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-10-01 09:44:14 +02:00
ea66a9e650 🧺 [1/N] Refactor _generate in GRPO/RLOO: list of ints instead of tensors (#4146)
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
2025-09-30 16:22:30 -06:00
da209f89fc 🎁 RewardTrainer refactor (#4093)
Co-authored-by: juejuezi <juejuezi.git@foxmail.com>
Co-authored-by: Yi Shi <96773624+singing-cat@users.noreply.github.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2025-09-30 15:13:45 -06:00
ebb8899f5d Fix Flash Attention x Padding-Free loss (#4170) 2025-09-30 12:01:29 -06:00
70e2017dbc 🎞️ Support sequence classification models in clone_chat_template (#4097) 2025-09-30 11:42:56 -06:00
4368f54c97 👾 Use our own require_bitsandbytes (#4137) 2025-09-30 11:11:29 -06:00
a7b54f988b Fix CI ValueError: Unknown loss type: dapo (#4173) 2025-09-30 18:27:21 +02:00
3b9ac65a05 🖨️ Print rich table for messages (#4160) 2025-09-30 09:07:57 -06:00
6428647063 Remove unnecessary list comprehensions (#4164) 2025-09-29 20:02:46 +02:00
d633c4337f Fix import statement and GRPO test case (#4141) 2025-09-24 16:23:32 -06:00
d1e24df031 [GRPO]: Sample from a Replay Buffer To Substitute Groups with 0 std. (#4060)
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-09-24 21:12:16 +01:00
094e0760d4 🌵 Mark GKD trainer test as expected failure due to OOM issue (#4126) 2025-09-24 12:26:44 -06:00
01c9b4c414 🤸‍♀️ Fix DFT test (#4135) 2025-09-24 12:25:56 -06:00
d144e73e78 🪙 [Experimental] Support GSPO-token (#3820)
Co-authored-by: LeonEricsson <70749762+LeonEricsson@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
2025-09-24 09:57:18 -06:00
be1ffe59d2 🌺 Fix GPT-OSS test (#4134) 2025-09-24 09:07:48 -06:00
526303edbd [SFTrainer]: Fix DFT Loss (#4112)
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
2025-09-24 11:46:12 +01:00
9e5e60c933 👩‍🦯 Fix usage of VLM using text only (#4080)
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-09-23 12:07:25 -06:00
68408d7219 📽 Multi image support for GRPO/RLOO (#4113) 2025-09-22 18:17:42 -06:00
b5ca3799ad 🟩 Drop image_split_sizes in favour of image_grid_thw (#4111) 2025-09-22 16:38:39 -06:00
a68b4af50f Fix code style with make precommit (#4119) 2025-09-22 13:19:54 -06:00
9f0ed8b130 CI hotfix: xfail test_training_with_transformers_paged for transformers<4.57.0 (#4120) 2025-09-22 13:19:30 -06:00