|
2f1802bc6e
|
Fix missing CI slow tests: ImportError: vLLM is not installed (#4304)
|
2025-10-20 08:03:48 +02:00 |
|
|
8e2d5516ca
|
Add accuracy reward (#4270)
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
|
2025-10-15 18:01:07 -06:00 |
|
|
26b7c2507e
|
Add support for token_type_ids in DPOTrainer (#4285)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
|
2025-10-15 17:33:35 -06:00 |
|
|
c7c041ecc8
|
Fix CI slow tests: ImportError: vLLM is not installed (#4287)
|
2025-10-15 18:15:36 +02:00 |
|
|
ef40c047aa
|
Replace unittest skipTest with pytest.skip (#4263)
|
2025-10-15 18:15:28 +02:00 |
|
|
7e0adbc552
|
Fix CI dev test TypeError: unexpected keyword argument 'load_in_4bit' (#4262)
|
2025-10-15 18:14:49 +02:00 |
|
|
773afd9314
|
💰 RichProgressCallback enhancement (#4245)
|
2025-10-15 09:39:17 -06:00 |
|
|
966b397201
|
Fix CI slow test OSError: You are trying to access a gated repo (#4283)
|
2025-10-15 16:11:11 +02:00 |
|
|
cefbacb30e
|
Fix style with make precommit (#4265)
|
2025-10-14 12:13:15 +02:00 |
|
|
1684ef279a
|
Fix Python version check for skipping tests on Python 3.13.8 (#4246)
|
2025-10-10 17:41:24 +02:00 |
|
|
aab21eb5e7
|
Include chat_template_kwargs in apply_chat_template (#4233)
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
|
2025-10-10 10:39:29 -05:00 |
|
|
86d1963cc1
|
Fix CI slow test AttributeError: 'TestSFTTrainerSlow' object has no attribute 'addCleanup' (#4255)
|
2025-10-10 17:19:53 +02:00 |
|
|
039d526d24
|
Deprecate unused dataset_formatting module (#4242)
Co-authored-by: behroozazarkhalili <ermiaazarkhalili>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
|
2025-10-10 10:16:18 -05:00 |
|
|
0e57b4a9df
|
🧺 [3/N] Refactor _generate in GRPO/RLOO: Rely on generator for prompt truncation (#4153)
|
2025-10-10 10:02:11 -05:00 |
|
|
98488e0946
|
Fix CI slow test ValueError: Unknown loss type: dapo (#4254)
|
2025-10-10 16:37:02 +02:00 |
|
|
f45e86571b
|
Fix CI ImportError for 'require_torch_gpu_if_bnb_not_multi_backend_enabled' (#4253)
|
2025-10-10 16:13:22 +02:00 |
|
|
f853e091ea
|
Fix CI CUDA out of memory errors by improving GPU memory management (#4238)
|
2025-10-10 09:49:45 +02:00 |
|
|
3dd7fc2850
|
Fix CI IndentationError for Python 3.13.8 (#4240)
|
2025-10-09 15:46:41 +02:00 |
|
|
a944890ff1
|
Fix callable annotations (#4216)
|
2025-10-08 21:21:21 +02:00 |
|
|
521db3520a
|
Fix CI unittest asserts (#4234)
|
2025-10-08 21:18:41 +02:00 |
|
|
d1d0407d3c
|
🏷️ Account for token_type_ids in DataCollatorForVisionLanguageModeling (#4190)
|
2025-10-08 09:34:48 -06:00 |
|
|
f15399d3d3
|
Fix entropy and accuracy calculation for prompt_tuning techniques. (#4196)
|
2025-10-08 09:42:19 +01:00 |
|
|
cc578b6b14
|
🧺 [2/N] Refactor _generate in GRPO/RLOO: Use prompt_ids from generation (#4152)
|
2025-10-07 12:11:34 -06:00 |
|
|
30cf68a97b
|
🎨 Support mixing image+text and text-only examples (#4203)
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
|
2025-10-07 10:21:10 -06:00 |
|
|
8265800abf
|
Fix trl-internal-testing/tiny-DbrxForCausalLM (#4213)
|
2025-10-06 15:11:16 -06:00 |
|
|
45ee98b05e
|
Replace unittest with pytest (#4188)
|
2025-10-06 11:14:54 +02:00 |
|
|
1cbfb00b6a
|
Replace remaining trainer.tokenizer with trainer.processing_class in GRPO test (#4192)
|
2025-10-03 09:08:53 +02:00 |
|
|
d1b4691900
|
Fix CI ImportError: FlashAttention2 and decorator order for all parameterized tests (#4176)
|
2025-10-01 18:01:56 +02:00 |
|
|
39c603872f
|
🔣 Fix test: replace trainer.tokenizer by trainer.processing_class (#4185)
|
2025-10-01 09:16:42 -06:00 |
|
|
5a4021f23e
|
Fix handling of f_divergence_type in DPO (#4171)
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
|
2025-10-01 09:44:14 +02:00 |
|
|
ea66a9e650
|
🧺 [1/N] Refactor _generate in GRPO/RLOO: list of ints instead of tensors (#4146)
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
|
2025-09-30 16:22:30 -06:00 |
|
|
da209f89fc
|
🎁 RewardTrainer refactor (#4093)
Co-authored-by: juejuezi <juejuezi.git@foxmail.com>
Co-authored-by: Yi Shi <96773624+singing-cat@users.noreply.github.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
|
2025-09-30 15:13:45 -06:00 |
|
|
ebb8899f5d
|
⚡ Fix Flash Attention x Padding-Free loss (#4170)
|
2025-09-30 12:01:29 -06:00 |
|
|
70e2017dbc
|
🎞️ Support sequence classification models in clone_chat_template (#4097)
|
2025-09-30 11:42:56 -06:00 |
|
|
4368f54c97
|
👾 Use our own require_bitsandbytes (#4137)
|
2025-09-30 11:11:29 -06:00 |
|
|
a7b54f988b
|
Fix CI ValueError: Unknown loss type: dapo (#4173)
|
2025-09-30 18:27:21 +02:00 |
|
|
3b9ac65a05
|
🖨️ Print rich table for messages (#4160)
|
2025-09-30 09:07:57 -06:00 |
|
|
6428647063
|
Remove unnecessary list comprehensions (#4164)
|
2025-09-29 20:02:46 +02:00 |
|
|
d633c4337f
|
Fix import statement and GRPO test case (#4141)
|
2025-09-24 16:23:32 -06:00 |
|
|
d1e24df031
|
[GRPO]: Sample from a Replay Buffer To Substitute Groups with 0 std. (#4060)
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
|
2025-09-24 21:12:16 +01:00 |
|
|
094e0760d4
|
🌵 Mark GKD trainer test as expected failure due to OOM issue (#4126)
|
2025-09-24 12:26:44 -06:00 |
|
|
01c9b4c414
|
🤸♀️ Fix DFT test (#4135)
|
2025-09-24 12:25:56 -06:00 |
|
|
d144e73e78
|
🪙 [Experimental] Support GSPO-token (#3820)
Co-authored-by: LeonEricsson <70749762+LeonEricsson@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
|
2025-09-24 09:57:18 -06:00 |
|
|
be1ffe59d2
|
🌺 Fix GPT-OSS test (#4134)
|
2025-09-24 09:07:48 -06:00 |
|
|
526303edbd
|
[SFTrainer]: Fix DFT Loss (#4112)
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
|
2025-09-24 11:46:12 +01:00 |
|
|
9e5e60c933
|
👩🦯 Fix usage of VLM using text only (#4080)
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
|
2025-09-23 12:07:25 -06:00 |
|
|
68408d7219
|
📽 Multi image support for GRPO/RLOO (#4113)
|
2025-09-22 18:17:42 -06:00 |
|
|
b5ca3799ad
|
🟩 Drop image_split_sizes in favour of image_grid_thw (#4111)
|
2025-09-22 16:38:39 -06:00 |
|
|
a68b4af50f
|
Fix code style with make precommit (#4119)
|
2025-09-22 13:19:54 -06:00 |
|
|
9f0ed8b130
|
CI hotfix: xfail test_training_with_transformers_paged for transformers<4.57.0 (#4120)
|
2025-09-22 13:19:30 -06:00 |
|