|
e0eec055b4
|
🧺 [4/N] Refactor _generate in GRPO/RLOO: Move forward_kwargs outside generation method (#4154)
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Co-authored-by: YonatanGideoni <yonatan.gideoni@gmail.com>
Co-authored-by: burtenshaw <ben.burtenshaw@gmail.com>
Co-authored-by: sergiopaniego <sergiopaniegoblanco@gmail.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
|
2025-10-17 15:36:13 -06:00 |
|
|
f4c554da22
|
Update links to docs in README to latest packaged version (#4084)
|
2025-10-17 08:06:40 -06:00 |
|
|
a932e2796d
|
⬆️ Bump dev version (#4293)
|
2025-10-15 18:11:52 -06:00 |
|
|
04fd1203af
|
Release: v0.24 (#4292)
v0.24.0
|
2025-10-15 18:10:10 -06:00 |
|
|
19d2f97932
|
Deprecate BestOfNSampler (#4291)
Co-authored-by: behroozazarkhalili <ermiaazarkhalili>
Co-authored-by: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
|
2025-10-15 18:06:34 -06:00 |
|
|
31caf64778
|
Remove unused commands directory (#4258)
Co-authored-by: behroozazarkhalili <ermiaazarkhalili>
|
2025-10-15 18:01:50 -06:00 |
|
|
8e2d5516ca
|
Add accuracy reward (#4270)
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
|
2025-10-15 18:01:07 -06:00 |
|
|
94aac4a101
|
Remove how_to_train.md: outdated training FAQ (#4267)
Co-authored-by: behroozazarkhalili <ermiaazarkhalili>
|
2025-10-15 23:49:04 +00:00 |
|
|
26b7c2507e
|
Add support for token_type_ids in DPOTrainer (#4285)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
|
2025-10-15 17:33:35 -06:00 |
|
|
aa25c2697c
|
Remove using_llama_models.md: outdated Llama2-specific documentation (#4268)
Co-authored-by: behroozazarkhalili <ermiaazarkhalili>
|
2025-10-15 14:13:27 -07:00 |
|
|
93c7d88563
|
Remove logging.md: trainer-specific metrics documentation (#4269)
Co-authored-by: behroozazarkhalili <ermiaazarkhalili>
|
2025-10-15 14:12:32 -07:00 |
|
|
c7c041ecc8
|
Fix CI slow tests: ImportError: vLLM is not installed (#4287)
|
2025-10-15 18:15:36 +02:00 |
|
|
ef40c047aa
|
Replace unittest skipTest with pytest.skip (#4263)
|
2025-10-15 18:15:28 +02:00 |
|
|
7e0adbc552
|
Fix CI dev test TypeError: unexpected keyword argument 'load_in_4bit' (#4262)
|
2025-10-15 18:14:49 +02:00 |
|
|
773afd9314
|
💰 RichProgressCallback enhancement (#4245)
|
2025-10-15 09:39:17 -06:00 |
|
|
966b397201
|
Fix CI slow test OSError: You are trying to access a gated repo (#4283)
|
2025-10-15 16:11:11 +02:00 |
|
|
927cf6ba46
|
Fix docstrings with Sphinx 'deprecated' directive (#4279)
|
2025-10-15 10:39:12 +02:00 |
|
|
56cb6ccf76
|
Fix typo in Colab link (#4276)
|
2025-10-14 18:51:17 +02:00 |
|
|
49c8f14b06
|
Add Qwen3-VL notebooks (SFT, GRPO) (#4275)
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
2025-10-14 18:45:01 +02:00 |
|
|
cefbacb30e
|
Fix style with make precommit (#4265)
|
2025-10-14 12:13:15 +02:00 |
|
|
fae245a062
|
Use FutureWarning instead of DeprecationWarning (#4266)
|
2025-10-14 12:12:03 +02:00 |
|
|
2aa9506c69
|
Fix docstring interlinks (#4221)
|
2025-10-13 13:40:24 +02:00 |
|
|
d6eeb290d9
|
Raise deprecation warning for Python 3.9 (#4226)
|
2025-10-13 11:06:09 +02:00 |
|
|
1684ef279a
|
Fix Python version check for skipping tests on Python 3.13.8 (#4246)
|
2025-10-10 17:41:24 +02:00 |
|
|
aab21eb5e7
|
Include chat_template_kwargs in apply_chat_template (#4233)
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
|
2025-10-10 10:39:29 -05:00 |
|
|
b997a31981
|
[Online-DPO] fix the completion_len == max_new_tokens crash (#4193)
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
|
2025-10-10 17:21:01 +02:00 |
|
|
86d1963cc1
|
Fix CI slow test AttributeError: 'TestSFTTrainerSlow' object has no attribute 'addCleanup' (#4255)
|
2025-10-10 17:19:53 +02:00 |
|
|
039d526d24
|
Deprecate unused dataset_formatting module (#4242)
Co-authored-by: behroozazarkhalili <ermiaazarkhalili>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
|
2025-10-10 10:16:18 -05:00 |
|
|
bcd059a384
|
Remove obsolete research_projects directory (#4243)
Co-authored-by: behroozazarkhalili <ermiaazarkhalili>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
|
2025-10-10 10:15:47 -05:00 |
|
|
0e57b4a9df
|
🧺 [3/N] Refactor _generate in GRPO/RLOO: Rely on generator for prompt truncation (#4153)
|
2025-10-10 10:02:11 -05:00 |
|
|
98488e0946
|
Fix CI slow test ValueError: Unknown loss type: dapo (#4254)
|
2025-10-10 16:37:02 +02:00 |
|
|
f45e86571b
|
Fix CI ImportError for 'require_torch_gpu_if_bnb_not_multi_backend_enabled' (#4253)
|
2025-10-10 16:13:22 +02:00 |
|
|
f5827928a0
|
Install peft from main for CI tests with dev dependencies (#4250)
|
2025-10-10 16:12:15 +02:00 |
|
|
f853e091ea
|
Fix CI CUDA out of memory errors by improving GPU memory management (#4238)
|
2025-10-10 09:49:45 +02:00 |
|
|
803ec0d856
|
Fix CI slow test ValueError: Backward pass should have cleared tracker of all tensors (#4236)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
|
2025-10-10 09:28:34 +02:00 |
|
|
7a0a615d50
|
Warnings pointing to RFC (#4224)
|
2025-10-09 17:05:36 -06:00 |
|
|
c38cb69ec7
|
🧘 Enhance markdown style (#4235)
|
2025-10-09 13:49:44 -05:00 |
|
|
68ef15c686
|
Remove unused log_example_reports.py script (#4241)
Co-authored-by: behroozazarkhalili <ermiaazarkhalili>
|
2025-10-09 09:18:48 -07:00 |
|
|
3dd7fc2850
|
Fix CI IndentationError for Python 3.13.8 (#4240)
|
2025-10-09 15:46:41 +02:00 |
|
|
51ced65153
|
Replace setup with pyproject in CI tests paths (#4230)
|
2025-10-09 09:35:08 +02:00 |
|
|
4bb883a6e6
|
Update CI Docker image to pytorch/pytorch:2.8.0 (#4232)
|
2025-10-09 08:09:15 +02:00 |
|
|
f7846321e7
|
Remove unused Path import in __init__.py (#4227)
|
2025-10-08 21:30:54 +02:00 |
|
|
a944890ff1
|
Fix callable annotations (#4216)
|
2025-10-08 21:21:21 +02:00 |
|
|
521db3520a
|
Fix CI unittest asserts (#4234)
|
2025-10-08 21:18:41 +02:00 |
|
|
e2c97a805a
|
Exclude vllm dependencies from dev extra (#4229)
|
2025-10-08 18:14:23 +02:00 |
|
|
d1d0407d3c
|
🏷️ Account for token_type_ids in DataCollatorForVisionLanguageModeling (#4190)
|
2025-10-08 09:34:48 -06:00 |
|
|
824ff8c73e
|
Add Efficient Online Training with GRPO and vLLM in TRL to community tutorials (#4219)
|
2025-10-08 12:59:04 +02:00 |
|
|
f15399d3d3
|
Fix entropy and accuracy calculation for prompt_tuning techniques. (#4196)
|
2025-10-08 09:42:19 +01:00 |
|
|
cc578b6b14
|
🧺 [2/N] Refactor _generate in GRPO/RLOO: Use prompt_ids from generation (#4152)
|
2025-10-07 12:11:34 -06:00 |
|
|
30cf68a97b
|
🎨 Support mixing image+text and text-only examples (#4203)
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
|
2025-10-07 10:21:10 -06:00 |
|