frozenleaves/trl - trl - Gitea: Git for Me

mirror of https://github.com/huggingface/trl.git synced 2025-10-20 10:03:51 +08:00

Author	SHA1	Message	Date
Quentin Gallouédec	e0eec055b4	🧺 [4/N] Refactor `_generate` in GRPO/RLOO: Move `forward_kwargs` outside generation method (#4154 ) Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Co-authored-by: YonatanGideoni <yonatan.gideoni@gmail.com> Co-authored-by: burtenshaw <ben.burtenshaw@gmail.com> Co-authored-by: sergiopaniego <sergiopaniegoblanco@gmail.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>	2025-10-17 15:36:13 -06:00
Sergio Paniego Blanco	f4c554da22	Update links to docs in README to latest packaged version (#4084 )	2025-10-17 08:06:40 -06:00
Quentin Gallouédec	a932e2796d	⬆️ Bump dev version (#4293 )	2025-10-15 18:11:52 -06:00
Quentin Gallouédec	04fd1203af	Release: v0.24 (#4292 ) v0.24.0	2025-10-15 18:10:10 -06:00
Quentin Gallouédec	19d2f97932	Deprecate `BestOfNSampler` (#4291 ) Co-authored-by: behroozazarkhalili <ermiaazarkhalili> Co-authored-by: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>	2025-10-15 18:06:34 -06:00
Behrooz Azarkhalili	31caf64778	Remove unused commands directory (#4258 ) Co-authored-by: behroozazarkhalili <ermiaazarkhalili>	2025-10-15 18:01:50 -06:00
Pramodith Ballapuram	8e2d5516ca	Add accuracy reward (#4270 ) Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>	2025-10-15 18:01:07 -06:00
Behrooz Azarkhalili	94aac4a101	Remove how_to_train.md: outdated training FAQ (#4267 ) Co-authored-by: behroozazarkhalili <ermiaazarkhalili>	2025-10-15 23:49:04 +00:00
Alexander Weers	26b7c2507e	Add support for `token_type_ids` in `DPOTrainer` (#4285 ) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>	2025-10-15 17:33:35 -06:00
Behrooz Azarkhalili	aa25c2697c	Remove using_llama_models.md: outdated Llama2-specific documentation (#4268 ) Co-authored-by: behroozazarkhalili <ermiaazarkhalili>	2025-10-15 14:13:27 -07:00
Behrooz Azarkhalili	93c7d88563	Remove logging.md: trainer-specific metrics documentation (#4269 ) Co-authored-by: behroozazarkhalili <ermiaazarkhalili>	2025-10-15 14:12:32 -07:00
Albert Villanova del Moral	c7c041ecc8	Fix CI slow tests: ImportError: vLLM is not installed (#4287 )	2025-10-15 18:15:36 +02:00
Albert Villanova del Moral	ef40c047aa	Replace unittest skipTest with pytest.skip (#4263 )	2025-10-15 18:15:28 +02:00
Albert Villanova del Moral	7e0adbc552	Fix CI dev test TypeError: unexpected keyword argument 'load_in_4bit' (#4262 )	2025-10-15 18:14:49 +02:00
Quentin Gallouédec	773afd9314	💰 `RichProgressCallback` enhancement (#4245 )	2025-10-15 09:39:17 -06:00
Albert Villanova del Moral	966b397201	Fix CI slow test OSError: You are trying to access a gated repo (#4283 )	2025-10-15 16:11:11 +02:00
Albert Villanova del Moral	927cf6ba46	Fix docstrings with Sphinx 'deprecated' directive (#4279 )	2025-10-15 10:39:12 +02:00
Sergio Paniego Blanco	56cb6ccf76	Fix typo in Colab link (#4276 )	2025-10-14 18:51:17 +02:00
Sergio Paniego Blanco	49c8f14b06	Add Qwen3-VL notebooks (SFT, GRPO) (#4275 ) Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-10-14 18:45:01 +02:00
Albert Villanova del Moral	cefbacb30e	Fix style with make precommit (#4265 )	2025-10-14 12:13:15 +02:00
Albert Villanova del Moral	fae245a062	Use FutureWarning instead of DeprecationWarning (#4266 )	2025-10-14 12:12:03 +02:00
Albert Villanova del Moral	2aa9506c69	Fix docstring interlinks (#4221 )	2025-10-13 13:40:24 +02:00
Albert Villanova del Moral	d6eeb290d9	Raise deprecation warning for Python 3.9 (#4226 )	2025-10-13 11:06:09 +02:00
Albert Villanova del Moral	1684ef279a	Fix Python version check for skipping tests on Python 3.13.8 (#4246 )	2025-10-10 17:41:24 +02:00
Carlos Miguel Patiño	aab21eb5e7	Include `chat_template_kwargs` in `apply_chat_template` (#4233 ) Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>	2025-10-10 10:39:29 -05:00
Kashif Rasul	b997a31981	[Online-DPO] fix the completion_len == max_new_tokens crash (#4193 ) Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-10-10 17:21:01 +02:00
Albert Villanova del Moral	86d1963cc1	Fix CI slow test AttributeError: 'TestSFTTrainerSlow' object has no attribute 'addCleanup' (#4255 )	2025-10-10 17:19:53 +02:00
Behrooz Azarkhalili	039d526d24	Deprecate unused dataset_formatting module (#4242 ) Co-authored-by: behroozazarkhalili <ermiaazarkhalili> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>	2025-10-10 10:16:18 -05:00
Behrooz Azarkhalili	bcd059a384	Remove obsolete research_projects directory (#4243 ) Co-authored-by: behroozazarkhalili <ermiaazarkhalili> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>	2025-10-10 10:15:47 -05:00
Quentin Gallouédec	0e57b4a9df	🧺 [3/N] Refactor `_generate` in GRPO/RLOO: Rely on generator for prompt truncation (#4153 )	2025-10-10 10:02:11 -05:00
Albert Villanova del Moral	98488e0946	Fix CI slow test ValueError: Unknown loss type: dapo (#4254 )	2025-10-10 16:37:02 +02:00
Albert Villanova del Moral	f45e86571b	Fix CI ImportError for 'require_torch_gpu_if_bnb_not_multi_backend_enabled' (#4253 )	2025-10-10 16:13:22 +02:00
Albert Villanova del Moral	f5827928a0	Install peft from main for CI tests with dev dependencies (#4250 )	2025-10-10 16:12:15 +02:00
Albert Villanova del Moral	f853e091ea	Fix CI CUDA out of memory errors by improving GPU memory management (#4238 )	2025-10-10 09:49:45 +02:00
Wang, Yi	803ec0d856	Fix CI slow test ValueError: Backward pass should have cleared tracker of all tensors (#4236 ) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>	2025-10-10 09:28:34 +02:00
Quentin Gallouédec	7a0a615d50	Warnings pointing to RFC (#4224 )	2025-10-09 17:05:36 -06:00
Quentin Gallouédec	c38cb69ec7	🧘 Enhance markdown style (#4235 )	2025-10-09 13:49:44 -05:00
Behrooz Azarkhalili	68ef15c686	Remove unused log_example_reports.py script (#4241 ) Co-authored-by: behroozazarkhalili <ermiaazarkhalili>	2025-10-09 09:18:48 -07:00
Albert Villanova del Moral	3dd7fc2850	Fix CI IndentationError for Python 3.13.8 (#4240 )	2025-10-09 15:46:41 +02:00
Albert Villanova del Moral	51ced65153	Replace setup with pyproject in CI tests paths (#4230 )	2025-10-09 09:35:08 +02:00
Albert Villanova del Moral	4bb883a6e6	Update CI Docker image to pytorch/pytorch:2.8.0 (#4232 )	2025-10-09 08:09:15 +02:00
Albert Villanova del Moral	f7846321e7	Remove unused Path import in __init__.py (#4227 )	2025-10-08 21:30:54 +02:00
Albert Villanova del Moral	a944890ff1	Fix callable annotations (#4216 )	2025-10-08 21:21:21 +02:00
Albert Villanova del Moral	521db3520a	Fix CI unittest asserts (#4234 )	2025-10-08 21:18:41 +02:00
Albert Villanova del Moral	e2c97a805a	Exclude vllm dependencies from dev extra (#4229 )	2025-10-08 18:14:23 +02:00
Quentin Gallouédec	d1d0407d3c	🏷️ Account for `token_type_ids` in `DataCollatorForVisionLanguageModeling` (#4190 )	2025-10-08 09:34:48 -06:00
Sergio Paniego Blanco	824ff8c73e	Add Efficient Online Training with GRPO and vLLM in TRL to community tutorials (#4219 )	2025-10-08 12:59:04 +02:00
Pramodith Ballapuram	f15399d3d3	Fix entropy and accuracy calculation for prompt_tuning techniques. (#4196 )	2025-10-08 09:42:19 +01:00
Quentin Gallouédec	cc578b6b14	🧺 [2/N] Refactor `_generate` in GRPO/RLOO: Use `prompt_ids` from generation (#4152 )	2025-10-07 12:11:34 -06:00
Quentin Gallouédec	30cf68a97b	🎨 Support mixing image+text and text-only examples (#4203 ) Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>	2025-10-07 10:21:10 -06:00

1 2 3 4 5 ...

1914 Commits