frozenleaves/trl - trl - Gitea: Git for Me

mirror of https://github.com/huggingface/trl.git synced 2025-10-20 18:43:52 +08:00

Author	SHA1	Message	Date
Sergio Paniego Blanco	49c8f14b06	Add Qwen3-VL notebooks (SFT, GRPO) (#4275 ) Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-10-14 18:45:01 +02:00
Albert Villanova del Moral	cefbacb30e	Fix style with make precommit (#4265 )	2025-10-14 12:13:15 +02:00
Albert Villanova del Moral	fae245a062	Use FutureWarning instead of DeprecationWarning (#4266 )	2025-10-14 12:12:03 +02:00
Albert Villanova del Moral	2aa9506c69	Fix docstring interlinks (#4221 )	2025-10-13 13:40:24 +02:00
Albert Villanova del Moral	d6eeb290d9	Raise deprecation warning for Python 3.9 (#4226 )	2025-10-13 11:06:09 +02:00
Albert Villanova del Moral	1684ef279a	Fix Python version check for skipping tests on Python 3.13.8 (#4246 )	2025-10-10 17:41:24 +02:00
Carlos Miguel Patiño	aab21eb5e7	Include `chat_template_kwargs` in `apply_chat_template` (#4233 ) Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>	2025-10-10 10:39:29 -05:00
Kashif Rasul	b997a31981	[Online-DPO] fix the completion_len == max_new_tokens crash (#4193 ) Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-10-10 17:21:01 +02:00
Albert Villanova del Moral	86d1963cc1	Fix CI slow test AttributeError: 'TestSFTTrainerSlow' object has no attribute 'addCleanup' (#4255 )	2025-10-10 17:19:53 +02:00
Behrooz Azarkhalili	039d526d24	Deprecate unused dataset_formatting module (#4242 ) Co-authored-by: behroozazarkhalili <ermiaazarkhalili> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>	2025-10-10 10:16:18 -05:00
Behrooz Azarkhalili	bcd059a384	Remove obsolete research_projects directory (#4243 ) Co-authored-by: behroozazarkhalili <ermiaazarkhalili> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>	2025-10-10 10:15:47 -05:00
Quentin Gallouédec	0e57b4a9df	🧺 [3/N] Refactor `_generate` in GRPO/RLOO: Rely on generator for prompt truncation (#4153 )	2025-10-10 10:02:11 -05:00
Albert Villanova del Moral	98488e0946	Fix CI slow test ValueError: Unknown loss type: dapo (#4254 )	2025-10-10 16:37:02 +02:00
Albert Villanova del Moral	f45e86571b	Fix CI ImportError for 'require_torch_gpu_if_bnb_not_multi_backend_enabled' (#4253 )	2025-10-10 16:13:22 +02:00
Albert Villanova del Moral	f5827928a0	Install peft from main for CI tests with dev dependencies (#4250 )	2025-10-10 16:12:15 +02:00
Albert Villanova del Moral	f853e091ea	Fix CI CUDA out of memory errors by improving GPU memory management (#4238 )	2025-10-10 09:49:45 +02:00
Wang, Yi	803ec0d856	Fix CI slow test ValueError: Backward pass should have cleared tracker of all tensors (#4236 ) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>	2025-10-10 09:28:34 +02:00
Quentin Gallouédec	7a0a615d50	Warnings pointing to RFC (#4224 )	2025-10-09 17:05:36 -06:00
Quentin Gallouédec	c38cb69ec7	🧘 Enhance markdown style (#4235 )	2025-10-09 13:49:44 -05:00
Behrooz Azarkhalili	68ef15c686	Remove unused log_example_reports.py script (#4241 ) Co-authored-by: behroozazarkhalili <ermiaazarkhalili>	2025-10-09 09:18:48 -07:00
Albert Villanova del Moral	3dd7fc2850	Fix CI IndentationError for Python 3.13.8 (#4240 )	2025-10-09 15:46:41 +02:00
Albert Villanova del Moral	51ced65153	Replace setup with pyproject in CI tests paths (#4230 )	2025-10-09 09:35:08 +02:00
Albert Villanova del Moral	4bb883a6e6	Update CI Docker image to pytorch/pytorch:2.8.0 (#4232 )	2025-10-09 08:09:15 +02:00
Albert Villanova del Moral	f7846321e7	Remove unused Path import in __init__.py (#4227 )	2025-10-08 21:30:54 +02:00
Albert Villanova del Moral	a944890ff1	Fix callable annotations (#4216 )	2025-10-08 21:21:21 +02:00
Albert Villanova del Moral	521db3520a	Fix CI unittest asserts (#4234 )	2025-10-08 21:18:41 +02:00
Albert Villanova del Moral	e2c97a805a	Exclude vllm dependencies from dev extra (#4229 )	2025-10-08 18:14:23 +02:00
Quentin Gallouédec	d1d0407d3c	🏷️ Account for `token_type_ids` in `DataCollatorForVisionLanguageModeling` (#4190 )	2025-10-08 09:34:48 -06:00
Sergio Paniego Blanco	824ff8c73e	Add Efficient Online Training with GRPO and vLLM in TRL to community tutorials (#4219 )	2025-10-08 12:59:04 +02:00
Pramodith Ballapuram	f15399d3d3	Fix entropy and accuracy calculation for prompt_tuning techniques. (#4196 )	2025-10-08 09:42:19 +01:00
Quentin Gallouédec	cc578b6b14	🧺 [2/N] Refactor `_generate` in GRPO/RLOO: Use `prompt_ids` from generation (#4152 )	2025-10-07 12:11:34 -06:00
Quentin Gallouédec	30cf68a97b	🎨 Support mixing image+text and text-only examples (#4203 ) Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>	2025-10-07 10:21:10 -06:00
Sergio Paniego Blanco	452284b8dc	Add trainers taxonomy to docs (#4195 )	2025-10-07 16:06:30 +02:00
burtenshaw	6be53e19bc	[DOCS] fix prose in lora guide (#4217 )	2025-10-07 10:40:37 +02:00
Sergio Paniego Blanco	3080fc1bd7	Fix LoRA params in Python in LoRA without regret (#4215 )	2025-10-07 09:56:04 +02:00
Behrooz Azarkhalili	5d870955f8	Fix prompt-completion labeling with add_generation_prompt and warning (#4201 ) Co-authored-by: behroozazarkhalili <ermiaazarkhalili> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>	2025-10-06 18:35:50 -06:00
Quentin Gallouédec	8265800abf	Fix `trl-internal-testing/tiny-DbrxForCausalLM` (#4213 )	2025-10-06 15:11:16 -06:00
Quentin Gallouédec	65eb45c32b	Apply style and revert change in `sft_video_llm` example (#4214 )	2025-10-06 13:07:18 -06:00
Sergio Paniego Blanco	ae6837f8d4	Removed tokenizer/processor creation from example scripts (#4211 )	2025-10-06 18:40:18 +02:00
Albert Villanova del Moral	56a8f1128b	Replace setup with pyproject and fix packaging unintended modules (#4194 )	2025-10-06 17:45:44 +02:00
Sergio Paniego Blanco	529101537f	Remove `Optional` from `processing_class` in `PPOTrainer` (#4212 )	2025-10-06 16:04:06 +02:00
Sergio Paniego Blanco	0588b1f01d	Updated vLLM integration guide (#4162 ) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-10-06 15:57:17 +02:00
Albert Villanova del Moral	45ee98b05e	Replace unittest with pytest (#4188 )	2025-10-06 11:14:54 +02:00
Albert Villanova del Moral	3800a6ecc7	Hotfix: Exclude transformers 4.57.0 for Python 3.9 (#4209 ) Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>	2025-10-06 11:13:21 +02:00
Sergio Paniego Blanco	7ad9ce8acc	Remove tokenizer creation from `sft` example script (#4197 )	2025-10-06 11:04:20 +02:00
Albert Villanova del Moral	0c2dc14014	Remove custome_container for building the docs (#4198 )	2025-10-06 08:31:58 +02:00
burtenshaw	ced8b337ba	[DOCS/FIX] lora without regrets - fix lr (#4207 )	2025-10-06 08:23:11 +02:00
burtenshaw	1eff7da9e0	[DOCS] Lora without regret (#4181 ) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: sergiopaniego <sergiopaniegoblanco@gmail.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>	2025-10-03 20:40:37 +02:00
Albert Villanova del Moral	1cbfb00b6a	Replace remaining trainer.tokenizer with trainer.processing_class in GRPO test (#4192 )	2025-10-03 09:08:53 +02:00
YonatanGideoni	e086f073cf	🌡️ Have vLLM return processed (temperature scaled) log probs (#4163 ) Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-10-01 11:58:13 -06:00

1 2 3 4 5 ...

1896 Commits