frozenleaves/trl - trl - Gitea: Git for Me

mirror of https://github.com/huggingface/trl.git synced 2025-10-21 11:33:51 +08:00

Author	SHA1	Message	Date
Albert Villanova del Moral	c9484b161f	Align docstring parameters with function definitions (#4017 )	2025-09-07 10:40:09 +02:00
Yao Matrix	1314aac502	ℹ️ Unify autocast behavior to `torch.autocast` and make it cover XPU (#3541 ) Signed-off-by: YAO Matrix <matrix.yao@intel.com> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>	2025-06-10 09:13:00 +02:00
Quentin Gallouédec	9df19e8a75	📜 Fix license and copyrights (#3264 )	2025-04-08 15:22:58 -07:00
Quentin Gallouédec	1d23ecc36f	©️ Update copyrights year (#2547 ) * happy new year * fix wandb import sort	2025-01-07 14:53:09 +01:00
Quentin Gallouédec	8c49ea39ec	🏚 Remove unused components (#2480 )	2024-12-19 19:29:39 +01:00
Quentin Gallouédec	9410874787	©️ Copyrights update (#2454 ) * First changes * Other files * Finally * rm comment * fix nashmd * Fix example * Fix example [ci skip]	2024-12-10 10:40:00 +01:00
Quentin Gallouédec	54f806b6ff	Standardize `dataset_num_proc` usage (#1925 ) * uniform dataset_num_proc * num_proc in shuffle * Update examples/datasets/anthropic_hh.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update examples/scripts/ppo.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update examples/scripts/ppo.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> --------- Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2024-08-13 15:10:39 +02:00
Wang, Yi	3c0a10b1ae	fix dataset load error (#1670 ) Signed-off-by: Wang, Yi <yi.a.wang@intel.com>	2024-05-27 14:52:20 +02:00
yuanwu2017	4219cbfedc	Fix the pad_token_id error (#1394 ) * Fix the pad_token_id error Signed-off-by: yuanwu <yuan.wu@intel.com> * Add the load_in_8bit argument in rl_training.py Signed-off-by: yuanwu <yuan.wu@intel.com> * Reformate the patch Signed-off-by: yuanwu <yuan.wu@intel.com> * Fix the check failed Signed-off-by: yuanwu <yuan.wu@intel.com> --------- Signed-off-by: yuanwu <yuan.wu@intel.com>	2024-03-05 02:18:42 +01:00
Aarni Koskela	9bc478ecbb	pre-commit: replace linters + formatters with Ruff; fix some issues (#1300 ) * pre-commit: replace linters + formatters with Ruff * Don't use bare except * Clean up `noqa`s * Enable Ruff UP; apply auto-fixes * Enable Ruff B; apply fixes * Enable Ruff T with exceptions * Enable Ruff C (complexity); autofix * Upgrade Ruff to 0.2.0	2024-02-15 04:37:41 +01:00
Matthew Hollings	6614b8aa6b	Minor fixes to some comments in some examples. (#1156 )	2023-12-29 14:12:05 +01:00
Viet Hoang Tran Duong	e7961e45f1	Remove duplicate data loading in rl_training.py (#1020 ) We load dataset twice, but in line 149 (new), we do `ds = train_dataset.map` anyway	2023-11-23 12:25:07 +01:00
Jan Vincent Hoffbauer	d78d917880	Add comment to explain how the sentiment pipeline is used to run the … (#555 ) * Add comment to explain how the sentiment pipeline is used to run the reward model in the StackLLaMA example * Apply 'make precommit'	2023-07-24 18:09:45 +02:00
Younes Belkada	5c7bfbc8d9	[`examples`] Big refactor of examples and documentation (#509 ) * added sfttrainer and rmtrainer example scripts. * added few lines in the documentation. * moved notebooks. * delete `examples/summarization` * remove from docs as well * refactor sentiment tuning * more refactoring. * updated docs for multi-adapter RL. * add research projects folder * more refactor * refactor docs. * refactor structure * add correct scripts all over the place * final touches * final touches * updated documentation from feedback.	2023-07-14 12:00:56 +02:00

14 Commits