frozenleaves/trl - trl - Gitea: Git for Me

mirror of https://github.com/huggingface/trl.git synced 2025-10-21 02:53:59 +08:00

Author	SHA1	Message	Date
Sergio Paniego Blanco	208e9f7df7	📏 `torch_dype` to `dtype` everywhere (#4000 ) Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>	2025-09-03 15:45:37 -06:00
Quentin Gallouédec	ed9b78a5f7	🗳️ Remove `logging_steps` parameter from for simpler setup (#3612 )	2025-06-18 13:52:21 +02:00
Quentin Gallouédec	9df19e8a75	📜 Fix license and copyrights (#3264 )	2025-04-08 15:22:58 -07:00
Quentin Gallouédec	1d23ecc36f	©️ Update copyrights year (#2547 ) * happy new year * fix wandb import sort	2025-01-07 14:53:09 +01:00
Quentin Gallouédec	9410874787	©️ Copyrights update (#2454 ) * First changes * Other files * Finally * rm comment * fix nashmd * Fix example * Fix example [ci skip]	2024-12-10 10:40:00 +01:00
Quentin Gallouédec	c10cc8995b	🗝️ Update type hints (#2399 ) * New type hint structure * Update type hints * Delete wrong file * Remove dict import	2024-11-26 20:37:27 +01:00
Quentin Gallouédec	fb1b48fdbe	Remove `max_length` from `RewardDataCollatorWithPadding` (#2119 )	2024-09-26 09:59:12 +02:00
Quentin Gallouédec	4c92ba5769	©️ Copyrights (#2063 ) * copyrights * fail if missing	2024-09-13 14:18:47 +02:00
Quentin Gallouédec	54f806b6ff	Standardize `dataset_num_proc` usage (#1925 ) * uniform dataset_num_proc * num_proc in shuffle * Update examples/datasets/anthropic_hh.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update examples/scripts/ppo.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update examples/scripts/ppo.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> --------- Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2024-08-13 15:10:39 +02:00
Quentin Gallouédec	c8cef79e6c	arXiv to HF Papers (#1870 )	2024-07-24 21:06:57 +02:00
Wang, Yi	3c0a10b1ae	fix dataset load error (#1670 ) Signed-off-by: Wang, Yi <yi.a.wang@intel.com>	2024-05-27 14:52:20 +02:00
Zach Mueller	a02513c3b7	Apply deprecated `evaluation_strategy` (#1559 ) * Deprecate * Update tests/test_dpo_trainer.py --------- Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>	2024-05-23 12:48:00 +02:00
Wang, Yi	2a2676e7ec	set seed in sft/dpo/reward_modeling to make result reproducable (#1357 ) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>	2024-02-23 11:12:45 +01:00
Younes Belkada	5c7bfbc8d9	[`examples`] Big refactor of examples and documentation (#509 ) * added sfttrainer and rmtrainer example scripts. * added few lines in the documentation. * moved notebooks. * delete `examples/summarization` * remove from docs as well * refactor sentiment tuning * more refactoring. * updated docs for multi-adapter RL. * add research projects folder * more refactor * refactor docs. * refactor structure * add correct scripts all over the place * final touches * final touches * updated documentation from feedback.	2023-07-14 12:00:56 +02:00

14 Commits