frozenleaves/trl - trl - Gitea: Git for Me

mirror of https://github.com/huggingface/trl.git synced 2025-10-20 18:43:52 +08:00

Author	SHA1	Message	Date
Quentin Gallouédec	9955ee7eaa	🐳 Docker update + Simplify Jobs doc (#3931 ) Co-authored-by: sergiopaniego <sergiopaniegoblanco@gmail.com> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>	2025-09-13 18:35:55 -06:00
Albert Villanova del Moral	e8b8499f1f	Remove redundant 'None' from docstrings (#4058 )	2025-09-11 08:16:34 +02:00
kaixuanliu	251c0488c8	📦 Wrapping the main execution code to avoid multi-processing issues from vLLM (#3932 ) Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>	2025-08-21 12:45:13 -07:00
Quentin Lhoest	a043fd74a3	Add uv scripts headers (#3767 )	2025-07-25 07:48:40 -07:00
Quentin Gallouédec	9df19e8a75	📜 Fix license and copyrights (#3264 )	2025-04-08 15:22:58 -07:00
Quentin Gallouédec	8453017622	🧼 Upgrade ruff (#2938 )	2025-02-23 17:33:50 +01:00
Quentin Gallouédec	1d23ecc36f	©️ Update copyrights year (#2547 ) * happy new year * fix wandb import sort	2025-01-07 14:53:09 +01:00
Quentin Gallouédec	52d213173f	🚜 Use field in dataclasses (#2494 ) * in hh-rlhf-helpful-base * delete tokenize ds * dataset scripts * alignprop * judge tldr * ddpo * zen * sft video * literal to choices * chat * script args * alignprop * bco * better help format * cpo * ddpo * whether or not -> whether * dpo * dont set the possible values * `Optional[...]` to ... or `None` * xpo * gkd * kto * nash * online dpo * Fix typo in learning rate help message * orpo * more ... or `None` * model config * ppo * prm * reward * rloo * sft * online policy config * make style	2025-01-06 18:29:09 +01:00
Quentin Gallouédec	9410874787	©️ Copyrights update (#2454 ) * First changes * Other files * Finally * rm comment * fix nashmd * Fix example * Fix example [ci skip]	2024-12-10 10:40:00 +01:00
Quentin Gallouédec	9af4734178	♻️ Standardize `script_args` (#2130 )	2024-09-26 15:23:42 +02:00
Quentin Gallouédec	4c0c98d950	Standardize dataset naming (#2081 ) * `ds`, `raw_dataset` etc -> `dataset` * Update docs/source/detoxifying_a_lm.mdx	2024-09-19 08:59:28 +02:00
Quentin Gallouédec	4c92ba5769	©️ Copyrights (#2063 ) * copyrights * fail if missing	2024-09-13 14:18:47 +02:00
Quentin Gallouédec	31b93876a7	📝 Document dataset format (#2020 ) * first piece of doc * improve readibility * some data utils and doc * simplify prompt-only * format * fix path data utils * fix example format * simplify * tests * prompt-completion * update antropic hh * update dataset script * implicit prompt * additional content * `maybe_reformat_dpo_to_kto` -> `unpair_preference_dataset` * Preference dataset with implicit prompt * unpair preference dataset tests * documentation * ... * doc * changes applied to dpo example * better doc and better log error * a bit more doc * improve doc * converting * some subsections * converting section * further refinements * tldr * tldr preference * rename * lm-human-preferences-sentiment * `imdb` to `stanfordnlp/imdb` * Add script for LM human preferences descriptiveness * Remove sentiment_descriptiveness.py script * style * example judge tlrd with new dataset * Syle * Dataset conversion for TRL compatibility * further refinements * trainers in doc * top level for functions * stanfordnlp/imdb * downgrade transformers * temp reduction of tests * next commit * next commit * additional content * proper tick format * precise the assistant start token * improve * lower case * Update titles in _toctree.yml and data_utils.mdx * revert make change * correct dataset ids * expand a bit dataset formats * skip gated repo tests * data utilities in API * Update docs/source/dataset_formats.mdx Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update docs/source/dataset_formats.mdx Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update docs/source/dataset_formats.mdx Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update docs/source/dataset_formats.mdx Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * tiny internal testing for chat template testing * precise type/format * exlude sft trainer in doc * Update trl/trainer/utils.py * XPO in the doc --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2024-09-11 20:11:25 +02:00
Quentin Gallouédec	8bd2ab82f4	Refactor judges (#1856 ) * BaseJudge -> BasePairwiseJudge * hf judge asyncio * refactor judges * doc * doc * doc * memeber judge * :inherited-members: * :inherited-members: * doc * give up * judge tldr with judge class * fix rank in multithread * format * improve doc * update doc * typo doc * doc online dpo * Update judge_tldr.py --------- Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co>	2024-07-28 14:06:19 +02:00

14 Commits