frozenleaves/trl - trl - Gitea: Git for Me

mirror of https://github.com/huggingface/trl.git synced 2025-10-20 18:43:52 +08:00

Author	SHA1	Message	Date
Quentin Gallouédec	9955ee7eaa	🐳 Docker update + Simplify Jobs doc (#3931 ) Co-authored-by: sergiopaniego <sergiopaniegoblanco@gmail.com> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>	2025-09-13 18:35:55 -06:00
Sergio Paniego Blanco	0c69fd2867	👷 Added Kernels on the Hub x TRL guide (#3969 ) Co-authored-by: vb <vaibhavs10@gmail.com> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2025-09-04 15:37:02 +02:00
Sergio Paniego Blanco	0c91515b58	🧭 HF jobs x TRL guide (#3890 ) Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>	2025-08-26 21:44:29 -07:00
Sergio Paniego Blanco	cb95323429	👋 Remove `--bf16` value in scripts (#3869 )	2025-08-07 12:25:36 -07:00
Quentin Lhoest	a043fd74a3	Add uv scripts headers (#3767 )	2025-07-25 07:48:40 -07:00
Quentin Gallouédec	ed9b78a5f7	🗳️ Remove `logging_steps` parameter from for simpler setup (#3612 )	2025-06-18 13:52:21 +02:00
Quentin Gallouédec	9df19e8a75	📜 Fix license and copyrights (#3264 )	2025-04-08 15:22:58 -07:00
Quentin Gallouédec	1d23ecc36f	©️ Update copyrights year (#2547 ) * happy new year * fix wandb import sort	2025-01-07 14:53:09 +01:00
Quentin Gallouédec	460e780265	👯 Standardize `model_args` (#2442 ) * `model_config` -> `model_args` * sort	2024-12-10 12:51:20 +01:00
Quentin Gallouédec	6a05feff02	🆔 Add `datast_config` to `ScriptArguments` (#2440 ) * datast_config_name * Update trl/utils.py [ci skip] * sort import * typo [ci skip] * Trigger CI * Rename `dataset_config_name` to `dataset_config`	2024-12-10 11:09:26 +01:00
Quentin Gallouédec	9410874787	©️ Copyrights update (#2454 ) * First changes * Other files * Finally * rm comment * fix nashmd * Fix example * Fix example [ci skip]	2024-12-10 10:40:00 +01:00
Quentin Gallouédec	e155cb8a66	⛓️‍💥 Don't use `eval_dataset` in scripts when no eval strategy (#2270 )	2024-10-28 11:40:51 +01:00
Quentin Gallouédec	7e394b03e8	🎭 Deprecate `[SFT/DPO/Reward]ScriptArguments` in favour of `ScriptArguments` (#2145 ) * `DPOScriptArguments` to `ScriptArguments` * use dataset_train_split * Use scriptarguments * dataset names in command lines * use `ScriptArguments` everywhere * ignore biais buffer to end * remove in v0.13 * rm comment * update test commands * Update docs/source/rloo_trainer.md * Update tests/test_rloo_trainer.py * Added dataset_train_split argument to ppo.py and rloo.py * update scripts with dataset_train_split	2024-10-14 11:14:58 +02:00
Quentin Gallouédec	47d08a9626	Rename trainer arg `tokenizer` to `processing_class` (#2162 )	2024-10-07 09:39:32 +02:00
Quentin Gallouédec	d45c86e2a7	Conversational dataset support for `CPOTrainer` (#2144 ) * extract prompt and apply chat template in cpo trainer * default leanring rate * simplify example * update doc * test all formats * extend exptract prompt * improve doc format * link in dataset formats * Update docs/source/cpo_trainer.mdx Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update docs/source/cpo_trainer.mdx Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2024-10-04 18:01:02 +02:00
Quentin Gallouédec	c00722ce0a	🃏 Model card for TRL (#2123 ) * template and util * test for online dpo * template in package_data * template in manifest * standardize push_to_hub * wandb badge and quick start * bco * xpo * simplify `create_model_card` * cpo * kto * dpo * gkd * orpo * style * nash-md * alignprop * bco citation * citation template * cpo citation * ddpo * fix alignprop * dpo * gkd citation * kto * online dpo citation * orpo citation * citation in utils * optional citation * reward * optional trainer citation * sft * remove add_model_tags bco * Remove unnecessary code for adding model tags * Fix model tag issue and update URL format * Remove unused code for adding model tags * Add citation for XPOTrainer * Remove unused code in SFTTrainer * Add model card generation in RLOOTrainer * Remove unused import and method call in reward_trainer.py * Add model card generation * Remove unused code and update error message in ORPOTrainer class * Add import statements and create model card in IterativeSFTTrainer * Add dataset name to push_to_hub() call * Update trainer.push_to_hub() dataset names * script args * test * better doc * fix tag test * fix test tag * Add tags parameter to create_model_card method * doc * script args * Update trl/templates/model_card.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * unittest's `assertIn` instead of `assert` * Update trl/templates/model_card.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2024-09-27 15:23:05 +02:00
Quentin Gallouédec	9af4734178	♻️ Standardize `script_args` (#2130 )	2024-09-26 15:23:42 +02:00
Quentin Gallouédec	32d9d34eb1	Standardize pushing to Hub in examples (#2126 )	2024-09-26 10:00:51 +02:00
Quentin Gallouédec	10c2f63b2a	`training_args` for all `TrainingArguments` (#2082 )	2024-09-19 15:03:47 +02:00
Quentin Gallouédec	4c0c98d950	Standardize dataset naming (#2081 ) * `ds`, `raw_dataset` etc -> `dataset` * Update docs/source/detoxifying_a_lm.mdx	2024-09-19 08:59:28 +02:00
Quentin Gallouédec	40f05226de	Standardizing datasets for testing (#2065 ) * zen dataset * Update dataset test bco * some tests * Simple chat template * bco * xpo * kto * gkd * trainer_args * sft * online dpo * orpo * zen script	2024-09-14 22:34:15 +02:00
Quentin Gallouédec	642c4b1855	Remove `debug` and `sanity_check` args (#2055 )	2024-09-11 17:56:02 +02:00
Quentin Gallouédec	f05f63c1ea	`PartialState().local_main_process_first()` when `map` in examples (#1926 ) * `PartialState().local_main_process_first()` when map in examples * allow load from cache --------- Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co>	2024-08-14 12:01:03 +02:00
Quentin Gallouédec	54f806b6ff	Standardize `dataset_num_proc` usage (#1925 ) * uniform dataset_num_proc * num_proc in shuffle * Update examples/datasets/anthropic_hh.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update examples/scripts/ppo.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update examples/scripts/ppo.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> --------- Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2024-08-13 15:10:39 +02:00
Quentin Gallouédec	7ddef5c158	Make use of `trust_remote_code` consistent (#1806 ) Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co>	2024-07-10 18:26:11 +02:00
Quentin Gallouédec	4402b36dcf	clean examples (#1791 ) Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co>	2024-07-04 14:29:25 +02:00
Costa Huang	7075cec94d	Update HH dataset on helpful only subset (#1613 ) * Update HH dataset on helpful only subset * format	2024-05-02 12:12:12 -04:00
Haoran Xu	d1df79f83c	Add CPOTrainer (#1382 ) * add CPOTrainer * add docs * fix formatting * removed precompute_ref_log_probs arg * remove precompute_ref_log_probs * typos * finish cpo trainer doc * remove redundant lines * typo * formatting * compute chosen nll loss also for enc-dec models * fix gradient error of inplace operation for enc-dec models * formatting * use CPOConfig * formatting * use model_init_kwargs from CPOConfig * comments in example * fix doc string * fix typo in docstring * update year * fixed typo * use preference dataset * fix learning rate * move dataset_num_proc to configs * Update cpo paper link from HF: cpo_trainer.mdx Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * update description for CPO: cpo_trainer.mdx Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * remove _prepare_deepspeed for cpo Because CPO does not need init for reference model * Add explanation to CPO loss * format * fix bug when lengths are given * add CPOTrainer to README * fix grammer --------- Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2024-03-22 21:32:45 +01:00

28 Commits