Commit Graph

28 Commits

Author SHA1 Message Date
9955ee7eaa 🐳 Docker update + Simplify Jobs doc (#3931)
Co-authored-by: sergiopaniego <sergiopaniegoblanco@gmail.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2025-09-13 18:35:55 -06:00
0c69fd2867 👷 Added Kernels on the Hub x TRL guide (#3969)
Co-authored-by: vb <vaibhavs10@gmail.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-09-04 15:37:02 +02:00
0c91515b58 🧭 HF jobs x TRL guide (#3890)
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
2025-08-26 21:44:29 -07:00
cb95323429 👋 Remove --bf16 value in scripts (#3869) 2025-08-07 12:25:36 -07:00
a043fd74a3 Add uv scripts headers (#3767) 2025-07-25 07:48:40 -07:00
ed9b78a5f7 🗳️ Remove logging_steps parameter from for simpler setup (#3612) 2025-06-18 13:52:21 +02:00
9df19e8a75 📜 Fix license and copyrights (#3264) 2025-04-08 15:22:58 -07:00
1d23ecc36f ©️ Update copyrights year (#2547)
* happy new year

* fix wandb import sort
2025-01-07 14:53:09 +01:00
460e780265 👯 Standardize model_args (#2442)
* `model_config` -> `model_args`

* sort
2024-12-10 12:51:20 +01:00
6a05feff02 🆔 Add datast_config to ScriptArguments (#2440)
* datast_config_name

* Update trl/utils.py [ci skip]

* sort import

* typo [ci skip]

* Trigger CI

* Rename `dataset_config_name` to `dataset_config`
2024-12-10 11:09:26 +01:00
9410874787 ©️ Copyrights update (#2454)
* First changes

* Other files

* Finally

* rm comment

* fix nashmd

* Fix example

* Fix example [ci skip]
2024-12-10 10:40:00 +01:00
e155cb8a66 ⛓️💥 Don't use eval_dataset in scripts when no eval strategy (#2270) 2024-10-28 11:40:51 +01:00
7e394b03e8 🎭 Deprecate [SFT/DPO/Reward]ScriptArguments in favour of ScriptArguments (#2145)
* `DPOScriptArguments` to `ScriptArguments`

* use dataset_train_split

* Use scriptarguments

* dataset names in command lines

* use `ScriptArguments` everywhere

* ignore biais buffer to end

* remove in v0.13

* rm comment

* update test commands

* Update docs/source/rloo_trainer.md

* Update tests/test_rloo_trainer.py

* Added dataset_train_split argument to ppo.py and rloo.py

* update scripts with dataset_train_split
2024-10-14 11:14:58 +02:00
47d08a9626 Rename trainer arg tokenizer to processing_class (#2162) 2024-10-07 09:39:32 +02:00
d45c86e2a7 Conversational dataset support for CPOTrainer (#2144)
* extract prompt and apply chat template in cpo trainer

* default leanring rate

* simplify example

* update doc

* test all formats

* extend exptract prompt

* improve doc format

* link in dataset formats

* Update docs/source/cpo_trainer.mdx

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update docs/source/cpo_trainer.mdx

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

---------

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2024-10-04 18:01:02 +02:00
c00722ce0a 🃏 Model card for TRL (#2123)
* template and util

* test for online dpo

* template in package_data

* template in manifest

* standardize push_to_hub

* wandb badge and quick start

* bco

* xpo

* simplify `create_model_card`

* cpo

* kto

* dpo

* gkd

* orpo

* style

* nash-md

* alignprop

* bco citation

* citation template

* cpo citation

* ddpo

* fix alignprop

* dpo

* gkd citation

* kto

* online dpo citation

* orpo citation

* citation in utils

* optional citation

* reward

* optional trainer citation

* sft

* remove add_model_tags bco

* Remove unnecessary code for adding model tags

* Fix model tag issue and update URL format

* Remove unused code for adding model tags

* Add citation for XPOTrainer

* Remove unused code in SFTTrainer

* Add model card generation in RLOOTrainer

* Remove unused import and method call in reward_trainer.py

* Add model card generation

* Remove unused code and update error message in ORPOTrainer class

* Add import statements and create model card in IterativeSFTTrainer

* Add dataset name to push_to_hub() call

* Update trainer.push_to_hub() dataset names

* script args

* test

* better doc

* fix tag test

* fix test tag

* Add tags parameter to create_model_card method

* doc

* script args

* Update trl/templates/model_card.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* unittest's `assertIn` instead of `assert`

* Update trl/templates/model_card.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

---------

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2024-09-27 15:23:05 +02:00
9af4734178 ♻️ Standardize script_args (#2130) 2024-09-26 15:23:42 +02:00
32d9d34eb1 Standardize pushing to Hub in examples (#2126) 2024-09-26 10:00:51 +02:00
10c2f63b2a training_args for all TrainingArguments (#2082) 2024-09-19 15:03:47 +02:00
4c0c98d950 Standardize dataset naming (#2081)
* `ds`, `raw_dataset` etc -> `dataset`

* Update docs/source/detoxifying_a_lm.mdx
2024-09-19 08:59:28 +02:00
40f05226de Standardizing datasets for testing (#2065)
* zen dataset

* Update dataset test bco

* some tests

* Simple chat template

* bco

* xpo

* kto

* gkd

* trainer_args

* sft

* online dpo

* orpo

* zen script
2024-09-14 22:34:15 +02:00
642c4b1855 Remove debug and sanity_check args (#2055) 2024-09-11 17:56:02 +02:00
f05f63c1ea PartialState().local_main_process_first() when map in examples (#1926)
* `PartialState().local_main_process_first()` when map in examples

* allow load from cache

---------

Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co>
2024-08-14 12:01:03 +02:00
54f806b6ff Standardize dataset_num_proc usage (#1925)
* uniform dataset_num_proc

* num_proc in shuffle

* Update examples/datasets/anthropic_hh.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update examples/scripts/ppo.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update examples/scripts/ppo.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

---------

Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2024-08-13 15:10:39 +02:00
7ddef5c158 Make use of trust_remote_code consistent (#1806)
Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co>
2024-07-10 18:26:11 +02:00
4402b36dcf clean examples (#1791)
Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co>
2024-07-04 14:29:25 +02:00
7075cec94d Update HH dataset on helpful only subset (#1613)
* Update HH dataset on helpful only subset

* format
2024-05-02 12:12:12 -04:00
d1df79f83c Add CPOTrainer (#1382)
* add CPOTrainer

* add docs

* fix formatting

* removed precompute_ref_log_probs arg

* remove precompute_ref_log_probs

* typos

* finish cpo trainer doc

* remove redundant lines

* typo

* formatting

* compute chosen nll loss also for enc-dec models

* fix gradient error of inplace operation for enc-dec models

* formatting

* use CPOConfig

* formatting

* use model_init_kwargs from CPOConfig

* comments in example

* fix doc string

* fix typo in docstring

* update year

* fixed typo

* use preference dataset

* fix learning rate

* move dataset_num_proc to configs

* Update cpo paper link from HF: cpo_trainer.mdx

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* update description for CPO: cpo_trainer.mdx

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* remove _prepare_deepspeed for cpo

Because CPO does not need init for reference model

* Add explanation to CPO loss

* format

* fix bug when lengths are given

* add CPOTrainer to README

* fix grammer

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2024-03-22 21:32:45 +01:00