9955ee7eaa
🐳 Docker update + Simplify Jobs doc ( #3931 )
...
Co-authored-by: sergiopaniego <sergiopaniegoblanco@gmail.com >
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com >
2025-09-13 18:35:55 -06:00
0c69fd2867
👷 Added Kernels on the Hub x TRL guide ( #3969 )
...
Co-authored-by: vb <vaibhavs10@gmail.com >
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com >
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
2025-09-04 15:37:02 +02:00
0c91515b58
🧭 HF jobs x TRL guide ( #3890 )
...
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com >
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com >
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com >
2025-08-26 21:44:29 -07:00
cb95323429
👋 Remove --bf16
value in scripts ( #3869 )
2025-08-07 12:25:36 -07:00
a043fd74a3
Add uv scripts headers ( #3767 )
2025-07-25 07:48:40 -07:00
ed9b78a5f7
🗳️ Remove logging_steps
parameter from for simpler setup ( #3612 )
2025-06-18 13:52:21 +02:00
9df19e8a75
📜 Fix license and copyrights ( #3264 )
2025-04-08 15:22:58 -07:00
1d23ecc36f
©️ Update copyrights year ( #2547 )
...
* happy new year
* fix wandb import sort
2025-01-07 14:53:09 +01:00
460e780265
👯 Standardize model_args
( #2442 )
...
* `model_config` -> `model_args`
* sort
2024-12-10 12:51:20 +01:00
6a05feff02
🆔 Add datast_config
to ScriptArguments
( #2440 )
...
* datast_config_name
* Update trl/utils.py [ci skip]
* sort import
* typo [ci skip]
* Trigger CI
* Rename `dataset_config_name` to `dataset_config`
2024-12-10 11:09:26 +01:00
9410874787
©️ Copyrights update ( #2454 )
...
* First changes
* Other files
* Finally
* rm comment
* fix nashmd
* Fix example
* Fix example [ci skip]
2024-12-10 10:40:00 +01:00
e155cb8a66
⛓️ 💥 Don't use eval_dataset
in scripts when no eval strategy ( #2270 )
2024-10-28 11:40:51 +01:00
7e394b03e8
🎭 Deprecate [SFT/DPO/Reward]ScriptArguments
in favour of ScriptArguments
( #2145 )
...
* `DPOScriptArguments` to `ScriptArguments`
* use dataset_train_split
* Use scriptarguments
* dataset names in command lines
* use `ScriptArguments` everywhere
* ignore biais buffer to end
* remove in v0.13
* rm comment
* update test commands
* Update docs/source/rloo_trainer.md
* Update tests/test_rloo_trainer.py
* Added dataset_train_split argument to ppo.py and rloo.py
* update scripts with dataset_train_split
2024-10-14 11:14:58 +02:00
47d08a9626
Rename trainer arg tokenizer
to processing_class
( #2162 )
2024-10-07 09:39:32 +02:00
d45c86e2a7
Conversational dataset support for CPOTrainer
( #2144 )
...
* extract prompt and apply chat template in cpo trainer
* default leanring rate
* simplify example
* update doc
* test all formats
* extend exptract prompt
* improve doc format
* link in dataset formats
* Update docs/source/cpo_trainer.mdx
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
* Update docs/source/cpo_trainer.mdx
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
---------
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
2024-10-04 18:01:02 +02:00
c00722ce0a
🃏 Model card for TRL ( #2123 )
...
* template and util
* test for online dpo
* template in package_data
* template in manifest
* standardize push_to_hub
* wandb badge and quick start
* bco
* xpo
* simplify `create_model_card`
* cpo
* kto
* dpo
* gkd
* orpo
* style
* nash-md
* alignprop
* bco citation
* citation template
* cpo citation
* ddpo
* fix alignprop
* dpo
* gkd citation
* kto
* online dpo citation
* orpo citation
* citation in utils
* optional citation
* reward
* optional trainer citation
* sft
* remove add_model_tags bco
* Remove unnecessary code for adding model tags
* Fix model tag issue and update URL format
* Remove unused code for adding model tags
* Add citation for XPOTrainer
* Remove unused code in SFTTrainer
* Add model card generation in RLOOTrainer
* Remove unused import and method call in reward_trainer.py
* Add model card generation
* Remove unused code and update error message in ORPOTrainer class
* Add import statements and create model card in IterativeSFTTrainer
* Add dataset name to push_to_hub() call
* Update trainer.push_to_hub() dataset names
* script args
* test
* better doc
* fix tag test
* fix test tag
* Add tags parameter to create_model_card method
* doc
* script args
* Update trl/templates/model_card.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
* unittest's `assertIn` instead of `assert`
* Update trl/templates/model_card.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
---------
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
2024-09-27 15:23:05 +02:00
9af4734178
♻️ Standardize script_args
( #2130 )
2024-09-26 15:23:42 +02:00
32d9d34eb1
Standardize pushing to Hub in examples ( #2126 )
2024-09-26 10:00:51 +02:00
10c2f63b2a
training_args
for all TrainingArguments
(#2082 )
2024-09-19 15:03:47 +02:00
4c0c98d950
Standardize dataset naming ( #2081 )
...
* `ds`, `raw_dataset` etc -> `dataset`
* Update docs/source/detoxifying_a_lm.mdx
2024-09-19 08:59:28 +02:00
40f05226de
Standardizing datasets for testing ( #2065 )
...
* zen dataset
* Update dataset test bco
* some tests
* Simple chat template
* bco
* xpo
* kto
* gkd
* trainer_args
* sft
* online dpo
* orpo
* zen script
2024-09-14 22:34:15 +02:00
642c4b1855
Remove debug
and sanity_check
args ( #2055 )
2024-09-11 17:56:02 +02:00
f05f63c1ea
PartialState().local_main_process_first()
when map
in examples (#1926 )
...
* `PartialState().local_main_process_first()` when map in examples
* allow load from cache
---------
Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co >
2024-08-14 12:01:03 +02:00
54f806b6ff
Standardize dataset_num_proc
usage ( #1925 )
...
* uniform dataset_num_proc
* num_proc in shuffle
* Update examples/datasets/anthropic_hh.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
* Update examples/scripts/ppo.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
* Update examples/scripts/ppo.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
---------
Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co >
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
2024-08-13 15:10:39 +02:00
7ddef5c158
Make use of trust_remote_code
consistent ( #1806 )
...
Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co >
2024-07-10 18:26:11 +02:00
4402b36dcf
clean examples ( #1791 )
...
Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co >
2024-07-04 14:29:25 +02:00
7075cec94d
Update HH dataset on helpful only subset ( #1613 )
...
* Update HH dataset on helpful only subset
* format
2024-05-02 12:12:12 -04:00
d1df79f83c
Add CPOTrainer ( #1382 )
...
* add CPOTrainer
* add docs
* fix formatting
* removed precompute_ref_log_probs arg
* remove precompute_ref_log_probs
* typos
* finish cpo trainer doc
* remove redundant lines
* typo
* formatting
* compute chosen nll loss also for enc-dec models
* fix gradient error of inplace operation for enc-dec models
* formatting
* use CPOConfig
* formatting
* use model_init_kwargs from CPOConfig
* comments in example
* fix doc string
* fix typo in docstring
* update year
* fixed typo
* use preference dataset
* fix learning rate
* move dataset_num_proc to configs
* Update cpo paper link from HF: cpo_trainer.mdx
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
* update description for CPO: cpo_trainer.mdx
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
* remove _prepare_deepspeed for cpo
Because CPO does not need init for reference model
* Add explanation to CPO loss
* format
* fix bug when lengths are given
* add CPOTrainer to README
* fix grammer
---------
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com >
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
2024-03-22 21:32:45 +01:00