9955ee7eaa
🐳 Docker update + Simplify Jobs doc ( #3931 )
...
Co-authored-by: sergiopaniego <sergiopaniegoblanco@gmail.com >
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com >
2025-09-13 18:35:55 -06:00
a647e5a78a
🗜 Hotfix: avoid passing quantization_config=None
( #4019 )
2025-09-09 14:50:15 -06:00
0c69fd2867
👷 Added Kernels on the Hub x TRL guide ( #3969 )
...
Co-authored-by: vb <vaibhavs10@gmail.com >
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com >
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
2025-09-04 15:37:02 +02:00
208e9f7df7
📏 torch_dype
to dtype
everywhere ( #4000 )
...
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com >
2025-09-03 15:45:37 -06:00
0c91515b58
🧭 HF jobs x TRL guide ( #3890 )
...
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com >
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com >
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com >
2025-08-26 21:44:29 -07:00
a043fd74a3
Add uv scripts headers ( #3767 )
2025-07-25 07:48:40 -07:00
9df19e8a75
📜 Fix license and copyrights ( #3264 )
2025-04-08 15:22:58 -07:00
1d23ecc36f
©️ Update copyrights year ( #2547 )
...
* happy new year
* fix wandb import sort
2025-01-07 14:53:09 +01:00
ca850be0a2
🕹️ CLI refactor ( #2380 )
...
* Refactor main function in dpo.py
* Update setup.py and add cli.py
* Add examples to package data
* style
* Refactor setup.py file
* Add new file t.py
* Move dpo to package
* Update MANIFEST.in and setup.py, refactor trl/cli.py
* Add __init__.py to trl/scripts directory
* Add license header to __init__.py
* File moved instruction
* Add Apache License and update file path
* Move dpo.py to new location
* Refactor CLI and DPO script
* Refactor import structure in scripts package
* env
* rm config from chat arg
* rm old cli
* chat init
* test cli [skip ci]
* Add `datast_config_name` to `ScriptArguments` (#2440 )
* add missing arg
* Add test cases for 'trl sft' and 'trl dpo' commands
* Add sft.py script and update cli.py to include sft command
* Move sft script
* chat
* style [ci skip]
* kto
* rm example config
* first step on doc
* see #2442
* see #2443
* fix chat windows
* ©️ Copyrights update (#2454 )
* First changes
* Other files
* Finally
* rm comment
* fix nashmd
* Fix example
* Fix example [ci skip]
* 💬 Fix chat for windows (#2443 )
* fix chat for windows
* add some tests back
* Revert "add some tests back"
This reverts commit 350aef52f53f8cf34fccd7ad0f78a3dd63867e06.
* 🆔 Add `datast_config` to `ScriptArguments` (#2440 )
* datast_config_name
* Update trl/utils.py [ci skip]
* sort import
* typo [ci skip]
* Trigger CI
* Rename `dataset_config_name` to `dataset_config`
* 🏎 Fix deepspeed preparation of `ref_model` in `OnlineDPOTrainer` (#2417 )
* Remove unused deepspeed code
* add model prep back
* add deepspeed even if it doesn't work
* rm old code
* Fix config name
* Remove `make dev` in favor of `pip install -e .[dev]`
* Update script paths and remove old symlink related things
* Fix chat script path [ci skip]
* style
2024-12-13 17:52:23 +01:00
460e780265
👯 Standardize model_args
( #2442 )
...
* `model_config` -> `model_args`
* sort
2024-12-10 12:51:20 +01:00
6a05feff02
🆔 Add datast_config
to ScriptArguments
( #2440 )
...
* datast_config_name
* Update trl/utils.py [ci skip]
* sort import
* typo [ci skip]
* Trigger CI
* Rename `dataset_config_name` to `dataset_config`
2024-12-10 11:09:26 +01:00
9410874787
©️ Copyrights update ( #2454 )
...
* First changes
* Other files
* Finally
* rm comment
* fix nashmd
* Fix example
* Fix example [ci skip]
2024-12-10 10:40:00 +01:00
16fa13ce72
👮 Deprecate policy
in favor of model
in PPOTrainer
( #2386 )
2024-11-26 08:13:10 +01:00
ee3cbe1946
💾 Deprecate config
in favor of args
in PPOTrainer
( #2384 )
2024-11-25 14:48:08 +01:00
1293f37c5f
📉 Add PEFT support for PPOTrainer
( #2344 )
...
* Add peft/lora support for
* Fix: style
* Fix: typo
* Add ppo.py PEFT example
* Fixed the optional dependencies error
* skip peft test if peft is unavailable
* Update trl/trainer/ppo_trainer.py
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com >
---------
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com >
2024-11-18 11:54:09 +01:00
e155cb8a66
⛓️ 💥 Don't use eval_dataset
in scripts when no eval strategy ( #2270 )
2024-10-28 11:40:51 +01:00
7e394b03e8
🎭 Deprecate [SFT/DPO/Reward]ScriptArguments
in favour of ScriptArguments
( #2145 )
...
* `DPOScriptArguments` to `ScriptArguments`
* use dataset_train_split
* Use scriptarguments
* dataset names in command lines
* use `ScriptArguments` everywhere
* ignore biais buffer to end
* remove in v0.13
* rm comment
* update test commands
* Update docs/source/rloo_trainer.md
* Update tests/test_rloo_trainer.py
* Added dataset_train_split argument to ppo.py and rloo.py
* update scripts with dataset_train_split
2024-10-14 11:14:58 +02:00
70036bf87f
🕊️ Migration PPOv2
-> PPO
( #2174 )
...
* delete old ppo
* rename ppov2 files
* PPOv2 -> PPO
* rm old doc
* rename ppo doc file
* rm old test
* rename test
* re-add v2 with deprecation
* style
* start update customization
* Lion
* Finish update customization
* remove ppo_multi_adaptater
* remove ppo example
* update some doc
* rm test no peft
* rm hello world
* processing class
* Update docs/source/detoxifying_a_lm.mdx
Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com >
* Update trl/trainer/ppov2_config.py
Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com >
* Update docs/source/customization.mdx
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
* Update docs/source/detoxifying_a_lm.mdx
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
* po to example overview
* drop lion
* remove "Use 8-bit optimizer"
* Update docs/source/customization.mdx
* Update docs/source/customization.mdx
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
* it applies to all trainers
---------
Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com >
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
2024-10-11 17:28:39 +02:00
47d08a9626
Rename trainer arg tokenizer
to processing_class
( #2162 )
2024-10-07 09:39:32 +02:00
c00722ce0a
🃏 Model card for TRL ( #2123 )
...
* template and util
* test for online dpo
* template in package_data
* template in manifest
* standardize push_to_hub
* wandb badge and quick start
* bco
* xpo
* simplify `create_model_card`
* cpo
* kto
* dpo
* gkd
* orpo
* style
* nash-md
* alignprop
* bco citation
* citation template
* cpo citation
* ddpo
* fix alignprop
* dpo
* gkd citation
* kto
* online dpo citation
* orpo citation
* citation in utils
* optional citation
* reward
* optional trainer citation
* sft
* remove add_model_tags bco
* Remove unnecessary code for adding model tags
* Fix model tag issue and update URL format
* Remove unused code for adding model tags
* Add citation for XPOTrainer
* Remove unused code in SFTTrainer
* Add model card generation in RLOOTrainer
* Remove unused import and method call in reward_trainer.py
* Add model card generation
* Remove unused code and update error message in ORPOTrainer class
* Add import statements and create model card in IterativeSFTTrainer
* Add dataset name to push_to_hub() call
* Update trainer.push_to_hub() dataset names
* script args
* test
* better doc
* fix tag test
* fix test tag
* Add tags parameter to create_model_card method
* doc
* script args
* Update trl/templates/model_card.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
* unittest's `assertIn` instead of `assert`
* Update trl/templates/model_card.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
---------
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
2024-09-27 15:23:05 +02:00
32d9d34eb1
Standardize pushing to Hub in examples ( #2126 )
2024-09-26 10:00:51 +02:00
10c2f63b2a
training_args
for all TrainingArguments
(#2082 )
2024-09-19 15:03:47 +02:00
4c0c98d950
Standardize dataset naming ( #2081 )
...
* `ds`, `raw_dataset` etc -> `dataset`
* Update docs/source/detoxifying_a_lm.mdx
2024-09-19 08:59:28 +02:00
40f05226de
Standardizing datasets for testing ( #2065 )
...
* zen dataset
* Update dataset test bco
* some tests
* Simple chat template
* bco
* xpo
* kto
* gkd
* trainer_args
* sft
* online dpo
* orpo
* zen script
2024-09-14 22:34:15 +02:00
4c92ba5769
©️ Copyrights ( #2063 )
...
* copyrights
* fail if missing
2024-09-13 14:18:47 +02:00
9a6061fc2f
Clean up DPO example ( #2043 )
...
* Clean up DPO example
* Fix bs
* Remove rentrant
* Fix tests
* Nuke sanity checks
* Switch dataset
* Remove sanity check from XPO
2024-09-11 17:45:00 +02:00
a20e822737
Deprecate PPOTrainer
( #2016 )
...
* Promote `PPOv2Trainer` and `PPOv2Config` to top-level import
* Deprecate `PPOTrainer` and `PPOConfig`
* FutureWarning
* Update trl/trainer/ppo_config.py
2024-09-10 19:04:29 +02:00
2ee0b62cdb
Change non_eos_penalty
to missing_eos_penalty
to be consistent across OnPolicy
trainers ( #2033 )
...
* Subtract a penalty from OnPolicy Trainers if output does not contain an EOS token
* Caught a few other problems
* Updated the documentation for RLOO trainer and PPOv2Trainer
* Corrected the default type and value for missing_eos_penalty
* Made RLOO Trainer consistent with Online DPO and PPOv2
* Removed --non_eos_penalty from all documentation
* Made missing_eos_penalty examples positive (because we subtract).
* Caught two more incorrect examples
* Removed unnecessary whitespace to make ruff happy
* Update trl/trainer/utils.py
---------
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com >
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com >
2024-09-10 14:40:23 +02:00
f05f63c1ea
PartialState().local_main_process_first()
when map
in examples (#1926 )
...
* `PartialState().local_main_process_first()` when map in examples
* allow load from cache
---------
Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co >
2024-08-14 12:01:03 +02:00
54f806b6ff
Standardize dataset_num_proc
usage ( #1925 )
...
* uniform dataset_num_proc
* num_proc in shuffle
* Update examples/datasets/anthropic_hh.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
* Update examples/scripts/ppo.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
* Update examples/scripts/ppo.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
---------
Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co >
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
2024-08-13 15:10:39 +02:00
7ddef5c158
Make use of trust_remote_code
consistent ( #1806 )
...
Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co >
2024-07-10 18:26:11 +02:00
34d273f227
Support num_train_epochs ( #1743 )
...
* add a test case for num_train_epochs
* fix ci
* quick change
* disable push to hub
* debug windows ci
* try another fix
* skip subprocess tests on windows
2024-06-20 13:16:43 -04:00
e7cb597230
Fix ppov2 test case ( #1661 )
...
* Fix PPOv2 / RLOO refactor's stuff
* update terminology to use stop token
2024-05-23 11:37:16 -04:00
13454d2f4b
PPO / Reinforce Trainers ( #1540 )
...
* Add ppov2 trainer
* make eos trick optional, remove unused args
* quick fix
* precommit
* update debugging script
* fix out of bound `drop_last=True`; use built-in scheduler
* Add PPO examples
* push changes
* quick change
* quick change
* various bug fixes
* remove unnecessary grad accumulation setting
* push new changes
* fix DS3 model saving
* update ppo.py
* refactor
* quick change
* refactor
* update ppo trainer
* refactor
* quick test
* add ds2 /ds3 7 processes config
* add vllm trainer
* quick change
* experiment with reward normalization
* push changes
* quick push
* push changes
* push various changes
* refactor to use ModelConfig
* quick change
* refactor
* refactor
* Simplify DS logic
* quick update
* remove unnecessary files
* precommit
* deepspeed fix; handle edge case when eos_token_id = 0
* add PPO tldr example
* add TL;DR example
* fix undefined var
* utilize all samples in rloo
* quick setting
* remove the unnecessary `value_model`
* use exact_div
* allow saving the deepspeed model
* refactor
* remove dead code
* Use some shared utilities
* add some end-to-end test cases
* add PPOv2 docs and RLOO docs / tests
* update docs
* quikc push
* fix ci
* fix type annotation for ci
* quick update
* update trainer docs
2024-05-22 08:31:10 -04:00