Commit Graph

46 Commits

Author SHA1 Message Date
9df19e8a75 📜 Fix license and copyrights (#3264) 2025-04-08 15:22:58 -07:00
b6bcafb8bb 🏃 Fix and make CI faster (#3160) 2025-04-08 06:12:08 -07:00
1d23ecc36f ©️ Update copyrights year (#2547)
* happy new year

* fix wandb import sort
2025-01-07 14:53:09 +01:00
efc687db62 🛠️ Update tests and fix PPO (#2463)
* [bugfix] critic not update

* Update ppo_trainer.py

* Update ppo_trainer.py

* add failing test

* test both policy and critic

* formatting

* fix tests

* formatting

* Update tests/test_ppo_trainer.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* fix test

---------

Co-authored-by: NINGBENZHE <53843873+NINGBENZHE@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co>
2024-12-12 12:53:32 +01:00
9410874787 ©️ Copyrights update (#2454)
* First changes

* Other files

* Finally

* rm comment

* fix nashmd

* Fix example

* Fix example [ci skip]
2024-12-10 10:40:00 +01:00
453db5cd79 🤏 New models for tests (#2287)
* first commit

* uncomment

* other tests adaptations

* Remove unused variable in test_setup_chat_format

* Remove unused import statement

* style

* Add Bart model

* Update BCOTrainerTester class in test_bco_trainer.py

* Update model IDs and tokenizers in test files

* Add new models and processors

* Update model IDs in test files

* Fix formatting issue in test_dataset_formatting.py

* Refactor dataset formatting in test_dataset_formatting.py

* Fix dataset sequence length in SFTTrainerTester

* Remove tokenizer

* Remove print statement

* Add reward_model_path and sft_model_path to PPO trainer

* Fix tokenizer padding issue

* Add chat template for testing purposes in PaliGemma model

* Update PaliGemma model and chat template

* Increase learning rate to speed up test

* Update model names in run_dpo.sh and run_sft.sh scripts

* Update model and dataset names

* Fix formatting issue in test_dataset_formatting.py

* Fix formatting issue in test_dataset_formatting.py

* Remove unused chat template

* Update model generation script

* additional models

* Update model references in test files

* Remove unused imports in test_online_dpo_trainer.py

* Add is_llm_blender_available import and update reward_tokenizer

* Refactor test_online_dpo_trainer.py: Move skipped test case decorator

* remove models without chat templates

* Update model names in scripts and tests

* Update model_id in test_modeling_value_head.py

* Update model versions in test files

* Fix formatting issue in test_dataset_formatting.py

* Update embedding model ID in BCOTrainerTester

* Update test_online_dpo_trainer.py with reward model changes

* Update expected formatted text in test_dataset_formatting.py

* Add reward_tokenizer to TestOnlineDPOTrainer

* fix tests

* Add SIMPLE_CHAT_TEMPLATE to T5 tokenizer

* Fix dummy_text format in test_rloo_trainer.py

* Skip outdated test for chatML data collator

* Add new vision language models

* Commented out unused model IDs in test_vdpo_trainer

* Update model and vision configurations in generate_tiny_models.py and test_dpo_trainer.py

* Update model and tokenizer references

* Don't push if it already exists

* Add comment explaining test skip

* Fix model_exists function call and add new models

* Update LlavaForConditionalGeneration model and processor

* `qgallouedec` -> `trl-internal-testing`
2024-11-25 16:31:56 +01:00
1293f37c5f 📉 Add PEFT support for PPOTrainer (#2344)
* Add peft/lora support for

* Fix: style

* Fix: typo

* Add ppo.py PEFT example

* Fixed the optional dependencies error

* skip peft test if peft is unavailable

* Update trl/trainer/ppo_trainer.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2024-11-18 11:54:09 +01:00
7e394b03e8 🎭 Deprecate [SFT/DPO/Reward]ScriptArguments in favour of ScriptArguments (#2145)
* `DPOScriptArguments` to `ScriptArguments`

* use dataset_train_split

* Use scriptarguments

* dataset names in command lines

* use `ScriptArguments` everywhere

* ignore biais buffer to end

* remove in v0.13

* rm comment

* update test commands

* Update docs/source/rloo_trainer.md

* Update tests/test_rloo_trainer.py

* Added dataset_train_split argument to ppo.py and rloo.py

* update scripts with dataset_train_split
2024-10-14 11:14:58 +02:00
70036bf87f 🕊️ Migration PPOv2 -> PPO (#2174)
* delete old ppo

* rename ppov2 files

* PPOv2 -> PPO

* rm old doc

* rename ppo doc file

* rm old test

* rename test

* re-add v2 with deprecation

* style

* start update customization

* Lion

* Finish update customization

* remove ppo_multi_adaptater

* remove ppo example

* update some doc

* rm test no peft

* rm hello world

* processing class

* Update docs/source/detoxifying_a_lm.mdx

Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com>

* Update trl/trainer/ppov2_config.py

Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com>

* Update docs/source/customization.mdx

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update docs/source/detoxifying_a_lm.mdx

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* po to example overview

* drop lion

* remove "Use 8-bit optimizer"

* Update docs/source/customization.mdx

* Update docs/source/customization.mdx

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* it applies to all trainers

---------

Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2024-10-11 17:28:39 +02:00
07f0e687cb Use transformers utilities when possible (#2064)
* use transformers' availability functions

* require from transformers

* rm file

* fix no peft

* fix import

* don't alter  _peft_available

* fix require_diffusers

* style

* transformers>=4.40 and add back `is_liger_kernel_available`
2024-09-16 15:56:49 +02:00
d47220f299 make cuda-only tests device-agnostic (#2044)
* update code

* update

* fix style

---------

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2024-09-13 14:23:12 +02:00
cbcaa46cd3 Various args and test fix (#1909)
* report to none

* simplify AlignPropTrainerTester

* rm unused marker

* Don't share setup in dpo trainer

* style

* don't share setup in test rich

* fix setup and classmethod

* fix args for sft

* test_trainer_args

* various arg fix

* report to none and vsdt simplifi

* drop generate_during_eval

* fix run_name

* style

* drop setUpClass

* style

* new ref values for ppo trainer tester

* update ref val

---------

Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co>
2024-08-09 10:07:58 +02:00
c9d56366ed rm token (#1852) 2024-07-18 18:28:49 +02:00
a2adfb836a ref_model -> model_ref (#1835)
Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co>
2024-07-15 18:50:29 +02:00
3bbe7e0407 Fixed ref model not used in PPO generation (#1534) 2024-04-17 07:22:56 -07:00
66078c7c01 CI: Fix CI on main (#1422)
* fix CI on main

* final fix
2024-03-13 13:54:22 +01:00
9bc478ecbb pre-commit: replace linters + formatters with Ruff; fix some issues (#1300)
* pre-commit: replace linters + formatters with Ruff

* Don't use bare except

* Clean up `noqa`s

* Enable Ruff UP; apply auto-fixes

* Enable Ruff B; apply fixes

* Enable Ruff T with exceptions

* Enable Ruff C (complexity); autofix

* Upgrade Ruff to 0.2.0
2024-02-15 04:37:41 +01:00
ae8431bd50 Codemod Unittest assertions to bare asserts (#1301)
* Remove stray commas from test data

* Codemod Unittest assertions to bare asserts

* Make `assertAlmostEqual` tests more idiomatic

* DRY some test strings
2024-02-01 23:49:03 +01:00
a236c5750f Fix reported KL in PPO trainer (#1180)
* Fix reported KL in PPO trainer

previously this was always reporting the estimated KL, even when using `kl_penalty = 'full'` (or `abs`, etc).
Now we return the actual KL calculated in `compute_rewards()`, and report that.

* fix test
2024-01-09 06:48:25 +01:00
94fa4b022b Make CI happy (#1080)
* Update test_ppo_trainer.py

* Update test_ppo_trainer.py

* Update test_ppo_trainer.py
2023-12-11 16:52:17 +01:00
9d09b3e107 TextEnvironments (#424)
* WIP skeleton

* minimal working poc

* cleanup

* rename variables

* quick typo fix

* add v1 masking (#429)

* add v1 masking

* working v1

* adapt from suggestion

* avoid warning `Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.`

* fix masking

- mask the responses from API call only

* quality

* address comments

* Update trl/environment/base.py

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* adapt a bit

* wip on tokenization/masking in textenv

* small fixes

* update viz

* add example

* print debug text and pass masks

* style

* format and move tensor to device

* update example

* update example

* This seems to work

* fix masking

* fix rich output to console

---------

Co-authored-by: Costa Huang <costa.huang@outlook.com>
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: leandro <leandro.vonwerra@spoud.io>

* Add masking (#461)

* add v1 masking

* working v1

* adapt from suggestion

* avoid warning `Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.`

* fix masking

- mask the responses from API call only

* quality

* address comments

* Update trl/environment/base.py

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* adapt a bit

* wip on tokenization/masking in textenv

* small fixes

* update viz

* add example

* print debug text and pass masks

* style

* format and move tensor to device

* update example

* update example

* This seems to work

* fix masking

* fix rich output to console

* fix batched generation

* improve stopping criteria

* improve error handling in tool call

---------

Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Costa Huang <costa.huang@outlook.com>

* fix uknown tool

* fix rewards and increase bs

* remove unused script

* ugly WIP fix

* do not return modified obj for in-place operations

* do not return modified obj for in-place operations

* clean up stopping criterium

* push updates

* push update

* format, add docs

* rename file

* add kwargs to reward fn

* simplify example

* simplify example

* bug fix

* add a trivia example

* pre-commit

* max tool response length

* fix regex for multi-line

* refactor tool exceptions

* fix exceptions in tool

* add docs

* fix style

* make rich optional

* add docstrings

* add  tests

* add TextEnv tests (WIP)

* update triviaqa code

* update docs

* refactor text env

* update tests (WIP)

* add end2end test

* update docs

* upload tool demo

* refactor

* customizable system prompt

* add text env docs

* update index and toc

* fix `TextHistory` show methods

* add max length

* fix style

* fix typo

* refactor to kwargs in init and tasks to queries

* kwargs for reward docs

* Update examples/triviaqa.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update examples/tool_demo.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update docs/source/learning_tools.mdx

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update docs/source/learning_tools.mdx

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update docs/source/learning_tools.mdx

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update docs/source/text_environments.md

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update examples/triviaqa.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update examples/triviaqa.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* move to tool folder

* remove assets

* remove tool demo

* move rich import test to import utils

* add copyright

* fixes for masks in ppo trainer

* add text env api docs

* make precommit + add ppo test with mask

* move examples and add python

* fix style

* update triviaqa example

* add more docs

* update docs

* Update docs/source/learning_tools.mdx

* Apply suggestions from code review

* precommit

---------

Co-authored-by: Costa Huang <costa.huang@outlook.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: leandro von werra <leandro@hf.co>
2023-08-30 11:44:06 +02:00
6dd50b45d8 Add checks on backward batch size (#651)
* Add checks on backward batch size

* add test case

* update test case

* Update citation
2023-08-17 10:35:44 +02:00
a00ab445ba refactor grad accum (#546)
* refactor grad accum

* quick fix

* use correct place to step optim

* push changes

* cleanup and fix division by zero in `masked_var`

* revert back changes

* use unbiased var

* deal with division by zero

* add test case

* calculate advantage only once

* format

* add warning

* add more warnings

* quick fix

* remove unhelpful warning

* fix test cases

* fix test cases

* bump version given the breaking change

* black

* refactor

* update test cases

* error out

* push changes

* remove exact div

* add comments
2023-08-01 09:00:41 -04:00
31658b4263 Computes the KL penalty using the entire distribution (#541)
* adds full log probs

* Adds tests, comments

* precommit

* bug all -> full

* adds option description to sentiment analysis script, fixes a few bugs
2023-07-27 12:08:24 +02:00
2b531b9223 Adds some options to stabilize the KL penalty (#486)
* adds options for the kl penalty

* style

* adds kl penalty to trl sentiment example args

* ppo_config -> config

* fix tests (equal -> allclose)

* style

* add a random seed option

* updates kl penalty description

---------

Co-authored-by: Costa Huang <costa.huang@outlook.com>
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
2023-07-05 11:23:10 +02:00
9679d87012 Multi adapter RL (MARL) - a single model for RM & Value Head (#373)
* fix doc

* adapt from suggestions

* working v1 multiple adapters

* style

* style && quality

* oops

* docs

* add tests and docs

* add RM script

* Apply suggestions from code review

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update docs/source/0_abstraction_rl.mdx

* Apply suggestions from code review

* Update docs/source/0_abstraction_rl.mdx

* add 4bit

* replace with `reward_adapter`

* explain break

* simple comment

* fix llama tokenizer

* fixes

* fixes

* rename

* quality

* rm unneeded file

* add disclaimer

---------

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2023-06-22 11:19:45 +02:00
61af5f26b6 Fix correct gradient accumulation (#407)
* add correct grad acc

* add some tests but they fail

* test should pass

* style

* fix
2023-06-14 08:43:35 -04:00
1f29725381 fix broken tests (#318) 2023-04-25 13:57:40 +02:00
131e5cdd10 add functionality to push best models to the hub during training (#275)
* add functionality to push best models to the hub during training

* fix indentation

* Update tests/test_ppo_trainer.py

Co-authored-by: Nathan Lambert <nathan@huggingface.co>

* Update trl/trainer/ppo_trainer.py

Co-authored-by: Nathan Lambert <nathan@huggingface.co>

* Update trl/trainer/ppo_trainer.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* fix style

---------

Co-authored-by: Nathan Lambert <nathan@huggingface.co>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-04-10 11:32:53 -07:00
9c3e9e43d0 Batched generation (#228)
* add `_generate_batch`

* fix style

* omit tensor conversion

* no multiple pad by default

* add test

* stylez

* update docstring

* encoder/decoder check

* input shape safety

* moar style

---------

Co-authored-by: leandro von werra <leandro@hf.co>
2023-03-21 16:48:34 +01:00
6b88bba62b [test] attempt to fix CI test for PT 2.0 (#225)
* attempt to fix CI test

* attempt to fix CI to PT 2.0

* fix 3.7 issue

* fix

* make quality

* try

* Update tests/test_ppo_trainer.py
2023-03-17 10:42:38 +01:00
7940683014 [core] fix DP issue (#222)
* fix DP issue

* fix

* oops

* Empty-Commit

* skip test
2023-03-16 08:43:12 +01:00
03d9844730 Let's support naive Pipeline Parallelism (#210)
* add fixes in to support PP

* add same logic for enc-dec

* add more checks

* fix 20b issues

* clean up

* update scripts

* dp safety checker

* added multi gpu tests

* fix order

* change

* fix script
2023-03-15 08:28:52 +01:00
679f29d408 peft integration (#163)
* adds a hacky peft example

* fixes bug due to missing "prepare_model_for_training"

* Formatting

* adds peft to requirements

* Update trl/trainer/ppo_trainer.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* gpt neo runs

* changes requested on the PR

* style

* updates to prepare_model_for_int8_training PEFT PR https://github.com/huggingface/peft/pull/105

* updates to prepare_model_for_int8_training PEFT PR https://github.com/huggingface/peft/pull/105

* adds missing 8-bit attribute to modeling base

* adds lr to example script

* adds missing train to trainer

* disables caching temporarily while I debug something

* debugging issues with unstable training

* Fix peft + int8 (#170)

* add fix

* another fix

* Auto stash before merge of "peft-example" and "origin/peft-example"

* adds peft model types to modeling base

* reduces memory usage using adapters and no ref model.

* adds support for EleutherAI/gpt-neox-20b

* example for peft finetune of cm model

* removes hacky research code

* fixing the rebase and some typos

* style

* style2

* adds gradient checkpointing to base model

* cleans up comments

* moves config and other pretrained_model properties to __init__

* make style

* added tests

* change dependency

* Update .github/workflows/tests.yml

* fix test

* fix style and failing tests

* make quality

* revert change

* rm unneeded change

* revert changes

* rm changes

* rm changes

* rm uneeded change

* Update trl/models/modeling_base.py

* revert uneeded changes

* make style

* adapt suggestions

* fix tests

* attempt to fix

* fix

* fix

* add no peft test

* revert

* remove unneded check

* more tests

* fix logic

* add `save_pretrained` support

* fix quality

* clean up

* clean up

* stronger test

* refactor comments

* make style

* attempt to add non-peft tests

* remove test runner

* format

* fix test

* move `train` on top

* fix peft import

* make quality

* fixes typo

* adds peft example to docs

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: younesbelakda <younesbelkada@gmail.com>
2023-03-07 15:08:21 +01:00
88550226ab [core] Add max_grad_norm support (#177)
* add `max_grad_norm` support

* Update tests/test_ppo_trainer.py
2023-02-28 10:52:11 +01:00
f1300ec811 add minibatching (#153)
* add minibatching

* all the fixes i missed

* ore fixes

* add dedicated variable for mini batch size

* style

* minor fixes

* fix rewards

* unbiased variance estimation

* mask values/returns

* moar fixes

* style

* change structure and add moar tests

* Apply suggestions from code review

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* deprecate `forward_batch_size`

* remove out of date warning about batching s2s and left padding models

* make style

* fixed failed merge

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-02-23 15:24:20 +01:00
38579b6c5a Delete 2023-01-24 11:03:35 +01:00
ffe5f4f2ce [core] improve API (#97)
* improve API

- add kwargs check on `PPOTrainer`
- add tests

* make all args kwargs
2023-01-24 11:00:42 +01:00
efb2e7563e [API] LR scheduler support (#96)
* lr scheduler support

- add lr scheduler support
- add tests

* Update trl/trainer/ppo_trainer.py

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* fix test

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
2023-01-23 15:56:59 +01:00
8c9a39b79b Encoder-Decoder models support (#93)
* v1

* add example script

* still not working

* update trainer

* update script

* fix example and clean code

* fix style

* fix style

* fix typo

* add post_init method

* add testing suite for seq2seq

- refactor testing suite
- push tiny models
- add testing suite for `seq2seq` models

* attempt to fix CI mem issue

* attempt to fix CI mem issue

* forward contrib credits from enc-dec PR

* correct script

* correct `fbs`

* address comments

* Apply suggestions from code review

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* Update examples/scripts/ppo-sentiment-t5-small.py

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

Co-authored-by: leandro <leandro.vonwerra@spoud.io>
Co-authored-by: SSamDav <SSamDav@users.noreply.github.com>
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
2023-01-23 11:00:39 +01:00
e954fa00e5 [core] remove wandb dependency (#92)
* remove `wandb` dependency

* make the logging framework agnostic

* `tensorboard` support

* modify func name

* fix default value
2023-01-19 15:17:01 +01:00
4f5c16fafd [core] Push v_head when using AutoModelForCausalLMWithValueHead (#86)
* v1

- working script
- added tests
- possible to load `v_head`
- possible to use `transformers` too

* fix trainer test

* add push_to_hub compatibility

* add `push_to_hub` tests

* few updates

- update based on comments
- add more tests
- update docs

* Update docs/source/quickstart.mdx

* clearer doc

* support sharded models

* `save_pretrained` support for sharded case
2023-01-19 14:45:18 +01:00
7a4780add0 [API] Make dataset attribute optional (#85)
* make `dataset` attribute optional

- add a warning for `batch_size`issue
- added tests

* update readme
2023-01-17 12:01:33 +01:00
6c5f2785a6 [PPOTrainer] Support generic optimizers (#78)
* v1

- add generic support
- add bnb example

* adapt LR

* add tests

* Update trl/trainer/ppo_trainer.py

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
2023-01-05 15:39:24 +01:00
d6fe301226 [core] refactor step method (#76)
* refactor `step` method

- add safety checker + manual device assignment

* cleanup

* isort

* more generic

* make style

* Update trl/trainer/ppo_trainer.py

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* make style + add tests

* more tests

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
2023-01-05 15:38:44 +01:00
4ec925552e [PPOTrainer] make the reference model optional (#67)
* v1

* make quality + refactor tests

* make quality

* Update trl/trainer/ppo_trainer.py

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* final grain of salt

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
2022-12-30 12:40:42 +01:00