Commit Graph

17 Commits

Author SHA1 Message Date
9df19e8a75 📜 Fix license and copyrights (#3264) 2025-04-08 15:22:58 -07:00
c0854c32c9 🌌 Fix logits computation in trainer prediction step (#2969)
* Fix logits computation in DPO trainer prediction step

* fix compute_metrics for bco and test

* same for cpo

* same from dpo

* for kto

* anf finally orpo

* Apply style fixes

---------

Co-authored-by: kyungdae-jo <kyungdae.jo@navercorp.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-02-27 17:09:11 +01:00
5877786b5a 🪄 Minor comment style modif (#2582) 2025-01-17 11:12:00 +01:00
1d23ecc36f ©️ Update copyrights year (#2547)
* happy new year

* fix wandb import sort
2025-01-07 14:53:09 +01:00
9410874787 ©️ Copyrights update (#2454)
* First changes

* Other files

* Finally

* rm comment

* fix nashmd

* Fix example

* Fix example [ci skip]
2024-12-10 10:40:00 +01:00
453db5cd79 🤏 New models for tests (#2287)
* first commit

* uncomment

* other tests adaptations

* Remove unused variable in test_setup_chat_format

* Remove unused import statement

* style

* Add Bart model

* Update BCOTrainerTester class in test_bco_trainer.py

* Update model IDs and tokenizers in test files

* Add new models and processors

* Update model IDs in test files

* Fix formatting issue in test_dataset_formatting.py

* Refactor dataset formatting in test_dataset_formatting.py

* Fix dataset sequence length in SFTTrainerTester

* Remove tokenizer

* Remove print statement

* Add reward_model_path and sft_model_path to PPO trainer

* Fix tokenizer padding issue

* Add chat template for testing purposes in PaliGemma model

* Update PaliGemma model and chat template

* Increase learning rate to speed up test

* Update model names in run_dpo.sh and run_sft.sh scripts

* Update model and dataset names

* Fix formatting issue in test_dataset_formatting.py

* Fix formatting issue in test_dataset_formatting.py

* Remove unused chat template

* Update model generation script

* additional models

* Update model references in test files

* Remove unused imports in test_online_dpo_trainer.py

* Add is_llm_blender_available import and update reward_tokenizer

* Refactor test_online_dpo_trainer.py: Move skipped test case decorator

* remove models without chat templates

* Update model names in scripts and tests

* Update model_id in test_modeling_value_head.py

* Update model versions in test files

* Fix formatting issue in test_dataset_formatting.py

* Update embedding model ID in BCOTrainerTester

* Update test_online_dpo_trainer.py with reward model changes

* Update expected formatted text in test_dataset_formatting.py

* Add reward_tokenizer to TestOnlineDPOTrainer

* fix tests

* Add SIMPLE_CHAT_TEMPLATE to T5 tokenizer

* Fix dummy_text format in test_rloo_trainer.py

* Skip outdated test for chatML data collator

* Add new vision language models

* Commented out unused model IDs in test_vdpo_trainer

* Update model and vision configurations in generate_tiny_models.py and test_dpo_trainer.py

* Update model and tokenizer references

* Don't push if it already exists

* Add comment explaining test skip

* Fix model_exists function call and add new models

* Update LlavaForConditionalGeneration model and processor

* `qgallouedec` -> `trl-internal-testing`
2024-11-25 16:31:56 +01:00
73c3970c1f 🙅 Ensure dependency optionality (#2301)
* Add conditional check for LLMBlender availability in test_judges.py

* Fix import issues and update test requirements

* Remove unused imports

* Add require_peft decorator to test cases

* Fix import_utils module to use correct package name for llm_blender
2024-10-31 22:37:49 +01:00
24fb32733f 🔧 Use standard unittest assertion methods (#2283)
* WIP: Partial unit test update

* Update unittest format

* Update tests/slow/test_sft_slow.py comment

* Refactor unit tests: replace pytest.raises with self.assertRaises

* Fix: Restore accidentally deleted 'ref_model' parameter in DPOTrainer

* Re-run pre-commit

* fix: Incorrectly replacing non-TestCase assert

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2024-10-31 15:10:43 +01:00
47d08a9626 Rename trainer arg tokenizer to processing_class (#2162) 2024-10-07 09:39:32 +02:00
d45c86e2a7 Conversational dataset support for CPOTrainer (#2144)
* extract prompt and apply chat template in cpo trainer

* default leanring rate

* simplify example

* update doc

* test all formats

* extend exptract prompt

* improve doc format

* link in dataset formats

* Update docs/source/cpo_trainer.mdx

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update docs/source/cpo_trainer.mdx

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

---------

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2024-10-04 18:01:02 +02:00
07f0e687cb Use transformers utilities when possible (#2064)
* use transformers' availability functions

* require from transformers

* rm file

* fix no peft

* fix import

* don't alter  _peft_available

* fix require_diffusers

* style

* transformers>=4.40 and add back `is_liger_kernel_available`
2024-09-16 15:56:49 +02:00
40f05226de Standardizing datasets for testing (#2065)
* zen dataset

* Update dataset test bco

* some tests

* Simple chat template

* bco

* xpo

* kto

* gkd

* trainer_args

* sft

* online dpo

* orpo

* zen script
2024-09-14 22:34:15 +02:00
cbcaa46cd3 Various args and test fix (#1909)
* report to none

* simplify AlignPropTrainerTester

* rm unused marker

* Don't share setup in dpo trainer

* style

* don't share setup in test rich

* fix setup and classmethod

* fix args for sft

* test_trainer_args

* various arg fix

* report to none and vsdt simplifi

* drop generate_during_eval

* fix run_name

* style

* drop setUpClass

* style

* new ref values for ppo trainer tester

* update ref val

---------

Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co>
2024-08-09 10:07:58 +02:00
6e1652bc5e Add CPO-SimPO method (#1760)
* enable cpo-simpo

* highlight SimPO and CPO-SimPO

* add test for cpo_alpha

* formatting

* Update docs/source/cpo_trainer.mdx

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2024-06-23 18:54:30 +02:00
b8b972fde1 Add a variant of CPO, SimPO (#1703)
* add a variant of cpo: simpo

* correct cpo-simpo loss

* avoid 0 int error in logging

* add simpo description

* Update trl/trainer/cpo_trainer.py

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>

* fix formatting

* add test for simpo

* Update docs/source/cpo_trainer.mdx

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>

* add a docstring for simpogamma

* move simpo description to the above docstring

* change simpo description in the doc

* formatting

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2024-06-06 17:06:47 -04:00
a02513c3b7 Apply deprecated evaluation_strategy (#1559)
* Deprecate

* Update tests/test_dpo_trainer.py

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2024-05-23 12:48:00 +02:00
d1df79f83c Add CPOTrainer (#1382)
* add CPOTrainer

* add docs

* fix formatting

* removed precompute_ref_log_probs arg

* remove precompute_ref_log_probs

* typos

* finish cpo trainer doc

* remove redundant lines

* typo

* formatting

* compute chosen nll loss also for enc-dec models

* fix gradient error of inplace operation for enc-dec models

* formatting

* use CPOConfig

* formatting

* use model_init_kwargs from CPOConfig

* comments in example

* fix doc string

* fix typo in docstring

* update year

* fixed typo

* use preference dataset

* fix learning rate

* move dataset_num_proc to configs

* Update cpo paper link from HF: cpo_trainer.mdx

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* update description for CPO: cpo_trainer.mdx

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* remove _prepare_deepspeed for cpo

Because CPO does not need init for reference model

* Add explanation to CPO loss

* format

* fix bug when lengths are given

* add CPOTrainer to README

* fix grammer

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2024-03-22 21:32:45 +01:00