frozenleaves/trl - trl - Gitea: Git for Me

mirror of https://github.com/huggingface/trl.git synced 2025-10-20 18:43:52 +08:00

Author	SHA1	Message	Date
Quentin Gallouédec	9df19e8a75	📜 Fix license and copyrights (#3264 )	2025-04-08 15:22:58 -07:00
Quentin Gallouédec	5877786b5a	🪄 Minor comment style modif (#2582 )	2025-01-17 11:12:00 +01:00
Quentin Gallouédec	1d23ecc36f	©️ Update copyrights year (#2547 ) * happy new year * fix wandb import sort	2025-01-07 14:53:09 +01:00
Gaetan LOPEZ LATOUCHE	179ba53671	🐾 Process-supervised RM Trainer (#2127 ) * initial skeleton * tokenize fn * adding bos and eos to tokenization fn * prmtrainer * fixing small typo in tokenize * typo in input_ids and labels construction * numpy dimension * introduce the stepwise reward trainer * update markdown files * let user decide post step separator in config * doc post_step_separator * do not add post step_tokens to last step of the reasoning process * renaming prm to stepwisereward * formatting * fix tokenize kwargs * adapt test to the new post_token args * adding example script * fix small typo * add create_model_card and renaming * fixing booleans * Adding the new stepwise_preference instead of placeholders for datasets * formatting * Update docs/source/_toctree.yml Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update examples/scripts/stepwise_reward_modeling.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update trl/trainer/stepwise_reward_trainer.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update trl/trainer/stepwise_reward_trainer.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * update push to hub Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * step_separator can't be None Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * fix suggested typos * add citation * reformat doc * reordering init * push to hub prm800k * changing dataset in example * change dataset format to align with the sky is blue example * fix tokenization column names * fix num labels in openai example * add support for conversational dataset * remove training whitespace * replace tokenizer with processing class * Update docs/source/dataset_formats.mdx Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * remove openai_prm800k * Update trl/trainer/stepwise_reward_trainer.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update trl/trainer/stepwise_reward_trainer.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update docs/source/stepwise_reward_trainer.mdx Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update docs/source/stepwise_reward_trainer.mdx Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * renaming Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * renaming Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * minor renamings in docs * using prm800k instead of openai_prm800k * update num labels to 2 following the new format * changing doc examples to math examples * change reference to dataset_formats.mdx * changing dataset config in test * remove conversational dataset support * remove conv dataset support * fix bos token * fix scriptarguments in example * completion to completions * remove valuerror for step_separator inside steps * run precommit * remove conv dataset support Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * renaming zen dataset * remove unused printing * unknown label column * introduce the train on last step arg * _tokenize support train_on_last_step * incorporate train_on_last_step to tests * formatting * remove comments in trainer * Refactor `tokenize_row` * Update max_completion_length parameter in StepwiseRewardConfig * Collator * Update comment * Update type hint * fix table * Remove collator * don't need pad token id * add error back * max length args * use tokenizer arg * Update doc * label -> labels * fixing tokenization issues in tokenize row * correct labels for token classification * adding max_length to tokenize_row * reformat tests * adding tests for tokenize row * fixing typos in comments * update doc Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> * Add math_shepherd.py script for dataset processing * split the dataset * formatting * same evaluation method for the two training methods * adding filtering to example script * formatting * Add features to avoid casting labels to bool in dataset tokenization * Update docs/source/stepwise_reward_trainer.mdx [ci skip] * Add learning_rate parameter to StepwiseRewardConfig class * update doc * Remove unused setup_chat_format function * Fix warning message in stepwise_reward_modeling.py * Update logging steps in stepwise_reward_trainer.mdx * little doc change [ci skip] * Fix copyrights * fix space after copyrights * Update dataset loading in stepwise_reward_modeling.py * refine compute_accuracy and proper test * fix tests * style * renamings * renaming in init * doc renaming * fix sorting and tag * experiemental [ci skip] * trigger CI * other doc fix --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co>	2024-12-13 15:56:10 +01:00
Quentin Gallouédec	9410874787	©️ Copyrights update (#2454 ) * First changes * Other files * Finally * rm comment * fix nashmd * Fix example * Fix example [ci skip]	2024-12-10 10:40:00 +01:00
Quentin Gallouédec	453db5cd79	🤏 New models for tests (#2287 ) * first commit * uncomment * other tests adaptations * Remove unused variable in test_setup_chat_format * Remove unused import statement * style * Add Bart model * Update BCOTrainerTester class in test_bco_trainer.py * Update model IDs and tokenizers in test files * Add new models and processors * Update model IDs in test files * Fix formatting issue in test_dataset_formatting.py * Refactor dataset formatting in test_dataset_formatting.py * Fix dataset sequence length in SFTTrainerTester * Remove tokenizer * Remove print statement * Add reward_model_path and sft_model_path to PPO trainer * Fix tokenizer padding issue * Add chat template for testing purposes in PaliGemma model * Update PaliGemma model and chat template * Increase learning rate to speed up test * Update model names in run_dpo.sh and run_sft.sh scripts * Update model and dataset names * Fix formatting issue in test_dataset_formatting.py * Fix formatting issue in test_dataset_formatting.py * Remove unused chat template * Update model generation script * additional models * Update model references in test files * Remove unused imports in test_online_dpo_trainer.py * Add is_llm_blender_available import and update reward_tokenizer * Refactor test_online_dpo_trainer.py: Move skipped test case decorator * remove models without chat templates * Update model names in scripts and tests * Update model_id in test_modeling_value_head.py * Update model versions in test files * Fix formatting issue in test_dataset_formatting.py * Update embedding model ID in BCOTrainerTester * Update test_online_dpo_trainer.py with reward model changes * Update expected formatted text in test_dataset_formatting.py * Add reward_tokenizer to TestOnlineDPOTrainer * fix tests * Add SIMPLE_CHAT_TEMPLATE to T5 tokenizer * Fix dummy_text format in test_rloo_trainer.py * Skip outdated test for chatML data collator * Add new vision language models * Commented out unused model IDs in test_vdpo_trainer * Update model and vision configurations in generate_tiny_models.py and test_dpo_trainer.py * Update model and tokenizer references * Don't push if it already exists * Add comment explaining test skip * Fix model_exists function call and add new models * Update LlavaForConditionalGeneration model and processor * `qgallouedec` -> `trl-internal-testing`	2024-11-25 16:31:56 +01:00
Clay	24fb32733f	🔧 Use standard unittest assertion methods (#2283 ) * WIP: Partial unit test update * Update unittest format * Update tests/slow/test_sft_slow.py comment * Refactor unit tests: replace pytest.raises with self.assertRaises * Fix: Restore accidentally deleted 'ref_model' parameter in DPOTrainer * Re-run pre-commit * fix: Incorrectly replacing non-TestCase assert --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2024-10-31 15:10:43 +01:00
Quentin Gallouédec	47d08a9626	Rename trainer arg `tokenizer` to `processing_class` (#2162 )	2024-10-07 09:39:32 +02:00
lewtun	cc23b511e4	[`RewardTrainer`] Tokenize inputs within trainer (#2102 ) * Pretokenize in reward modelling * Fix README example * Apply suggestions from code review Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Move chat template formatting inside trainer * Refactor tests * Fix README * Disable wandb * Update readme * add comment `remove_unused_columns` * Update trl/trainer/reward_config.py * doc * implicit* * explicit --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co>	2024-09-24 13:03:32 +02:00
Quentin Gallouédec	07f0e687cb	Use `transformers` utilities when possible (#2064 ) * use transformers' availability functions * require from transformers * rm file * fix no peft * fix import * don't alter _peft_available * fix require_diffusers * style * transformers>=4.40 and add back `is_liger_kernel_available`	2024-09-16 15:56:49 +02:00
Quentin Gallouédec	cbcaa46cd3	Various args and test fix (#1909 ) * report to none * simplify AlignPropTrainerTester * rm unused marker * Don't share setup in dpo trainer * style * don't share setup in test rich * fix setup and classmethod * fix args for sft * test_trainer_args * various arg fix * report to none and vsdt simplifi * drop generate_during_eval * fix run_name * style * drop setUpClass * style * new ref values for ppo trainer tester * update ref val --------- Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co>	2024-08-09 10:07:58 +02:00
Quentin Gallouédec	332062372d	Drop `setUpClass` in reward tester (#1895 ) * drop setUp class in reward tester * report to none * style --------- Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co>	2024-08-05 16:01:43 +02:00
Zach Mueller	a02513c3b7	Apply deprecated `evaluation_strategy` (#1559 ) * Deprecate * Update tests/test_dpo_trainer.py --------- Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>	2024-05-23 12:48:00 +02:00
Kashif Rasul	d10f7663b0	[peft] Update test_reward_trainer.py to fix tests (#1471 ) * [peft] Update test_reward_trainer.py Since we are requiring peft >= 0.4.0 * formatting	2024-03-22 19:12:54 +01:00
Younes Belkada	1e77d8aeb2	[`core` / `xxxTrainer`] Automatic tagging (#1329 ) * automatic tagging * add comments * fix tests * fix	2024-02-15 14:47:32 +01:00
Aarni Koskela	9bc478ecbb	pre-commit: replace linters + formatters with Ruff; fix some issues (#1300 ) * pre-commit: replace linters + formatters with Ruff * Don't use bare except * Clean up `noqa`s * Enable Ruff UP; apply auto-fixes * Enable Ruff B; apply fixes * Enable Ruff T with exceptions * Enable Ruff C (complexity); autofix * Upgrade Ruff to 0.2.0	2024-02-15 04:37:41 +01:00
Aarni Koskela	ae8431bd50	Codemod Unittest assertions to bare asserts (#1301 ) * Remove stray commas from test data * Codemod Unittest assertions to bare asserts * Make `assertAlmostEqual` tests more idiomatic * DRY some test strings	2024-02-01 23:49:03 +01:00
Jan Vincent Hoffbauer	08cfc4179b	Add margin to RM training (#719 ) * Start adding margin to RM training * Fix typo and cleanup * Fix incompatibilities when not using margin * Format using 'make precommit' * Add documentation and test for reward trainer * Run 'make precommit' * Update docs/source/reward_trainer.mdx Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Fix missed merge conflict in reward trainer docs --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2023-09-20 10:18:38 +02:00
lewtun	d484dc2a93	Refactor RewardTrainer hyperparameters into dedicated dataclass (#726 ) * Refactor RewardTrainer hyperparameters into dedicated dataclass * Revert * Add doc string * Fix warning * Handle backwards compat * Fix tests * Add docs * Refactor to RewardConfig * Fix case conditions * Fix	2023-09-05 09:05:42 +02:00
Younes Belkada	843c14574f	fix CI RM (#468 )	2023-06-26 14:30:06 +02:00
Younes Belkada	b4bb12992e	Update test_reward_trainer.py (#421 )	2023-06-09 15:52:41 +02:00
Younes Belkada	fadffc22bc	Update test_reward_trainer.py (#410 )	2023-06-07 12:22:22 +02:00
Tom Aarsen	376d152d3f	Resolve broken evaluation/prediction for RewardTrainer (#404 ) * Implement evaluation/prediction for RewardTrainer * Stick with unittest assertions * Perform prediction forward calls without gradient * Remove Literal to preserve Python 3.7 support I recognize that I can also import from typing_extensions with a try-except, but that is a bit overkill for this I feel. * Remove eval_steps=1 to prevent flaky test on CI The flaky test is caused by a division by zero when dividing by the runtime. This is done on the transformers side, so it's not a TRL issue. In practice, this won't happen - it only happens because both the model and dataset are tiny.	2023-06-06 16:49:30 +02:00
Younes Belkada	3cfe194e34	[`core`] Officially Support Reward Modeling (#303 ) * v1 - add working version - add all possible tests - add docs * add some contents * clean up * fixes * patch test for now * fix test * clean up * fix * this time fix * Update docs/source/trainer.mdx Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * fixe * update * final changes * oops * Update docs/source/reward_trainer.mdx Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update docs/source/reward_trainer.mdx Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update docs/source/reward_trainer.mdx Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * switch to chosen / rejected * fixes * add example * add accuracy metric * pass PEFT config * refactor compute metrics --------- Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2023-04-26 11:51:56 +02:00

24 Commits