frozenleaves/trl - trl - Gitea: Git for Me

mirror of https://github.com/huggingface/trl.git synced 2025-10-20 18:43:52 +08:00

Author	SHA1	Message	Date
Quentin Gallouédec	9df19e8a75	📜 Fix license and copyrights (#3264 )	2025-04-08 15:22:58 -07:00
Quentin Gallouédec	1d23ecc36f	©️ Update copyrights year (#2547 ) * happy new year * fix wandb import sort	2025-01-07 14:53:09 +01:00
Quentin Gallouédec	ca850be0a2	🕹️ CLI refactor (#2380 ) * Refactor main function in dpo.py * Update setup.py and add cli.py * Add examples to package data * style * Refactor setup.py file * Add new file t.py * Move dpo to package * Update MANIFEST.in and setup.py, refactor trl/cli.py * Add __init__.py to trl/scripts directory * Add license header to __init__.py * File moved instruction * Add Apache License and update file path * Move dpo.py to new location * Refactor CLI and DPO script * Refactor import structure in scripts package * env * rm config from chat arg * rm old cli * chat init * test cli [skip ci] * Add `datast_config_name` to `ScriptArguments` (#2440) * add missing arg * Add test cases for 'trl sft' and 'trl dpo' commands * Add sft.py script and update cli.py to include sft command * Move sft script * chat * style [ci skip] * kto * rm example config * first step on doc * see #2442 * see #2443 * fix chat windows * ©️ Copyrights update (#2454) * First changes * Other files * Finally * rm comment * fix nashmd * Fix example * Fix example [ci skip] * 💬 Fix chat for windows (#2443) * fix chat for windows * add some tests back * Revert "add some tests back" This reverts commit 350aef52f53f8cf34fccd7ad0f78a3dd63867e06. * 🆔 Add `datast_config` to `ScriptArguments` (#2440) * datast_config_name * Update trl/utils.py [ci skip] * sort import * typo [ci skip] * Trigger CI * Rename `dataset_config_name` to `dataset_config` * 🏎 Fix deepspeed preparation of `ref_model` in `OnlineDPOTrainer` (#2417) * Remove unused deepspeed code * add model prep back * add deepspeed even if it doesn't work * rm old code * Fix config name * Remove `make dev` in favor of `pip install -e .[dev]` * Update script paths and remove old symlink related things * Fix chat script path [ci skip] * style	2024-12-13 17:52:23 +01:00
Quentin Gallouédec	460e780265	👯 Standardize `model_args` (#2442 ) * `model_config` -> `model_args` * sort	2024-12-10 12:51:20 +01:00
Quentin Gallouédec	6a05feff02	🆔 Add `datast_config` to `ScriptArguments` (#2440 ) * datast_config_name * Update trl/utils.py [ci skip] * sort import * typo [ci skip] * Trigger CI * Rename `dataset_config_name` to `dataset_config`	2024-12-10 11:09:26 +01:00
Quentin Gallouédec	9410874787	©️ Copyrights update (#2454 ) * First changes * Other files * Finally * rm comment * fix nashmd * Fix example * Fix example [ci skip]	2024-12-10 10:40:00 +01:00
Quentin Gallouédec	7e394b03e8	🎭 Deprecate `[SFT/DPO/Reward]ScriptArguments` in favour of `ScriptArguments` (#2145 ) * `DPOScriptArguments` to `ScriptArguments` * use dataset_train_split * Use scriptarguments * dataset names in command lines * use `ScriptArguments` everywhere * ignore biais buffer to end * remove in v0.13 * rm comment * update test commands * Update docs/source/rloo_trainer.md * Update tests/test_rloo_trainer.py * Added dataset_train_split argument to ppo.py and rloo.py * update scripts with dataset_train_split	2024-10-14 11:14:58 +02:00
Quentin Gallouédec	47d08a9626	Rename trainer arg `tokenizer` to `processing_class` (#2162 )	2024-10-07 09:39:32 +02:00
Quentin Gallouédec	c00722ce0a	🃏 Model card for TRL (#2123 ) * template and util * test for online dpo * template in package_data * template in manifest * standardize push_to_hub * wandb badge and quick start * bco * xpo * simplify `create_model_card` * cpo * kto * dpo * gkd * orpo * style * nash-md * alignprop * bco citation * citation template * cpo citation * ddpo * fix alignprop * dpo * gkd citation * kto * online dpo citation * orpo citation * citation in utils * optional citation * reward * optional trainer citation * sft * remove add_model_tags bco * Remove unnecessary code for adding model tags * Fix model tag issue and update URL format * Remove unused code for adding model tags * Add citation for XPOTrainer * Remove unused code in SFTTrainer * Add model card generation in RLOOTrainer * Remove unused import and method call in reward_trainer.py * Add model card generation * Remove unused code and update error message in ORPOTrainer class * Add import statements and create model card in IterativeSFTTrainer * Add dataset name to push_to_hub() call * Update trainer.push_to_hub() dataset names * script args * test * better doc * fix tag test * fix test tag * Add tags parameter to create_model_card method * doc * script args * Update trl/templates/model_card.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * unittest's `assertIn` instead of `assert` * Update trl/templates/model_card.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2024-09-27 15:23:05 +02:00
Quentin Gallouédec	32d9d34eb1	Standardize pushing to Hub in examples (#2126 )	2024-09-26 10:00:51 +02:00
lewtun	6859e048da	Fix PPO/RLOO examples (#2100 )	2024-09-23 11:49:36 +02:00
Quentin Gallouédec	10c2f63b2a	`training_args` for all `TrainingArguments` (#2082 )	2024-09-19 15:03:47 +02:00
Quentin Gallouédec	40f05226de	Standardizing datasets for testing (#2065 ) * zen dataset * Update dataset test bco * some tests * Simple chat template * bco * xpo * kto * gkd * trainer_args * sft * online dpo * orpo * zen script	2024-09-14 22:34:15 +02:00
Quentin Gallouédec	4c92ba5769	©️ Copyrights (#2063 ) * copyrights * fail if missing	2024-09-13 14:18:47 +02:00
Rylan Schaeffer	2ee0b62cdb	Change `non_eos_penalty` to `missing_eos_penalty` to be consistent across `OnPolicy` trainers (#2033 ) * Subtract a penalty from OnPolicy Trainers if output does not contain an EOS token * Caught a few other problems * Updated the documentation for RLOO trainer and PPOv2Trainer * Corrected the default type and value for missing_eos_penalty * Made RLOO Trainer consistent with Online DPO and PPOv2 * Removed --non_eos_penalty from all documentation * Made missing_eos_penalty examples positive (because we subtract). * Caught two more incorrect examples * Removed unnecessary whitespace to make ruff happy * Update trl/trainer/utils.py --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>	2024-09-10 14:40:23 +02:00
Quentin Gallouédec	f05f63c1ea	`PartialState().local_main_process_first()` when `map` in examples (#1926 ) * `PartialState().local_main_process_first()` when map in examples * allow load from cache --------- Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co>	2024-08-14 12:01:03 +02:00
Quentin Gallouédec	54f806b6ff	Standardize `dataset_num_proc` usage (#1925 ) * uniform dataset_num_proc * num_proc in shuffle * Update examples/datasets/anthropic_hh.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update examples/scripts/ppo.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update examples/scripts/ppo.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> --------- Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2024-08-13 15:10:39 +02:00
Quentin Gallouédec	7ddef5c158	Make use of `trust_remote_code` consistent (#1806 ) Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co>	2024-07-10 18:26:11 +02:00
Costa Huang	34d273f227	Support num_train_epochs (#1743 ) * add a test case for num_train_epochs * fix ci * quick change * disable push to hub * debug windows ci * try another fix * skip subprocess tests on windows	2024-06-20 13:16:43 -04:00
Costa Huang	e7cb597230	Fix ppov2 test case (#1661 ) * Fix PPOv2 / RLOO refactor's stuff * update terminology to use stop token	2024-05-23 11:37:16 -04:00
Costa Huang	13454d2f4b	PPO / Reinforce Trainers (#1540 ) * Add ppov2 trainer * make eos trick optional, remove unused args * quick fix * precommit * update debugging script * fix out of bound `drop_last=True`; use built-in scheduler * Add PPO examples * push changes * quick change * quick change * various bug fixes * remove unnecessary grad accumulation setting * push new changes * fix DS3 model saving * update ppo.py * refactor * quick change * refactor * update ppo trainer * refactor * quick test * add ds2 /ds3 7 processes config * add vllm trainer * quick change * experiment with reward normalization * push changes * quick push * push changes * push various changes * refactor to use ModelConfig * quick change * refactor * refactor * Simplify DS logic * quick update * remove unnecessary files * precommit * deepspeed fix; handle edge case when eos_token_id = 0 * add PPO tldr example * add TL;DR example * fix undefined var * utilize all samples in rloo * quick setting * remove the unnecessary `value_model` * use exact_div * allow saving the deepspeed model * refactor * remove dead code * Use some shared utilities * add some end-to-end test cases * add PPOv2 docs and RLOO docs / tests * update docs * quikc push * fix ci * fix type annotation for ci * quick update * update trainer docs	2024-05-22 08:31:10 -04:00

21 Commits