frozenleaves/trl - trl - Gitea: Git for Me

mirror of https://github.com/huggingface/trl.git synced 2025-10-21 02:53:59 +08:00

Author	SHA1	Message	Date
lewtun	6859e048da	Fix PPO/RLOO examples (#2100 )	2024-09-23 11:49:36 +02:00
Quentin Gallouédec	10c2f63b2a	`training_args` for all `TrainingArguments` (#2082 )	2024-09-19 15:03:47 +02:00
Quentin Gallouédec	40f05226de	Standardizing datasets for testing (#2065 ) * zen dataset * Update dataset test bco * some tests * Simple chat template * bco * xpo * kto * gkd * trainer_args * sft * online dpo * orpo * zen script	2024-09-14 22:34:15 +02:00
Quentin Gallouédec	4c92ba5769	©️ Copyrights (#2063 ) * copyrights * fail if missing	2024-09-13 14:18:47 +02:00
Rylan Schaeffer	2ee0b62cdb	Change `non_eos_penalty` to `missing_eos_penalty` to be consistent across `OnPolicy` trainers (#2033 ) * Subtract a penalty from OnPolicy Trainers if output does not contain an EOS token * Caught a few other problems * Updated the documentation for RLOO trainer and PPOv2Trainer * Corrected the default type and value for missing_eos_penalty * Made RLOO Trainer consistent with Online DPO and PPOv2 * Removed --non_eos_penalty from all documentation * Made missing_eos_penalty examples positive (because we subtract). * Caught two more incorrect examples * Removed unnecessary whitespace to make ruff happy * Update trl/trainer/utils.py --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>	2024-09-10 14:40:23 +02:00
Quentin Gallouédec	f05f63c1ea	`PartialState().local_main_process_first()` when `map` in examples (#1926 ) * `PartialState().local_main_process_first()` when map in examples * allow load from cache --------- Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co>	2024-08-14 12:01:03 +02:00
Quentin Gallouédec	54f806b6ff	Standardize `dataset_num_proc` usage (#1925 ) * uniform dataset_num_proc * num_proc in shuffle * Update examples/datasets/anthropic_hh.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update examples/scripts/ppo.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update examples/scripts/ppo.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> --------- Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2024-08-13 15:10:39 +02:00
Quentin Gallouédec	7ddef5c158	Make use of `trust_remote_code` consistent (#1806 ) Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co>	2024-07-10 18:26:11 +02:00
Costa Huang	34d273f227	Support num_train_epochs (#1743 ) * add a test case for num_train_epochs * fix ci * quick change * disable push to hub * debug windows ci * try another fix * skip subprocess tests on windows	2024-06-20 13:16:43 -04:00
Costa Huang	e7cb597230	Fix ppov2 test case (#1661 ) * Fix PPOv2 / RLOO refactor's stuff * update terminology to use stop token	2024-05-23 11:37:16 -04:00
Costa Huang	13454d2f4b	PPO / Reinforce Trainers (#1540 ) * Add ppov2 trainer * make eos trick optional, remove unused args * quick fix * precommit * update debugging script * fix out of bound `drop_last=True`; use built-in scheduler * Add PPO examples * push changes * quick change * quick change * various bug fixes * remove unnecessary grad accumulation setting * push new changes * fix DS3 model saving * update ppo.py * refactor * quick change * refactor * update ppo trainer * refactor * quick test * add ds2 /ds3 7 processes config * add vllm trainer * quick change * experiment with reward normalization * push changes * quick push * push changes * push various changes * refactor to use ModelConfig * quick change * refactor * refactor * Simplify DS logic * quick update * remove unnecessary files * precommit * deepspeed fix; handle edge case when eos_token_id = 0 * add PPO tldr example * add TL;DR example * fix undefined var * utilize all samples in rloo * quick setting * remove the unnecessary `value_model` * use exact_div * allow saving the deepspeed model * refactor * remove dead code * Use some shared utilities * add some end-to-end test cases * add PPOv2 docs and RLOO docs / tests * update docs * quikc push * fix ci * fix type annotation for ci * quick update * update trainer docs	2024-05-22 08:31:10 -04:00

11 Commits