frozenleaves/trl - trl - Gitea: Git for Me

mirror of https://github.com/huggingface/trl.git synced 2025-10-20 18:43:52 +08:00

Author	SHA1	Message	Date
Quentin Gallouédec	23a635ed61	Release: v0.16 (#3137 )	2025-03-22 14:03:54 -07:00
Quentin Gallouédec	ca850be0a2	🕹️ CLI refactor (#2380 ) * Refactor main function in dpo.py * Update setup.py and add cli.py * Add examples to package data * style * Refactor setup.py file * Add new file t.py * Move dpo to package * Update MANIFEST.in and setup.py, refactor trl/cli.py * Add __init__.py to trl/scripts directory * Add license header to __init__.py * File moved instruction * Add Apache License and update file path * Move dpo.py to new location * Refactor CLI and DPO script * Refactor import structure in scripts package * env * rm config from chat arg * rm old cli * chat init * test cli [skip ci] * Add `datast_config_name` to `ScriptArguments` (#2440) * add missing arg * Add test cases for 'trl sft' and 'trl dpo' commands * Add sft.py script and update cli.py to include sft command * Move sft script * chat * style [ci skip] * kto * rm example config * first step on doc * see #2442 * see #2443 * fix chat windows * ©️ Copyrights update (#2454) * First changes * Other files * Finally * rm comment * fix nashmd * Fix example * Fix example [ci skip] * 💬 Fix chat for windows (#2443) * fix chat for windows * add some tests back * Revert "add some tests back" This reverts commit 350aef52f53f8cf34fccd7ad0f78a3dd63867e06. * 🆔 Add `datast_config` to `ScriptArguments` (#2440) * datast_config_name * Update trl/utils.py [ci skip] * sort import * typo [ci skip] * Trigger CI * Rename `dataset_config_name` to `dataset_config` * 🏎 Fix deepspeed preparation of `ref_model` in `OnlineDPOTrainer` (#2417) * Remove unused deepspeed code * add model prep back * add deepspeed even if it doesn't work * rm old code * Fix config name * Remove `make dev` in favor of `pip install -e .[dev]` * Update script paths and remove old symlink related things * Fix chat script path [ci skip] * style	2024-12-13 17:52:23 +01:00
lewtun	92eea1f239	Clean up README and remove openrlbenchmark dependency (#2085 ) * Clean up README * Add Kashif and Quentin * Refactor * Apply suggestions from code review Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Add citation * Omit benchmarks from dev install * Remove openrlbenchmark * Apply suggestions from code review Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2024-09-23 09:21:41 +02:00
lewtun	7ff6206510	Ignore chat files (#1486 ) * Ignore chat files * Update .gitignore Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * Update .gitignore --------- Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>	2024-03-27 10:44:23 +01:00
Costa Huang	e4f9a483d9	Refactor and benchmark (#662 ) * refactor and benchmark * update code * Add accelerate logging * logs * quick fix * update config * precommit * modify training example * fix multi-gpu all_reduce error `Tensors must be CUDA and dense` * support more models and benchmark * update * add changes * upload benchmark * precommit * add tyro as a dependency * add tyro * pre-commit * precommit * weird... * lol typo * precommit * sigh * push changes * Update benchmark/README.md Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * Add experiments * upload image to tag specific folder * add openrlbenchmark documentation * rename * remove unused field * precommit * push changes --------- Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>	2023-09-13 10:24:18 -04:00
Edward Beeching	9ff151006c	updates gitignore for wandb files	2023-01-04 10:59:44 +01:00
Leandro von Werra	52910d3bf1	Dynamic input sizes (#35 ) * change ppo input from tensor to list of tensors for varying shapes * update readme example with new input type * update docs * add listification of tensors need for new API * replace nans in tensors for wandb compatibility * add `listify_batch` helper function for backwards compatibility * update sentiment example with new api * update docs * update library * ignore wandb artifacts * update requirements * run experiment * replace respond to batch with generate * add experiment * update docs * fix action * fix action	2022-05-15 18:16:25 +02:00
Leandro von Werra	5ca5b61e52	Initial commit	2020-03-27 11:54:56 +01:00

8 Commits