23a635ed61
Release: v0.16 ( #3137 )
2025-03-22 14:03:54 -07:00
ca850be0a2
🕹️ CLI refactor ( #2380 )
...
* Refactor main function in dpo.py
* Update setup.py and add cli.py
* Add examples to package data
* style
* Refactor setup.py file
* Add new file t.py
* Move dpo to package
* Update MANIFEST.in and setup.py, refactor trl/cli.py
* Add __init__.py to trl/scripts directory
* Add license header to __init__.py
* File moved instruction
* Add Apache License and update file path
* Move dpo.py to new location
* Refactor CLI and DPO script
* Refactor import structure in scripts package
* env
* rm config from chat arg
* rm old cli
* chat init
* test cli [skip ci]
* Add `datast_config_name` to `ScriptArguments` (#2440 )
* add missing arg
* Add test cases for 'trl sft' and 'trl dpo' commands
* Add sft.py script and update cli.py to include sft command
* Move sft script
* chat
* style [ci skip]
* kto
* rm example config
* first step on doc
* see #2442
* see #2443
* fix chat windows
* ©️ Copyrights update (#2454 )
* First changes
* Other files
* Finally
* rm comment
* fix nashmd
* Fix example
* Fix example [ci skip]
* 💬 Fix chat for windows (#2443 )
* fix chat for windows
* add some tests back
* Revert "add some tests back"
This reverts commit 350aef52f53f8cf34fccd7ad0f78a3dd63867e06.
* 🆔 Add `datast_config` to `ScriptArguments` (#2440 )
* datast_config_name
* Update trl/utils.py [ci skip]
* sort import
* typo [ci skip]
* Trigger CI
* Rename `dataset_config_name` to `dataset_config`
* 🏎 Fix deepspeed preparation of `ref_model` in `OnlineDPOTrainer` (#2417 )
* Remove unused deepspeed code
* add model prep back
* add deepspeed even if it doesn't work
* rm old code
* Fix config name
* Remove `make dev` in favor of `pip install -e .[dev]`
* Update script paths and remove old symlink related things
* Fix chat script path [ci skip]
* style
2024-12-13 17:52:23 +01:00
92eea1f239
Clean up README and remove openrlbenchmark dependency ( #2085 )
...
* Clean up README
* Add Kashif and Quentin
* Refactor
* Apply suggestions from code review
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com >
* Apply suggestions from code review
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com >
* Add citation
* Omit benchmarks from dev install
* Remove openrlbenchmark
* Apply suggestions from code review
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com >
---------
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com >
2024-09-23 09:21:41 +02:00
7ff6206510
Ignore chat files ( #1486 )
...
* Ignore chat files
* Update .gitignore
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com >
* Update .gitignore
---------
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com >
2024-03-27 10:44:23 +01:00
e4f9a483d9
Refactor and benchmark ( #662 )
...
* refactor and benchmark
* update code
* Add accelerate logging
* logs
* quick fix
* update config
* precommit
* modify training example
* fix multi-gpu all_reduce error `Tensors must be CUDA and dense`
* support more models and benchmark
* update
* add changes
* upload benchmark
* precommit
* add tyro as a dependency
* add tyro
* pre-commit
* precommit
* weird...
* lol typo
* precommit
* sigh
* push changes
* Update benchmark/README.md
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com >
* Add experiments
* upload image to tag specific folder
* add openrlbenchmark documentation
* rename
* remove unused field
* precommit
* push changes
---------
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com >
2023-09-13 10:24:18 -04:00
9ff151006c
updates gitignore for wandb files
2023-01-04 10:59:44 +01:00
52910d3bf1
Dynamic input sizes ( #35 )
...
* change ppo input from tensor to list of tensors for varying shapes
* update readme example with new input type
* update docs
* add listification of tensors need for new API
* replace nans in tensors for wandb compatibility
* add `listify_batch` helper function for backwards compatibility
* update sentiment example with new api
* update docs
* update library
* ignore wandb artifacts
* update requirements
* run experiment
* replace respond to batch with generate
* add experiment
* update docs
* fix action
* fix action
2022-05-15 18:16:25 +02:00
5ca5b61e52
Initial commit
2020-03-27 11:54:56 +01:00