Commit Graph

8 Commits

Author SHA1 Message Date
23a635ed61 Release: v0.16 (#3137) 2025-03-22 14:03:54 -07:00
ca850be0a2 🕹️ CLI refactor (#2380)
* Refactor main function in dpo.py

* Update setup.py and add cli.py

* Add examples to package data

* style

* Refactor setup.py file

* Add new file t.py

* Move dpo to package

* Update MANIFEST.in and setup.py, refactor trl/cli.py

* Add __init__.py to trl/scripts directory

* Add license header to __init__.py

* File moved instruction

* Add Apache License and update file path

* Move dpo.py to new location

* Refactor CLI and DPO script

* Refactor import structure in scripts package

* env

* rm config from chat arg

* rm old cli

* chat init

* test cli [skip ci]

* Add `datast_config_name` to `ScriptArguments` (#2440)

* add missing arg

* Add test cases for 'trl sft' and 'trl dpo' commands

* Add sft.py script and update cli.py to include sft command

* Move sft script

* chat

* style [ci skip]

* kto

* rm example config

* first step on doc

* see #2442

* see #2443

* fix chat windows

* ©️ Copyrights update (#2454)

* First changes

* Other files

* Finally

* rm comment

* fix nashmd

* Fix example

* Fix example [ci skip]

* 💬 Fix chat for windows (#2443)

* fix chat for windows

* add some tests back

* Revert "add some tests back"

This reverts commit 350aef52f53f8cf34fccd7ad0f78a3dd63867e06.

* 🆔 Add `datast_config` to `ScriptArguments` (#2440)

* datast_config_name

* Update trl/utils.py [ci skip]

* sort import

* typo [ci skip]

* Trigger CI

* Rename `dataset_config_name` to `dataset_config`

* 🏎 Fix deepspeed preparation of `ref_model` in `OnlineDPOTrainer` (#2417)

* Remove unused deepspeed code

* add model prep back

* add deepspeed even if it doesn't work

* rm old code

* Fix config name

* Remove `make dev` in favor of `pip install -e .[dev]`

* Update script paths and remove old symlink related things

* Fix chat script path [ci skip]

* style
2024-12-13 17:52:23 +01:00
92eea1f239 Clean up README and remove openrlbenchmark dependency (#2085)
* Clean up README

* Add Kashif and Quentin

* Refactor

* Apply suggestions from code review

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Add citation

* Omit benchmarks from dev install

* Remove openrlbenchmark

* Apply suggestions from code review

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2024-09-23 09:21:41 +02:00
7ff6206510 Ignore chat files (#1486)
* Ignore chat files

* Update .gitignore

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* Update .gitignore

---------

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
2024-03-27 10:44:23 +01:00
e4f9a483d9 Refactor and benchmark (#662)
* refactor and benchmark

* update code

* Add accelerate logging

* logs

* quick fix

* update config

* precommit

* modify training example

* fix multi-gpu all_reduce error `Tensors must be CUDA and dense`

* support more models and benchmark

* update

* add changes

* upload benchmark

* precommit

* add tyro as a dependency

* add tyro

* pre-commit

* precommit

* weird...

* lol typo

* precommit

* sigh

* push changes

* Update benchmark/README.md

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* Add experiments

* upload image to tag specific folder

* add openrlbenchmark documentation

* rename

* remove unused field

* precommit

* push changes

---------

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
2023-09-13 10:24:18 -04:00
9ff151006c updates gitignore for wandb files 2023-01-04 10:59:44 +01:00
52910d3bf1 Dynamic input sizes (#35)
* change ppo input from tensor to list of tensors for varying shapes

* update readme example with new input type

* update docs

* add listification of tensors need for new API

* replace nans in tensors for wandb compatibility

* add `listify_batch` helper function for backwards compatibility

* update sentiment example with new api

* update docs

* update library

* ignore wandb artifacts

* update requirements

* run experiment

* replace respond to batch with generate

* add experiment

* update docs

* fix action

* fix action
2022-05-15 18:16:25 +02:00
5ca5b61e52 Initial commit 2020-03-27 11:54:56 +01:00