Commit Graph

6 Commits

Author SHA1 Message Date
9a71e67be9 Remove tyro (#1176)
* refactor

* Remove tyro in `ppo.py`

* quick update

* update default args

* quick push

* precommit

* refactor

* quick change

* remove tyro

* quick change

* precommit

* quick change

* fix hello_world

* remove docstring diffences

* add `module load cuda/12.1`

* push changes

* precommit

* make dpo runnable

* fix circular import

* quick fix

* refactor

* quick update

* path change

* update plots

* fix docs

* quick change

* Update trl/trainer/model_config.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update trl/trainer/model_config.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update trl/trainer/utils.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update examples/scripts/dpo.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* address comments. use attn_implementation

* precommit

* remove duplicate code

* update peft.py

* fix test no op dep

* Update trl/trainer/utils.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* precommit

* add docs

---------

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2024-01-26 07:51:15 -08:00
cbc6c9bb3e [core / DDP] Fix RM trainer + DDP + quantization + propagate gradient_checkpointing_kwargs in SFT & DPO (#912)
* make use of forward hooks

* correctly delete attributes

* fix RM DPP issues

* revert unneeded changes

* more fixes

* fix diff

* fix

* propagate to SFT

* Update examples/scripts/reward_modeling.py

* propagate the fix on DPO trainer

* add to example scripts

* trigger CI
2023-10-31 18:50:17 +01:00
ec9e76623e [Feature] Enable Intel XPU support (#839)
* enable xpu support

* fix bug

* review commits

* fix style

* add xou decorator

* refactor review commit

* fix test

* review commit

* fix test

* Update benchmark.yml (#856)

* Standardise example scripts (#842)

* Standardise example scripts

* fix plotting script

* Rename run_xxx to xxx

* Fix doc

---------

Co-authored-by: Costa Huang <costa.huang@outlook.com>

* Fix version check in import_utils.py (#853)

* dont use get_peft_model if model is already peft (#857)

* merge conflict

* add xou decorator

* resolve

* resolves

* upstream

* refactor and precommit

* fix new tests

* add device mapping for xpu

---------

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Co-authored-by: Costa Huang <costa.huang@outlook.com>
Co-authored-by: Adam Pauls <adpauls@gmail.com>
Co-authored-by: abhishek thakur <1183441+abhishekkrthakur@users.noreply.github.com>
2023-10-31 10:15:35 +01:00
0a5aee7d99 [reward_modeling] Cleaning example script (#882)
* remove load in repeated multiple times & truncation

* trigger CI
2023-10-19 16:00:20 +02:00
f91fb2bda2 remove duplicate key in reward_modeling.py (#890) 2023-10-18 23:45:18 +02:00
ddd318865b Standardise example scripts (#842)
* Standardise example scripts

* fix plotting script

* Rename run_xxx to xxx

* Fix doc

---------

Co-authored-by: Costa Huang <costa.huang@outlook.com>
2023-10-11 17:28:15 +02:00