|
9df19e8a75
|
📜 Fix license and copyrights (#3264)
|
2025-04-08 15:22:58 -07:00 |
|
|
1d23ecc36f
|
©️ Update copyrights year (#2547)
* happy new year
* fix wandb import sort
|
2025-01-07 14:53:09 +01:00 |
|
|
52d213173f
|
🚜 Use field in dataclasses (#2494)
* in hh-rlhf-helpful-base
* delete tokenize ds
* dataset scripts
* alignprop
* judge tldr
* ddpo
* zen
* sft video
* literal to choices
* chat
* script args
* alignprop
* bco
* better help format
* cpo
* ddpo
* whether or not -> whether
* dpo
* dont set the possible values
* `Optional[...]` to ... or `None`
* xpo
* gkd
* kto
* nash
* online dpo
* Fix typo in learning rate help message
* orpo
* more ... or `None`
* model config
* ppo
* prm
* reward
* rloo
* sft
* online policy config
* make style
|
2025-01-06 18:29:09 +01:00 |
|
|
9410874787
|
©️ Copyrights update (#2454)
* First changes
* Other files
* Finally
* rm comment
* fix nashmd
* Fix example
* Fix example [ci skip]
|
2024-12-10 10:40:00 +01:00 |
|
|
43df3a485a
|
🧳 Move zen generation script and fix tests (#2393)
* Move zen
* step -> stepwise_supervision
* Fix train_test_split shuffle issue
* Fix tests
* Update tests/test_sft_trainer.py
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
* Fix typo in key name
---------
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
|
2024-11-26 14:08:06 +01:00 |
|