|
9955ee7eaa
|
🐳 Docker update + Simplify Jobs doc (#3931)
Co-authored-by: sergiopaniego <sergiopaniegoblanco@gmail.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
|
2025-09-13 18:35:55 -06:00 |
|
|
e8b8499f1f
|
Remove redundant 'None' from docstrings (#4058)
|
2025-09-11 08:16:34 +02:00 |
|
|
251c0488c8
|
📦 Wrapping the main execution code to avoid multi-processing issues from vLLM (#3932)
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
|
2025-08-21 12:45:13 -07:00 |
|
|
a043fd74a3
|
Add uv scripts headers (#3767)
|
2025-07-25 07:48:40 -07:00 |
|
|
9df19e8a75
|
📜 Fix license and copyrights (#3264)
|
2025-04-08 15:22:58 -07:00 |
|
|
8453017622
|
🧼 Upgrade ruff (#2938)
|
2025-02-23 17:33:50 +01:00 |
|
|
1d23ecc36f
|
©️ Update copyrights year (#2547)
* happy new year
* fix wandb import sort
|
2025-01-07 14:53:09 +01:00 |
|
|
52d213173f
|
🚜 Use field in dataclasses (#2494)
* in hh-rlhf-helpful-base
* delete tokenize ds
* dataset scripts
* alignprop
* judge tldr
* ddpo
* zen
* sft video
* literal to choices
* chat
* script args
* alignprop
* bco
* better help format
* cpo
* ddpo
* whether or not -> whether
* dpo
* dont set the possible values
* `Optional[...]` to ... or `None`
* xpo
* gkd
* kto
* nash
* online dpo
* Fix typo in learning rate help message
* orpo
* more ... or `None`
* model config
* ppo
* prm
* reward
* rloo
* sft
* online policy config
* make style
|
2025-01-06 18:29:09 +01:00 |
|
|
9410874787
|
©️ Copyrights update (#2454)
* First changes
* Other files
* Finally
* rm comment
* fix nashmd
* Fix example
* Fix example [ci skip]
|
2024-12-10 10:40:00 +01:00 |
|
|
9af4734178
|
♻️ Standardize script_args (#2130)
|
2024-09-26 15:23:42 +02:00 |
|
|
4c0c98d950
|
Standardize dataset naming (#2081)
* `ds`, `raw_dataset` etc -> `dataset`
* Update docs/source/detoxifying_a_lm.mdx
|
2024-09-19 08:59:28 +02:00 |
|
|
4c92ba5769
|
©️ Copyrights (#2063)
* copyrights
* fail if missing
|
2024-09-13 14:18:47 +02:00 |
|
|
31b93876a7
|
📝 Document dataset format (#2020)
* first piece of doc
* improve readibility
* some data utils and doc
* simplify prompt-only
* format
* fix path data utils
* fix example format
* simplify
* tests
* prompt-completion
* update antropic hh
* update dataset script
* implicit prompt
* additional content
* `maybe_reformat_dpo_to_kto` -> `unpair_preference_dataset`
* Preference dataset with implicit prompt
* unpair preference dataset tests
* documentation
* ...
* doc
* changes applied to dpo example
* better doc and better log error
* a bit more doc
* improve doc
* converting
* some subsections
* converting section
* further refinements
* tldr
* tldr preference
* rename
* lm-human-preferences-sentiment
* `imdb` to `stanfordnlp/imdb`
* Add script for LM human preferences descriptiveness
* Remove sentiment_descriptiveness.py script
* style
* example judge tlrd with new dataset
* Syle
* Dataset conversion for TRL compatibility
* further refinements
* trainers in doc
* top level for functions
* stanfordnlp/imdb
* downgrade transformers
* temp reduction of tests
* next commit
* next commit
* additional content
* proper tick format
* precise the assistant start token
* improve
* lower case
* Update titles in _toctree.yml and data_utils.mdx
* revert make change
* correct dataset ids
* expand a bit dataset formats
* skip gated repo tests
* data utilities in API
* Update docs/source/dataset_formats.mdx
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update docs/source/dataset_formats.mdx
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update docs/source/dataset_formats.mdx
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update docs/source/dataset_formats.mdx
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* tiny internal testing for chat template testing
* precise type/format
* exlude sft trainer in doc
* Update trl/trainer/utils.py
* XPO in the doc
---------
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
|
2024-09-11 20:11:25 +02:00 |
|
|
8bd2ab82f4
|
Refactor judges (#1856)
* BaseJudge -> BasePairwiseJudge
* hf judge asyncio
* refactor judges
* doc
* doc
* doc
* memeber judge
* :inherited-members:
* :inherited-members:
* doc
* give up
* judge tldr with judge class
* fix rank in multithread
* format
* improve doc
* update doc
* typo doc
* doc online dpo
* Update judge_tldr.py
---------
Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co>
|
2024-07-28 14:06:19 +02:00 |
|