frozenleaves/trl - trl - Gitea: Git for Me

mirror of https://github.com/huggingface/trl.git synced 2025-10-20 18:43:52 +08:00

Author	SHA1	Message	Date
Quentin Gallouédec	9df19e8a75	📜 Fix license and copyrights (#3264 )	2025-04-08 15:22:58 -07:00
Quentin Gallouédec	1d23ecc36f	©️ Update copyrights year (#2547 ) * happy new year * fix wandb import sort	2025-01-07 14:53:09 +01:00
Quentin Gallouédec	9410874787	©️ Copyrights update (#2454 ) * First changes * Other files * Finally * rm comment * fix nashmd * Fix example * Fix example [ci skip]	2024-12-10 10:40:00 +01:00
Quentin Gallouédec	ee3cbe1946	💾 Deprecate `config` in favor of `args` in `PPOTrainer` (#2384 )	2024-11-25 14:48:08 +01:00
Quentin Gallouédec	9af4734178	♻️ Standardize `script_args` (#2130 )	2024-09-26 15:23:42 +02:00
Quentin Gallouédec	ac071d6225	Drop canonical dataset namespaces (#2048 ) * drop canonical * Delete ultrafeedback_prompt_only.py dataset script * reduce dif in best_of_n * try to revert best_of_n to make github happy * anyway...	2024-09-10 12:12:00 +02:00
Aarni Koskela	9bc478ecbb	pre-commit: replace linters + formatters with Ruff; fix some issues (#1300 ) * pre-commit: replace linters + formatters with Ruff * Don't use bare except * Clean up `noqa`s * Enable Ruff UP; apply auto-fixes * Enable Ruff B; apply fixes * Enable Ruff T with exceptions * Enable Ruff C (complexity); autofix * Upgrade Ruff to 0.2.0	2024-02-15 04:37:41 +01:00
Younes Belkada	79b90e19ba	a workaround for failing log_stats (#708 )	2023-08-30 12:23:57 +02:00
Leandro von Werra	9d09b3e107	TextEnvironments (#424 ) * WIP skeleton * minimal working poc * cleanup * rename variables * quick typo fix * add v1 masking (#429) * add v1 masking * working v1 * adapt from suggestion * avoid warning `Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.` * fix masking - mask the responses from API call only * quality * address comments * Update trl/environment/base.py Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * adapt a bit * wip on tokenization/masking in textenv * small fixes * update viz * add example * print debug text and pass masks * style * format and move tensor to device * update example * update example * This seems to work * fix masking * fix rich output to console --------- Co-authored-by: Costa Huang <costa.huang@outlook.com> Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> Co-authored-by: leandro <leandro.vonwerra@spoud.io> * Add masking (#461) * add v1 masking * working v1 * adapt from suggestion * avoid warning `Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.` * fix masking - mask the responses from API call only * quality * address comments * Update trl/environment/base.py Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * adapt a bit * wip on tokenization/masking in textenv * small fixes * update viz * add example * print debug text and pass masks * style * format and move tensor to device * update example * update example * This seems to work * fix masking * fix rich output to console * fix batched generation * improve stopping criteria * improve error handling in tool call --------- Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Costa Huang <costa.huang@outlook.com> * fix uknown tool * fix rewards and increase bs * remove unused script * ugly WIP fix * do not return modified obj for in-place operations * do not return modified obj for in-place operations * clean up stopping criterium * push updates * push update * format, add docs * rename file * add kwargs to reward fn * simplify example * simplify example * bug fix * add a trivia example * pre-commit * max tool response length * fix regex for multi-line * refactor tool exceptions * fix exceptions in tool * add docs * fix style * make rich optional * add docstrings * add tests * add TextEnv tests (WIP) * update triviaqa code * update docs * refactor text env * update tests (WIP) * add end2end test * update docs * upload tool demo * refactor * customizable system prompt * add text env docs * update index and toc * fix `TextHistory` show methods * add max length * fix style * fix typo * refactor to kwargs in init and tasks to queries * kwargs for reward docs * Update examples/triviaqa.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update examples/tool_demo.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update docs/source/learning_tools.mdx Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update docs/source/learning_tools.mdx Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update docs/source/learning_tools.mdx Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update docs/source/text_environments.md Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update examples/triviaqa.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update examples/triviaqa.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * move to tool folder * remove assets * remove tool demo * move rich import test to import utils * add copyright * fixes for masks in ppo trainer * add text env api docs * make precommit + add ppo test with mask * move examples and add python * fix style * update triviaqa example * add more docs * update docs * Update docs/source/learning_tools.mdx * Apply suggestions from code review * precommit --------- Co-authored-by: Costa Huang <costa.huang@outlook.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: leandro von werra <leandro@hf.co>	2023-08-30 11:44:06 +02:00

9 Commits