9df19e8a75
📜 Fix license and copyrights ( #3264 )
2025-04-08 15:22:58 -07:00
1d23ecc36f
©️ Update copyrights year ( #2547 )
...
* happy new year
* fix wandb import sort
2025-01-07 14:53:09 +01:00
9410874787
©️ Copyrights update ( #2454 )
...
* First changes
* Other files
* Finally
* rm comment
* fix nashmd
* Fix example
* Fix example [ci skip]
2024-12-10 10:40:00 +01:00
ee3cbe1946
💾 Deprecate config
in favor of args
in PPOTrainer
( #2384 )
2024-11-25 14:48:08 +01:00
9af4734178
♻️ Standardize script_args
( #2130 )
2024-09-26 15:23:42 +02:00
ac071d6225
Drop canonical dataset namespaces ( #2048 )
...
* drop canonical
* Delete ultrafeedback_prompt_only.py dataset script
* reduce dif in best_of_n
* try to revert best_of_n to make github happy
* anyway...
2024-09-10 12:12:00 +02:00
9bc478ecbb
pre-commit: replace linters + formatters with Ruff; fix some issues ( #1300 )
...
* pre-commit: replace linters + formatters with Ruff
* Don't use bare except
* Clean up `noqa`s
* Enable Ruff UP; apply auto-fixes
* Enable Ruff B; apply fixes
* Enable Ruff T with exceptions
* Enable Ruff C (complexity); autofix
* Upgrade Ruff to 0.2.0
2024-02-15 04:37:41 +01:00
79b90e19ba
a workaround for failing log_stats ( #708 )
2023-08-30 12:23:57 +02:00
9d09b3e107
TextEnvironments ( #424 )
...
* WIP skeleton
* minimal working poc
* cleanup
* rename variables
* quick typo fix
* add v1 masking (#429 )
* add v1 masking
* working v1
* adapt from suggestion
* avoid warning `Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.`
* fix masking
- mask the responses from API call only
* quality
* address comments
* Update trl/environment/base.py
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com >
* adapt a bit
* wip on tokenization/masking in textenv
* small fixes
* update viz
* add example
* print debug text and pass masks
* style
* format and move tensor to device
* update example
* update example
* This seems to work
* fix masking
* fix rich output to console
---------
Co-authored-by: Costa Huang <costa.huang@outlook.com >
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com >
Co-authored-by: leandro <leandro.vonwerra@spoud.io >
* Add masking (#461 )
* add v1 masking
* working v1
* adapt from suggestion
* avoid warning `Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.`
* fix masking
- mask the responses from API call only
* quality
* address comments
* Update trl/environment/base.py
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com >
* adapt a bit
* wip on tokenization/masking in textenv
* small fixes
* update viz
* add example
* print debug text and pass masks
* style
* format and move tensor to device
* update example
* update example
* This seems to work
* fix masking
* fix rich output to console
* fix batched generation
* improve stopping criteria
* improve error handling in tool call
---------
Co-authored-by: younesbelkada <younesbelkada@gmail.com >
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com >
Co-authored-by: Costa Huang <costa.huang@outlook.com >
* fix uknown tool
* fix rewards and increase bs
* remove unused script
* ugly WIP fix
* do not return modified obj for in-place operations
* do not return modified obj for in-place operations
* clean up stopping criterium
* push updates
* push update
* format, add docs
* rename file
* add kwargs to reward fn
* simplify example
* simplify example
* bug fix
* add a trivia example
* pre-commit
* max tool response length
* fix regex for multi-line
* refactor tool exceptions
* fix exceptions in tool
* add docs
* fix style
* make rich optional
* add docstrings
* add tests
* add TextEnv tests (WIP)
* update triviaqa code
* update docs
* refactor text env
* update tests (WIP)
* add end2end test
* update docs
* upload tool demo
* refactor
* customizable system prompt
* add text env docs
* update index and toc
* fix `TextHistory` show methods
* add max length
* fix style
* fix typo
* refactor to kwargs in init and tasks to queries
* kwargs for reward docs
* Update examples/triviaqa.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com >
* Update examples/tool_demo.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com >
* Update docs/source/learning_tools.mdx
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com >
* Update docs/source/learning_tools.mdx
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com >
* Update docs/source/learning_tools.mdx
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com >
* Update docs/source/text_environments.md
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com >
* Update examples/triviaqa.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com >
* Update examples/triviaqa.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com >
* move to tool folder
* remove assets
* remove tool demo
* move rich import test to import utils
* add copyright
* fixes for masks in ppo trainer
* add text env api docs
* make precommit + add ppo test with mask
* move examples and add python
* fix style
* update triviaqa example
* add more docs
* update docs
* Update docs/source/learning_tools.mdx
* Apply suggestions from code review
* precommit
---------
Co-authored-by: Costa Huang <costa.huang@outlook.com >
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com >
Co-authored-by: younesbelkada <younesbelkada@gmail.com >
Co-authored-by: leandro von werra <leandro@hf.co >
2023-08-30 11:44:06 +02:00