9df19e8a75
📜 Fix license and copyrights ( #3264 )
2025-04-08 15:22:58 -07:00
1d23ecc36f
©️ Update copyrights year ( #2547 )
...
* happy new year
* fix wandb import sort
2025-01-07 14:53:09 +01:00
460e780265
👯 Standardize model_args
( #2442 )
...
* `model_config` -> `model_args`
* sort
2024-12-10 12:51:20 +01:00
6a05feff02
🆔 Add datast_config
to ScriptArguments
( #2440 )
...
* datast_config_name
* Update trl/utils.py [ci skip]
* sort import
* typo [ci skip]
* Trigger CI
* Rename `dataset_config_name` to `dataset_config`
2024-12-10 11:09:26 +01:00
9410874787
©️ Copyrights update ( #2454 )
...
* First changes
* Other files
* Finally
* rm comment
* fix nashmd
* Fix example
* Fix example [ci skip]
2024-12-10 10:40:00 +01:00
d6a8f2c2f6
⚠️ Add warning guidelines and update codebase to follow best practices ( #2350 )
...
* Add guidelines for working with warnings in the codebase
* Remove unnecessary warnings and improve code initialization
* Fix warnings and improve accuracy calculation
* Add rich library dependency for text formatting
* Update LoRA weight loading warning message
* Fix logging and import issues in AlignPropConfig
* Fix warnings and improve code readability
* Remove unused import statements
* Refactor CPOTrainer class in cpo_trainer.py
* Remove unnecessary warnings and raise ValueError for missing model
* Fix warnings and improve code consistency
* Update CONTRIBUTING.md to clarify the purpose of warnings
* Fix string formatting in DataCollatorForCompletionOnlyLM class
* Update SimPO loss parameters in CPOTrainer
* Fix warnings and remove unnecessary code in ConstantLengthDataset class
* Clarify warning guidelines
* Rewrite the entire section
* Fix capitalization in CONTRIBUTING.md
* Fix formatting in CONTRIBUTING.md
2024-11-29 16:07:38 +01:00
b2696578ce
🍬 Use any reward model for online methods ( #2276 )
...
* Refactor reward processing in OnlineDPOTrainer
* Refactor completion decoding and reward processing
* remove strip
* remove warning
* Add reward_tokenizer to training script
* Add reward_tokenizer and reward_processing_class to OnlineDPOTrainer test
* propagate to xpo and nash
* style
* reduce memory requirement with inference_mode
* fix tests
* pairrm judge llmblender
* setUpClass(cls)
* Add setUpClass method to TestJudges class
* truncation left for reward tokenizer
* don't logcompletion without eval dataset
* only eval when possible
2024-10-28 16:21:40 +01:00
e155cb8a66
⛓️ 💥 Don't use eval_dataset
in scripts when no eval strategy ( #2270 )
2024-10-28 11:40:51 +01:00
c2bb1eed14
Add torch_dtype
to model kwargs in reward modeling example ( #2266 )
...
Update model_kwargs to include torch_dtype.
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com >
2024-10-24 20:12:26 +02:00
7e394b03e8
🎭 Deprecate [SFT/DPO/Reward]ScriptArguments
in favour of ScriptArguments
( #2145 )
...
* `DPOScriptArguments` to `ScriptArguments`
* use dataset_train_split
* Use scriptarguments
* dataset names in command lines
* use `ScriptArguments` everywhere
* ignore biais buffer to end
* remove in v0.13
* rm comment
* update test commands
* Update docs/source/rloo_trainer.md
* Update tests/test_rloo_trainer.py
* Added dataset_train_split argument to ppo.py and rloo.py
* update scripts with dataset_train_split
2024-10-14 11:14:58 +02:00
47d08a9626
Rename trainer arg tokenizer
to processing_class
( #2162 )
2024-10-07 09:39:32 +02:00
c00722ce0a
🃏 Model card for TRL ( #2123 )
...
* template and util
* test for online dpo
* template in package_data
* template in manifest
* standardize push_to_hub
* wandb badge and quick start
* bco
* xpo
* simplify `create_model_card`
* cpo
* kto
* dpo
* gkd
* orpo
* style
* nash-md
* alignprop
* bco citation
* citation template
* cpo citation
* ddpo
* fix alignprop
* dpo
* gkd citation
* kto
* online dpo citation
* orpo citation
* citation in utils
* optional citation
* reward
* optional trainer citation
* sft
* remove add_model_tags bco
* Remove unnecessary code for adding model tags
* Fix model tag issue and update URL format
* Remove unused code for adding model tags
* Add citation for XPOTrainer
* Remove unused code in SFTTrainer
* Add model card generation in RLOOTrainer
* Remove unused import and method call in reward_trainer.py
* Add model card generation
* Remove unused code and update error message in ORPOTrainer class
* Add import statements and create model card in IterativeSFTTrainer
* Add dataset name to push_to_hub() call
* Update trainer.push_to_hub() dataset names
* script args
* test
* better doc
* fix tag test
* fix test tag
* Add tags parameter to create_model_card method
* doc
* script args
* Update trl/templates/model_card.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
* unittest's `assertIn` instead of `assert`
* Update trl/templates/model_card.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
---------
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
2024-09-27 15:23:05 +02:00
9af4734178
♻️ Standardize script_args
( #2130 )
2024-09-26 15:23:42 +02:00
32d9d34eb1
Standardize pushing to Hub in examples ( #2126 )
2024-09-26 10:00:51 +02:00
cc23b511e4
[RewardTrainer
] Tokenize inputs within trainer ( #2102 )
...
* Pretokenize in reward modelling
* Fix README example
* Apply suggestions from code review
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com >
* Move chat template formatting inside trainer
* Refactor tests
* Fix README
* Disable wandb
* Update readme
* add comment `remove_unused_columns`
* Update trl/trainer/reward_config.py
* doc
* implicit*
* explicit
---------
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com >
Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co >
2024-09-24 13:03:32 +02:00
10c2f63b2a
training_args
for all TrainingArguments
(#2082 )
2024-09-19 15:03:47 +02:00
4c0c98d950
Standardize dataset naming ( #2081 )
...
* `ds`, `raw_dataset` etc -> `dataset`
* Update docs/source/detoxifying_a_lm.mdx
2024-09-19 08:59:28 +02:00
3412f513f2
Refactor reward modelling script to work with chat models ( #2026 )
...
* Make Qwen2 works
* Make it work
* Refactor
* Add doc
* Add dataset
* Fix
* Quality
2024-09-06 13:12:38 +02:00
850ddcf598
[pre-commit] update pre-commit yaml ( #2002 )
...
* update pre-commit yaml
* fix test
* use element_type
2024-09-02 19:15:25 +02:00
f05f63c1ea
PartialState().local_main_process_first()
when map
in examples (#1926 )
...
* `PartialState().local_main_process_first()` when map in examples
* allow load from cache
---------
Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co >
2024-08-14 12:01:03 +02:00
54f806b6ff
Standardize dataset_num_proc
usage ( #1925 )
...
* uniform dataset_num_proc
* num_proc in shuffle
* Update examples/datasets/anthropic_hh.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
* Update examples/scripts/ppo.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
* Update examples/scripts/ppo.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
---------
Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co >
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
2024-08-13 15:10:39 +02:00
7ddef5c158
Make use of trust_remote_code
consistent ( #1806 )
...
Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co >
2024-07-10 18:26:11 +02:00
a02513c3b7
Apply deprecated evaluation_strategy
( #1559 )
...
* Deprecate
* Update tests/test_dpo_trainer.py
---------
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com >
2024-05-23 12:48:00 +02:00
8799952876
visualize rm prediction ( #1636 )
...
* visualize rm prediction
* quick update
* quick check
* quick fix
* update eval steps
2024-05-10 09:32:20 -04:00
22b4f548f4
fix RM script ( #1393 )
2024-03-07 08:49:52 +01:00
9bc478ecbb
pre-commit: replace linters + formatters with Ruff; fix some issues ( #1300 )
...
* pre-commit: replace linters + formatters with Ruff
* Don't use bare except
* Clean up `noqa`s
* Enable Ruff UP; apply auto-fixes
* Enable Ruff B; apply fixes
* Enable Ruff T with exceptions
* Enable Ruff C (complexity); autofix
* Upgrade Ruff to 0.2.0
2024-02-15 04:37:41 +01:00
9a71e67be9
Remove tyro ( #1176 )
...
* refactor
* Remove tyro in `ppo.py`
* quick update
* update default args
* quick push
* precommit
* refactor
* quick change
* remove tyro
* quick change
* precommit
* quick change
* fix hello_world
* remove docstring diffences
* add `module load cuda/12.1`
* push changes
* precommit
* make dpo runnable
* fix circular import
* quick fix
* refactor
* quick update
* path change
* update plots
* fix docs
* quick change
* Update trl/trainer/model_config.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
* Update trl/trainer/model_config.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
* Update trl/trainer/utils.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
* Update examples/scripts/dpo.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
* address comments. use attn_implementation
* precommit
* remove duplicate code
* update peft.py
* fix test no op dep
* Update trl/trainer/utils.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com >
* Apply suggestions from code review
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com >
* precommit
* add docs
---------
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com >
2024-01-26 07:51:15 -08:00
cbc6c9bb3e
[core
/ DDP
] Fix RM trainer + DDP + quantization + propagate gradient_checkpointing_kwargs
in SFT & DPO ( #912 )
...
* make use of forward hooks
* correctly delete attributes
* fix RM DPP issues
* revert unneeded changes
* more fixes
* fix diff
* fix
* propagate to SFT
* Update examples/scripts/reward_modeling.py
* propagate the fix on DPO trainer
* add to example scripts
* trigger CI
2023-10-31 18:50:17 +01:00
ec9e76623e
[Feature] Enable Intel XPU support ( #839 )
...
* enable xpu support
* fix bug
* review commits
* fix style
* add xou decorator
* refactor review commit
* fix test
* review commit
* fix test
* Update benchmark.yml (#856 )
* Standardise example scripts (#842 )
* Standardise example scripts
* fix plotting script
* Rename run_xxx to xxx
* Fix doc
---------
Co-authored-by: Costa Huang <costa.huang@outlook.com >
* Fix version check in import_utils.py (#853 )
* dont use get_peft_model if model is already peft (#857 )
* merge conflict
* add xou decorator
* resolve
* resolves
* upstream
* refactor and precommit
* fix new tests
* add device mapping for xpu
---------
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com >
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
Co-authored-by: Costa Huang <costa.huang@outlook.com >
Co-authored-by: Adam Pauls <adpauls@gmail.com >
Co-authored-by: abhishek thakur <1183441+abhishekkrthakur@users.noreply.github.com >
2023-10-31 10:15:35 +01:00
0a5aee7d99
[reward_modeling] Cleaning example script ( #882 )
...
* remove load in repeated multiple times & truncation
* trigger CI
2023-10-19 16:00:20 +02:00
f91fb2bda2
remove duplicate key in reward_modeling.py
( #890 )
2023-10-18 23:45:18 +02:00
ddd318865b
Standardise example scripts ( #842 )
...
* Standardise example scripts
* fix plotting script
* Rename run_xxx to xxx
* Fix doc
---------
Co-authored-by: Costa Huang <costa.huang@outlook.com >
2023-10-11 17:28:15 +02:00