9df19e8a75
📜 Fix license and copyrights ( #3264 )
2025-04-08 15:22:58 -07:00
1d23ecc36f
©️ Update copyrights year ( #2547 )
...
* happy new year
* fix wandb import sort
2025-01-07 14:53:09 +01:00
8c49ea39ec
🏚 Remove unused components ( #2480 )
2024-12-19 19:29:39 +01:00
9410874787
©️ Copyrights update ( #2454 )
...
* First changes
* Other files
* Finally
* rm comment
* fix nashmd
* Fix example
* Fix example [ci skip]
2024-12-10 10:40:00 +01:00
c10cc8995b
🗝️ Update type hints ( #2399 )
...
* New type hint structure
* Update type hints
* Delete wrong file
* Remove dict import
2024-11-26 20:37:27 +01:00
fb1b48fdbe
Remove max_length from RewardDataCollatorWithPadding ( #2119 )
2024-09-26 09:59:12 +02:00
4c92ba5769
©️ Copyrights ( #2063 )
...
* copyrights
* fail if missing
2024-09-13 14:18:47 +02:00
54f806b6ff
Standardize dataset_num_proc usage ( #1925 )
...
* uniform dataset_num_proc
* num_proc in shuffle
* Update examples/datasets/anthropic_hh.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
* Update examples/scripts/ppo.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
* Update examples/scripts/ppo.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
---------
Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co >
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
2024-08-13 15:10:39 +02:00
c8cef79e6c
arXiv to HF Papers ( #1870 )
2024-07-24 21:06:57 +02:00
3c0a10b1ae
fix dataset load error ( #1670 )
...
Signed-off-by: Wang, Yi <yi.a.wang@intel.com >
2024-05-27 14:52:20 +02:00
a02513c3b7
Apply deprecated evaluation_strategy ( #1559 )
...
* Deprecate
* Update tests/test_dpo_trainer.py
---------
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com >
2024-05-23 12:48:00 +02:00
4219cbfedc
Fix the pad_token_id error ( #1394 )
...
* Fix the pad_token_id error
Signed-off-by: yuanwu <yuan.wu@intel.com >
* Add the load_in_8bit argument in rl_training.py
Signed-off-by: yuanwu <yuan.wu@intel.com >
* Reformate the patch
Signed-off-by: yuanwu <yuan.wu@intel.com >
* Fix the check failed
Signed-off-by: yuanwu <yuan.wu@intel.com >
---------
Signed-off-by: yuanwu <yuan.wu@intel.com >
2024-03-05 02:18:42 +01:00
2a2676e7ec
set seed in sft/dpo/reward_modeling to make result reproducable ( #1357 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com >
2024-02-23 11:12:45 +01:00
9bc478ecbb
pre-commit: replace linters + formatters with Ruff; fix some issues ( #1300 )
...
* pre-commit: replace linters + formatters with Ruff
* Don't use bare except
* Clean up `noqa`s
* Enable Ruff UP; apply auto-fixes
* Enable Ruff B; apply fixes
* Enable Ruff T with exceptions
* Enable Ruff C (complexity); autofix
* Upgrade Ruff to 0.2.0
2024-02-15 04:37:41 +01:00
6614b8aa6b
Minor fixes to some comments in some examples. ( #1156 )
2023-12-29 14:12:05 +01:00
950ee2187d
clear up the parameters of supervised_finetuning.py ( #1126 )
...
no_gradient_checkpointing is always false
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com >
2023-12-22 17:00:28 +01:00
e7961e45f1
Remove duplicate data loading in rl_training.py ( #1020 )
...
We load dataset twice, but in line 149 (new), we do
`ds = train_dataset.map` anyway
2023-11-23 12:25:07 +01:00
9e9f024399
Fix a bunch of outdated references to examples/ ( #977 )
2023-11-10 11:29:21 +01:00
6826d592ae
Clarify docstrings, help messages, assert messages in merge_peft_adapter.py ( #838 )
...
An assertion was also corrected to the intended test condition
2023-10-06 11:04:58 +02:00
d78d917880
Add comment to explain how the sentiment pipeline is used to run the … ( #555 )
...
* Add comment to explain how the sentiment pipeline is used to run the reward model in the StackLLaMA example
* Apply 'make precommit'
2023-07-24 18:09:45 +02:00
5c7bfbc8d9
[examples] Big refactor of examples and documentation ( #509 )
...
* added sfttrainer and rmtrainer example scripts.
* added few lines in the documentation.
* moved notebooks.
* delete `examples/summarization`
* remove from docs as well
* refactor sentiment tuning
* more refactoring.
* updated docs for multi-adapter RL.
* add research projects folder
* more refactor
* refactor docs.
* refactor structure
* add correct scripts all over the place
* final touches
* final touches
* updated documentation from feedback.
2023-07-14 12:00:56 +02:00