* first commit
* uncomment
* other tests adaptations
* Remove unused variable in test_setup_chat_format
* Remove unused import statement
* style
* Add Bart model
* Update BCOTrainerTester class in test_bco_trainer.py
* Update model IDs and tokenizers in test files
* Add new models and processors
* Update model IDs in test files
* Fix formatting issue in test_dataset_formatting.py
* Refactor dataset formatting in test_dataset_formatting.py
* Fix dataset sequence length in SFTTrainerTester
* Remove tokenizer
* Remove print statement
* Add reward_model_path and sft_model_path to PPO trainer
* Fix tokenizer padding issue
* Add chat template for testing purposes in PaliGemma model
* Update PaliGemma model and chat template
* Increase learning rate to speed up test
* Update model names in run_dpo.sh and run_sft.sh scripts
* Update model and dataset names
* Fix formatting issue in test_dataset_formatting.py
* Fix formatting issue in test_dataset_formatting.py
* Remove unused chat template
* Update model generation script
* additional models
* Update model references in test files
* Remove unused imports in test_online_dpo_trainer.py
* Add is_llm_blender_available import and update reward_tokenizer
* Refactor test_online_dpo_trainer.py: Move skipped test case decorator
* remove models without chat templates
* Update model names in scripts and tests
* Update model_id in test_modeling_value_head.py
* Update model versions in test files
* Fix formatting issue in test_dataset_formatting.py
* Update embedding model ID in BCOTrainerTester
* Update test_online_dpo_trainer.py with reward model changes
* Update expected formatted text in test_dataset_formatting.py
* Add reward_tokenizer to TestOnlineDPOTrainer
* fix tests
* Add SIMPLE_CHAT_TEMPLATE to T5 tokenizer
* Fix dummy_text format in test_rloo_trainer.py
* Skip outdated test for chatML data collator
* Add new vision language models
* Commented out unused model IDs in test_vdpo_trainer
* Update model and vision configurations in generate_tiny_models.py and test_dpo_trainer.py
* Update model and tokenizer references
* Don't push if it already exists
* Add comment explaining test skip
* Fix model_exists function call and add new models
* Update LlavaForConditionalGeneration model and processor
* `qgallouedec` -> `trl-internal-testing`
* `DPOScriptArguments` to `ScriptArguments`
* use dataset_train_split
* Use scriptarguments
* dataset names in command lines
* use `ScriptArguments` everywhere
* ignore biais buffer to end
* remove in v0.13
* rm comment
* update test commands
* Update docs/source/rloo_trainer.md
* Update tests/test_rloo_trainer.py
* Added dataset_train_split argument to ppo.py and rloo.py
* update scripts with dataset_train_split
* Remove stray commas from test data
* Codemod Unittest assertions to bare asserts
* Make `assertAlmostEqual` tests more idiomatic
* DRY some test strings
* Fix reported KL in PPO trainer
previously this was always reporting the estimated KL, even when using `kl_penalty = 'full'` (or `abs`, etc).
Now we return the actual KL calculated in `compute_rewards()`, and report that.
* fix test
* refactor grad accum
* quick fix
* use correct place to step optim
* push changes
* cleanup and fix division by zero in `masked_var`
* revert back changes
* use unbiased var
* deal with division by zero
* add test case
* calculate advantage only once
* format
* add warning
* add more warnings
* quick fix
* remove unhelpful warning
* fix test cases
* fix test cases
* bump version given the breaking change
* black
* refactor
* update test cases
* error out
* push changes
* remove exact div
* add comments
* add fixes in to support PP
* add same logic for enc-dec
* add more checks
* fix 20b issues
* clean up
* update scripts
* dp safety checker
* added multi gpu tests
* fix order
* change
* fix script
* adds a hacky peft example
* fixes bug due to missing "prepare_model_for_training"
* Formatting
* adds peft to requirements
* Update trl/trainer/ppo_trainer.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* gpt neo runs
* changes requested on the PR
* style
* updates to prepare_model_for_int8_training PEFT PR https://github.com/huggingface/peft/pull/105
* updates to prepare_model_for_int8_training PEFT PR https://github.com/huggingface/peft/pull/105
* adds missing 8-bit attribute to modeling base
* adds lr to example script
* adds missing train to trainer
* disables caching temporarily while I debug something
* debugging issues with unstable training
* Fix peft + int8 (#170)
* add fix
* another fix
* Auto stash before merge of "peft-example" and "origin/peft-example"
* adds peft model types to modeling base
* reduces memory usage using adapters and no ref model.
* adds support for EleutherAI/gpt-neox-20b
* example for peft finetune of cm model
* removes hacky research code
* fixing the rebase and some typos
* style
* style2
* adds gradient checkpointing to base model
* cleans up comments
* moves config and other pretrained_model properties to __init__
* make style
* added tests
* change dependency
* Update .github/workflows/tests.yml
* fix test
* fix style and failing tests
* make quality
* revert change
* rm unneeded change
* revert changes
* rm changes
* rm changes
* rm uneeded change
* Update trl/models/modeling_base.py
* revert uneeded changes
* make style
* adapt suggestions
* fix tests
* attempt to fix
* fix
* fix
* add no peft test
* revert
* remove unneded check
* more tests
* fix logic
* add `save_pretrained` support
* fix quality
* clean up
* clean up
* stronger test
* refactor comments
* make style
* attempt to add non-peft tests
* remove test runner
* format
* fix test
* move `train` on top
* fix peft import
* make quality
* fixes typo
* adds peft example to docs
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: younesbelakda <younesbelkada@gmail.com>
* v1
- working script
- added tests
- possible to load `v_head`
- possible to use `transformers` too
* fix trainer test
* add push_to_hub compatibility
* add `push_to_hub` tests
* few updates
- update based on comments
- add more tests
- update docs
* Update docs/source/quickstart.mdx
* clearer doc
* support sharded models
* `save_pretrained` support for sharded case