* function calling training support for SFTTraining
* adding tool support to data_utils
* adding test for function calling tokenizer
* reverting changes to sfttrainer and config,added maybe_apply_chat_template
* arg for maybe_apply_chat_templates docstring
* Doc sectioning
* minor test modification
* minor doc modification
---------
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co>
* first commit
* uncomment
* other tests adaptations
* Remove unused variable in test_setup_chat_format
* Remove unused import statement
* style
* Add Bart model
* Update BCOTrainerTester class in test_bco_trainer.py
* Update model IDs and tokenizers in test files
* Add new models and processors
* Update model IDs in test files
* Fix formatting issue in test_dataset_formatting.py
* Refactor dataset formatting in test_dataset_formatting.py
* Fix dataset sequence length in SFTTrainerTester
* Remove tokenizer
* Remove print statement
* Add reward_model_path and sft_model_path to PPO trainer
* Fix tokenizer padding issue
* Add chat template for testing purposes in PaliGemma model
* Update PaliGemma model and chat template
* Increase learning rate to speed up test
* Update model names in run_dpo.sh and run_sft.sh scripts
* Update model and dataset names
* Fix formatting issue in test_dataset_formatting.py
* Fix formatting issue in test_dataset_formatting.py
* Remove unused chat template
* Update model generation script
* additional models
* Update model references in test files
* Remove unused imports in test_online_dpo_trainer.py
* Add is_llm_blender_available import and update reward_tokenizer
* Refactor test_online_dpo_trainer.py: Move skipped test case decorator
* remove models without chat templates
* Update model names in scripts and tests
* Update model_id in test_modeling_value_head.py
* Update model versions in test files
* Fix formatting issue in test_dataset_formatting.py
* Update embedding model ID in BCOTrainerTester
* Update test_online_dpo_trainer.py with reward model changes
* Update expected formatted text in test_dataset_formatting.py
* Add reward_tokenizer to TestOnlineDPOTrainer
* fix tests
* Add SIMPLE_CHAT_TEMPLATE to T5 tokenizer
* Fix dummy_text format in test_rloo_trainer.py
* Skip outdated test for chatML data collator
* Add new vision language models
* Commented out unused model IDs in test_vdpo_trainer
* Update model and vision configurations in generate_tiny_models.py and test_dpo_trainer.py
* Update model and tokenizer references
* Don't push if it already exists
* Add comment explaining test skip
* Fix model_exists function call and add new models
* Update LlavaForConditionalGeneration model and processor
* `qgallouedec` -> `trl-internal-testing`
* Add conditional check for LLMBlender availability in test_judges.py
* Fix import issues and update test requirements
* Remove unused imports
* Add require_peft decorator to test cases
* Fix import_utils module to use correct package name for llm_blender
* conversational dataset support for dpo
* support standard dataset for extract prompt
* test standard dataset for extract prompt
* fix maybe
* fix maybe apply prompt
* style
* overwrite default learning rate of DPO
* style
* rlaif script
* `writer_batch_size` in `train_test_split`
* initial dpo doc refactoring
* vision data section in doc
* lil format modif
* refine Vision datasets
* refine doc
* test new loss type format
* restrcture loss function
* table loss type
* simplify `unsloth`
* improve doc
* looged metrics up
* refine loss section
* Fix label_smoothing parameter in DPOConfig
* dataset for test
* update readme
* Update docs/source/dpo_trainer.mdx
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* try colorized code block
* refine doc style
* further refine doc
* Update docs/source/dpo_trainer.mdx
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
* re add pali gemma test
* Add missing period
---------
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>