* init grpo [ci skip]
* initial version
* refine args defs
* model card
* initial doc
* fix badges
* fix spaces
* try link to super in doc
* temperature, fix indexing, and std=0.0
* grpo script for cli
* peft support
* move data preparation in `compute_loss`
* weird doc trial
* fix device and some logging
* unwrap_model_for_generation for distributed setting
* Compat with distrib training
* revert grpo config doc trial (didn't work)
* test
* allow model to be str and processing_class to be none; fix loss computation
* advantage is always 0.0: don't log
* fix peft not installed
* proper reward model for testing
* fix script for cli
* add trl grpo to cli doc
* test peft
* flush left
* fix reward calculation
* new reward model
* support any reward model
* fix reward processing class def
* log reward std
* fix reward logging
* fix grad computation
* skip embed layer in test
* remove optimizer_cls_and_kwargs
* improve GRPO default args
* reduce mem usage for grpo test
* reduce mem usage in test grpo
* reduce memory usage for test
* Fix the test
* remove redondant
* fix min version
* Update test_grpo_trainer.py
* Update test_grpo_trainer.py
* Fix test, finally found the solution!
* some doc
* Update doc-builder workflow to use specific commit sha
* more doc
* advantages
* drop cancel fo no grad
* logged metrics [ci skip]
* completion col is ignored [ci skip]
* fix latex
* double space? ~?
* try a latex fix
* with branch
* Empty commit
* Empty commit
* double space seems to be the solution
* vllm online dpo
* new arg and add back generation config [skip ci]
* import utils
* optional import and comment
* is_vllm_available
* support conv and not conv [ci skip]
* add old code back
* use func [skip ci]
* fix _generate call
* fix and dedicated func
* top k 50
* style
* add import error
* new testing model
* Update OnlineDPOTrainer class with new features
* test vllm
* fix generate tiny script
* max len arg
* fix comment [ci skip]
* revert num_return_sequences
* vllm dep
* Add require_torch_accelerator import and skip test if vllm is not available
* proper require_torch_accelerator
* add vllm section
* Add hfoption sections to speeding_up_training.md
* no, an id
* Update vllm dependency to exclude Windows platform
* Note on future release
* style
* first commit
* uncomment
* other tests adaptations
* Remove unused variable in test_setup_chat_format
* Remove unused import statement
* style
* Add Bart model
* Update BCOTrainerTester class in test_bco_trainer.py
* Update model IDs and tokenizers in test files
* Add new models and processors
* Update model IDs in test files
* Fix formatting issue in test_dataset_formatting.py
* Refactor dataset formatting in test_dataset_formatting.py
* Fix dataset sequence length in SFTTrainerTester
* Remove tokenizer
* Remove print statement
* Add reward_model_path and sft_model_path to PPO trainer
* Fix tokenizer padding issue
* Add chat template for testing purposes in PaliGemma model
* Update PaliGemma model and chat template
* Increase learning rate to speed up test
* Update model names in run_dpo.sh and run_sft.sh scripts
* Update model and dataset names
* Fix formatting issue in test_dataset_formatting.py
* Fix formatting issue in test_dataset_formatting.py
* Remove unused chat template
* Update model generation script
* additional models
* Update model references in test files
* Remove unused imports in test_online_dpo_trainer.py
* Add is_llm_blender_available import and update reward_tokenizer
* Refactor test_online_dpo_trainer.py: Move skipped test case decorator
* remove models without chat templates
* Update model names in scripts and tests
* Update model_id in test_modeling_value_head.py
* Update model versions in test files
* Fix formatting issue in test_dataset_formatting.py
* Update embedding model ID in BCOTrainerTester
* Update test_online_dpo_trainer.py with reward model changes
* Update expected formatted text in test_dataset_formatting.py
* Add reward_tokenizer to TestOnlineDPOTrainer
* fix tests
* Add SIMPLE_CHAT_TEMPLATE to T5 tokenizer
* Fix dummy_text format in test_rloo_trainer.py
* Skip outdated test for chatML data collator
* Add new vision language models
* Commented out unused model IDs in test_vdpo_trainer
* Update model and vision configurations in generate_tiny_models.py and test_dpo_trainer.py
* Update model and tokenizer references
* Don't push if it already exists
* Add comment explaining test skip
* Fix model_exists function call and add new models
* Update LlavaForConditionalGeneration model and processor
* `qgallouedec` -> `trl-internal-testing`