* Distribute
* fix some logic errors
* fix and document RepeatRandomSampler
* comment
* doc clarification
* fix type hint
* more readable
* fix eval
* fix tests
* roll back to distribute generation
* improve comment [ci skip]
* fix slice
* catch for eval batch size as well; fix completion_ids in vllm
* log completions
* Revert "log completions"
This reverts commit 1e4af8ffb8dda15d7596e707ac784208db88135a.
* Before the first training step, the model has no optimizer: fix ds3
* init grpo [ci skip]
* initial version
* refine args defs
* model card
* initial doc
* fix badges
* fix spaces
* try link to super in doc
* temperature, fix indexing, and std=0.0
* grpo script for cli
* peft support
* move data preparation in `compute_loss`
* weird doc trial
* fix device and some logging
* unwrap_model_for_generation for distributed setting
* Compat with distrib training
* revert grpo config doc trial (didn't work)
* test
* allow model to be str and processing_class to be none; fix loss computation
* advantage is always 0.0: don't log
* fix peft not installed
* proper reward model for testing
* fix script for cli
* add trl grpo to cli doc
* test peft
* flush left
* fix reward calculation
* new reward model
* support any reward model
* fix reward processing class def
* log reward std
* fix reward logging
* fix grad computation
* skip embed layer in test
* remove optimizer_cls_and_kwargs
* improve GRPO default args
* reduce mem usage for grpo test
* reduce mem usage in test grpo
* reduce memory usage for test
* Fix the test
* remove redondant
* fix min version
* Update test_grpo_trainer.py
* Update test_grpo_trainer.py
* Fix test, finally found the solution!
* some doc
* Update doc-builder workflow to use specific commit sha
* more doc
* advantages
* drop cancel fo no grad
* logged metrics [ci skip]
* completion col is ignored [ci skip]
* fix latex
* double space? ~?
* try a latex fix
* with branch
* Empty commit
* Empty commit
* double space seems to be the solution
* first commit
* uncomment
* other tests adaptations
* Remove unused variable in test_setup_chat_format
* Remove unused import statement
* style
* Add Bart model
* Update BCOTrainerTester class in test_bco_trainer.py
* Update model IDs and tokenizers in test files
* Add new models and processors
* Update model IDs in test files
* Fix formatting issue in test_dataset_formatting.py
* Refactor dataset formatting in test_dataset_formatting.py
* Fix dataset sequence length in SFTTrainerTester
* Remove tokenizer
* Remove print statement
* Add reward_model_path and sft_model_path to PPO trainer
* Fix tokenizer padding issue
* Add chat template for testing purposes in PaliGemma model
* Update PaliGemma model and chat template
* Increase learning rate to speed up test
* Update model names in run_dpo.sh and run_sft.sh scripts
* Update model and dataset names
* Fix formatting issue in test_dataset_formatting.py
* Fix formatting issue in test_dataset_formatting.py
* Remove unused chat template
* Update model generation script
* additional models
* Update model references in test files
* Remove unused imports in test_online_dpo_trainer.py
* Add is_llm_blender_available import and update reward_tokenizer
* Refactor test_online_dpo_trainer.py: Move skipped test case decorator
* remove models without chat templates
* Update model names in scripts and tests
* Update model_id in test_modeling_value_head.py
* Update model versions in test files
* Fix formatting issue in test_dataset_formatting.py
* Update embedding model ID in BCOTrainerTester
* Update test_online_dpo_trainer.py with reward model changes
* Update expected formatted text in test_dataset_formatting.py
* Add reward_tokenizer to TestOnlineDPOTrainer
* fix tests
* Add SIMPLE_CHAT_TEMPLATE to T5 tokenizer
* Fix dummy_text format in test_rloo_trainer.py
* Skip outdated test for chatML data collator
* Add new vision language models
* Commented out unused model IDs in test_vdpo_trainer
* Update model and vision configurations in generate_tiny_models.py and test_dpo_trainer.py
* Update model and tokenizer references
* Don't push if it already exists
* Add comment explaining test skip
* Fix model_exists function call and add new models
* Update LlavaForConditionalGeneration model and processor
* `qgallouedec` -> `trl-internal-testing`
* in progress
* refactor concatenated_inputs and concatenated_forward
* progress
* further modif
* padding side
* eos prompt enc dec
* prompt_padding_side
* drop prompt apdding side collator
* working on decoder only
* dpo trainer
* Fix loss_mask type conversion bug
* bad attention mask
* try to get the same tokens as main
* fix loss mask
* fix unused col
* added comment
* raise error when paddind token not set
* remove private method tests
* initial vlm support
* make it work for paligemma
* minor test updates
* style
* improve readibility
* improve doc
* style
* flush left and truncate
* flush left in the code
* fix empty_cols and make max_length optional
* always add eos token
* minor changes and doc
* style
* fix docstring
* preference collator in doc
* fix doc
* optional max_completion_length
* Investigating CI failing
* style
* just dpo trainer test
* just idefics
* paligemma
* llava
* test cli
* dataset in test
* all tests
* Update trl/trainer/dpo_trainer.py
* Update trl/trainer/dpo_trainer.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update trl/trainer/dpo_trainer.py
* Update trl/trainer/dpo_trainer.py
* reference to ref
* rich descriptions
* fix logits reporting
* fix truncation
* remove chat template from dpo_vlm
* `get_batch_sample` -> `generate_from_model[_and_ref]`
* add `num_items_in_batch=None`
* `num_items_in_batch` in `training_step`
* Fix return type hint
* test tokenize row
* fix test
---------
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* clarify ConstantLengthDataset usage
* dont provide dataset text field when formatting func is provided
* kto maybe_apply_chat_template
* default text field
* doc
* remove maybe_apply_chat_template from kto example
* dataset text field always a str
* remove `dataset_text_field="text"`
* update doc
* drop canonical
* Delete ultrafeedback_prompt_only.py dataset script
* reduce dif in best_of_n
* try to revert best_of_n to make github happy
* anyway...
* initial DPOConfig
* fix doc string
* use DPOConfig
* fix missing import
* fix DpoScriptArguments
* override args config when given in init
* use DPOConfig
* fix output dir name
* over-ride with depreicated arguments if given
* use DPOConfig in tests
* fix comment
* add custom_message
* use dataset_train_name and dataset_test_name
* beta is also in the training_args
* fix loss_type docs
* Update trl/commands/cli_utils.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update trl/commands/cli_utils.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update trl/commands/cli_utils.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* use DPOScriptArguments
---------
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* CLI V1
* v1 CLI
* add rich enhancmeents
* revert unindented change
* some comments
* cleaner CLI
* fix
* fix
* remove print callback
* move to cli instead of trl_cli
* revert unneeded changes
* fix test
* Update trl/commands/sft.py
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* remove redundant strings
* fix import issue
* fix other issues
* add packing
* add config parser
* some refactor
* cleaner
* add example config yaml file
* small refactor
* change a bit the logic
* fix issues here and there
* add CLI in docs
* move to examples/sft
* remove redundant licenses
* make it work on dpo
* set to None
* switch to accelerate and fix many things
* add docs
* more docs
* added tests
* doc clarification
* more docs
* fix CI for windows and python 3.8
* fix
* attempt to fix CI
* fix?
* test
* fix
* tweak?
* fix
* test
* another test
* fix
* test
* fix
* fix
* fix
* skip tests for windows
* test @lvwerra approach
* make dev
* revert unneeded changes
* fix sft dpo
* optimize a bit
* address final comments
* update docs
* final comment
---------
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>