frozenleaves/trl - trl - Gitea: Git for Me

mirror of https://github.com/huggingface/trl.git synced 2025-10-21 02:53:59 +08:00

Author	SHA1	Message	Date
Quentin Gallouédec	9df19e8a75	📜 Fix license and copyrights (#3264 )	2025-04-08 15:22:58 -07:00
Quentin Gallouédec	0f5ffad26e	👨‍👨‍👧‍👧 GRPO (#2565 ) * init grpo [ci skip] * initial version * refine args defs * model card * initial doc * fix badges * fix spaces * try link to super in doc * temperature, fix indexing, and std=0.0 * grpo script for cli * peft support * move data preparation in `compute_loss` * weird doc trial * fix device and some logging * unwrap_model_for_generation for distributed setting * Compat with distrib training * revert grpo config doc trial (didn't work) * test * allow model to be str and processing_class to be none; fix loss computation * advantage is always 0.0: don't log * fix peft not installed * proper reward model for testing * fix script for cli * add trl grpo to cli doc * test peft * flush left * fix reward calculation * new reward model * support any reward model * fix reward processing class def * log reward std * fix reward logging * fix grad computation * skip embed layer in test * remove optimizer_cls_and_kwargs * improve GRPO default args * reduce mem usage for grpo test * reduce mem usage in test grpo * reduce memory usage for test * Fix the test * remove redondant * fix min version * Update test_grpo_trainer.py * Update test_grpo_trainer.py * Fix test, finally found the solution! * some doc * Update doc-builder workflow to use specific commit sha * more doc * advantages * drop cancel fo no grad * logged metrics [ci skip] * completion col is ignored [ci skip] * fix latex * double space? ~? * try a latex fix * with branch * Empty commit * Empty commit * double space seems to be the solution	2025-01-20 19:02:15 +01:00
Quentin Gallouédec	2ecd53ad77	🏎️ vLLM for Online DPO (#2558 ) * vllm online dpo * new arg and add back generation config [skip ci] * import utils * optional import and comment * is_vllm_available * support conv and not conv [ci skip] * add old code back * use func [skip ci] * fix _generate call * fix and dedicated func * top k 50 * style * add import error * new testing model * Update OnlineDPOTrainer class with new features * test vllm * fix generate tiny script * max len arg * fix comment [ci skip] * revert num_return_sequences * vllm dep * Add require_torch_accelerator import and skip test if vllm is not available * proper require_torch_accelerator * add vllm section * Add hfoption sections to speeding_up_training.md * no, an id * Update vllm dependency to exclude Windows platform * Note on future release * style	2025-01-17 11:39:13 +01:00
Quentin Gallouédec	1d23ecc36f	©️ Update copyrights year (#2547 ) * happy new year * fix wandb import sort	2025-01-07 14:53:09 +01:00
Quentin Gallouédec	9410874787	©️ Copyrights update (#2454 ) * First changes * Other files * Finally * rm comment * fix nashmd * Fix example * Fix example [ci skip]	2024-12-10 10:40:00 +01:00
Quentin Gallouédec	453db5cd79	🤏 New models for tests (#2287 ) * first commit * uncomment * other tests adaptations * Remove unused variable in test_setup_chat_format * Remove unused import statement * style * Add Bart model * Update BCOTrainerTester class in test_bco_trainer.py * Update model IDs and tokenizers in test files * Add new models and processors * Update model IDs in test files * Fix formatting issue in test_dataset_formatting.py * Refactor dataset formatting in test_dataset_formatting.py * Fix dataset sequence length in SFTTrainerTester * Remove tokenizer * Remove print statement * Add reward_model_path and sft_model_path to PPO trainer * Fix tokenizer padding issue * Add chat template for testing purposes in PaliGemma model * Update PaliGemma model and chat template * Increase learning rate to speed up test * Update model names in run_dpo.sh and run_sft.sh scripts * Update model and dataset names * Fix formatting issue in test_dataset_formatting.py * Fix formatting issue in test_dataset_formatting.py * Remove unused chat template * Update model generation script * additional models * Update model references in test files * Remove unused imports in test_online_dpo_trainer.py * Add is_llm_blender_available import and update reward_tokenizer * Refactor test_online_dpo_trainer.py: Move skipped test case decorator * remove models without chat templates * Update model names in scripts and tests * Update model_id in test_modeling_value_head.py * Update model versions in test files * Fix formatting issue in test_dataset_formatting.py * Update embedding model ID in BCOTrainerTester * Update test_online_dpo_trainer.py with reward model changes * Update expected formatted text in test_dataset_formatting.py * Add reward_tokenizer to TestOnlineDPOTrainer * fix tests * Add SIMPLE_CHAT_TEMPLATE to T5 tokenizer * Fix dummy_text format in test_rloo_trainer.py * Skip outdated test for chatML data collator * Add new vision language models * Commented out unused model IDs in test_vdpo_trainer * Update model and vision configurations in generate_tiny_models.py and test_dpo_trainer.py * Update model and tokenizer references * Don't push if it already exists * Add comment explaining test skip * Fix model_exists function call and add new models * Update LlavaForConditionalGeneration model and processor * `qgallouedec` -> `trl-internal-testing`	2024-11-25 16:31:56 +01:00

6 Commits