frozenleaves/trl - trl - Gitea: Git for Me

mirror of https://github.com/huggingface/trl.git synced 2025-10-20 18:43:52 +08:00

Author	SHA1	Message	Date
Carlos Miguel Patiño	aab21eb5e7	Include `chat_template_kwargs` in `apply_chat_template` (#4233 ) Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>	2025-10-10 10:39:29 -05:00
Albert Villanova del Moral	45ee98b05e	Replace unittest with pytest (#4188 )	2025-10-06 11:14:54 +02:00
Quentin Gallouédec	be1ffe59d2	🌺 Fix GPT-OSS test (#4134 )	2025-09-24 09:07:48 -06:00
MQY	85ead751f5	♻️ Reuse multimodal message preparation from `SFTTrainer` in `GRPOTrainer` (#3919 ) Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-08-20 10:04:54 -07:00
MQY	c99cd2361e	🌳 Enhance segment tree implementation for non-power-of-2 values (#3888 ) Co-authored-by: Pramodith Ballapuram <16939722+pramodith@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-08-16 21:39:57 -07:00
Quentin Gallouédec	7ee8f796ff	👔 HF Doc Builder style (#3498 )	2025-08-14 18:58:12 -07:00
Quentin Gallouédec	f5b1ed24a0	⏳ Replaced `unittest.TestCase` with `TrlTestCase` that handles tmp dir (#3863 )	2025-08-12 12:37:19 -07:00
Quentin Gallouédec	17393b8c82	🌺 OpenAI GPT OSS & Harmony support (#3848 ) Co-authored-by: Shirin Yamani <75791599+shirinyamani@users.noreply.github.com> Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>	2025-08-05 09:44:59 -07:00
Quentin Gallouédec	e102ac8df1	⚰️ Remove deprecated (#3704 )	2025-07-21 18:16:29 -07:00
LeonEricsson	0353d67661	Fix mislabeling: "First-fit decreasing" is actually "Best-fit-decreasing" (#3696 ) Co-authored-by: Shirin Yamani <75791599+shirinyamani@users.noreply.github.com>	2025-07-07 19:47:18 +02:00
Mario Šaško	4ccc5ca7bd	Faster `position_ids` computation for FFD packing (#3649 ) Co-authored-by: Shirin Yamani <75791599+shirinyamani@users.noreply.github.com>	2025-07-03 13:43:22 +02:00
Quentin Gallouédec	712afd5dd1	🦘 Skip no-op ChatML conversion for datasets already in ChatML format (#3594 )	2025-06-19 22:37:58 +02:00
George Grigorev	7c8355d038	📦 Packing with flash attn kwargs to avoid cross-contamination (#3526 ) Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-06-05 21:18:46 -07:00
Kashif Rasul	6ffde23a45	💭 [Data] Fix DeepSeek-R1 case (#3522 ) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>	2025-06-04 11:48:16 -07:00
Quentin Gallouédec	6f288c2d9d	🐳 Add DeepseekV3 model configurations and update tests for new models (#3536 )	2025-06-04 09:34:28 -07:00
Quentin Gallouédec	fef915e36f	📉 FFD packing (#3521 )	2025-06-02 13:15:22 -07:00
Shirin Yamani	abbbb93d6a	🧪 Testing support for Qwen3 tiny (#3415 )	2025-05-07 19:32:42 -07:00
Quentin Gallouédec	9df19e8a75	📜 Fix license and copyrights (#3264 )	2025-04-08 15:22:58 -07:00
Mario Šaško	7511aa4e36	⚡ Pack 300 times faster, truncate 100 times faster (#3009 ) Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-03-22 12:33:31 -07:00
Kashif Rasul	ba036576d4	💬 Add `maybe_convert_to_chatml` map for conversational datasets in SFT (#2862 ) * add back get_formatting_func_from_dataset * maybe_convert_to_chatml * maybe_convert_to_chatml before maybe_apply_chat_template map * remove comment * test * desc * style * Update trl/data_utils.py --------- Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-02-17 16:47:06 +01:00
Quentin Gallouédec	5b9236d1e8	🔬 SFT simplification (#2405 ) * initial commit * update * Refactor SFTTrainer and SFTConfig * Update SFTConfig class in sft_config.py * Fix SFTConfig torch_dtype validation and dataset preprocessing flag * Refactor dataset mapping and conversion * Refactor dataset mapping in SFTTrainer * Fix SFTTrainerTester unit test by removing unnecessary code * Remove unused variables and update tokenization logic * Remove pack_dataset function * Add deprecation warning for tokenizer in SFTTrainer constructor * add docstring back * Update model parameter type annotation * Update SFTTrainer class definition * style * preprocess_dataset -> _prepare_dataset * Retro compat * Update formatting_func type hint in SFTTrainer constructor * typo * better comment * simplify tokenize row * Fix type hint for peft_config * fix doc * Add pack_examples function to `test_data_utils.py` * promote pack_examples and document * improve doc * Add new SFTTrainerTester2 class for testing * test was reversed * ©️ Copyrights update (#2454) * First changes * Other files * Finally * rm comment * fix nashmd * Fix example * Fix example * 💬 Fix chat for windows (#2443) * fix chat for windows * add some tests back * Revert "add some tests back" This reverts commit 350aef52f53f8cf34fccd7ad0f78a3dd63867e06. * 🆔 Add `datast_config` to `ScriptArguments` (#2440) * datast_config_name * Update trl/utils.py * sort import * typo * Trigger CI * Rename `dataset_config_name` to `dataset_config` * 🏎 Fix deepspeed preparation of `ref_model` in `OnlineDPOTrainer` (#2417) * Remove unused deepspeed code * add model prep back * add deepspeed even if it doesn't work * rm old code * 👯 Standardize `model_args` (#2442) * `model_config` -> `model_args` * sort * refactor config * drop skip prepare dataset * add sep to packing * drop prompt-completion for now * Revert "drop prompt-completion for now" This reverts commit 16ef195031ac9c860f8f2ac383ff34133fcbe70f. * Revert "add sep to packing" This reverts commit dc84d08da7a4b7804c064be1a15605f1770549e2. * Revert "drop skip prepare dataset" This reverts commit d2ee070d994a4b29ad33128a8ef99f101994a6c7. * Revert "refactor config" This reverts commit f732aa8728e42623ee5817b514263912cab337e4. * Format * Update doc-builder workflow to use specific commit sha * add peft edge cases * no logits when using liger * remove unused columns * proper handle of prompt-completion * trick to keep messages * fix messages missing * for Liger kernel, ensure only input_ids is present * packing and liger are compatible * shinny doc and final nits * another nit * refactor config and doc * re add truncation * fix ci * drop deprecated params in tests * fix link * fix config docstring --------- Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>	2025-02-08 00:21:36 +01:00
omahs	4659ad916f	🖊 Fix typos (#2673 ) * fix typos * fix typo * fix typo * fix typos * fix typos * fix typo * fix typo * fix typo * fix typo * fix typo * fix typo * fix typo * fix typo * fix typo * fix typo	2025-01-28 11:26:36 +01:00
Quentin Gallouédec	1d23ecc36f	©️ Update copyrights year (#2547 ) * happy new year * fix wandb import sort	2025-01-07 14:53:09 +01:00
August Moharrami	e3e171a26b	🔨 Support for tools for data utils (#2455 ) * function calling training support for SFTTraining * adding tool support to data_utils * adding test for function calling tokenizer * reverting changes to sfttrainer and config,added maybe_apply_chat_template * arg for maybe_apply_chat_templates docstring * Doc sectioning * minor test modification * minor doc modification --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co>	2024-12-12 17:11:50 +01:00
Quentin Gallouédec	9410874787	©️ Copyrights update (#2454 ) * First changes * Other files * Finally * rm comment * fix nashmd * Fix example * Fix example [ci skip]	2024-12-10 10:40:00 +01:00
Quentin Gallouédec	453db5cd79	🤏 New models for tests (#2287 ) * first commit * uncomment * other tests adaptations * Remove unused variable in test_setup_chat_format * Remove unused import statement * style * Add Bart model * Update BCOTrainerTester class in test_bco_trainer.py * Update model IDs and tokenizers in test files * Add new models and processors * Update model IDs in test files * Fix formatting issue in test_dataset_formatting.py * Refactor dataset formatting in test_dataset_formatting.py * Fix dataset sequence length in SFTTrainerTester * Remove tokenizer * Remove print statement * Add reward_model_path and sft_model_path to PPO trainer * Fix tokenizer padding issue * Add chat template for testing purposes in PaliGemma model * Update PaliGemma model and chat template * Increase learning rate to speed up test * Update model names in run_dpo.sh and run_sft.sh scripts * Update model and dataset names * Fix formatting issue in test_dataset_formatting.py * Fix formatting issue in test_dataset_formatting.py * Remove unused chat template * Update model generation script * additional models * Update model references in test files * Remove unused imports in test_online_dpo_trainer.py * Add is_llm_blender_available import and update reward_tokenizer * Refactor test_online_dpo_trainer.py: Move skipped test case decorator * remove models without chat templates * Update model names in scripts and tests * Update model_id in test_modeling_value_head.py * Update model versions in test files * Fix formatting issue in test_dataset_formatting.py * Update embedding model ID in BCOTrainerTester * Update test_online_dpo_trainer.py with reward model changes * Update expected formatted text in test_dataset_formatting.py * Add reward_tokenizer to TestOnlineDPOTrainer * fix tests * Add SIMPLE_CHAT_TEMPLATE to T5 tokenizer * Fix dummy_text format in test_rloo_trainer.py * Skip outdated test for chatML data collator * Add new vision language models * Commented out unused model IDs in test_vdpo_trainer * Update model and vision configurations in generate_tiny_models.py and test_dpo_trainer.py * Update model and tokenizer references * Don't push if it already exists * Add comment explaining test skip * Fix model_exists function call and add new models * Update LlavaForConditionalGeneration model and processor * `qgallouedec` -> `trl-internal-testing`	2024-11-25 16:31:56 +01:00
Quentin Gallouédec	73c3970c1f	🙅 Ensure dependency optionality (#2301 ) * Add conditional check for LLMBlender availability in test_judges.py * Fix import issues and update test requirements * Remove unused imports * Add require_peft decorator to test cases * Fix import_utils module to use correct package name for llm_blender	2024-10-31 22:37:49 +01:00
Quentin Gallouédec	5e24101b36	📒 Fix type/format confusions (#2223 )	2024-10-11 23:39:19 +02:00
Quentin Gallouédec	78249d9de4	Conversational dataset support for `DPOTrainer` (#2131 ) * conversational dataset support for dpo * support standard dataset for extract prompt * test standard dataset for extract prompt * fix maybe * fix maybe apply prompt * style * overwrite default learning rate of DPO * style * rlaif script * `writer_batch_size` in `train_test_split` * initial dpo doc refactoring * vision data section in doc * lil format modif * refine Vision datasets * refine doc * test new loss type format * restrcture loss function * table loss type * simplify `unsloth` * improve doc * looged metrics up * refine loss section * Fix label_smoothing parameter in DPOConfig * dataset for test * update readme * Update docs/source/dpo_trainer.mdx Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * try colorized code block * refine doc style * further refine doc * Update docs/source/dpo_trainer.mdx Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> * re add pali gemma test * Add missing period --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>	2024-10-02 10:04:03 +02:00
Quentin Gallouédec	31b93876a7	📝 Document dataset format (#2020 ) * first piece of doc * improve readibility * some data utils and doc * simplify prompt-only * format * fix path data utils * fix example format * simplify * tests * prompt-completion * update antropic hh * update dataset script * implicit prompt * additional content * `maybe_reformat_dpo_to_kto` -> `unpair_preference_dataset` * Preference dataset with implicit prompt * unpair preference dataset tests * documentation * ... * doc * changes applied to dpo example * better doc and better log error * a bit more doc * improve doc * converting * some subsections * converting section * further refinements * tldr * tldr preference * rename * lm-human-preferences-sentiment * `imdb` to `stanfordnlp/imdb` * Add script for LM human preferences descriptiveness * Remove sentiment_descriptiveness.py script * style * example judge tlrd with new dataset * Syle * Dataset conversion for TRL compatibility * further refinements * trainers in doc * top level for functions * stanfordnlp/imdb * downgrade transformers * temp reduction of tests * next commit * next commit * additional content * proper tick format * precise the assistant start token * improve * lower case * Update titles in _toctree.yml and data_utils.mdx * revert make change * correct dataset ids * expand a bit dataset formats * skip gated repo tests * data utilities in API * Update docs/source/dataset_formats.mdx Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update docs/source/dataset_formats.mdx Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update docs/source/dataset_formats.mdx Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update docs/source/dataset_formats.mdx Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * tiny internal testing for chat template testing * precise type/format * exlude sft trainer in doc * Update trl/trainer/utils.py * XPO in the doc --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2024-09-11 20:11:25 +02:00

30 Commits