frozenleaves/trl - trl - Gitea: Git for Me

mirror of https://github.com/huggingface/trl.git synced 2025-10-21 19:38:55 +08:00

Author	SHA1	Message	Date
Quentin Gallouédec	78f1a928ce	🗑️ Remove deprecated `AlignPropTrainer`, `DDPOTrainer` and `IterativeSFTTrainer` (#4068 )	2025-09-15 09:56:41 -06:00
johann	d1bf56020d	⚖️ Add vLLM server mode and VLM support to OnlineDPOTrainer (#3783 ) Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>	2025-09-05 16:58:49 -06:00
Shirin Yamani	e7b37d4e8d	🔥 [Refactor] RLOOTrainer (#3801 ) Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com>	2025-08-29 09:27:28 -06:00
Sergio Paniego Blanco	3ae60cd1b4	Add GSPO script examples (VLM/LLM) (#3810 )	2025-07-30 20:07:23 -06:00
Sergio Paniego Blanco	72bbc6dd0d	Examples list updated in docs (#3806 )	2025-07-30 04:09:29 -06:00
Sergio Paniego Blanco	25ce0f31ae	🐙 Add MPO VLM example script (#3799 )	2025-07-29 20:52:32 -06:00
Sergio Paniego Blanco	26d86757a7	💎 Gemma 3 VLM SFT example script for single-image and multi-image (#3131 ) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>	2025-03-26 08:16:02 -07:00
Quentin Gallouédec	ca850be0a2	🕹️ CLI refactor (#2380 ) * Refactor main function in dpo.py * Update setup.py and add cli.py * Add examples to package data * style * Refactor setup.py file * Add new file t.py * Move dpo to package * Update MANIFEST.in and setup.py, refactor trl/cli.py * Add __init__.py to trl/scripts directory * Add license header to __init__.py * File moved instruction * Add Apache License and update file path * Move dpo.py to new location * Refactor CLI and DPO script * Refactor import structure in scripts package * env * rm config from chat arg * rm old cli * chat init * test cli [skip ci] * Add `datast_config_name` to `ScriptArguments` (#2440) * add missing arg * Add test cases for 'trl sft' and 'trl dpo' commands * Add sft.py script and update cli.py to include sft command * Move sft script * chat * style [ci skip] * kto * rm example config * first step on doc * see #2442 * see #2443 * fix chat windows * ©️ Copyrights update (#2454) * First changes * Other files * Finally * rm comment * fix nashmd * Fix example * Fix example [ci skip] * 💬 Fix chat for windows (#2443) * fix chat for windows * add some tests back * Revert "add some tests back" This reverts commit 350aef52f53f8cf34fccd7ad0f78a3dd63867e06. * 🆔 Add `datast_config` to `ScriptArguments` (#2440) * datast_config_name * Update trl/utils.py [ci skip] * sort import * typo [ci skip] * Trigger CI * Rename `dataset_config_name` to `dataset_config` * 🏎 Fix deepspeed preparation of `ref_model` in `OnlineDPOTrainer` (#2417) * Remove unused deepspeed code * add model prep back * add deepspeed even if it doesn't work * rm old code * Fix config name * Remove `make dev` in favor of `pip install -e .[dev]` * Update script paths and remove old symlink related things * Fix chat script path [ci skip] * style	2024-12-13 17:52:23 +01:00
Quentin Gallouédec	70036bf87f	🕊️ Migration `PPOv2` -> `PPO` (#2174 ) * delete old ppo * rename ppov2 files * PPOv2 -> PPO * rm old doc * rename ppo doc file * rm old test * rename test * re-add v2 with deprecation * style * start update customization * Lion * Finish update customization * remove ppo_multi_adaptater * remove ppo example * update some doc * rm test no peft * rm hello world * processing class * Update docs/source/detoxifying_a_lm.mdx Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com> * Update trl/trainer/ppov2_config.py Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com> * Update docs/source/customization.mdx Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update docs/source/detoxifying_a_lm.mdx Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * po to example overview * drop lion * remove "Use 8-bit optimizer" * Update docs/source/customization.mdx * Update docs/source/customization.mdx Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * it applies to all trainers --------- Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2024-10-11 17:28:39 +02:00
Quentin Gallouédec	1201aa61b4	rename example (#2139 )	2024-09-27 21:45:21 +02:00
Kashif Rasul	b5e4bc5984	Update example_overview.md (#2125 )	2024-09-25 20:45:57 +02:00
Edward Beeching	7a24565d9d	Generalizes VSFT script to support REDACTED (#2120 ) * generalizes vst script * precommit * change launch command to use accelerate * updates docs * rename to sft_vlm * fix script location * fix formatting * comma * add model link * fix name --------- Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>	2024-09-25 19:54:44 +02:00
Quentin Gallouédec	890232fa28	update example overview (#1883 ) Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co>	2024-07-30 14:29:47 +02:00
Edward Beeching	346c99d222	Adds VLM Training support to SFTTrainer + VSFT script (#1518 ) * adds option to skip dataset preparation in SFTTrainer * before changing the template * adds support for new schema * a few fixes to data collator to support new schema * updates args * precommit * adds sys prompt to chat template and other fixes * updates template, fixes collator for multiple images * precommit * rename vsft to vstf_llava * adding integration tests * adds integration test for vsft * precommit * adds back chat template * docs * typo * adds eval, precommit * adds peft launch args * formatting * fixes no deps tests by checking if PIL lib exists * Update __init__.py --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>	2024-04-11 15:35:59 +02:00
Omar Sanseviero	a90e13321b	Fix broken link/markdown (#903 ) * Fix broken link/markdown * attempt to fix mps issue * attempt fix mps issue * test --------- Co-authored-by: Costa Huang <costa.huang@outlook.com>	2023-10-24 14:27:03 +02:00
lewtun	ddd318865b	Standardise example scripts (#842 ) * Standardise example scripts * fix plotting script * Rename run_xxx to xxx * Fix doc --------- Co-authored-by: Costa Huang <costa.huang@outlook.com>	2023-10-11 17:28:15 +02:00
Costa Huang	9f6326e65a	Unify sentiment documentation (#803 ) * Update documentation * update docs * test * format * Update docs/source/example_overview.md Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * update * add quantization dependency and update docs * Update docs/source/example_overview.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update docs/source/example_overview.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update docs/source/example_overview.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update docs/source/example_overview.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update docs/source/sentiment_tuning.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update docs/source/sentiment_tuning.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update docs/source/sentiment_tuning.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update docs/source/sentiment_tuning.mdx Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update docs/source/sentiment_tuning.mdx Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update docs/source/sentiment_tuning.mdx Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * update * quick update 2 --------- Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2023-10-02 10:35:49 -04:00

17 Commits