78f1a928ce
🗑️ Remove deprecated AlignPropTrainer
, DDPOTrainer
and IterativeSFTTrainer
( #4068 )
2025-09-15 09:56:41 -06:00
d1bf56020d
⚖️ Add vLLM server mode and VLM support to OnlineDPOTrainer ( #3783 )
...
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com >
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com >
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com >
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com >
2025-09-05 16:58:49 -06:00
e7b37d4e8d
🔥 [Refactor] RLOOTrainer ( #3801 )
...
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com >
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com >
Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com >
2025-08-29 09:27:28 -06:00
3ae60cd1b4
Add GSPO script examples (VLM/LLM) ( #3810 )
2025-07-30 20:07:23 -06:00
72bbc6dd0d
Examples list updated in docs ( #3806 )
2025-07-30 04:09:29 -06:00
25ce0f31ae
🐙 Add MPO VLM example script ( #3799 )
2025-07-29 20:52:32 -06:00
26d86757a7
💎 Gemma 3 VLM SFT example script for single-image and multi-image ( #3131 )
...
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com >
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com >
2025-03-26 08:16:02 -07:00
ca850be0a2
🕹️ CLI refactor ( #2380 )
...
* Refactor main function in dpo.py
* Update setup.py and add cli.py
* Add examples to package data
* style
* Refactor setup.py file
* Add new file t.py
* Move dpo to package
* Update MANIFEST.in and setup.py, refactor trl/cli.py
* Add __init__.py to trl/scripts directory
* Add license header to __init__.py
* File moved instruction
* Add Apache License and update file path
* Move dpo.py to new location
* Refactor CLI and DPO script
* Refactor import structure in scripts package
* env
* rm config from chat arg
* rm old cli
* chat init
* test cli [skip ci]
* Add `datast_config_name` to `ScriptArguments` (#2440 )
* add missing arg
* Add test cases for 'trl sft' and 'trl dpo' commands
* Add sft.py script and update cli.py to include sft command
* Move sft script
* chat
* style [ci skip]
* kto
* rm example config
* first step on doc
* see #2442
* see #2443
* fix chat windows
* ©️ Copyrights update (#2454 )
* First changes
* Other files
* Finally
* rm comment
* fix nashmd
* Fix example
* Fix example [ci skip]
* 💬 Fix chat for windows (#2443 )
* fix chat for windows
* add some tests back
* Revert "add some tests back"
This reverts commit 350aef52f53f8cf34fccd7ad0f78a3dd63867e06.
* 🆔 Add `datast_config` to `ScriptArguments` (#2440 )
* datast_config_name
* Update trl/utils.py [ci skip]
* sort import
* typo [ci skip]
* Trigger CI
* Rename `dataset_config_name` to `dataset_config`
* 🏎 Fix deepspeed preparation of `ref_model` in `OnlineDPOTrainer` (#2417 )
* Remove unused deepspeed code
* add model prep back
* add deepspeed even if it doesn't work
* rm old code
* Fix config name
* Remove `make dev` in favor of `pip install -e .[dev]`
* Update script paths and remove old symlink related things
* Fix chat script path [ci skip]
* style
2024-12-13 17:52:23 +01:00
70036bf87f
🕊️ Migration PPOv2
-> PPO
( #2174 )
...
* delete old ppo
* rename ppov2 files
* PPOv2 -> PPO
* rm old doc
* rename ppo doc file
* rm old test
* rename test
* re-add v2 with deprecation
* style
* start update customization
* Lion
* Finish update customization
* remove ppo_multi_adaptater
* remove ppo example
* update some doc
* rm test no peft
* rm hello world
* processing class
* Update docs/source/detoxifying_a_lm.mdx
Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com >
* Update trl/trainer/ppov2_config.py
Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com >
* Update docs/source/customization.mdx
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
* Update docs/source/detoxifying_a_lm.mdx
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
* po to example overview
* drop lion
* remove "Use 8-bit optimizer"
* Update docs/source/customization.mdx
* Update docs/source/customization.mdx
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
* it applies to all trainers
---------
Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com >
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
2024-10-11 17:28:39 +02:00
1201aa61b4
rename example ( #2139 )
2024-09-27 21:45:21 +02:00
b5e4bc5984
Update example_overview.md ( #2125 )
2024-09-25 20:45:57 +02:00
7a24565d9d
Generalizes VSFT script to support REDACTED ( #2120 )
...
* generalizes vst script
* precommit
* change launch command to use accelerate
* updates docs
* rename to sft_vlm
* fix script location
* fix formatting
* comma
* add model link
* fix name
---------
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com >
2024-09-25 19:54:44 +02:00
890232fa28
update example overview ( #1883 )
...
Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co >
2024-07-30 14:29:47 +02:00
346c99d222
Adds VLM Training support to SFTTrainer + VSFT script ( #1518 )
...
* adds option to skip dataset preparation in SFTTrainer
* before changing the template
* adds support for new schema
* a few fixes to data collator to support new schema
* updates args
* precommit
* adds sys prompt to chat template and other fixes
* updates template, fixes collator for multiple images
* precommit
* rename vsft to vstf_llava
* adding integration tests
* adds integration test for vsft
* precommit
* adds back chat template
* docs
* typo
* adds eval, precommit
* adds peft launch args
* formatting
* fixes no deps tests by checking if PIL lib exists
* Update __init__.py
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com >
2024-04-11 15:35:59 +02:00
a90e13321b
Fix broken link/markdown ( #903 )
...
* Fix broken link/markdown
* attempt to fix mps issue
* attempt fix mps issue
* test
---------
Co-authored-by: Costa Huang <costa.huang@outlook.com >
2023-10-24 14:27:03 +02:00
ddd318865b
Standardise example scripts ( #842 )
...
* Standardise example scripts
* fix plotting script
* Rename run_xxx to xxx
* Fix doc
---------
Co-authored-by: Costa Huang <costa.huang@outlook.com >
2023-10-11 17:28:15 +02:00
9f6326e65a
Unify sentiment documentation ( #803 )
...
* Update documentation
* update docs
* test
* format
* Update docs/source/example_overview.md
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com >
* update
* add quantization dependency and update docs
* Update docs/source/example_overview.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
* Update docs/source/example_overview.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
* Update docs/source/example_overview.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
* Update docs/source/example_overview.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
* Update docs/source/sentiment_tuning.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
* Update docs/source/sentiment_tuning.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
* Update docs/source/sentiment_tuning.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
* Update docs/source/sentiment_tuning.mdx
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
* Update docs/source/sentiment_tuning.mdx
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
* Update docs/source/sentiment_tuning.mdx
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
* update
* quick update 2
---------
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com >
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com >
2023-10-02 10:35:49 -04:00