Commit Graph

166 Commits

Author SHA1 Message Date
999acd53ec 🕺 Migrate setup configuration from setup.py to setup.cfg and make rich an optional dep (#3403)
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-05-02 11:03:57 -07:00
8606b1ad09 🪪 Remove license classifier (#3402) 2025-05-02 10:03:39 -07:00
09b669fbf7 [🐯+GRPO] Support FSDP + Fix bug when using LigerGRPO with DDP (#3260)
Co-authored-by: Ubuntu <azureuser@liger-ci-h100-vm.kvghai4yzzmufguwws3040dwlf.dx.internal.cloudapp.net>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-04-30 22:49:45 +02:00
a9b27f82d6 ⬆️ Bump dev version (#3357) 2025-04-24 16:22:12 -07:00
cd6b3de356 Release: v0.17 (#3356) 2025-04-24 16:15:45 -07:00
d625c5533a ⏱️ Fix vLLM server to support V1 Engine (#3276) 2025-04-10 18:29:50 -07:00
ae1581474e 🚧 Temporarily restrict diffusers to <0.33.0 due to ftfy optional dep issue breaking doc builds (#3273) 2025-04-09 10:20:43 -07:00
9df19e8a75 📜 Fix license and copyrights (#3264) 2025-04-08 15:22:58 -07:00
793735a698 🐯 Integrate Liger GRPO Loss to GRPO Trainer (#3184)
Co-authored-by: Ubuntu <azureuser@liger-ci-h100-vm.kvghai4yzzmufguwws3040dwlf.dx.internal.cloudapp.net>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-04-03 19:17:00 +02:00
2fe2337067 🏃 Migrate CI to self-hosted runners (#3174) 2025-03-29 11:56:44 -07:00
f6b4d6e569 [Liger] Liger KTO support (#2812)
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
2025-03-28 20:56:59 +01:00
6067e2a669 BCOTrainer version upgrade fixes (#2867)
Co-authored-by: Clara Luise Pohland <clara-luise.pohland@telekom.de>
2025-03-24 10:55:00 +01:00
a0a53171cc ⬆️ Bump dev version 2025-03-22 21:14:59 +00:00
23a635ed61 Release: v0.16 (#3137) 2025-03-22 14:03:54 -07:00
f713f614e9 🚀 Scaling GRPO to 70B+ Models and Multi-Node Training with vLLM Server & NCCL Communication (#3094)
* 🚀allow GRPO to connect to VLLM in remote/local node with NCCL communication

* Update trl/extras/remote_vllm_helper.py

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>

* use argparse for options

* add  imports for remote vllm helper

* formatting

* fix arguments

* use cli options

* vllm serve

* clean server

* better naming

* client

* style

* new params in generate

* this method is the new default

* update config

* do not use asserts

* update config

* separate host and post

* proper deprectation

* deprecated arg in the vllm server

* simplify moving

* document host and port

* style

* update trainer

* new generate args

* update doc

* Fix for zero3

* Better naming

* Remove remote_vllm_helper

* remove grpo_with_remote_vllm

* remove cloudpickle from deps

* Some consistency

* Update docs/source/grpo_trainer.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update setup.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* add revision argument to vllm server

* Update docs/source/grpo_trainer.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update docs/source/grpo_trainer.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Reset the prefix cache after updating weights

* Update vllm_client.py

* Update vllm_client.py

* Update vllm_serve.py

* Add health check endpoint to vLLM server

* connection timeout

* style

* fix doc langauge hint

* move reset_prefix_cache to its own endpoint

* async

* merge peft adaptor to send to vllm

* Looks simple. Wasn't.

* Peft compatibility

* Update docs/source/speeding_up_training.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update docs/source/speeding_up_training.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update trl/extras/vllm_client.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* GatheredParameters can be disabled

* gather and ungather peft weights within the same deepseed context

* use is_vllm_available

* minor consistency fixes

* fix error when deepspeed is not installed

* fix deepspeed import when not peft

* simpler

* multinode doc

* minor code and comments changes

* style

* optional deps

* vllm_server_timeout as arg

* small refinement in doc

* update deps

* Fix VLLMClient argument in grpo_trainer; Add zero3+peft vllm transfer solution

* Revert "Fix VLLMClient argument in grpo_trainer; Add zero3+peft vllm transfer solution"

This reverts commit d759c9c4d12ff4531482c465c6257a59987ba748.

* log num_tokens

* disable vllm test (in the future we'll add a mock for vllm server for them)

* style

* fix ds3_gather_for_generation

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-03-21 12:12:08 -07:00
4871c82b0c 🏊 [SFT] Compatibility with padding free and iterable dataset (#3053)
* Compatibilitywith padding free and iterable dataset

* Fix collator test

* add a test for streaming

* some cleaning

* improve and fix tests

* tiny revert

* bump datasets to 3.0.0
2025-03-12 11:44:25 -07:00
3d94e4e25c 📜 Update README and doc index (#2986)
* Update readme and doc index

* bold

* consistent uppercase
2025-02-28 13:51:58 +01:00
45ccdefac4 📌 Pin liger-kernel and vLLM (#2952)
* pin liger-kernel

* style
2025-02-25 00:34:16 +01:00
ae3bd0d07a 🆙 Bump vLLM min version to 0.7.2 (#2860)
Bumps vllm as there were a number of throughput improvements in vllm==0.7.2

Also may resolve issue such as https://github.com/huggingface/trl/issues/2851
2025-02-17 10:54:07 +01:00
6d9fc11fd6 [SFT] fix check for AutoLigerKernelForCausalLM (#2874)
* fix check for AutoLigerKernelForCausalLM

* fix case where AutoLigerKernelForCausalLM is not defined

* update min liger version

* formatting

* fix win CI
2025-02-17 07:50:55 +01:00
ffcb9f4aee ⬆️ Bump dev version 2025-02-13 14:33:44 +00:00
00e5889380 Release: v0.15 2025-02-13 14:28:36 +00:00
1a2276402f 📌 vLLM >= 0.7.1 for device fix (#2766)
see https://github.com/huggingface/trl/issues/2745
2025-02-04 20:12:22 +01:00
df8f619ec5 📦 trl.templates in excluded packages (#2690) 2025-01-30 09:31:08 +01:00
56880ba73d ⬆️ Bump dev version (#2689) 2025-01-30 09:23:31 +01:00
a5c88d6c75 Add uv installation instructions (#2601)
* add uv

* Update docs/source/installation.mdx

* Update docs/source/installation.mdx

* pypi -> PyPI

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co>
2025-01-21 22:09:18 +01:00
2ecd53ad77 🏎️ vLLM for Online DPO (#2558)
* vllm online dpo

* new arg and add back generation config [skip ci]

* import utils

* optional import and comment

* is_vllm_available

* support conv and not conv [ci skip]

* add old code back

* use func [skip ci]

* fix _generate call

* fix and dedicated func

* top k 50

* style

* add import error

* new testing model

* Update OnlineDPOTrainer class with new features

* test vllm

* fix generate tiny script

* max len arg

* fix comment [ci skip]

* revert num_return_sequences

* vllm dep

* Add require_torch_accelerator import and skip test if vllm is not available

* proper require_torch_accelerator

* add vllm section

* Add hfoption sections to speeding_up_training.md

* no, an id

* Update vllm dependency to exclude Windows platform

* Note on future release

* style
2025-01-17 11:39:13 +01:00
1d23ecc36f ©️ Update copyrights year (#2547)
* happy new year

* fix wandb import sort
2025-01-07 14:53:09 +01:00
f68d11f9f9 Bump version 2024-12-15 19:56:54 +01:00
ca850be0a2 🕹️ CLI refactor (#2380)
* Refactor main function in dpo.py

* Update setup.py and add cli.py

* Add examples to package data

* style

* Refactor setup.py file

* Add new file t.py

* Move dpo to package

* Update MANIFEST.in and setup.py, refactor trl/cli.py

* Add __init__.py to trl/scripts directory

* Add license header to __init__.py

* File moved instruction

* Add Apache License and update file path

* Move dpo.py to new location

* Refactor CLI and DPO script

* Refactor import structure in scripts package

* env

* rm config from chat arg

* rm old cli

* chat init

* test cli [skip ci]

* Add `datast_config_name` to `ScriptArguments` (#2440)

* add missing arg

* Add test cases for 'trl sft' and 'trl dpo' commands

* Add sft.py script and update cli.py to include sft command

* Move sft script

* chat

* style [ci skip]

* kto

* rm example config

* first step on doc

* see #2442

* see #2443

* fix chat windows

* ©️ Copyrights update (#2454)

* First changes

* Other files

* Finally

* rm comment

* fix nashmd

* Fix example

* Fix example [ci skip]

* 💬 Fix chat for windows (#2443)

* fix chat for windows

* add some tests back

* Revert "add some tests back"

This reverts commit 350aef52f53f8cf34fccd7ad0f78a3dd63867e06.

* 🆔 Add `datast_config` to `ScriptArguments` (#2440)

* datast_config_name

* Update trl/utils.py [ci skip]

* sort import

* typo [ci skip]

* Trigger CI

* Rename `dataset_config_name` to `dataset_config`

* 🏎 Fix deepspeed preparation of `ref_model` in `OnlineDPOTrainer` (#2417)

* Remove unused deepspeed code

* add model prep back

* add deepspeed even if it doesn't work

* rm old code

* Fix config name

* Remove `make dev` in favor of `pip install -e .[dev]`

* Update script paths and remove old symlink related things

* Fix chat script path [ci skip]

* style
2024-12-13 17:52:23 +01:00
9410874787 ©️ Copyrights update (#2454)
* First changes

* Other files

* Finally

* rm comment

* fix nashmd

* Fix example

* Fix example [ci skip]
2024-12-10 10:40:00 +01:00
6578fdc101 🔀 Add MergeModelCallBack (#2282)
* Create mergekit_utils.py

* adding mergekit as an optional dependancy

* adding MergeModel to callbacks

* adding mergekit_utils dependencies to callbacks

* setting lower bound for mergekit

* setting mergekit lower band to 0.0.5.1

* adding support for MergeModelCallBack __init__.py

* adding support for mergemodelcallback

* mergemodelcallback tests

* Update callbacks.py

* Update __init__.py

* Update __init__.py

* Update test_callbacks.py

* Update trl/trainer/callbacks.py

removing ## from docs

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update trl/trainer/callbacks.py

removing ## from docs

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update trl/trainer/callbacks.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* using different dataset for tests

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update trl/mergekit_utils.py

adding types

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update trl/mergekit_utils.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Apply suggestions from code review

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* replacing get_last_checkpoint

* renaming Merge to merge_models

* setting mergers default value to linear

* removing unnecessary docs and comments

* adding docstring to Mergeconfig

* adding mergekits link to docstring

* precommit

* removing duplicated import

* typos in mergekit_utils docstring

* fixing tests

* making mergemodelcallback tests optional

* Make import optional

* minor

* use tmp dir in test

* sort

* Add import error checks for mergekit extra

* use a common _merge_and_maybe_push method and compat with windows path

* debug windows

* Update dependencies for mergekit and add test dependencies

* Add assertion to check if merged folder exists in the last checkpoint

* Fix temporary directory cleanup in test_callbacks.py

* Add sys import and skip test for Python versions below 3.10 due to cleanup errors with temp dir

* revert change for debug

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2024-11-21 14:06:45 +01:00
b80c1a6fb8 🎲 Move random judges in testing utilities (#2365)
* Update judges and testing utilities

* Update judges in test files

* Update judges in test files
2024-11-18 18:43:18 +01:00
c86b51cd12 Bump liger-kernel to fix grad acc and more features (#2333)
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2024-11-08 12:16:33 +01:00
2f34a161cd Bump dev version to 0.13.0.dev0 (#2305)
* Bump dev version to `0.13.0.dev0`

* Update version number to 0.12 in CITATION.cff

* 🧽 Fix judge documentation (#2318)

* Update judge examples and documentation

* without ':'

* Clean doc

* Fix typo in example code

* Add space after Attributes

* Update attribute name in judges.py

* Add installation instructions for llm-blender library

* Update PairRMJudge attributes documentation

* Fix return type in PairRMJudge

* Revert "🧽 Fix judge documentation (#2318)"

This reverts commit 337005d95169371935fb87f1c559c7412f8472a4.
2024-11-04 15:59:52 +01:00
6138439df4 🧓 Specify and test min versions (#2303)
* Add conditional check for LLMBlender availability in test_judges.py

* Fix import issues and update test requirements

* Remove unused imports

* Add require_peft decorator to test cases

* Fix import_utils module to use correct package name for llm_blender

* Found min version and test

* Update Slack notification titles

* Update dependencies versions

* Update GitHub Actions workflow to include setup.py and reorder file paths

* Revert "Update Slack notification titles"

This reverts commit be02a7f2de87905e86a847540770968d0416934a.

* Update Slack notification titles

* Remove pull_request branch restriction in tests.yml

* add check code quality back

* Fix PairRMJudge model loading issue
2024-11-01 00:26:53 +01:00
06be6f409a 🖇️ Better dependency and partitioning of CI tests (#2298)
* clean deps

* new tests

* tests

* Add tests without optional dependencies workflow

* Update dependencies in tests.yml

* cpu version of torch

* Update dependencies and installation commands

* Disable fail-fast in test workflow

* Update test matrix in workflows file

* try fix windows

* Remove "rich" from required packages in setup.py

* Update dependency installation in tests.yml

* Add torch and deepspeed installation for windows-latest

* Fix conditional statement in workflow file

* Add torch and deepspeed installation for Windows

* Fix if statement

* Update torch and deepspeed dependencies

* Update liger package requirement for non-Windows platforms

* remove scipy dep

* Add torch GPU requirement for testing_utils

* Update trl/trainer/judges.py
2024-10-31 11:08:51 +01:00
99225bb6d6 Bump the minimum transformers version to v4.46 (#2245)
* Bump the minimum transformers version

* Bump version in `requirements.txt`

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2024-10-24 10:42:30 +02:00
c00722ce0a 🃏 Model card for TRL (#2123)
* template and util

* test for online dpo

* template in package_data

* template in manifest

* standardize push_to_hub

* wandb badge and quick start

* bco

* xpo

* simplify `create_model_card`

* cpo

* kto

* dpo

* gkd

* orpo

* style

* nash-md

* alignprop

* bco citation

* citation template

* cpo citation

* ddpo

* fix alignprop

* dpo

* gkd citation

* kto

* online dpo citation

* orpo citation

* citation in utils

* optional citation

* reward

* optional trainer citation

* sft

* remove add_model_tags bco

* Remove unnecessary code for adding model tags

* Fix model tag issue and update URL format

* Remove unused code for adding model tags

* Add citation for XPOTrainer

* Remove unused code in SFTTrainer

* Add model card generation in RLOOTrainer

* Remove unused import and method call in reward_trainer.py

* Add model card generation

* Remove unused code and update error message in ORPOTrainer class

* Add import statements and create model card in IterativeSFTTrainer

* Add dataset name to push_to_hub() call

* Update trainer.push_to_hub() dataset names

* script args

* test

* better doc

* fix tag test

* fix test tag

* Add tags parameter to create_model_card method

* doc

* script args

* Update trl/templates/model_card.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* unittest's `assertIn` instead of `assert`

* Update trl/templates/model_card.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

---------

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2024-09-27 15:23:05 +02:00
2cad48d511 [CLI] trl env for printing system info (#2104) 2024-09-24 09:57:24 +02:00
92eea1f239 Clean up README and remove openrlbenchmark dependency (#2085)
* Clean up README

* Add Kashif and Quentin

* Refactor

* Apply suggestions from code review

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Add citation

* Omit benchmarks from dev install

* Remove openrlbenchmark

* Apply suggestions from code review

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2024-09-23 09:21:41 +02:00
3cec013a20 Bump dev version 2024-09-19 10:47:21 +02:00
07f0e687cb Use transformers utilities when possible (#2064)
* use transformers' availability functions

* require from transformers

* rm file

* fix no peft

* fix import

* don't alter  _peft_available

* fix require_diffusers

* style

* transformers>=4.40 and add back `is_liger_kernel_available`
2024-09-16 15:56:49 +02:00
4c92ba5769 ©️ Copyrights (#2063)
* copyrights

* fail if missing
2024-09-13 14:18:47 +02:00
e2966c8d99 Integrate OrpoTrainer with PyTorchXLA for faster step time on TPUs (#2001)
* make Orpotrainer run faster on tpu

* less data transfer

* train-trl.py

* fix

* set device_map=auto

* add is_torch_xla_available guards

* delete file

* address comments

* make presubmit

* Update transformer version in setup.py

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2024-09-11 15:11:28 +02:00
d60a1f50fe [ci] pin numpy to < 2 on win (#2009) 2024-09-03 13:03:38 +02:00
850ddcf598 [pre-commit] update pre-commit yaml (#2002)
* update pre-commit yaml

* fix test

* use element_type
2024-09-02 19:15:25 +02:00
d57e4b7265 [Online-DPO] fixes to the training scripts and setup.py (#1997)
* fixes

* fixed typo

* add tests for liger

* fix imports

* class name
2024-08-30 22:05:14 +02:00
437e8ccaba Bump dev version 2024-08-29 14:39:18 +00:00
4dd0dc2988 Adds experimental Liger support to SFT script (#1992)
* adds cli and import utils

* updates SFT script

* adds liger model to trainer

* adds liger nightly dep

* precommit

* fix import

* Update trl/commands/cli_utils.py

* Fix quality

* moved use_liger arg to sft config

* remove arg

* remove use liger from sft trainer

---------

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2024-08-29 14:48:35 +02:00