26 Commits

Author SHA1 Message Date
82b34e5723 Update transformers minimum version to 4.56.1 (#4047) 2025-09-09 16:05:04 +02:00
208e9f7df7 📏 torch_dype to dtype everywhere (#4000)
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
2025-09-03 15:45:37 -06:00
17393b8c82 🌺 OpenAI GPT OSS & Harmony support (#3848)
Co-authored-by: Shirin Yamani <75791599+shirinyamani@users.noreply.github.com>
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
2025-08-05 09:44:59 -07:00
5a2b04a699 ↔️ Fix CB in GRPO (#3722)
Co-authored-by: Shirin Yamani <75791599+shirinyamani@users.noreply.github.com>
2025-07-11 18:21:24 -07:00
02cce41d06 Add support for CB with native transformers (#3471)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2025-07-01 12:26:09 +02:00
b9572737b4 🆙 Bump transformers to 4.51 and use _VALID_DICT_FIELDS (#3553) 2025-06-09 21:50:57 +02:00
c49c7b7d4e 🛋️ Fix CI and bump accelerate (#3551) 2025-06-09 14:56:20 +02:00
06be6f409a 🖇️ Better dependency and partitioning of CI tests (#2298)
* clean deps

* new tests

* tests

* Add tests without optional dependencies workflow

* Update dependencies in tests.yml

* cpu version of torch

* Update dependencies and installation commands

* Disable fail-fast in test workflow

* Update test matrix in workflows file

* try fix windows

* Remove "rich" from required packages in setup.py

* Update dependency installation in tests.yml

* Add torch and deepspeed installation for windows-latest

* Fix conditional statement in workflow file

* Add torch and deepspeed installation for Windows

* Fix if statement

* Update torch and deepspeed dependencies

* Update liger package requirement for non-Windows platforms

* remove scipy dep

* Add torch GPU requirement for testing_utils

* Update trl/trainer/judges.py
2024-10-31 11:08:51 +01:00
99225bb6d6 Bump the minimum transformers version to v4.46 (#2245)
* Bump the minimum transformers version

* Bump version in `requirements.txt`

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2024-10-24 10:42:30 +02:00
07f0e687cb Use transformers utilities when possible (#2064)
* use transformers' availability functions

* require from transformers

* rm file

* fix no peft

* fix import

* don't alter  _peft_available

* fix require_diffusers

* style

* transformers>=4.40 and add back `is_liger_kernel_available`
2024-09-16 15:56:49 +02:00
e2966c8d99 Integrate OrpoTrainer with PyTorchXLA for faster step time on TPUs (#2001)
* make Orpotrainer run faster on tpu

* less data transfer

* train-trl.py

* fix

* set device_map=auto

* add is_torch_xla_available guards

* delete file

* address comments

* make presubmit

* Update transformer version in setup.py

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2024-09-11 15:11:28 +02:00
e4f9a483d9 Refactor and benchmark (#662)
* refactor and benchmark

* update code

* Add accelerate logging

* logs

* quick fix

* update config

* precommit

* modify training example

* fix multi-gpu all_reduce error `Tensors must be CUDA and dense`

* support more models and benchmark

* update

* add changes

* upload benchmark

* precommit

* add tyro as a dependency

* add tyro

* pre-commit

* precommit

* weird...

* lol typo

* precommit

* sigh

* push changes

* Update benchmark/README.md

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* Add experiments

* upload image to tag specific folder

* add openrlbenchmark documentation

* rename

* remove unused field

* precommit

* push changes

---------

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
2023-09-13 10:24:18 -04:00
01c4a35928 Denoising Diffusion Policy Optimization (#508)
* Broken first pre-draft

* Change structure to leverage user-definition of pipeline
 - reward function, pipeline and scheduler will be left to the user to define
 - pipeline and scheduler contract interfaces is what the framework will define
 - none of this actually works

* Incremental progress: trying to get the set-up running e2e

* Incemental progress: successfully running code

* Incremental progress: running setup
Next steps: fix accelerate gardient acc assertion error when we set value > 1

* Formatting and code standards

* Incremental prog: break down code a bit
- new config flag to notify code of async reward fetching
- break off image handling code and throw it on to user to define how to handle it
- more code restructuring

* Incremental progress:
1. More code sectioning off into own methods (more for readibility than anything else)

* Incremental progress:
1. clear up contracts
2. type the reward function and prompt function

* Code shuffling and expansion of tracker, accelerator config args to beyond wandb

* More small additions
Add tensorboard logging function
Remove wandb logging function for now
Consolidate the data that get's thrown to the logging function
Add README

* Formatting

* Formatting

* Remove print statement
Make tensorboard tracking the sole tracking for the training example

* 1. start of testing
2. more refactoring
3. start of docstrings
4. parameter rename

* Basic Tests
Formatting

* Docs according to the norm

* Doocs, credits and rename file

* docs and corrections

* Put example config to respectable state

* Add recent run params

* Correct the name of the library

* Move requirements to EXTRAS

* - Add license banners
- Guard import of DDPO functions with if_diffusers_available
- doc strings for output types

* Add snippet to pull weights from huggingface + banner

* Test if passes on CI/CD

* Minor refactor

* Test dummy unet

* Possible fix for randomly disappearing attribute

* Shuffling arrangement in hopes of meeting memory requirements

* Proper Names

* Appease windows memory allocator issues for the cpu device

* Remove print statements

* Update docs/source/ddpo_trainer.mdx

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update docs/source/ddpo_trainer.mdx

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Add docstrings and correct url

* Spelling and grammar

* Add more documentation and commandline parsing for example script

* Markdown synatx correction

* Revert accidentally committed file and put the correct one

* More docs

* Remove subclassing and add docs for leftoover subclassing

* Put back subclassing

* Reward metadata and more docs

* Remove save_load_save flag

* Grammar

* Update trl/trainer/ddpo_trainer.py

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* Update tests/test_ddpo_trainer.py

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* Update setup.py

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* Update examples/scripts/stable_diffusion_tuning.py

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* Edits to the readme for DDPO

* Renamed modelling_sd_base to modeling_sd_base

* Insert try and catch for bitsandbytes import

* Change to smaller model

* Correct tolerance for floating point comparison

* Remove dummy unet and move to check is isfinite

* 1. Expand interface to ensure other Stable Diffusion pipelines could be covered
2. remove extra identification

* 1. Remove most of the asserts except for one and add value error
2. Remove default run name

* Remove progress bar

* Docs

* Put back progress bar

* 1. Revert progress bar deletion completely
2. grammar
3. relocate line

* Experiment

* Remove experiment parts and format properly

* Change formatting and edit info in docs

* Grammar

* Refactor out most of nitty gritty of loading/saving from trainer to example model
Readme addition

* Docs additions

* 1. Proper formatting fr the test file
2. incorporatioon of pull frm hub if fails try local
3. doc strings for interface
4. highlight in the trainer, that this is only ready fr sd pipelines

* Resources for before and after

* Attempt at embedding images

* Post testing example script

* Consistent naming and document edits in light of new args

* Remove resources and add CDN links in html in doc file

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
2023-08-21 19:24:52 +02:00
e547c392f9 Remove obsolete layer_norm_names parameter and add peft>=0.3.0 to requirements (#366)
* remove obsolete layer_norm_names parameter

* remove obsolete parameter layer_norm_names and add peft>=0.3.0 to requirements

* make style - oops

* typo
2023-05-15 16:08:11 +02:00
e954fa00e5 [core] remove wandb dependency (#92)
* remove `wandb` dependency

* make the logging framework agnostic

* `tensorboard` support

* modify func name

* fix default value
2023-01-19 15:17:01 +01:00
c322d8e7a7 Relax requirements (#66)
* relax requirements

* relax requirements
2022-12-30 11:51:20 +01:00
b1279004e7 accelerate integration (#58)
* working v1

* add `accelerate` on requirements

* add `accelerate` on `setup.py`

* add `datasets` on `setup.py`

* small updates

- add docstring on most functions
- correct logging

* rm unneeded file

* replace with `generate`

* Update trl/trainer/accelerate_ppo.py

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* correct return

* add dataloader support

* add `wandb` to `setup.py`

* refactor

- remove `_build_dataset` method
- change name to `PPOTrainer`

* test

* fix test

* rename file

* refactor

* remove unneeded device assignment

* fix correct device assignment

* standardize docstrings

* add `wandb` on `dev`

* fix slow convergence

- random init seems to converge much faster

* oops

* revert fix

* revert patch

* remove unneeded reshape

* add input safety checker

* refactor

- added comments on example
- fixes CI test
- rewards should be a list of tensors
- clearer error messages
- remove build model method
- refactor log stats method

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* refactor

- added `PPOConfig` class
- docstring on `LengthSampler`
- fix test
- gather rewards when logging
- unwrap model when calling generate

* some refactor

* remove unneeded hack

* adapt dataset

* fix test

* remove rollout

* remove timing

* remove `shuffle=True`

* remove `LengthSampler` from trainer

* refactor

* remove text length sampler args from config

* change collate_fn

* fix silent bug

* rename

* move file

* refactor base trainer

* fix collate

* final bug

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2022-12-30 09:27:25 +01:00
d588fced02 remove nbdev from requirements 2022-12-16 17:00:39 +00:00
dfb864dcde refactor
- move notebooks to `examples/notebooks``
- removed `_nbdev`file
- refactored `gpt2.py` to make it work with more recent `transformers`
- update `requirements` to add recent `transformers`
2022-12-16 16:58:29 +00:00
52910d3bf1 Dynamic input sizes (#35)
* change ppo input from tensor to list of tensors for varying shapes

* update readme example with new input type

* update docs

* add listification of tensors need for new API

* replace nans in tensors for wandb compatibility

* add `listify_batch` helper function for backwards compatibility

* update sentiment example with new api

* update docs

* update library

* ignore wandb artifacts

* update requirements

* run experiment

* replace respond to batch with generate

* add experiment

* update docs

* fix action

* fix action
2022-05-15 18:16:25 +02:00
002a72e5fb chore(deps): bump jupyterlab from 2.0.1 to 2.2.10
Bumps [jupyterlab](https://github.com/jupyterlab/jupyterlab) from 2.0.1 to 2.2.10.
- [Release notes](https://github.com/jupyterlab/jupyterlab/releases)
- [Changelog](https://github.com/jupyterlab/jupyterlab/blob/master/CHANGELOG.md)
- [Commits](https://github.com/jupyterlab/jupyterlab/compare/@jupyterlab/vdom@2.0.1...@jupyterlab/application-top@2.2.10)

---
updated-dependencies:
- dependency-name: jupyterlab
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-01-01 16:31:56 +00:00
919a1eb5ca replace simpletransformers training and use datasets for data loading 2022-01-01 15:45:40 +01:00
78ba8f6af0 Upgrade to HF transformers 4.3.2 2021-02-24 16:15:07 -05:00
033a3159fb chore: bump up wandb version in requirements 2020-05-12 20:53:07 +02:00
611d5b47af feat: add gpt2 control experiment notebook 2020-04-22 18:36:16 +02:00
fdd95b9b9f feat: add requirements (module and repo) 2020-03-29 13:54:56 +02:00