frozenleaves/trl - trl - Gitea: Git for Me

mirror of https://github.com/huggingface/trl.git synced 2025-10-20 10:03:51 +08:00

Author	SHA1	Message	Date
Albert Villanova del Moral	82b34e5723	Update transformers minimum version to 4.56.1 (#4047 )	2025-09-09 16:05:04 +02:00
Sergio Paniego Blanco	208e9f7df7	📏 `torch_dype` to `dtype` everywhere (#4000 ) Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>	2025-09-03 15:45:37 -06:00
Quentin Gallouédec	17393b8c82	🌺 OpenAI GPT OSS & Harmony support (#3848 ) Co-authored-by: Shirin Yamani <75791599+shirinyamani@users.noreply.github.com> Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>	2025-08-05 09:44:59 -07:00
Quentin Gallouédec	5a2b04a699	↔️ Fix CB in GRPO (#3722 ) Co-authored-by: Shirin Yamani <75791599+shirinyamani@users.noreply.github.com>	2025-07-11 18:21:24 -07:00
Arthur	02cce41d06	Add support for CB with native transformers (#3471 ) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>	2025-07-01 12:26:09 +02:00
Quentin Gallouédec	b9572737b4	🆙 Bump transformers to 4.51 and use `_VALID_DICT_FIELDS` (#3553 )	2025-06-09 21:50:57 +02:00
Quentin Gallouédec	c49c7b7d4e	🛋️ Fix CI and bump accelerate (#3551 )	2025-06-09 14:56:20 +02:00
Quentin Gallouédec	06be6f409a	🖇️ Better dependency and partitioning of CI tests (#2298 ) * clean deps * new tests * tests * Add tests without optional dependencies workflow * Update dependencies in tests.yml * cpu version of torch * Update dependencies and installation commands * Disable fail-fast in test workflow * Update test matrix in workflows file * try fix windows * Remove "rich" from required packages in setup.py * Update dependency installation in tests.yml * Add torch and deepspeed installation for windows-latest * Fix conditional statement in workflow file * Add torch and deepspeed installation for Windows * Fix if statement * Update torch and deepspeed dependencies * Update liger package requirement for non-Windows platforms * remove scipy dep * Add torch GPU requirement for testing_utils * Update trl/trainer/judges.py	2024-10-31 11:08:51 +01:00
Quentin Gallouédec	99225bb6d6	Bump the minimum transformers version to v4.46 (#2245 ) * Bump the minimum transformers version * Bump version in `requirements.txt` --------- Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>	2024-10-24 10:42:30 +02:00
Quentin Gallouédec	07f0e687cb	Use `transformers` utilities when possible (#2064 ) * use transformers' availability functions * require from transformers * rm file * fix no peft * fix import * don't alter _peft_available * fix require_diffusers * style * transformers>=4.40 and add back `is_liger_kernel_available`	2024-09-16 15:56:49 +02:00
wenxindongwork	e2966c8d99	Integrate OrpoTrainer with PyTorchXLA for faster step time on TPUs (#2001 ) * make Orpotrainer run faster on tpu * less data transfer * train-trl.py * fix * set device_map=auto * add is_torch_xla_available guards * delete file * address comments * make presubmit * Update transformer version in setup.py --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2024-09-11 15:11:28 +02:00
Costa Huang	e4f9a483d9	Refactor and benchmark (#662 ) * refactor and benchmark * update code * Add accelerate logging * logs * quick fix * update config * precommit * modify training example * fix multi-gpu all_reduce error `Tensors must be CUDA and dense` * support more models and benchmark * update * add changes * upload benchmark * precommit * add tyro as a dependency * add tyro * pre-commit * precommit * weird... * lol typo * precommit * sigh * push changes * Update benchmark/README.md Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * Add experiments * upload image to tag specific folder * add openrlbenchmark documentation * rename * remove unused field * precommit * push changes --------- Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>	2023-09-13 10:24:18 -04:00
Luke Meyers	01c4a35928	Denoising Diffusion Policy Optimization (#508 ) * Broken first pre-draft * Change structure to leverage user-definition of pipeline - reward function, pipeline and scheduler will be left to the user to define - pipeline and scheduler contract interfaces is what the framework will define - none of this actually works * Incremental progress: trying to get the set-up running e2e * Incemental progress: successfully running code * Incremental progress: running setup Next steps: fix accelerate gardient acc assertion error when we set value > 1 * Formatting and code standards * Incremental prog: break down code a bit - new config flag to notify code of async reward fetching - break off image handling code and throw it on to user to define how to handle it - more code restructuring * Incremental progress: 1. More code sectioning off into own methods (more for readibility than anything else) * Incremental progress: 1. clear up contracts 2. type the reward function and prompt function * Code shuffling and expansion of tracker, accelerator config args to beyond wandb * More small additions Add tensorboard logging function Remove wandb logging function for now Consolidate the data that get's thrown to the logging function Add README * Formatting * Formatting * Remove print statement Make tensorboard tracking the sole tracking for the training example * 1. start of testing 2. more refactoring 3. start of docstrings 4. parameter rename * Basic Tests Formatting * Docs according to the norm * Doocs, credits and rename file * docs and corrections * Put example config to respectable state * Add recent run params * Correct the name of the library * Move requirements to EXTRAS * - Add license banners - Guard import of DDPO functions with if_diffusers_available - doc strings for output types * Add snippet to pull weights from huggingface + banner * Test if passes on CI/CD * Minor refactor * Test dummy unet * Possible fix for randomly disappearing attribute * Shuffling arrangement in hopes of meeting memory requirements * Proper Names * Appease windows memory allocator issues for the cpu device * Remove print statements * Update docs/source/ddpo_trainer.mdx Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Update docs/source/ddpo_trainer.mdx Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Add docstrings and correct url * Spelling and grammar * Add more documentation and commandline parsing for example script * Markdown synatx correction * Revert accidentally committed file and put the correct one * More docs * Remove subclassing and add docs for leftoover subclassing * Put back subclassing * Reward metadata and more docs * Remove save_load_save flag * Grammar * Update trl/trainer/ddpo_trainer.py Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * Update tests/test_ddpo_trainer.py Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * Update setup.py Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * Update examples/scripts/stable_diffusion_tuning.py Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * Edits to the readme for DDPO * Renamed modelling_sd_base to modeling_sd_base * Insert try and catch for bitsandbytes import * Change to smaller model * Correct tolerance for floating point comparison * Remove dummy unet and move to check is isfinite * 1. Expand interface to ensure other Stable Diffusion pipelines could be covered 2. remove extra identification * 1. Remove most of the asserts except for one and add value error 2. Remove default run name * Remove progress bar * Docs * Put back progress bar * 1. Revert progress bar deletion completely 2. grammar 3. relocate line * Experiment * Remove experiment parts and format properly * Change formatting and edit info in docs * Grammar * Refactor out most of nitty gritty of loading/saving from trainer to example model Readme addition * Docs additions * 1. Proper formatting fr the test file 2. incorporatioon of pull frm hub if fails try local 3. doc strings for interface 4. highlight in the trainer, that this is only ready fr sd pipelines * Resources for before and after * Attempt at embedding images * Post testing example script * Consistent naming and document edits in light of new args * Remove resources and add CDN links in html in doc file --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>	2023-08-21 19:24:52 +02:00
Robert Dargavel Smith	e547c392f9	Remove obsolete layer_norm_names parameter and add peft>=0.3.0 to requirements (#366 ) * remove obsolete layer_norm_names parameter * remove obsolete parameter layer_norm_names and add peft>=0.3.0 to requirements * make style - oops * typo	2023-05-15 16:08:11 +02:00
Younes Belkada	e954fa00e5	[core] remove `wandb` dependency (#92 ) * remove `wandb` dependency * make the logging framework agnostic * `tensorboard` support * modify func name * fix default value	2023-01-19 15:17:01 +01:00
Leandro von Werra	c322d8e7a7	Relax requirements (#66 ) * relax requirements * relax requirements	2022-12-30 11:51:20 +01:00
Younes Belkada	b1279004e7	`accelerate` integration (#58 ) * working v1 * add `accelerate` on requirements * add `accelerate` on `setup.py` * add `datasets` on `setup.py` * small updates - add docstring on most functions - correct logging * rm unneeded file * replace with `generate` * Update trl/trainer/accelerate_ppo.py Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * correct return * add dataloader support * add `wandb` to `setup.py` * refactor - remove `_build_dataset` method - change name to `PPOTrainer` * test * fix test * rename file * refactor * remove unneeded device assignment * fix correct device assignment * standardize docstrings * add `wandb` on `dev` * fix slow convergence - random init seems to converge much faster * oops * revert fix * revert patch * remove unneeded reshape * add input safety checker * refactor - added comments on example - fixes CI test - rewards should be a list of tensors - clearer error messages - remove build model method - refactor log stats method Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * refactor - added `PPOConfig` class - docstring on `LengthSampler` - fix test - gather rewards when logging - unwrap model when calling generate * some refactor * remove unneeded hack * adapt dataset * fix test * remove rollout * remove timing * remove `shuffle=True` * remove `LengthSampler` from trainer * refactor * remove text length sampler args from config * change collate_fn * fix silent bug * rename * move file * refactor base trainer * fix collate * final bug Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2022-12-30 09:27:25 +01:00
younesbelkada	d588fced02	remove `nbdev` from requirements	2022-12-16 17:00:39 +00:00
younesbelkada	dfb864dcde	refactor - move notebooks to `examples/notebooks`` - removed `_nbdev`file - refactored `gpt2.py` to make it work with more recent `transformers` - update `requirements` to add recent `transformers`	2022-12-16 16:58:29 +00:00
Leandro von Werra	52910d3bf1	Dynamic input sizes (#35 ) * change ppo input from tensor to list of tensors for varying shapes * update readme example with new input type * update docs * add listification of tensors need for new API * replace nans in tensors for wandb compatibility * add `listify_batch` helper function for backwards compatibility * update sentiment example with new api * update docs * update library * ignore wandb artifacts * update requirements * run experiment * replace respond to batch with generate * add experiment * update docs * fix action * fix action	2022-05-15 18:16:25 +02:00
dependabot[bot]	002a72e5fb	chore(deps): bump jupyterlab from 2.0.1 to 2.2.10 Bumps [jupyterlab](https://github.com/jupyterlab/jupyterlab) from 2.0.1 to 2.2.10. - [Release notes](https://github.com/jupyterlab/jupyterlab/releases) - [Changelog](https://github.com/jupyterlab/jupyterlab/blob/master/CHANGELOG.md) - [Commits](https://github.com/jupyterlab/jupyterlab/compare/@jupyterlab/vdom@2.0.1...@jupyterlab/application-top@2.2.10) --- updated-dependencies: - dependency-name: jupyterlab dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2022-01-01 16:31:56 +00:00
leandro	919a1eb5ca	replace simpletransformers training and use datasets for data loading	2022-01-01 15:45:40 +01:00
Vladimir Blagojevic	78ba8f6af0	Upgrade to HF transformers 4.3.2	2021-02-24 16:15:07 -05:00
leandro	033a3159fb	chore: bump up wandb version in requirements	2020-05-12 20:53:07 +02:00
leandro	611d5b47af	feat: add gpt2 control experiment notebook	2020-04-22 18:36:16 +02:00
leandro	fdd95b9b9f	feat: add requirements (module and repo)	2020-03-29 13:54:56 +02:00

26 Commits