frozenleaves/trl - trl - Gitea: Git for Me

mirror of https://github.com/huggingface/trl.git synced 2025-10-20 18:43:52 +08:00

Author	SHA1	Message	Date
Pramodith Ballapuram	8e2d5516ca	Add accuracy reward (#4270 ) Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>	2025-10-15 18:01:07 -06:00
Quentin Gallouédec	65eb45c32b	Apply style and revert change in `sft_video_llm` example (#4214 )	2025-10-06 13:07:18 -06:00
Sergio Paniego Blanco	ae6837f8d4	Removed tokenizer/processor creation from example scripts (#4211 )	2025-10-06 18:40:18 +02:00
Quentin Gallouédec	251fdb228a	📌 Pin vLLM version (#4122 )	2025-09-23 08:02:30 -06:00
Behrooz Azarkhalili	3c8d7209f1	👁️ Add VLM support to RLOO trainer (#4067 ) Co-authored-by: behroozazarkhalili <ermiaazarkhalili> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>	2025-09-18 21:54:06 -06:00
lewtun	45e59f77ea	⌨️ Pin num2words (#4094 ) Co-authored-by: sergiopaniego <sergiopaniegoblanco@gmail.com>	2025-09-16 08:48:09 -06:00
Quentin Gallouédec	78f1a928ce	🗑️ Remove deprecated `AlignPropTrainer`, `DDPOTrainer` and `IterativeSFTTrainer` (#4068 )	2025-09-15 09:56:41 -06:00
Quentin Gallouédec	9955ee7eaa	🐳 Docker update + Simplify Jobs doc (#3931 ) Co-authored-by: sergiopaniego <sergiopaniegoblanco@gmail.com> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>	2025-09-13 18:35:55 -06:00
Albert Villanova del Moral	e8b8499f1f	Remove redundant 'None' from docstrings (#4058 )	2025-09-11 08:16:34 +02:00
Quentin Gallouédec	a647e5a78a	🗜 Hotfix: avoid passing `quantization_config=None` (#4019 )	2025-09-09 14:50:15 -06:00
Quentin Gallouédec	af82b38482	⚖️ Remove `average_tokens_across_devices` default replacement (#4039 )	2025-09-09 07:39:12 -06:00
johann	d1bf56020d	⚖️ Add vLLM server mode and VLM support to OnlineDPOTrainer (#3783 ) Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>	2025-09-05 16:58:49 -06:00
Sergio Paniego Blanco	0c69fd2867	👷 Added Kernels on the Hub x TRL guide (#3969 ) Co-authored-by: vb <vaibhavs10@gmail.com> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2025-09-04 15:37:02 +02:00
Sergio Paniego Blanco	208e9f7df7	📏 `torch_dype` to `dtype` everywhere (#4000 ) Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>	2025-09-03 15:45:37 -06:00
Yao Matrix	8aa0eed816	ℹ️ Validate examples on xpu (#3897 ) Signed-off-by: Yao, Matrix <matrix.yao@intel.com> Signed-off-by: YAO Matrix <matrix.yao@intel.com>	2025-08-29 10:56:57 -07:00
Shirin Yamani	e7b37d4e8d	🔥 [Refactor] RLOOTrainer (#3801 ) Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com>	2025-08-29 09:27:28 -06:00
Sergio Paniego Blanco	0c91515b58	🧭 HF jobs x TRL guide (#3890 ) Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>	2025-08-26 21:44:29 -07:00
kaixuanliu	251c0488c8	📦 Wrapping the main execution code to avoid multi-processing issues from vLLM (#3932 ) Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>	2025-08-21 12:45:13 -07:00
Quentin Gallouédec	48d7ecc67b	🗑️ Deprecate `setup_chat_format` (#3929 )	2025-08-20 14:06:23 -07:00
Quentin Gallouédec	8793a46760	🧾 Use `logger.warning` instead of `warnings.warn` (#3923 ) Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>	2025-08-20 09:20:09 -07:00
Quentin Gallouédec	44e6c153a5	🔮 Native VLM support for `SFTTrainer` (#3862 ) Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>	2025-08-12 20:43:00 -07:00
Sergio Paniego Blanco	cb95323429	👋 Remove `--bf16` value in scripts (#3869 )	2025-08-07 12:25:36 -07:00
Quentin Gallouédec	2fb7090231	👁️ From `AutoModelForVision2Seq` to `AutoModelForImageTextToText` (#3836 )	2025-08-07 08:00:16 -07:00
Quentin Gallouédec	17393b8c82	🌺 OpenAI GPT OSS & Harmony support (#3848 ) Co-authored-by: Shirin Yamani <75791599+shirinyamani@users.noreply.github.com> Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>	2025-08-05 09:44:59 -07:00
Sergio Paniego Blanco	3ae60cd1b4	Add GSPO script examples (VLM/LLM) (#3810 )	2025-07-30 20:07:23 -06:00
Sergio Paniego Blanco	25ce0f31ae	🐙 Add MPO VLM example script (#3799 )	2025-07-29 20:52:32 -06:00
Kashif Rasul	fcd3e0fd15	🌋 [GRPO] add support for `pixel_attention_mask` (SmolVLM2) and `image_sizes` (LLaVa-Next) (#3760 ) Co-authored-by: sergiopaniego <sergiopaniego@users.noreply.huggingface.co> Co-authored-by: sergiopaniego <sergiopaniegoblanco@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>	2025-07-28 16:28:29 -06:00
Quentin Gallouédec	2f4cb38f28	📐 Fix CI and `GeometricMixtureWrapper` (#3779 )	2025-07-26 16:15:08 -06:00
Quentin Lhoest	a043fd74a3	Add uv scripts headers (#3767 )	2025-07-25 07:48:40 -07:00
Marc Kassubeck	56f4201db6	👁️ [GRPO] Add VLM training capabilities to the trainer (#3072 )	2025-07-22 20:31:08 -07:00
Quentin Gallouédec	dffd1acb94	👋 Remove `--bf16` flag from training scripts (#3724 ) Co-authored-by: Shirin Yamani <75791599+shirinyamani@users.noreply.github.com>	2025-07-11 18:20:15 -07:00
Qizhi Chen	43e6b24e70	Remove deprecated `processor.tokenizer` (#3720 )	2025-07-11 15:46:34 -06:00
Wei Han (Henry)	ab331bfd56	Update dpo_vlm.py (#3629 ) Co-authored-by: Shirin Yamani <75791599+shirinyamani@users.noreply.github.com>	2025-06-24 13:56:34 +02:00
Quentin Gallouédec	ed9b78a5f7	🗳️ Remove `logging_steps` parameter from for simpler setup (#3612 )	2025-06-18 13:52:21 +02:00
Tony Wu	3d077fd3de	Add support for `IterableDataset` in DPO Trainer (#3559 ) Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>	2025-06-12 13:06:34 +02:00
Kashif Rasul	c7e3f096a5	[GKD] fix the gkd script (#3497 )	2025-05-26 20:22:15 +02:00
Sergio Paniego Blanco	df737f99c1	🏷️ Fixed naming error in output_dir for Gemma 3 VLM script (#3297 )	2025-04-15 14:51:26 -07:00
Sergio Paniego Blanco	95b1a9f612	Add Fine-tuning a Multimodal Model Using SFT (Single or Multi-Image Dataset) guide to docs (#3235 ) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>	2025-04-10 09:33:41 +02:00
Quentin Gallouédec	9df19e8a75	📜 Fix license and copyrights (#3264 )	2025-04-08 15:22:58 -07:00
Sergio Paniego Blanco	26d86757a7	💎 Gemma 3 VLM SFT example script for single-image and multi-image (#3131 ) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>	2025-03-26 08:16:02 -07:00
Quentin Gallouédec	e4e5671e80	💎 Gemma 3 SFT example on Codeforces dataset (#3070 ) * Gemma 3 and padding free * remove padding free changes * style * update sft cli * update script * revert * style	2025-03-13 10:50:52 -07:00
Quentin Gallouédec	f69707dab4	🐈 Bye bye chat (#2934 ) * Bye chat * better warning * style error * Apply style fixes --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-02-23 19:18:28 +01:00
Quentin Gallouédec	8453017622	🧼 Upgrade ruff (#2938 )	2025-02-23 17:33:50 +01:00
Quentin Gallouédec	1d23ecc36f	©️ Update copyrights year (#2547 ) * happy new year * fix wandb import sort	2025-01-07 14:53:09 +01:00
Quentin Gallouédec	52d213173f	🚜 Use field in dataclasses (#2494 ) * in hh-rlhf-helpful-base * delete tokenize ds * dataset scripts * alignprop * judge tldr * ddpo * zen * sft video * literal to choices * chat * script args * alignprop * bco * better help format * cpo * ddpo * whether or not -> whether * dpo * dont set the possible values * `Optional[...]` to ... or `None` * xpo * gkd * kto * nash * online dpo * Fix typo in learning rate help message * orpo * more ... or `None` * model config * ppo * prm * reward * rloo * sft * online policy config * make style	2025-01-06 18:29:09 +01:00
Quentin Gallouédec	ca850be0a2	🕹️ CLI refactor (#2380 ) * Refactor main function in dpo.py * Update setup.py and add cli.py * Add examples to package data * style * Refactor setup.py file * Add new file t.py * Move dpo to package * Update MANIFEST.in and setup.py, refactor trl/cli.py * Add __init__.py to trl/scripts directory * Add license header to __init__.py * File moved instruction * Add Apache License and update file path * Move dpo.py to new location * Refactor CLI and DPO script * Refactor import structure in scripts package * env * rm config from chat arg * rm old cli * chat init * test cli [skip ci] * Add `datast_config_name` to `ScriptArguments` (#2440) * add missing arg * Add test cases for 'trl sft' and 'trl dpo' commands * Add sft.py script and update cli.py to include sft command * Move sft script * chat * style [ci skip] * kto * rm example config * first step on doc * see #2442 * see #2443 * fix chat windows * ©️ Copyrights update (#2454) * First changes * Other files * Finally * rm comment * fix nashmd * Fix example * Fix example [ci skip] * 💬 Fix chat for windows (#2443) * fix chat for windows * add some tests back * Revert "add some tests back" This reverts commit 350aef52f53f8cf34fccd7ad0f78a3dd63867e06. * 🆔 Add `datast_config` to `ScriptArguments` (#2440) * datast_config_name * Update trl/utils.py [ci skip] * sort import * typo [ci skip] * Trigger CI * Rename `dataset_config_name` to `dataset_config` * 🏎 Fix deepspeed preparation of `ref_model` in `OnlineDPOTrainer` (#2417) * Remove unused deepspeed code * add model prep back * add deepspeed even if it doesn't work * rm old code * Fix config name * Remove `make dev` in favor of `pip install -e .[dev]` * Update script paths and remove old symlink related things * Fix chat script path [ci skip] * style	2024-12-13 17:52:23 +01:00
Gaetan LOPEZ LATOUCHE	179ba53671	🐾 Process-supervised RM Trainer (#2127 ) * initial skeleton * tokenize fn * adding bos and eos to tokenization fn * prmtrainer * fixing small typo in tokenize * typo in input_ids and labels construction * numpy dimension * introduce the stepwise reward trainer * update markdown files * let user decide post step separator in config * doc post_step_separator * do not add post step_tokens to last step of the reasoning process * renaming prm to stepwisereward * formatting * fix tokenize kwargs * adapt test to the new post_token args * adding example script * fix small typo * add create_model_card and renaming * fixing booleans * Adding the new stepwise_preference instead of placeholders for datasets * formatting * Update docs/source/_toctree.yml Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update examples/scripts/stepwise_reward_modeling.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update trl/trainer/stepwise_reward_trainer.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update trl/trainer/stepwise_reward_trainer.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * update push to hub Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * step_separator can't be None Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * fix suggested typos * add citation * reformat doc * reordering init * push to hub prm800k * changing dataset in example * change dataset format to align with the sky is blue example * fix tokenization column names * fix num labels in openai example * add support for conversational dataset * remove training whitespace * replace tokenizer with processing class * Update docs/source/dataset_formats.mdx Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * remove openai_prm800k * Update trl/trainer/stepwise_reward_trainer.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update trl/trainer/stepwise_reward_trainer.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update docs/source/stepwise_reward_trainer.mdx Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update docs/source/stepwise_reward_trainer.mdx Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * renaming Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * renaming Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * minor renamings in docs * using prm800k instead of openai_prm800k * update num labels to 2 following the new format * changing doc examples to math examples * change reference to dataset_formats.mdx * changing dataset config in test * remove conversational dataset support * remove conv dataset support * fix bos token * fix scriptarguments in example * completion to completions * remove valuerror for step_separator inside steps * run precommit * remove conv dataset support Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * renaming zen dataset * remove unused printing * unknown label column * introduce the train on last step arg * _tokenize support train_on_last_step * incorporate train_on_last_step to tests * formatting * remove comments in trainer * Refactor `tokenize_row` * Update max_completion_length parameter in StepwiseRewardConfig * Collator * Update comment * Update type hint * fix table * Remove collator * don't need pad token id * add error back * max length args * use tokenizer arg * Update doc * label -> labels * fixing tokenization issues in tokenize row * correct labels for token classification * adding max_length to tokenize_row * reformat tests * adding tests for tokenize row * fixing typos in comments * update doc Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> * Add math_shepherd.py script for dataset processing * split the dataset * formatting * same evaluation method for the two training methods * adding filtering to example script * formatting * Add features to avoid casting labels to bool in dataset tokenization * Update docs/source/stepwise_reward_trainer.mdx [ci skip] * Add learning_rate parameter to StepwiseRewardConfig class * update doc * Remove unused setup_chat_format function * Fix warning message in stepwise_reward_modeling.py * Update logging steps in stepwise_reward_trainer.mdx * little doc change [ci skip] * Fix copyrights * fix space after copyrights * Update dataset loading in stepwise_reward_modeling.py * refine compute_accuracy and proper test * fix tests * style * renamings * renaming in init * doc renaming * fix sorting and tag * experiemental [ci skip] * trigger CI * other doc fix --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co>	2024-12-13 15:56:10 +01:00
Quentin Gallouédec	460e780265	👯 Standardize `model_args` (#2442 ) * `model_config` -> `model_args` * sort	2024-12-10 12:51:20 +01:00
Quentin Gallouédec	6a05feff02	🆔 Add `datast_config` to `ScriptArguments` (#2440 ) * datast_config_name * Update trl/utils.py [ci skip] * sort import * typo [ci skip] * Trigger CI * Rename `dataset_config_name` to `dataset_config`	2024-12-10 11:09:26 +01:00
Quentin Gallouédec	2f72f47191	💬 Fix chat for windows (#2443 ) * fix chat for windows * add some tests back * Revert "add some tests back" This reverts commit 350aef52f53f8cf34fccd7ad0f78a3dd63867e06.	2024-12-10 10:40:23 +01:00

1 2 3 4 5

225 Commits