354 Commits

Author SHA1 Message Date
28bba8c6b1 Added SFT LoRA notebook (#4244)
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
2025-10-20 11:24:54 +02:00
8e2d5516ca Add accuracy reward (#4270)
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
2025-10-15 18:01:07 -06:00
56cb6ccf76 Fix typo in Colab link (#4276) 2025-10-14 18:51:17 +02:00
49c8f14b06 Add Qwen3-VL notebooks (SFT, GRPO) (#4275)
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-14 18:45:01 +02:00
bcd059a384 Remove obsolete research_projects directory (#4243)
Co-authored-by: behroozazarkhalili <ermiaazarkhalili>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
2025-10-10 10:15:47 -05:00
c38cb69ec7 🧘 Enhance markdown style (#4235) 2025-10-09 13:49:44 -05:00
65eb45c32b Apply style and revert change in sft_video_llm example (#4214) 2025-10-06 13:07:18 -06:00
ae6837f8d4 Removed tokenizer/processor creation from example scripts (#4211) 2025-10-06 18:40:18 +02:00
251fdb228a 📌 Pin vLLM version (#4122) 2025-09-23 08:02:30 -06:00
3c8d7209f1 👁️ Add VLM support to RLOO trainer (#4067)
Co-authored-by: behroozazarkhalili <ermiaazarkhalili>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
2025-09-18 21:54:06 -06:00
45e59f77ea ⌨️ Pin num2words (#4094)
Co-authored-by: sergiopaniego <sergiopaniegoblanco@gmail.com>
2025-09-16 08:48:09 -06:00
4bd4acf172 🏞️ Context Parallelism benchmark guide (#4075)
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-09-16 08:46:12 -06:00
78f1a928ce 🗑️ Remove deprecated AlignPropTrainer, DDPOTrainer and IterativeSFTTrainer (#4068) 2025-09-15 09:56:41 -06:00
9955ee7eaa 🐳 Docker update + Simplify Jobs doc (#3931)
Co-authored-by: sergiopaniego <sergiopaniegoblanco@gmail.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2025-09-13 18:35:55 -06:00
e8b8499f1f Remove redundant 'None' from docstrings (#4058) 2025-09-11 08:16:34 +02:00
a647e5a78a 🗜 Hotfix: avoid passing quantization_config=None (#4019) 2025-09-09 14:50:15 -06:00
af82b38482 ⚖️ Remove average_tokens_across_devices default replacement (#4039) 2025-09-09 07:39:12 -06:00
1b799a23c1 🥓 [docs] add CP docs (#3994)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
2025-09-08 21:46:22 -06:00
c9484b161f Align docstring parameters with function definitions (#4017) 2025-09-07 10:40:09 +02:00
d1bf56020d ⚖️ Add vLLM server mode and VLM support to OnlineDPOTrainer (#3783)
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
2025-09-05 16:58:49 -06:00
0c69fd2867 👷 Added Kernels on the Hub x TRL guide (#3969)
Co-authored-by: vb <vaibhavs10@gmail.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-09-04 15:37:02 +02:00
208e9f7df7 📏 torch_dype to dtype everywhere (#4000)
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
2025-09-03 15:45:37 -06:00
8aa0eed816 ℹ️ Validate examples on xpu (#3897)
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-08-29 10:56:57 -07:00
e7b37d4e8d 🔥 [Refactor] RLOOTrainer (#3801)
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com>
2025-08-29 09:27:28 -06:00
0c91515b58 🧭 HF jobs x TRL guide (#3890)
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
2025-08-26 21:44:29 -07:00
251c0488c8 📦 Wrapping the main execution code to avoid multi-processing issues from vLLM (#3932)
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
2025-08-21 12:45:13 -07:00
48d7ecc67b 🗑️ Deprecate setup_chat_format (#3929) 2025-08-20 14:06:23 -07:00
8793a46760 🧾 Use logger.warning instead of warnings.warn (#3923)
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2025-08-20 09:20:09 -07:00
0227d68e50 🌓 SFTTrainer for VLM: Support for prompt-completion data (#3907) 2025-08-18 16:46:17 -07:00
44e6c153a5 🔮 Native VLM support for SFTTrainer (#3862)
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2025-08-12 20:43:00 -07:00
cb95323429 👋 Remove --bf16 value in scripts (#3869) 2025-08-07 12:25:36 -07:00
2fb7090231 👁️ From AutoModelForVision2Seq to AutoModelForImageTextToText (#3836) 2025-08-07 08:00:16 -07:00
17393b8c82 🌺 OpenAI GPT OSS & Harmony support (#3848)
Co-authored-by: Shirin Yamani <75791599+shirinyamani@users.noreply.github.com>
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
2025-08-05 09:44:59 -07:00
3ae60cd1b4 Add GSPO script examples (VLM/LLM) (#3810) 2025-07-30 20:07:23 -06:00
25ce0f31ae 🐙 Add MPO VLM example script (#3799) 2025-07-29 20:52:32 -06:00
fcd3e0fd15 🌋 [GRPO] add support for pixel_attention_mask (SmolVLM2) and image_sizes (LLaVa-Next) (#3760)
Co-authored-by: sergiopaniego <sergiopaniego@users.noreply.huggingface.co>
Co-authored-by: sergiopaniego <sergiopaniegoblanco@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
2025-07-28 16:28:29 -06:00
2f4cb38f28 📐 Fix CI and GeometricMixtureWrapper (#3779) 2025-07-26 16:15:08 -06:00
a043fd74a3 Add uv scripts headers (#3767) 2025-07-25 07:48:40 -07:00
56f4201db6 👁️ [GRPO] Add VLM training capabilities to the trainer (#3072) 2025-07-22 20:31:08 -07:00
dffd1acb94 👋 Remove --bf16 flag from training scripts (#3724)
Co-authored-by: Shirin Yamani <75791599+shirinyamani@users.noreply.github.com>
2025-07-11 18:20:15 -07:00
43e6b24e70 Remove deprecated processor.tokenizer (#3720) 2025-07-11 15:46:34 -06:00
ab331bfd56 Update dpo_vlm.py (#3629)
Co-authored-by: Shirin Yamani <75791599+shirinyamani@users.noreply.github.com>
2025-06-24 13:56:34 +02:00
b40c959c00 fixing num_processes (#3637) 2025-06-24 13:42:58 +02:00
ed9b78a5f7 🗳️ Remove logging_steps parameter from for simpler setup (#3612) 2025-06-18 13:52:21 +02:00
FT
8a235a9b71 Fix Typo in Documentation and Notebook; Improve Library Installation Comment (#3593) 2025-06-15 16:46:41 +02:00
3d077fd3de Add support for IterableDataset in DPO Trainer (#3559)
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2025-06-12 13:06:34 +02:00
1314aac502 ℹ️ Unify autocast behavior to torch.autocast and make it cover XPU (#3541)
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
2025-06-10 09:13:00 +02:00
c7e3f096a5 [GKD] fix the gkd script (#3497) 2025-05-26 20:22:15 +02:00
31bf3f9244 Fix typo (#3489) 2025-05-24 13:24:15 +02:00
45f4c58832 ✌️ Add support for FSDP2 (#3317)
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-05-06 08:29:11 +02:00