mirror of
https://github.com/huggingface/trl.git
synced 2025-10-20 10:03:51 +08:00
9.1 KiB
9.1 KiB
Community Tutorials
Community tutorials are made by active members of the Hugging Face community who want to share their knowledge and expertise with others. They are a great way to learn about the library and its features, and to get started with core classes and modalities.
Language Models
Tutorials
Task | Class | Description | Author | Tutorial | Colab |
---|---|---|---|---|---|
Reinforcement Learning | [GRPOTrainer ] |
Efficient Online Training with GRPO and vLLM in TRL | Sergio Paniego | Link | |
Reinforcement Learning | [GRPOTrainer ] |
Post training an LLM for reasoning with GRPO in TRL | Sergio Paniego | Link | |
Reinforcement Learning | [GRPOTrainer ] |
Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial | Philipp Schmid | Link | |
Reinforcement Learning | [GRPOTrainer ] |
RL on LLaMA 3.1-8B with GRPO and Unsloth optimizations | Andrea Manzoni | Link | |
Instruction tuning | [SFTTrainer ] |
Fine-tuning Google Gemma LLMs using ChatML format with QLoRA | Philipp Schmid | Link | |
Structured Generation | [SFTTrainer ] |
Fine-tuning Llama-2-7B to generate Persian product catalogs in JSON using QLoRA and PEFT | Mohammadreza Esmaeilian | Link | |
Preference Optimization | [DPOTrainer ] |
Align Mistral-7b using Direct Preference Optimization for human preference alignment | Maxime Labonne | Link | |
Preference Optimization | [ORPOTrainer ] |
Fine-tuning Llama 3 with ORPO combining instruction tuning and preference alignment | Maxime Labonne | Link | |
Instruction tuning | [SFTTrainer ] |
How to fine-tune open LLMs in 2025 with Hugging Face | Philipp Schmid | Link |
Videos
Task | Title | Author | Video |
---|---|---|---|
Instruction tuning | Fine-tuning open AI models using Hugging Face TRL | Wietse Venema | ![]() |
Instruction tuning | How to fine-tune a smol-LM with Hugging Face, TRL, and the smoltalk Dataset | Mayurji | ![]() |
⚠️ Deprecated features notice for "How to fine-tune a smol-LM with Hugging Face, TRL, and the smoltalk Dataset" (click to expand)
Warning
The tutorial uses two deprecated features:
SFTTrainer(..., tokenizer=tokenizer)
: UseSFTTrainer(..., processing_class=tokenizer)
instead, or simply omit it (it will be inferred from the model).setup_chat_format(model, tokenizer)
: UseSFTConfig(..., chat_template_path="Qwen/Qwen3-0.6B")
, wherechat_template_path
specifies the model whose chat template you want to copy.
Vision Language Models
Tutorials
Task | Class | Description | Author | Tutorial | Colab |
---|---|---|---|---|---|
Visual QA | [SFTTrainer ] |
Fine-tuning Qwen2-VL-7B for visual question answering on ChartQA dataset | Sergio Paniego | Link | |
Visual QA | [SFTTrainer ] |
Fine-tuning SmolVLM with TRL on a consumer GPU | Sergio Paniego | Link | |
SEO Description | [SFTTrainer ] |
Fine-tuning Qwen2-VL-7B for generating SEO-friendly descriptions from images | Philipp Schmid | Link | |
Visual QA | [DPOTrainer ] |
PaliGemma 🤝 Direct Preference Optimization | Merve Noyan | Link | |
Visual QA | [DPOTrainer ] |
Fine-tuning SmolVLM using direct preference optimization (DPO) with TRL on a consumer GPU | Sergio Paniego | Link | |
Object Detection Grounding | [SFTTrainer ] |
Fine tuning a VLM for Object Detection Grounding using TRL | Sergio Paniego | Link | |
Visual QA | [DPOTrainer ] |
Fine-Tuning a Vision Language Model with TRL using MPO | Sergio Paniego | Link | |
Reinforcement Learning | [GRPOTrainer ] |
Post training a VLM for reasoning with GRPO using TRL | Sergio Paniego | Link |
Contributing
If you have a tutorial that you would like to add to this list, please open a PR to add it. We will review it and merge it if it is relevant to the community.