Files
trl/docs/source/community_tutorials.md
2025-10-09 13:49:44 -05:00

9.1 KiB

Community Tutorials

Community tutorials are made by active members of the Hugging Face community who want to share their knowledge and expertise with others. They are a great way to learn about the library and its features, and to get started with core classes and modalities.

Language Models

Tutorials

Task Class Description Author Tutorial Colab
Reinforcement Learning [GRPOTrainer] Efficient Online Training with GRPO and vLLM in TRL Sergio Paniego Link Open In Colab
Reinforcement Learning [GRPOTrainer] Post training an LLM for reasoning with GRPO in TRL Sergio Paniego Link Open In Colab
Reinforcement Learning [GRPOTrainer] Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial Philipp Schmid Link Open In Colab
Reinforcement Learning [GRPOTrainer] RL on LLaMA 3.1-8B with GRPO and Unsloth optimizations Andrea Manzoni Link Open In Colab
Instruction tuning [SFTTrainer] Fine-tuning Google Gemma LLMs using ChatML format with QLoRA Philipp Schmid Link Open In Colab
Structured Generation [SFTTrainer] Fine-tuning Llama-2-7B to generate Persian product catalogs in JSON using QLoRA and PEFT Mohammadreza Esmaeilian Link Open In Colab
Preference Optimization [DPOTrainer] Align Mistral-7b using Direct Preference Optimization for human preference alignment Maxime Labonne Link Open In Colab
Preference Optimization [ORPOTrainer] Fine-tuning Llama 3 with ORPO combining instruction tuning and preference alignment Maxime Labonne Link Open In Colab
Instruction tuning [SFTTrainer] How to fine-tune open LLMs in 2025 with Hugging Face Philipp Schmid Link Open In Colab

Videos

Task Title Author Video
Instruction tuning Fine-tuning open AI models using Hugging Face TRL Wietse Venema
Instruction tuning How to fine-tune a smol-LM with Hugging Face, TRL, and the smoltalk Dataset Mayurji
⚠️ Deprecated features notice for "How to fine-tune a smol-LM with Hugging Face, TRL, and the smoltalk Dataset" (click to expand)

Warning

The tutorial uses two deprecated features:

  • SFTTrainer(..., tokenizer=tokenizer): Use SFTTrainer(..., processing_class=tokenizer) instead, or simply omit it (it will be inferred from the model).
  • setup_chat_format(model, tokenizer): Use SFTConfig(..., chat_template_path="Qwen/Qwen3-0.6B"), where chat_template_path specifies the model whose chat template you want to copy.

Vision Language Models

Tutorials

Task Class Description Author Tutorial Colab
Visual QA [SFTTrainer] Fine-tuning Qwen2-VL-7B for visual question answering on ChartQA dataset Sergio Paniego Link Open In Colab
Visual QA [SFTTrainer] Fine-tuning SmolVLM with TRL on a consumer GPU Sergio Paniego Link Open In Colab
SEO Description [SFTTrainer] Fine-tuning Qwen2-VL-7B for generating SEO-friendly descriptions from images Philipp Schmid Link Open In Colab
Visual QA [DPOTrainer] PaliGemma 🤝 Direct Preference Optimization Merve Noyan Link Open In Colab
Visual QA [DPOTrainer] Fine-tuning SmolVLM using direct preference optimization (DPO) with TRL on a consumer GPU Sergio Paniego Link Open In Colab
Object Detection Grounding [SFTTrainer] Fine tuning a VLM for Object Detection Grounding using TRL Sergio Paniego Link Open In Colab
Visual QA [DPOTrainer] Fine-Tuning a Vision Language Model with TRL using MPO Sergio Paniego Link Open In Colab
Reinforcement Learning [GRPOTrainer] Post training a VLM for reasoning with GRPO using TRL Sergio Paniego Link Open In Colab

Contributing

If you have a tutorial that you would like to add to this list, please open a PR to add it. We will review it and merge it if it is relevant to the community.