mirror of https://github.com/huggingface/peft.git synced 2025-10-20 15:33:48 +08:00

Files

Shantanu Gupta 1a1f97263d CHORE Replace deprecated torch_dtype with dtype (#2837 )

Note: Diffusers is left as is for now, might need an update later.

2025-10-16 14:59:09 +02:00

configs

Changes to support fsdp+qlora and dsz3+qlora (#1550 )

2024-03-13 15:23:09 +05:30

README.md

ENH Enable FSDP example for GPTQ quantized model (#2626 )

2025-07-07 11:08:03 +02:00

requirements_colab.txt

add example and update deepspeed/FSDP docs (#1489 )

2024-02-26 11:05:27 +05:30

requirements_xpu.txt

ENH Support XPU for SFT training script (#2709 )

2025-08-11 14:35:05 +02:00

requirements.txt

add example and update deepspeed/FSDP docs (#1489 )

2024-02-26 11:05:27 +05:30

run_peft_deepspeed.sh

MNT Update deprecated evaluation_strategy (#1664 )

2024-09-13 18:01:26 +02:00

run_peft_fsdp_gptq.sh

ENH Enable FSDP example for GPTQ quantized model (#2626 )

2025-07-07 11:08:03 +02:00

run_peft_fsdp.sh

MNT Update deprecated evaluation_strategy (#1664 )

2024-09-13 18:01:26 +02:00

run_peft_multigpu.sh

MNT Update deprecated evaluation_strategy (#1664 )

2024-09-13 18:01:26 +02:00

run_peft_qlora_deepspeed_stage3.sh

MNT Update deprecated evaluation_strategy (#1664 )

2024-09-13 18:01:26 +02:00

run_peft_qlora_fsdp.sh

MNT Update deprecated evaluation_strategy (#1664 )

2024-09-13 18:01:26 +02:00

run_peft.sh

MNT Update deprecated evaluation_strategy (#1664 )

2024-09-13 18:01:26 +02:00

run_unsloth_peft.sh

MNT Update deprecated evaluation_strategy (#1664 )

2024-09-13 18:01:26 +02:00

train.py

ENH Support XPU for SFT training script (#2709 )

2025-08-11 14:35:05 +02:00

utils.py

CHORE Replace deprecated torch_dtype with dtype (#2837 )

2025-10-16 14:59:09 +02:00

README.md

Supervised Fine-tuning (SFT) with PEFT

In this example, we'll see how to use PEFT to perform SFT using PEFT on various distributed setups.

Single GPU SFT with QLoRA

QLoRA uses 4-bit quantization of the base model to drastically reduce the GPU memory consumed by the base model while using LoRA for parameter-efficient fine-tuning. The command to use QLoRA is present at run_peft.sh.

Note:

At present, use_reentrant needs to be True when using gradient checkpointing with QLoRA else QLoRA leads to high GPU memory consumption.

Single GPU SFT with QLoRA using Unsloth

Unsloth enables finetuning Mistral/Llama 2-5x faster with 70% less memory. It achieves this by reducing data upcasting, using Flash Attention 2, custom Triton kernels for RoPE embeddings, RMS Layernorm & Cross Entropy Loss and manual clever autograd computation to reduce the FLOPs during QLoRA finetuning. Below is the list of the optimizations from the Unsloth blogpost mistral-benchmark. The command to use QLoRA with Unsloth is present at run_unsloth_peft.sh.

Optimization in Unsloth to speed up QLoRA finetuning while reducing GPU memory usage

Multi-GPU SFT with QLoRA

To speed up QLoRA finetuning when you have access to multiple GPUs, look at the launch command at run_peft_multigpu.sh. This example to performs DDP on 8 GPUs.

Note:

At present, use_reentrant needs to be False when using gradient checkpointing with Multi-GPU QLoRA else it will lead to errors. However, this leads to huge GPU memory consumption.

Multi-GPU SFT with LoRA and DeepSpeed

When you have access to multiple GPUs, it would be better to use normal LoRA with DeepSpeed/FSDP. To use LoRA with DeepSpeed, refer to the docs at PEFT with DeepSpeed.

Multi-GPU SFT with LoRA and FSDP

When you have access to multiple GPUs, it would be better to use normal LoRA with DeepSpeed/FSDP. To use LoRA with FSDP, refer to the docs at PEFT with FSDP.

Multi-GPU SFT with LoRA and FSDP for GPTQModel:

As in Multi-GPU SFT with LoRA and FSDP, we also support other quantization methods like GPTQModel. You may need to install GPTQModel > v3.0.0 or from source. Here is the launch command for reference: [run_peft_fsdp_gptq.sh]. For the --model_name_or_path argument, it is important to pass a model that is already quantized with GPTQModel, like "hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4".

Note: there is a bug in transformers v4.53.0 for this case, please skip this transformers version.

Tip

Generally try to upgrade to the latest package versions for best results, especially when it comes to bitsandbytes, accelerate, transformers, trl, and peft.