1484 Commits

Author SHA1 Message Date
2813b9c4bf FEAT Add DeLoRA (#2780)
Implements DeLoRA: "Decoupling Angles and Strength in Low-rank
Adaptation" (https://huggingface.co/papers/2503.18225).

Similar to DoRA, DeLoRA decouples the angular learning from the
adaptation strength, but it also allows to limit the norm of the change.
This way, DeLoRA promises to reduce the risk of catastrophic forgetting
and to be more robust to hyper-parameter settings such as the learning
rate.
2025-10-17 16:24:46 +02:00
8d8aa0b716 Method comparison: LoRA that targets MLP modules (#2845)
The "LoRA Without Regret" blog
post (https://thinkingmachines.ai/blog/lora/) mentions that targeting
the MLP part of the transformer is more effective than targeting the
attention modules. This experiment tests this by targeting:

["gate_proj", "up_proj", "down_proj"]

instead of the default layers (["q_proj", "v_proj"]).

I chose a rank to match the parameter count we would get when targeting
the attention modules with rank 32, which is rank 10. Testing on my
machine, there is indeed a nice improvement in the test score:

| metric               | target attention | target MLP |
|----------------------|------------------|------------|
| test accuracy        | 48.2%            | 51.3%      |
| # trainable params   | 9175040          | 9461760    |
| peak memory reserved | 20.74 GB         | 23.02 GB   |

There is, however, also a marked increase in memory usage, despite
matching parameter count. Since the operations are different, this may
not be a surprise, but let's wait for the final verdict once this
experiment runs on our AWS instance.

Note: I also tested higher and lower ranks when targeting the MLP. The
effect on memory usage was negligible, but it did improve the score:

| metric             | rank 8  | rank 10 | rank 12  | rank 32  |
|--------------------|---------|---------|----------|----------|
| test accuracy      | 50.3%   | 51.3%   | 52.2%    | 54.8%    |
| # trainable params | 7569408 | 9461760 | 11354112 | 30277632 |

In the end, I chose only to add the rank 10 experiment to match the
number of trainable parameters.
2025-10-16 17:37:02 +02:00
Nir
182f4c945a ENH Add RWKV default target modules (#2810) 2025-10-16 16:30:51 +02:00
1a1f97263d CHORE Replace deprecated torch_dtype with dtype (#2837)
Note: Diffusers is left as is for now, might need an update later.
2025-10-16 14:59:09 +02:00
87b90f045e FIX TST Wrong attribute in LoftQ test (#2841)
This is to fix an oversight from #2797, where the LoftQ test was
sligthly refactored but one test was not updated accordingly.
2025-10-15 16:29:32 +02:00
086f187a4d FIX DoRA embed_scale support (#2839) 2025-10-15 12:07:51 +02:00
ec5a1b2ce6 FIX X-LoRA embed_scale support #2830 (#2831) 2025-10-14 15:54:15 +02:00
9b8cf2a0c3 FIX Handle embed scale for trainable tokens, LoRA (#2825)
Resolves #2809

Some models like Gemma3 apply a scalar to the embedding output. It needs
to be taken into account when using trainable tokens or LoRA applied to
the embedding layer.
2025-10-14 12:35:31 +02:00
6392935921 Add prompt tuning experiment with sample vocab (#2824)
A new initialization method was added to prompt tuning in #2815. This PR
adds an experiment config for this method to the MetaMathQA benchmark.

Testing locally, this got a test accuracy of 36%, compared to 25% with
random initialization.
2025-10-13 16:54:45 +02:00
25f97e663a ENH: Add set_requires_grad method (#2807)
This PR adds the set_requires_grad method to PEFT models (both PeftModel
and BaseTuner). As the name suggests, this is a method to set the
requires_grad attribute of the specified PEFT adapters.

For more general context, this is mostly relevant when dealing with
multiple adapters. As is, users can already set the active adapter(s)
with set_adapter, which automatically adjust the requires_grad attribute
too, so that only the active adapters will have grads enabled. However,
there can be situations where activity status and requires grad may
differ. Right now, users would need to manually set requires_grad to
deal with that, which is error prone (e.g. forgetting modules_to_save).
This PR closes this gap in the API.

As this functionality is quite general purpose, I added a
set_requires_grad function to functional.py for easier integration.

Note: The set_requires_grad method will raise an error when called with
prompt learning methods like prompt tuning. This is because these
methods don't have a universal base class (BaseTuner and BaseTunerLayer)
that would allow to add this API. Moreover, they only support a single
adapter at a time, hence there is not much need to have this method in
the first place.

A side effect of not supporting prompt learning is that on the
PeftModel, we are free to allow set_requires_grad to accept more than
one adapter, which would normally be difficult, because prompt learning
only allows one adapter.
2025-10-13 16:54:16 +02:00
61a11f9180 CI Testing transformers deprecations (#2817)
Check if PEFT triggers transformers FutureWarning or DeprecationWarning
by converting these warnings into failures.
2025-10-13 16:53:35 +02:00
2f9f759587 Add num_trainable_params column to gradio app (#2819)
While memory usage correlates with the number of trainable params, having this number directly
makes it easier to see that methods are using similar numbers of trainable params and outliers
can be inspected easily.
2025-10-13 14:36:58 +02:00
2410f458c8 TST Change bad random seed (#2829)
A seed was accidentally chosen that results in a test failing with XPU.

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-10-13 11:26:10 +02:00
879587f3db FIX bnb weights can be dequantized on CPU (#2820) 2025-10-10 12:29:54 +02:00
f8aca0a0c2 ENH Merging LoRAs supports negative weights (#2811)
When using add_weighted_adapter, so far, there was an implicit
assumption that all weights are positive. This PR allows negative
weights to be passed.

---------

Co-authored-by: Valentin Teutschbein <valentin.teutschbein@student.hpi.uni-potsdam.de>
2025-10-09 13:53:08 +02:00
e9f5707e3f FIX X-LoRA scaling storage and per token normalization (#2793) 2025-10-09 13:36:54 +02:00
2c29cf7936 ENH Add sample vocab init to PromptEmbedding (#2815) 2025-10-09 12:21:40 +02:00
31989eab83 FIX DOC Add missing TOC entry for WaveFT (#2814) 2025-10-08 17:01:52 +02:00
b0954e0daa FEAT Add WaveFT method (#2560)
Implements the paper "Exploring Sparsity for Parameter Efficient Fine
Tuning Using Wavelets" (https://arxiv.org/abs/2505.12532).

WaveFT enables fine-grained control over the number of trainable
parameters by directly learning a sparse set of coefficients in the
wavelet domain of residual matrices. Experiments show that it works well
in the text-to-image generation space.
2025-10-07 10:58:49 +02:00
f00d94a170 FIX Typo in PiSSA finetune README (#2812) 2025-10-06 11:54:53 +02:00
24aebeec21 CHORE: Ensure PEFT works with huggingface_hub 1.0.0 (#2808)
The reset_sessions function is removed but it's also no longer necessary
to call it for the purpose we used it.

Moreover, the deprecated use_auth_token argument is fully removed now,
so everywhere we used to pass it, it is now removed, unless a user
passes it explicitly.

Also, remove the deprecated local_dir_use_symlinks argument.
2025-10-02 13:21:02 +02:00
815956b9b8 CHORE Drop Python 3.9, add 3.13 (#2790) 2025-10-01 12:02:39 +02:00
ffa971a68c FIX LoftQ 8-bit bnb error, support XPU (#2797) 2025-10-01 12:02:14 +02:00
4469af57a0 DOC Some more TIP syntax migration (#2806)
Add `<Tip>`s converted to new syntax to docstrings.

---------

Co-authored-by: nemo <git@ningu.net>
2025-09-30 12:31:12 +02:00
e596112b7b Fix module target edge cases (#2773)
Resolves #2772

Fixes several edge cases with unusual layer names or target modules.

1. As #2772 stated, if "weight" is part of a layer name, it would be
treated incorrectly when creating the PEFT state_dict.
2. Similarly, when the adapter name itself is part of a layer name.

Some of these errors would pass silently, which is especially bad (e.g.
a weight not being loaded but no error raised).

I also added some tests that were not failing before, but to cover some
yet uncovered cases or to lay out some basic functionality.

While working on this, I also noticed that it was possible to target a
BaseTunerLayer with modules_to_save and trainable_token_indices (e.g.
the lora_A and lora_B nn.Linear would be replaced with
ModulesToSaveWrapper). I don't think this is ever desired, so we now
raise an error if this is detected.
2025-09-30 11:09:44 +02:00
046e32bf16 ENH: Store PEFT version in PEFT config file (#2782)
This PR adds the PEFT version to the adapter_config.json. This can be
useful in the future -- for instance when we change the state dict
format of a PEFT method, we can convert it in a backwards compatible way
based on the PEFT version being used. It can also be useful for
debugging by providing an easy way to see the PEFT version that was used
to train a PEFT adapter.

Notes:

In #2038, we made a change to PEFT configs to make it so that even if
new arguments are added to a config, it can still be loaded with older
PEFT versions (forward compatibility). Before that change, adding the
PEFT version would have been quite disruptive, as it would make all PEFT
configs incompatible with older PEFT versions. Said PR was included in
the 0.14.0 release from Dec 2024, so we can expect the vast majority of
PEFT users to use this version or a more recent one.

If the PEFT version is a dev version, the version tag is ambiguous.
Therefore, I added some code to try to determine the commit hash. This
works if users installed PEFT with git+...@<HASH>. Unit testing that the
function to determine the hash works with these types of installs is not
trivial. Therefore, I just patched the function to return a fixed hash.
I did, however, test it locally and it works:

python -m pip install
git+https://github.com/huggingface/diffusers.git@5e181eddfe7e44c1444a2511b0d8e21d177850a0
python -c "from peft.config import _get_commit_hash; print(_get_commit_hash('diffusers'))"

Also note that I tried to make the retrieval of the hash super robust by
adding a broad try ... except. If there is an error there, e.g. due to a
busted install path, we never want this to fail, but rather just accept
that the hash cannot be determined (we add @UNKNOWN in this case).

If users installed a dev version of PEFT in different way, e.g. using git
clone && pip install ., the commit hash will not be detected. I think
this is fine, I really don't want to start shelling out with git just
for this purpose.
2025-09-30 11:09:18 +02:00
190f9873b1 CHORE DOC Migrate tips syntax (#2801)
Discussed internally
2025-09-29 10:33:57 +02:00
6030f9160e ENH Model and layer status for auxiliary modules (#2762)
Right now, get_model_status() and get_layer_status() only report on
BaseTunerLayers, but it would be helpful if they could also report
auxiliary modules. This PR now includes those.

To facilitate this, a few attributes and methods were added to
AuxiliaryTrainingWrapper and subclasses to make them more similar to
BaseTunerLayer (e.g. the adapter_layer_names attribute). These
attributes and methods were assumed to be present in the code that
determines the model and layer status.
2025-09-25 18:00:11 +02:00
ae671baec9 FIX PEFT layers expose in_features, out_features (#2784)
Resolves #2783.

Most PEFT layers (BaseTunerLayers) expose the in_features and
out_features attributes. Therefore, other packages like diffusers may
expect this attribute to exist. However, there were a few PEFT methods
where these attributes were missing:

- LoHa
- LoKr
- LN Tuning
- Trainable Tokens

The layers of these methods now also expose the attributes.

Implementation

To avoid code duplication, I factored out the whole code block in LoRA
layers that extracts these attributes, since LoRA has the most
exhaustive list of checks. The new utility function has the exact same
functionality and can now be used by other PEFT methods.

I updated the four PEFT methods mentioned above to use this new
function, but I did not update PEFT methods that already handled it, as
there wasn't really a need (they check one or two layer types at most,
so there is little duplication).
2025-09-25 17:59:45 +02:00
7b2a5b1f02 DOC: Explain how to use multiple adapters at the same time (#2763)
Explain how to use multiple adapters (e.g. 2 LoRA adapters) at the same
time, as the API is not quite intuitive and there are some footguns
around trainable parameters.

This question has come up multiple times in the past (for recent
examples, check #2749 and #2756). Thus it's a good idea to properly
document this.

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-09-25 17:58:57 +02:00
530d7bbf1e Method comparison: Add MiSS result (#2740)
- default
- mini
- bat

Results are pretty close to the corresponding experiments with Bone,
which is what we expected.
2025-09-25 17:58:22 +02:00
9da3f77960 FIX Small fixes to warning, like missing spaces (#2788)
- The warning message was missing spaces between sentences.
- Added ' around strings for clarity
- For one warning, which extended another warning, put it at the start
  instead of the end, because the other warning can be quite long,
  leading to users missing the addition

For more context on this warning, see #2254
2025-09-25 17:58:07 +02:00
c15daaa5aa ENH Support XPU in DoRA FT example (#2700) 2025-09-25 17:57:41 +02:00
4f868bd7c9 Use technical user for CI runs (#2800)
Makes it easier to track rate limiting issues.
2025-09-24 17:49:16 +02:00
50329a7138 ENH Support for XPU in LM eval notebook (#2705) 2025-09-23 15:15:48 +02:00
f6b0a2dd43 ENH Small speedups to adapter injection (#2785)
See
https://github.com/huggingface/diffusers/issues/11816#issuecomment-3281290153

This PR implements two small improvements to the speed of adapter
injection. On a benchmark based on the linked issue, the first change
leads to a speedup of 21% and the second change of another 3%. It's not
that much, but as the changes don't make the code more complicated,
there is really no reason not to take them.

The optimizations don't add any functional change but are simply based
on not recomputing the same values multiple times. Therefore, unless I'm
missing something, they should strictly improve runtime.
2025-09-23 13:27:49 +02:00
f1b83646a6 The great deduplication (#2771)
Deduplicate a lot of redundant code from PEFT method's model.py:

merge_and_unload
unload
delete_adapter
set_adapter
enable_adapter_layers
disable_adapter_layers
_replace_module
_unload_and_optionally_merge
_mark_only_adapters_as_trainable
_check_new_adapter_config
_check_target_module_exists
_prepare_adapter_config
__getattr__
get_peft_config_as_dict (fully deleted)

Related changes:

A new module, functional.py, is introduced, which contains functions
(just reimported from elsewhere) that can be useful for libraries that
want to integrate PEFT. I would suggest that we should treat them as
public API and thus guarantee backwards compatibility.

I also deduplicated almost identical
TRANSFORMERS_MODULES_TO_XXX_TARGET_MODULES_MAPPING constants by copying
them from LoRA and only overriding a few values that differ. Moreover,
some PEFT methods didn't have their own
TRANSFORMERS_MODULES_TO_XXX_TARGET_MODULES_MAPPING but used the one from
LoRA instead. They now each have their own constant, which is a copy
from the one from LoRA.
2025-09-23 13:26:35 +02:00
b774fd901e TST Add missing configs to test_config.py (#2781)
The test_config.py tests were missing a few configs from recently added
PEFT methods. Those are now included. After adding those, it was
revealed that for C3A and trainable tokens, super().__post_init__() was
not being called. This is now done.
2025-09-19 17:52:58 +02:00
20a9829f76 FIX Account for rsLoRA scaling in set_scale (#2775) 2025-09-16 11:30:29 +02:00
1806c1651a CHORE Update and pin (commit hash) GitHub actions (#2779)
Some GH actions didn't have a pinned commit hash while others did
because of Zizmor. Now all actions have pinned commit hashes.
2025-09-11 11:12:23 +02:00
13fa0aea7e FIX: Wrong coupling between requires_grad and the active adapter (#2765)
Description

At the moment, we strongly couple the active adapter with
requires_grad=True. Concretely, when we call model.set_adapter(name), we
automatically assume that this adapter should not only be made active,
its requires_grad should also be set to True.

For the purpose of training PEFT models, this is fair. However, when
loading PEFT models for inference, this is not desired. Generally, for
inference, we don't need requires_grad=True, but as is, it is enabled.

Generally, this is not a severe bug, since in the inference code, we
don't perform any updates, thus we don't inadvertently update a weight
because it wrongly has requires_grad=True -- this is probably why it
went unnoticed so far. However, it could lead to worse runtime
performance and memory overhead when PyTorch records grads for those
parameters (which it shouldn't if called with torch.inference_mode, but
some users may forget to use this). Therefore, this bug is still worth
fixing.

Example

Example

With `modules_to_save`

A very basic example where the current PEFT fails:

import os
from transformers import AutoModelForCausalLM
from peft import LoraConfig, PeftModel, get_peft_model

model_id = "facebook/opt-125m"
path = "/tmp/peft/2759"
if not os.path.exists(path + "/adapter_model.safetensors"):
    model = AutoModelForCausalLM.from_pretrained(model_id)
    config = LoraConfig(target_modules=["q_proj", "v_proj"], modules_to_save=["lm_head"], r=8)
    model = get_peft_model(model, config)
    model.save_pretrained(path)
    del model

model = AutoModelForCausalLM.from_pretrained(model_id)
model = PeftModel.from_pretrained(model, path)

`modules_to_save` should not have grads enabled, but currently it does.

### With multiple adapters

There is also an issue when loading more than one adapter:

model = PeftModel.from_pretrained(...)
assert not any(p.requires_grad for p in model.parameters())  # works

So far, so good, the first adapter does not have `requires_grad`.

model.load_adapter(...)
assert not any(p.requires_grad for p in model.parameters())  # fails

The load_adapter call inadvertently sets requires_grad=True for the
weights of the _first_ adapter. The reason why this happens is because
when the second adapter is loaded, we call set_adapter with the first
adapter to ensure that it remains the activate adapter. However, due to
the coupling of active adapter and requires_grad, this would result in
setting requires_grad=True for the first adapter.

The PR relaxes this coupling by allowing to call set_adapter with an
additional argument, inference_mode. If set to True, the requires_grad
will not be enabled, even if the adapter is activated.

The example above would also fail for modules_to_save and trainable
tokens, not only for the LoRA/LoHa/... weights.

Still open bugs

The proposed solution is unfortunately not perfect. Right now, we do
pass inference_mode based on the PEFT config of the adapter being added,
which helps with the original issue described above. However, even this
is not absolutely correct, because inference_mode of the second adapter
does not necessarily have the same value as inference_mode of the first
adapter. To illustrate how this can go wrong, I added an xfailing test:

test_loading_model_requires_grad_set_correctly_switch_inference_mode

I believe that this use case is rarer than the ones described at the
beginning, so IMO it is okay to have this bug because we fix more common
bugs. However, LMK if you disagree.

Related to this, I noticed that many tests in
test_custom_models.TestRequiresGrad had code like this:

config0 = FooConfig(...)
peft_model = get_peft_model(MLP(), config0)
config1 = FooConfig(..., inference_mode=True)  # <==
peft_model.add_adapter("adapter1", config1)

This now fails because of the reason just given. I removed
inference_mode=True here and the tests pass again.

Note that the only reason why inference_mode=True was passed here is
because AdaLoRA cannot load 2 adapters in training mode and thus
requires this. Later PEFT methods without this restriction blindly
copied the AdaLoRA test. For those PEFT methods, I removed
inference_mode=True.

However, this also means that the AdaLoRA tests now fail. I thus marked
them as xfail.

To properly fix this bug, I think we would have to refactor the code to
isolate set_adapter (i.e. determining the active adapter) and setting
requires_grad into separate code paths, as they're orthogonal. Moreover,
these attributes are being set all over the place, which makes it hard
to reason about where these attributes are being changed. This should be
streamlined.

Making these changes while not breaking any existing code is not
trivial (or maybe impossible even). Therefore, I went the easier way for
the time being with this PR. Maybe a bigger refactor could be envisioned
for a version 1.0 release of PEFT.

Related changes

While working on this, I noticed that LNTuning was completely buggy when
calling set_adapter. This is now fixed.

Moreover, since I had to touch update_layer everywhere, I ensured that
they all take kwargs for consistency.
2025-09-08 19:49:29 +02:00
42db980676 Add Arrow + GenKnowSub to LoRA (#2644)
This PR adds support for Arrow, a modular routing mechanism for LoRA experts introduced here, as well as the refinement method GenKnowSub, proposed in our ACL 2025 Main Conference paper. GenKnowSub enhances Arrow by subtracting a general-domain LoRA from task-specific ones prior to routing, leading to improved generalisation and modularity.
2025-09-08 14:21:37 +02:00
ed5c6eaa1a Replace from_legacy_cache method with constructors (#2767)
Replace Cache.from_legacy_cache method with init.
2025-09-08 13:49:25 +02:00
92e15573ac CHORE Upgrade trufflehog GitHub action to 3.90.5 (#2770)
Maybe solves the trufflehog false positive, maybe not.
2025-09-08 13:47:02 +02:00
5ef8e85d1f FIX X-LoRA forward hook issue during generate (#2761)
There was an issue that forward hooks would accumulate during
generation, since one hook per forward step was being registered and
generate would call forward multiple times. This is already undesirable,
but to make it worse, only the last hook was removed, resulting in hooks
accumulating.
2025-09-08 13:46:31 +02:00
c81363bd4e Support dataclass model configs (#2778)
LeRobot uses dataclasses to manage policy configs. If we want to
support LeRobot policy fine-tuning it'd be easiest to support
these configs in `get_model_config`.

While it is possible to fix this on LeRobot's side (add a to_dict implementation to the config classes) I think it'd be cleaner to support it on our side since the cost is relatively low and dataclasses are getting more popular anyway.

Thanks @xliu0105 for raising this issue and proposing a fix.
2025-09-08 13:35:47 +02:00
5d97453235 FIX Deprecated key_cache attribute on Cache pt 2 (#2753)
In #2737, we fixed some code that relied on the deprecated attribute but
some was being missed, as it only runs on the nightly CI with multiple
GPUs. This PR fixes this.

Note that the original transformers code that this solution was based on
no longer exists, as transformers now initializes the cache lazily, so
pre-allocating the keys and values to the correct device is not
necessary. But since prefix tuning inserts "virtual" keys/values, we
still have to ensure the correct device in PEFT.

I have tested the failing tests locally and they pass.
2025-09-04 14:47:29 +02:00
2ea5377ee3 TST FIX Failing AutoAWQ test with torch 2.8 (#2752)
There is a failing AWQ test since torch 2.6 which is marked as xfail for
torch=2.7. However, now torch 2.8 is out and the test is still failing.
Therefore, the xfail now checks for torch>=2.7.

As AWQ is no longer being maintained, we should expect this situation to
deteriorate over time and eventually we'll have to remove it. But for
the time being, it still appears to mostly work, so I suggest we leave
it as is.
2025-09-03 19:25:05 +02:00
de60e88b6b Fix missing code start in docs (#2768)
There was a minor typo which a suggestion of PR #2609 which broke code formatting for one code sample.

This is a simple fix for that.
2025-09-03 18:37:52 +02:00
293aea5df6 Support for Activated LoRA (#2609)
This PR migrates Activated LoRA (aLoRA) support from a standalone Github (see above) to PEFT itself.

Note there is also an active PR for vLLM inference support for Activated LoRA: vllm-project/vllm#19710 . There are also collections of aLoRA models on huggingface (in the ibm-granite org), note that these preexisting models run off of the standalone github repo and will be updated to work with this new PEFT feature if merged.

Description of changes: Activated LoRA is a modification of the LoRA architecture to "activate" the adapter weights only on tokens coming after a specified invocation_string. This fact makes it so that KV values for the string coming before the activation matches KV values for the base model. This allows KV cache for the input to be interchangeable between the base model and adapter model, and allows for major speedups in inference pipelines (e.g. agentic pipelines) that want to use both base models and adapter models. See the paper for detailed exploration of use cases and further elaboration.

Other notes:

The crux of the changes are really in layer.py. Everything else is simply managing the alora_offsets quantity which defines where the weights start to be activated. This is determined by scanning input strings for the invocation_string defined in the aLoraConfig.
    
I believe that aLoRA really only makes sense for CausalLMs, hence I've only implemented this for that model type.

Merging doesn't make sense for aLoRA adapters since the weights are not universally applied to all tokens.
    
I used the LoRA code as a starting point, but did not implement various seemingly extra features in that code.

As of now, invocation_string should probably start and end with special tokens, to avoid tokenizer issues at the boundary. Open to suggestions on how to make this more general if needed.

---------

Co-authored-by: githubnemo <githubnemo@users.noreply.github.com>
2025-09-03 18:26:50 +02:00