222 Commits

Author SHA1 Message Date
2813b9c4bf FEAT Add DeLoRA (#2780)
Implements DeLoRA: "Decoupling Angles and Strength in Low-rank
Adaptation" (https://huggingface.co/papers/2503.18225).

Similar to DoRA, DeLoRA decouples the angular learning from the
adaptation strength, but it also allows to limit the norm of the change.
This way, DeLoRA promises to reduce the risk of catastrophic forgetting
and to be more robust to hyper-parameter settings such as the learning
rate.
2025-10-17 16:24:46 +02:00
1a1f97263d CHORE Replace deprecated torch_dtype with dtype (#2837)
Note: Diffusers is left as is for now, might need an update later.
2025-10-16 14:59:09 +02:00
25f97e663a ENH: Add set_requires_grad method (#2807)
This PR adds the set_requires_grad method to PEFT models (both PeftModel
and BaseTuner). As the name suggests, this is a method to set the
requires_grad attribute of the specified PEFT adapters.

For more general context, this is mostly relevant when dealing with
multiple adapters. As is, users can already set the active adapter(s)
with set_adapter, which automatically adjust the requires_grad attribute
too, so that only the active adapters will have grads enabled. However,
there can be situations where activity status and requires grad may
differ. Right now, users would need to manually set requires_grad to
deal with that, which is error prone (e.g. forgetting modules_to_save).
This PR closes this gap in the API.

As this functionality is quite general purpose, I added a
set_requires_grad function to functional.py for easier integration.

Note: The set_requires_grad method will raise an error when called with
prompt learning methods like prompt tuning. This is because these
methods don't have a universal base class (BaseTuner and BaseTunerLayer)
that would allow to add this API. Moreover, they only support a single
adapter at a time, hence there is not much need to have this method in
the first place.

A side effect of not supporting prompt learning is that on the
PeftModel, we are free to allow set_requires_grad to accept more than
one adapter, which would normally be difficult, because prompt learning
only allows one adapter.
2025-10-13 16:54:16 +02:00
31989eab83 FIX DOC Add missing TOC entry for WaveFT (#2814) 2025-10-08 17:01:52 +02:00
b0954e0daa FEAT Add WaveFT method (#2560)
Implements the paper "Exploring Sparsity for Parameter Efficient Fine
Tuning Using Wavelets" (https://arxiv.org/abs/2505.12532).

WaveFT enables fine-grained control over the number of trainable
parameters by directly learning a sparse set of coefficients in the
wavelet domain of residual matrices. Experiments show that it works well
in the text-to-image generation space.
2025-10-07 10:58:49 +02:00
190f9873b1 CHORE DOC Migrate tips syntax (#2801)
Discussed internally
2025-09-29 10:33:57 +02:00
7b2a5b1f02 DOC: Explain how to use multiple adapters at the same time (#2763)
Explain how to use multiple adapters (e.g. 2 LoRA adapters) at the same
time, as the API is not quite intuitive and there are some footguns
around trainable parameters.

This question has come up multiple times in the past (for recent
examples, check #2749 and #2756). Thus it's a good idea to properly
document this.

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-09-25 17:58:57 +02:00
f1b83646a6 The great deduplication (#2771)
Deduplicate a lot of redundant code from PEFT method's model.py:

merge_and_unload
unload
delete_adapter
set_adapter
enable_adapter_layers
disable_adapter_layers
_replace_module
_unload_and_optionally_merge
_mark_only_adapters_as_trainable
_check_new_adapter_config
_check_target_module_exists
_prepare_adapter_config
__getattr__
get_peft_config_as_dict (fully deleted)

Related changes:

A new module, functional.py, is introduced, which contains functions
(just reimported from elsewhere) that can be useful for libraries that
want to integrate PEFT. I would suggest that we should treat them as
public API and thus guarantee backwards compatibility.

I also deduplicated almost identical
TRANSFORMERS_MODULES_TO_XXX_TARGET_MODULES_MAPPING constants by copying
them from LoRA and only overriding a few values that differ. Moreover,
some PEFT methods didn't have their own
TRANSFORMERS_MODULES_TO_XXX_TARGET_MODULES_MAPPING but used the one from
LoRA instead. They now each have their own constant, which is a copy
from the one from LoRA.
2025-09-23 13:26:35 +02:00
42db980676 Add Arrow + GenKnowSub to LoRA (#2644)
This PR adds support for Arrow, a modular routing mechanism for LoRA experts introduced here, as well as the refinement method GenKnowSub, proposed in our ACL 2025 Main Conference paper. GenKnowSub enhances Arrow by subtracting a general-domain LoRA from task-specific ones prior to routing, leading to improved generalisation and modularity.
2025-09-08 14:21:37 +02:00
de60e88b6b Fix missing code start in docs (#2768)
There was a minor typo which a suggestion of PR #2609 which broke code formatting for one code sample.

This is a simple fix for that.
2025-09-03 18:37:52 +02:00
293aea5df6 Support for Activated LoRA (#2609)
This PR migrates Activated LoRA (aLoRA) support from a standalone Github (see above) to PEFT itself.

Note there is also an active PR for vLLM inference support for Activated LoRA: vllm-project/vllm#19710 . There are also collections of aLoRA models on huggingface (in the ibm-granite org), note that these preexisting models run off of the standalone github repo and will be updated to work with this new PEFT feature if merged.

Description of changes: Activated LoRA is a modification of the LoRA architecture to "activate" the adapter weights only on tokens coming after a specified invocation_string. This fact makes it so that KV values for the string coming before the activation matches KV values for the base model. This allows KV cache for the input to be interchangeable between the base model and adapter model, and allows for major speedups in inference pipelines (e.g. agentic pipelines) that want to use both base models and adapter models. See the paper for detailed exploration of use cases and further elaboration.

Other notes:

The crux of the changes are really in layer.py. Everything else is simply managing the alora_offsets quantity which defines where the weights start to be activated. This is determined by scanning input strings for the invocation_string defined in the aLoraConfig.
    
I believe that aLoRA really only makes sense for CausalLMs, hence I've only implemented this for that model type.

Merging doesn't make sense for aLoRA adapters since the weights are not universally applied to all tokens.
    
I used the LoRA code as a starting point, but did not implement various seemingly extra features in that code.

As of now, invocation_string should probably start and end with special tokens, to avoid tokenizer issues at the boundary. Open to suggestions on how to make this more general if needed.

---------

Co-authored-by: githubnemo <githubnemo@users.noreply.github.com>
2025-09-03 18:26:50 +02:00
246fe4db7c DOC Update BOFT conceptual guide (#2744) 2025-08-26 11:23:27 +02:00
ce5c2044f1 FEAT RoAd: 2D Rotary Adaptation (#2678)
Implements RoAd from https://arxiv.org/pdf/2409.00119

Supports mixed adapter batches.
2025-08-19 15:45:38 +02:00
47961bb547 FIX Dataset download in docs and examples (#2708)
Co-authored-by: Camilo Leonel Amadio <camilo.amadio@microchip.com>
2025-08-12 20:00:06 +02:00
a2c6612b12 FIX Multiple issues with target_parameters (#2710)
There are a few issues with target_parameters that are fixed in this PR.

Existing parametrizations

When using target_parameters with LoRA, after the forward call finishes,
the LoRA parametrization is removed. However, this also used to remove
all other parametrizations on the same parameter, which is bad. With
this PR, only the LoRA parametrization is removed.

Module repr

This PR also extends the __repr__ of lora.ParamWrapper to contain the
parameter name, which makes it more useful.

Extend testing

Added a tiny gpt-oss model to the target_parameters test suite.

Multiple LoRA adapters with target_parameters

There is an issue when adding a second LoRA adapter with
target_paramters, where this second adapter would not actually be
applied correctly. The corresponding unit test was too lax to notice the
bug. This is not easy to fix, so for now we forbid adding a second
adapter with target_parameters. This is very strict but it's better than
having silent errors.

Although it was possible to fix that specific issue, the solution
resulted in ever deeply nested adapters (i.e. with multiple
.base_layer). This in turn results in those infixes to be part of the
state_dict. But then we cannot load the individual adapters correctly,
except if the model is restored in the exact same order as it was
previously created. This is not normally a requirement in PEFT (e.g. I
can create a model with two adapters and later decide to load only one
of them).

In the long run, we need to think about solutions that would allow this.
It may require some form of normalization of the layers to prevent ever
deeper nesting. Also, what is ugly right now is that, given that the
LoRA lives on a module but actually targets one of possibly multiple
parameter, the LoRA weights don't actually reference said parameter in
any name. That means, purely from the state_dict, it is unclear which
parameter a LoRA weight belongs to. Ideally, this should be encoded in
the LoRA weight key.
2025-08-12 13:59:29 +02:00
e98a59ec2d DOC Make docs more device agnostic (e.g. XPU) (#2728)
Also adjusted some more examples.

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-08-08 12:06:22 +02:00
337be05f03 ENH: Adapter injection based on state_dict (#2637)
Make it possible to inject the PEFT adapters based on a state_dict
instead of the PEFT config.

See https://github.com/huggingface/diffusers/issues/11874 for context.

Description

Right now, when creating a PEFT adapter like LoRA, the adapter layers
are injected based on the PEFT config, most notably the entries in
`target_modules`, but other arguments also play into this. Generally,
this is a good approach, but it breaks down in some situations. For
instance, in diffusers, we often have the situation that the checkpoint
was created without PEFT/diffusers, thus there is no PEFT config, only
the `state_dict`. To load these checkpoints in diffusers, the current
approach is to reverse-engineer a valid PEFT config based on the keys in
the `state_dict`.

Unfortunately, this is error prone. Moreover, not every combination of
`state_dict` keys can be easily expressed in a PEFT config through a
combination of `target_modules`, `exclude_modules`, etc. Yes, in theory
everything can be expressed by passing `target_module=<regex_pattern>`,
but reverse-engineering such a regex correctly and efficiently is very
hard (and thus currently not done).

This PR implements a completely different approach to inject adapters.
Instead of relying on the PEFT config to determine which layers to
target, it takes the `state_dict` directly as the source of truth. This
should allow to exactly match what is desired.

Implementation details

I took care to implement this change in a way that if no `state_dict` is
passed, the exact same code path as previously is taken. The risk of
breaking anything should thus be minimized.

Technically, it is not necessary to pass the `state_dict`, we are only
interested in the keys. I still called the argument `state_dict`, since
that is typically what we have at this point, but this can be easily
changed.

I thought it might be a good idea, if the `state_dict` is used, to still
check what modules would have been targeted if we had used the PEFT
config. Then, the results are compared and a warning is given if they
differ. This allows the user to see if the PEFT config is not correctly
specified. While running some diffusers tests, I never encountered this
warning, which is good. However, if we plan, for instance, to get rid of
all the reverse engineering of the PEFT config in diffusers, it would
make more sense to not give this warning.

Caveats

When the original LoRA model was using `target_parameters`, injecting
from `state_dict` will not work correctly. The problem is that the
`state_dict` looks the same, whether the module or a parameter was
targeted. Therefore, we cannot correctly determine the user's intent.

For now, what I decided to do is:

1. Always assume that `target_modules` is meant, as it's the far more
   common occurrence.
2. When we detect `target_parameters` while using `state_dict` for
   injection, we raise an error.
3. If we don't detect this, injection might just slip through, resulting
   in modules being targeted (if they are valid modules) instead of
   parameters.
4. Document that these two features don't work together.

I think overall, this is not too concerning, as both features are rather
niche and thus unlikely to be used in conjunction.

Related changes

While working on this PR, I made a couple of related, though not
strictly necessary, changes:

- Refactor tests in `test_low_level_api.py` to use pytest instead of
  unittest
- Add default target modules for LoHa and LoKr (just copying LoRA)
- Most PEFT method's model classes like `LoraModel` had an `__init__`
  that effectively just called `super()` with the same arguments. I
  removed these `__init__` methods.
2025-08-01 18:39:53 +02:00
J.L
bb4fb50e2b FEAT Add MiSS as a replacement for Bone. (#2604)
Add MiSS, an evolution of Bone, from https://arxiv.org/abs/2409.15371.

MiSS will replace Bone, which is now deprecated. A script to convert Bone
checkpoints to MiSS checkpoints is included.
2025-08-01 18:37:20 +02:00
92d65cafa5 Update extending vocab docs (#2669)
- Recommends trainable tokens as first measure
- Clarifies a few things about saving embeddings
- Adds full-finetuning as an option of last resort

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2025-07-25 13:09:00 +02:00
04a5ed7b2f DOC Fix error in code example (#2666) 2025-07-24 12:13:41 +02:00
a795199ffa Update tokenizer parameter in sfttrainer across multiple examples (#2664)
* REFAC Update tokenizer parameter to processing_class in SFTTrainer instances across multiple examples

* REFAC Replace tokenizer parameter with processing_class in Trainer instances across documentation and examples

* Refactor tokenizer parameter to processing_class in various examples

- Updated the Trainer initialization in corda_finetuning.py to use processing_class instead of tokenizer.
- Changed the execution_count to null in image_classification_peft_lora.ipynb.
- Modified the tokenizer parameter to processing_class in image_classification_peft_lora.ipynb.
- Adjusted the tokenizer parameter to processing_class in peft_bnb_whisper_large_v2_training.ipynb.
- Updated the README.md in lorafa_finetune to reflect the change from tokenizer to processing_class in Trainer initialization.

* REFAC Update tokenizer parameter to processing_class in Seq2SeqTrainer instantiation

* REFAC Replace tokenizer parameter with processing_class in README and notebook examples
2025-07-23 15:30:28 +02:00
f3b97c3704 FEAT Allow LoRA to target nn.Parameter (#2638)
Normally, nn.Parameter cannot be targeted with LoRA adapters. This can
be problematic, e.g. when there are MoE layers that use nn.Parameter
directly, or when there is nn.Linear but the weight is passed directly
instead of calling forward (e.g. MHA).

It would be possible to craft a solution involving a special LoRA layer
for each of the modules that use nn.Parameter directly (e.g. lora.MHA)
but that doesn't scale. This PR is implements a direct way to target
nn.Parameter making use of torch.nn.utils.parametrize.

Using the feature requires passing target_parameters to the LoraConfig.
During the forward pass, when the parameter is acceessed, the LoRA
weights are added to the weights while still ensuring that gradients
flow correctly to the LoRA weights.

Right now, only LoRA supports this feature. Moreover, it is not possible
to target multiple parameters of the same module with the same adapter.
A workaround is to use multiple adapters (i.e. with different names).

---------

Co-authored-by: githubnemo <githubnemo@users.noreply.github.com>
2025-07-15 16:18:46 +02:00
a4f9334f12 FEAT Add SHiRA Adapters (#2584)
Implements: Sparse High Rank Adapters

Paper: https://arxiv.org/abs/2406.13175
2025-07-14 11:16:10 +02:00
e6577076bf FEAT Add C3A (Circular Convolution Adaptation) (#2577)
Add new PEFT method C³A (Circular Convolution Adaptation).

From "Parameter-Efficient Fine-Tuning via Circular Convolution":
https://arxiv.org/abs/2407.19342
2025-06-30 14:17:11 +02:00
d936478f07 ENH Make OFT faster and more memory efficient (#2575)
Make OFT faster and more memory efficient. This new version of OFT is
not backwards compatible with older checkpoints and vice versa. To load
older checkpoints, downgrade PEFT to 0.15.2 or lower.
2025-06-26 14:27:03 +02:00
1f4143a7ca DOC Update README, contributing.md, GH templates (#2588)
- Use a more up to date example code in the README
- A section on transformers integration
- Update devs to tag
- Simplify issue template (did not seem useful in practice)
- Update contribution guideline

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-18 18:11:59 +02:00
b3130c9edb Use HF Papers (#2542)
Replaced all arxiv.org/pdf links with HF papers.
2025-05-27 13:48:53 +02:00
d5776f605d fix typos (#2544) 2025-05-26 17:35:55 +02:00
6c48949930 Randlora documentation and some example usage (#2524)
This is a follow up to #2464 and issue #2441.

Entails documentation for RandLora and slightly updated example usage in the model.py docstring.

Also adds RandLoRA to method comparison.

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2025-05-07 14:40:55 +02:00
003cf20bcd FEAT Add LoRA INC support (#2499)
Add LoRA Adds Intel Neural Compressor.

---------

Signed-off-by: Daniel Socek <daniel.socek@intel.com>
2025-04-28 18:39:37 +02:00
0c2bdbb11a FEAT Add LoRA-FA to PEFT (#2468)
Adds LoRA with frozen A (LoRA-FA) to PEFT.

Paper: https://arxiv.org/abs/2308.03303
2025-04-10 10:53:19 +02:00
dfd82f73f0 Fix: Multiple PEFT methods have issues with models loaded in float16 or bfloat16 (#2433)
As a user, it should be possible to manually cast the base model to a
lower precision dtype, float16 or bfloat16, and still have the different
PEFT methods work correctly. Currently, this is not the case for many
PEFT methods, as can be replicated by the added tests.

To understand the problem, it helps to take a step back. By default,
PEFT will treat the adapter weights with high precision, i.e. with
float32. When the base model is lower precision, the user needs to pass
inputs in lower precision too, as otherwise self.base_layer(x) would
fail. However, this low precision input clashes with the high precision
adapter weights.

The solution implemented in this PR is to cast the input to a higher
dtype [1]. That way, the whole adapter operation is conducted in high
precision. Only once that has finished will the final result be cast to
the original dtype. This should lead to better results, but it may
require more memory. Note that this is how LoRA is implemented, so the
changes in this PR bring the other methods more in line with what LoRA
does.

If the user does not want the adapter to be in float32, they can always
pass autocast_adapter_dtype=False when calling get_peft_model or
PeftModel.from_pretrained. This is also tested.

Besides adjusting the forward method to account for these changes, the
merge and unmerge methods also often had to be adjusted, as they did not
correctly account for the base model dtype. Now, those methods should
always conserve the original dtype of the base model.

Note that if, for whatever reason, the input casting in [1] is not
desired, users can use the disable_input_dtype_casting context manager
to disable it (more context information on this feature can be found in
PR #2353). I updated the corresponding code to be agnostic to the
specific PEFT method (beforehand, it was only for LoRA).

Note that model.merge_adapter(safe_merge=True) did not work so far, even
though the argument was documented it was not actually there. This is
now fixed.
2025-04-04 12:06:17 +02:00
J.L
7dcdf7b311 DOC Update of Bone/Bat/DiSHA docs (#2312) 2025-04-02 12:18:52 +02:00
e5e7b73fcf Fix typos (#2447) 2025-03-24 11:36:32 +01:00
42bb6b55cc DOC Fix incorrect link in DeepSpeed docs (#2444) 2025-03-24 11:23:37 +01:00
e79fdd78f6 DOC: Tip on how to merge with DeepSpeed ZeRO-3 (#2446)
---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2025-03-21 13:58:23 +01:00
2f063e6342 ENH: Extend the regex for rank/alpha pattern (#2419)
Supersedes #2382

Right now, the regex used to match the keys passed for rank_pattern and
alpha_pattern requires that either:

1. The module name is identical to the key
2. The module name having a prefix and then ending on the key

This is restrictive, since it doesn't allow to disambiguate between all
cases. E.g. if we have a model with these attributes:

- model.foo
- model.bar.foo

We cannot currently target just model.foo. (We can already target only
model.bar.foo by passing "bar.foo" as a key to the rank_pattern /
alpha_pattern dict).

This PR makes it possible to pass "^foo" as a key. This way,
model.bar.foo is not targeted, as the key does not start with "foo".

As a general rule for users, if they intend to have a full match, they
should pass the full name of the module preceded by a ^. This is the
least ambigious way.

When running the test case with the old code, all the test cases with ^
will fail, which is fine, since ^ was not working anyway. At the same
time, all test cases not using ^ pass, which means they are backwards
compatible.
2025-03-13 12:53:27 +01:00
461f6426ef Trainable Tokens: Support for Weight Tying (#2399)
This is a follow-up PR of #2376 to add support for weight-tying.

Some models, such as gpt2, tie the weights between the LM head and the input embeddings for various reasons. If we use the trainable tokens adapter, we're changing the result of the forward() of the input embeddings but we do not change the weights (unless we merge()). This means that the changes are not reflected in the tied weights, such as the LM head, leading to wrong results when training.

The current approach is searching for tied layers and putting TrainableTokensLayer adapters on them as well but initialized to use the parameters from the embedding layer's TrainableTokensLayer. This is done via the tied_adapter argument of TrailableTokensLayer.__init__().

Notable other changes:

* Implement weight-tying for encoder-decoder models

Notably we are removing the duplication filter of `named_modules` when searching for
the (tied) target modules since tied weights are by definition duplicates.

* Implement embedding name inference

It's now possible to let the adapter decide which is the input embedding layer based on the output
of `model.get_input_embeddings()`. If that fails, the default is still `embed_tokens`.

* Refactor getattr in AuxiliaryTrainingWrapper

Before this change only the selection of the module that was supposed to have the queried
attribute was given to the wrapper implemention (via `_{has,get}attr_wrapped`). Now the full
`getattr()` call is done by the implementation.

This change is motivated by the need for access to `embedding.weight` at certain times which,
for `ModulesToSaveWrapper` is not a problem - but it is for `TrainableTokensWrapper` since
the original module's weights differ from the current weights, at least potentially.

What we do now is to merge the weights and return those when `embedding.weight` is accessed.
No other attributes are currently forwarded.

* initialization from buffers was broken since `persistent` flag was set too late
  (update() is called before setting the flag)

* update from other BufferDict was broken since it was assumed that BufferDict was
  a mapping collection object. we cannot simply change it to a Mapping since it
  then will break pytorch code which assumes that modules are hashable.

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2025-03-06 14:09:01 +01:00
f51203f3e4 Standalone Custom Tokens Tuner and integrated into LoRA (#2376)
This change is based on the nifty addition of @marcusinthesky from #1541.

When adding tokens or fine-tuning the representation of specific tokens we currently have little choice but to retrain the whole embedding matrix which can be huge and adds to the memory footprint (in RAM but also on disk). This method creates a sparse matrix of shape (n, embed_dim) where n is the number of tokens to be customized and only trains these few values.

This change introduces two ways of using it:

```
peft_config = TrainableTokensConfig(target_modules=['embed_tokens'], token_indices=[0, 1, 2])
peft_model = get_peft_model(model, peft_config)
```

and with LoRA

```
peft_config = LoraConfig(
    target_modules='all-linear',
    trainable_token_indices={'embed_tokens': [0, 1, 2]},
)
peft_model = get_peft_model(model, peft_config)
```

Adding this feature to adapters other than LoRA should be relatively easy, mostly adding the `trainable_token_indices` config option and some debugging.

To make this change it was necessary to change the `modules_to_save` infrastructure as combining this feature with LoRA is quite similar. This refactoring entailed moving most of the basic functionality of `ModulesToSave` to the `AuxiliaryTrainingWrapper` class. This also changes the logic how `modules_to_save` is loaded/saved from from the state dict, so there could still be bugs here.

This implementation does not entail support for weight-tied layers yet. This will follow in a future change.

---

Notable commits in this squash:

* Use unload_and_optionally_merge_module protocol

With `AuxiliaryTrainingWrapper` as abstraction it is probably a good idea to
have support for `unload_and_optionally_merge_module`.

Since the wrapper is more akin to a PEFT layer than a model the name semantics
are fine and it does basically the same job.

* trainable tokens is also trained in certain adapters

Before, the assumption was that modules_to_save was the only thing that
is trained alongside an adapter's parameters. Now there's also the
token_adapter delta tokens via `NewTokensWrapper`.

* Remove old modules_to_save handling

This is now all handled via the `AuxiliaryTrainingWrapper`.

* Fix modules_to_save module overwriting

The state dict imlementation of ModulesToSaveWrapper was incorrect in that
it did not include its own parameters, just the parameters it needs to overwrite
in the end. I.e. if layer `lin1` is modules to save wrapped,
`lin1.{weight,bias}` is saved and overwritten but `lin1.modules_to_save.<adpater_name>.[...]`
is not saved.

* Introduce a load key map for aux. train wrapper

Before this change it was only possible to remove a key prefix from the wrapper's
state dict (e.g., `modules_to_save.default.weight` -> `weight`); now it is possible
to restore such reduced value by mapping the key back
(i.e., `weight` -> `modules_to_save.default.weight`).

* Replace sparse matrix with dense + index_copy

This change is mostly because sparse matrices are not that beneficial in this case
(at least not from what we can see right now) and they do not solve the problem
of having to change the new tokens in-place to avoid outdated deltas when new token
vectors are initialized randomly after loading the deltas.

* Make peft_config.layers_to_transform optional

Before this change the base tuner class was forcing this attribute
to be present on the config class even though the attribute is not
specified in the base config.

* Implement missing key logic in `_set_trainable`

Before this it was not checked if the targeted module by `modules_to_save` or `trainable_token_indices` existed
or not (when used in conjunction with a PEFT method). In this case an error message similar to the `inject_adapter`
error is raised when no module is found.

---------

Co-authored-by: Marcus Gawronsky <marcus.g@myrunway.co.za>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2025-02-26 16:51:45 +01:00
5e03d058b8 DOC: Explain uninitialized weights warning (#2369)
Users sometimes get confused by the warning from transformers that some
weights are uninitialized and need to be trained when they use models
for classification. A recent example is #2367.

Even though the warning does not come from PEFT, let's add a section to
the docs to explain this warning, as the situation is a bit different
here.
---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-02-10 12:00:58 +01:00
40fe166446 DOC Fix links to BOFT in docs (#2365)
Fixes #2364
2025-02-07 11:14:42 +01:00
eaab05e18d Hotswap allow different alpha scalings and ranks (#2177)
Hotswapping of LoRA adapters is already implemented, but when alpha
scalings or ranks differ, this triggers recompilation of the model is
compiled, which is inefficient. Users can now call
prepare_model_for_compiled_hotswap to prevent recompilation in many
cases (see the doc update for caveats).
2025-02-05 18:04:06 +01:00
db9dd3f4db ENH Allow disabling input dtype casting for LoRA (#2353)
Provides the disable_input_dtype_casting to prevent the input dtype to
be cast during the forward call of a PEFT layer.

Normally, the dtype of the weight and input need to match, which is why
the dtype is cast. However, in certain circumustances, this is handled
by forward hooks, e.g. when using layerwise casting in diffusers. In
that case, PEFT casting the dtype interferes with the layerwise casting,
which is why the option to disable it is given.

Right now, this only supports LoRA. LoKr and LoHa don't cast the input
dtype anyway. Therefore, the PEFT methods most relevant for diffusers
are covered.
2025-02-04 17:32:29 +01:00
2825774d2d DOC Rename link to PEFT Quicktour (#2358)
The "Get started" link currently points to the "Quicktour" article,
while "Get started" is also the first title in the TOC, causing
confusion.

Rename the "Get started" link to "Quicktour" to match the article and
ensure consistency.
2025-02-03 17:36:28 +01:00
57126d5bdd DOC Fix links to PEFT guides (#2357) 2025-02-03 12:48:10 +01:00
9c25d9411a Documentation & error checking for AdaLoRA timing (#2341)
The documentation about how the AdaLoRA works was a bit unclear.
Especially that `tfinal` is not a point in time but a duration.

It was also possible to build schedules that never budget and
therefore lead to an exception because the code does not expect
this case (which is OK). We prevent such a scenario now by treating
this configuration as invalid. (Issue #2337)

We also check for `total_step` != None since this is also a guaranteed error in the code.
2025-01-24 18:54:17 +01:00
6538e56e13 TST: Update torch.compile tests and docs (#2332)
We have tests to check if torch.compile works for various PEFT methods
and "advanced" features (QLoRA, merging, ...). These tests are not run
on a regular basis, but are triggered manually. As such, it was time to
revisit them.

So far, a few of these tests were marked as xfailing. All these tests
are passing now. The reasons for this:

- Presumably: New PyTorch version (I haven't checked older)
- Loosening some tolerances
- Remove a spurious argument added by torch.compile
- Slightly adjust order of when torch.compile is called

The docs have been updated to reflect these new findings.
2025-01-24 15:21:28 +01:00
6e30991e97 FEAT Add gptqmodel support (#2247)
Add support for gptqmodel quantization. This is a replacement for
auto-gptq.

For now, both packages are supported, but since auto-gptq is no longer
being developed, it will be deprecated and removed at some point in the
future.

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: LRL-ModelCloud <165116337+LRL-ModelCloud@users.noreply.github.com>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
Co-authored-by: ZX-ModelCloud <165115237+ZX-ModelCloud@users.noreply.github.com>
Co-authored-by: LRL <lrl@lbx.dev>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-01-23 14:00:11 +01:00
1b9bcb200b DOC Add entry to solve unknown config argument (#2340)
There have been multiple issues and forum posts in the past asking about
errors like:

TypeError: LoraConfig.__init__() got an unexpected keyword argument ...

This error can occur when the adapter that is being loaded is trained
with a more recent PEFT version than the one currently being used. I
thus added a section to the Troubleshooting part of our docs to describe
the solutions.

Note that we already added changes to PEFT in #2038 to make configs
forward compatible. But since users who encounter this problem have, by
definition, older PEFT versions, they don't benefit from this.

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-01-23 12:41:51 +01:00
af637acc5b DOC In-place modification through get_peft_model (#2313) 2025-01-09 15:05:41 +01:00