frozenleaves/peft - peft - Gitea: Git for Me

mirror of https://github.com/huggingface/peft.git synced 2025-10-20 15:33:48 +08:00

Author	SHA1	Message	Date
Massimo Bini	2813b9c4bf	FEAT Add DeLoRA (#2780 ) Implements DeLoRA: "Decoupling Angles and Strength in Low-rank Adaptation" (https://huggingface.co/papers/2503.18225). Similar to DoRA, DeLoRA decouples the angular learning from the adaptation strength, but it also allows to limit the norm of the change. This way, DeLoRA promises to reduce the risk of catastrophic forgetting and to be more robust to hyper-parameter settings such as the learning rate.	2025-10-17 16:24:46 +02:00
Shantanu Gupta	1a1f97263d	CHORE Replace deprecated torch_dtype with dtype (#2837 ) Note: Diffusers is left as is for now, might need an update later.	2025-10-16 14:59:09 +02:00
Benjamin Bossan	25f97e663a	ENH: Add set_requires_grad method (#2807 ) This PR adds the set_requires_grad method to PEFT models (both PeftModel and BaseTuner). As the name suggests, this is a method to set the requires_grad attribute of the specified PEFT adapters. For more general context, this is mostly relevant when dealing with multiple adapters. As is, users can already set the active adapter(s) with set_adapter, which automatically adjust the requires_grad attribute too, so that only the active adapters will have grads enabled. However, there can be situations where activity status and requires grad may differ. Right now, users would need to manually set requires_grad to deal with that, which is error prone (e.g. forgetting modules_to_save). This PR closes this gap in the API. As this functionality is quite general purpose, I added a set_requires_grad function to functional.py for easier integration. Note: The set_requires_grad method will raise an error when called with prompt learning methods like prompt tuning. This is because these methods don't have a universal base class (BaseTuner and BaseTunerLayer) that would allow to add this API. Moreover, they only support a single adapter at a time, hence there is not much need to have this method in the first place. A side effect of not supporting prompt learning is that on the PeftModel, we are free to allow set_requires_grad to accept more than one adapter, which would normally be difficult, because prompt learning only allows one adapter.	2025-10-13 16:54:16 +02:00
Benjamin Bossan	31989eab83	FIX DOC Add missing TOC entry for WaveFT (#2814 )	2025-10-08 17:01:52 +02:00
Ahmet Bilican	b0954e0daa	FEAT Add WaveFT method (#2560 ) Implements the paper "Exploring Sparsity for Parameter Efficient Fine Tuning Using Wavelets" (https://arxiv.org/abs/2505.12532). WaveFT enables fine-grained control over the number of trainable parameters by directly learning a sparse set of coefficients in the wavelet domain of residual matrices. Experiments show that it works well in the text-to-image generation space.	2025-10-07 10:58:49 +02:00
Benjamin Bossan	190f9873b1	CHORE DOC Migrate tips syntax (#2801 ) Discussed internally	2025-09-29 10:33:57 +02:00
Benjamin Bossan	7b2a5b1f02	DOC: Explain how to use multiple adapters at the same time (#2763 ) Explain how to use multiple adapters (e.g. 2 LoRA adapters) at the same time, as the API is not quite intuitive and there are some footguns around trainable parameters. This question has come up multiple times in the past (for recent examples, check #2749 and #2756). Thus it's a good idea to properly document this. --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-09-25 17:58:57 +02:00
Benjamin Bossan	f1b83646a6	The great deduplication (#2771 ) Deduplicate a lot of redundant code from PEFT method's model.py: merge_and_unload unload delete_adapter set_adapter enable_adapter_layers disable_adapter_layers _replace_module _unload_and_optionally_merge _mark_only_adapters_as_trainable _check_new_adapter_config _check_target_module_exists _prepare_adapter_config __getattr__ get_peft_config_as_dict (fully deleted) Related changes: A new module, functional.py, is introduced, which contains functions (just reimported from elsewhere) that can be useful for libraries that want to integrate PEFT. I would suggest that we should treat them as public API and thus guarantee backwards compatibility. I also deduplicated almost identical TRANSFORMERS_MODULES_TO_XXX_TARGET_MODULES_MAPPING constants by copying them from LoRA and only overriding a few values that differ. Moreover, some PEFT methods didn't have their own TRANSFORMERS_MODULES_TO_XXX_TARGET_MODULES_MAPPING but used the one from LoRA instead. They now each have their own constant, which is a copy from the one from LoRA.	2025-09-23 13:26:35 +02:00
Mohammadtaha Bagherifard	42db980676	Add Arrow + GenKnowSub to LoRA (#2644 ) This PR adds support for Arrow, a modular routing mechanism for LoRA experts introduced here, as well as the refinement method GenKnowSub, proposed in our ACL 2025 Main Conference paper. GenKnowSub enhances Arrow by subtracting a general-domain LoRA from task-specific ones prior to routing, leading to improved generalisation and modularity.	2025-09-08 14:21:37 +02:00
githubnemo	de60e88b6b	Fix missing code start in docs (#2768 ) There was a minor typo which a suggestion of PR #2609 which broke code formatting for one code sample. This is a simple fix for that.	2025-09-03 18:37:52 +02:00
Greenewald	293aea5df6	Support for Activated LoRA (#2609 ) This PR migrates Activated LoRA (aLoRA) support from a standalone Github (see above) to PEFT itself. Note there is also an active PR for vLLM inference support for Activated LoRA: vllm-project/vllm#19710 . There are also collections of aLoRA models on huggingface (in the ibm-granite org), note that these preexisting models run off of the standalone github repo and will be updated to work with this new PEFT feature if merged. Description of changes: Activated LoRA is a modification of the LoRA architecture to "activate" the adapter weights only on tokens coming after a specified invocation_string. This fact makes it so that KV values for the string coming before the activation matches KV values for the base model. This allows KV cache for the input to be interchangeable between the base model and adapter model, and allows for major speedups in inference pipelines (e.g. agentic pipelines) that want to use both base models and adapter models. See the paper for detailed exploration of use cases and further elaboration. Other notes: The crux of the changes are really in layer.py. Everything else is simply managing the alora_offsets quantity which defines where the weights start to be activated. This is determined by scanning input strings for the invocation_string defined in the aLoraConfig. I believe that aLoRA really only makes sense for CausalLMs, hence I've only implemented this for that model type. Merging doesn't make sense for aLoRA adapters since the weights are not universally applied to all tokens. I used the LoRA code as a starting point, but did not implement various seemingly extra features in that code. As of now, invocation_string should probably start and end with special tokens, to avoid tokenizer issues at the boundary. Open to suggestions on how to make this more general if needed. --------- Co-authored-by: githubnemo <githubnemo@users.noreply.github.com>	2025-09-03 18:26:50 +02:00
Rohan Jagtap	246fe4db7c	DOC Update BOFT conceptual guide (#2744 )	2025-08-26 11:23:27 +02:00
ppetrushkov	ce5c2044f1	FEAT RoAd: 2D Rotary Adaptation (#2678 ) Implements RoAd from https://arxiv.org/pdf/2409.00119 Supports mixed adapter batches.	2025-08-19 15:45:38 +02:00
Camilo Amadio	47961bb547	FIX Dataset download in docs and examples (#2708 ) Co-authored-by: Camilo Leonel Amadio <camilo.amadio@microchip.com>	2025-08-12 20:00:06 +02:00
Benjamin Bossan	a2c6612b12	FIX Multiple issues with target_parameters (#2710 ) There are a few issues with target_parameters that are fixed in this PR. Existing parametrizations When using target_parameters with LoRA, after the forward call finishes, the LoRA parametrization is removed. However, this also used to remove all other parametrizations on the same parameter, which is bad. With this PR, only the LoRA parametrization is removed. Module repr This PR also extends the __repr__ of lora.ParamWrapper to contain the parameter name, which makes it more useful. Extend testing Added a tiny gpt-oss model to the target_parameters test suite. Multiple LoRA adapters with target_parameters There is an issue when adding a second LoRA adapter with target_paramters, where this second adapter would not actually be applied correctly. The corresponding unit test was too lax to notice the bug. This is not easy to fix, so for now we forbid adding a second adapter with target_parameters. This is very strict but it's better than having silent errors. Although it was possible to fix that specific issue, the solution resulted in ever deeply nested adapters (i.e. with multiple .base_layer). This in turn results in those infixes to be part of the state_dict. But then we cannot load the individual adapters correctly, except if the model is restored in the exact same order as it was previously created. This is not normally a requirement in PEFT (e.g. I can create a model with two adapters and later decide to load only one of them). In the long run, we need to think about solutions that would allow this. It may require some form of normalization of the layers to prevent ever deeper nesting. Also, what is ugly right now is that, given that the LoRA lives on a module but actually targets one of possibly multiple parameter, the LoRA weights don't actually reference said parameter in any name. That means, purely from the state_dict, it is unclear which parameter a LoRA weight belongs to. Ideally, this should be encoded in the LoRA weight key.	2025-08-12 13:59:29 +02:00
Yao Matrix	e98a59ec2d	DOC Make docs more device agnostic (e.g. XPU) (#2728 ) Also adjusted some more examples. Signed-off-by: Yao, Matrix <matrix.yao@intel.com>	2025-08-08 12:06:22 +02:00
Benjamin Bossan	337be05f03	ENH: Adapter injection based on state_dict (#2637 ) Make it possible to inject the PEFT adapters based on a state_dict instead of the PEFT config. See https://github.com/huggingface/diffusers/issues/11874 for context. Description Right now, when creating a PEFT adapter like LoRA, the adapter layers are injected based on the PEFT config, most notably the entries in `target_modules`, but other arguments also play into this. Generally, this is a good approach, but it breaks down in some situations. For instance, in diffusers, we often have the situation that the checkpoint was created without PEFT/diffusers, thus there is no PEFT config, only the `state_dict`. To load these checkpoints in diffusers, the current approach is to reverse-engineer a valid PEFT config based on the keys in the `state_dict`. Unfortunately, this is error prone. Moreover, not every combination of `state_dict` keys can be easily expressed in a PEFT config through a combination of `target_modules`, `exclude_modules`, etc. Yes, in theory everything can be expressed by passing `target_module=<regex_pattern>`, but reverse-engineering such a regex correctly and efficiently is very hard (and thus currently not done). This PR implements a completely different approach to inject adapters. Instead of relying on the PEFT config to determine which layers to target, it takes the `state_dict` directly as the source of truth. This should allow to exactly match what is desired. Implementation details I took care to implement this change in a way that if no `state_dict` is passed, the exact same code path as previously is taken. The risk of breaking anything should thus be minimized. Technically, it is not necessary to pass the `state_dict`, we are only interested in the keys. I still called the argument `state_dict`, since that is typically what we have at this point, but this can be easily changed. I thought it might be a good idea, if the `state_dict` is used, to still check what modules would have been targeted if we had used the PEFT config. Then, the results are compared and a warning is given if they differ. This allows the user to see if the PEFT config is not correctly specified. While running some diffusers tests, I never encountered this warning, which is good. However, if we plan, for instance, to get rid of all the reverse engineering of the PEFT config in diffusers, it would make more sense to not give this warning. Caveats When the original LoRA model was using `target_parameters`, injecting from `state_dict` will not work correctly. The problem is that the `state_dict` looks the same, whether the module or a parameter was targeted. Therefore, we cannot correctly determine the user's intent. For now, what I decided to do is: 1. Always assume that `target_modules` is meant, as it's the far more common occurrence. 2. When we detect `target_parameters` while using `state_dict` for injection, we raise an error. 3. If we don't detect this, injection might just slip through, resulting in modules being targeted (if they are valid modules) instead of parameters. 4. Document that these two features don't work together. I think overall, this is not too concerning, as both features are rather niche and thus unlikely to be used in conjunction. Related changes While working on this PR, I made a couple of related, though not strictly necessary, changes: - Refactor tests in `test_low_level_api.py` to use pytest instead of unittest - Add default target modules for LoHa and LoKr (just copying LoRA) - Most PEFT method's model classes like `LoraModel` had an `__init__` that effectively just called `super()` with the same arguments. I removed these `__init__` methods.	2025-08-01 18:39:53 +02:00
J.L	bb4fb50e2b	FEAT Add MiSS as a replacement for Bone. (#2604 ) Add MiSS, an evolution of Bone, from https://arxiv.org/abs/2409.15371. MiSS will replace Bone, which is now deprecated. A script to convert Bone checkpoints to MiSS checkpoints is included.	2025-08-01 18:37:20 +02:00
githubnemo	92d65cafa5	Update extending vocab docs (#2669 ) - Recommends trainable tokens as first measure - Clarifies a few things about saving embeddings - Adds full-finetuning as an option of last resort --------- Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>	2025-07-25 13:09:00 +02:00
Quentin Gallouédec	04a5ed7b2f	DOC Fix error in code example (#2666 )	2025-07-24 12:13:41 +02:00
gapsong	a795199ffa	Update tokenizer parameter in sfttrainer across multiple examples (#2664 ) * REFAC Update tokenizer parameter to processing_class in SFTTrainer instances across multiple examples * REFAC Replace tokenizer parameter with processing_class in Trainer instances across documentation and examples * Refactor tokenizer parameter to processing_class in various examples - Updated the Trainer initialization in corda_finetuning.py to use processing_class instead of tokenizer. - Changed the execution_count to null in image_classification_peft_lora.ipynb. - Modified the tokenizer parameter to processing_class in image_classification_peft_lora.ipynb. - Adjusted the tokenizer parameter to processing_class in peft_bnb_whisper_large_v2_training.ipynb. - Updated the README.md in lorafa_finetune to reflect the change from tokenizer to processing_class in Trainer initialization. * REFAC Update tokenizer parameter to processing_class in Seq2SeqTrainer instantiation * REFAC Replace tokenizer parameter with processing_class in README and notebook examples	2025-07-23 15:30:28 +02:00
Benjamin Bossan	f3b97c3704	FEAT Allow LoRA to target nn.Parameter (#2638 ) Normally, nn.Parameter cannot be targeted with LoRA adapters. This can be problematic, e.g. when there are MoE layers that use nn.Parameter directly, or when there is nn.Linear but the weight is passed directly instead of calling forward (e.g. MHA). It would be possible to craft a solution involving a special LoRA layer for each of the modules that use nn.Parameter directly (e.g. lora.MHA) but that doesn't scale. This PR is implements a direct way to target nn.Parameter making use of torch.nn.utils.parametrize. Using the feature requires passing target_parameters to the LoraConfig. During the forward pass, when the parameter is acceessed, the LoRA weights are added to the weights while still ensuring that gradients flow correctly to the LoRA weights. Right now, only LoRA supports this feature. Moreover, it is not possible to target multiple parameters of the same module with the same adapter. A workaround is to use multiple adapters (i.e. with different names). --------- Co-authored-by: githubnemo <githubnemo@users.noreply.github.com>	2025-07-15 16:18:46 +02:00
kkb-code	a4f9334f12	FEAT Add SHiRA Adapters (#2584 ) Implements: Sparse High Rank Adapters Paper: https://arxiv.org/abs/2406.13175	2025-07-14 11:16:10 +02:00
Aochuan	e6577076bf	FEAT Add C3A (Circular Convolution Adaptation) (#2577 ) Add new PEFT method C³A (Circular Convolution Adaptation). From "Parameter-Efficient Fine-Tuning via Circular Convolution": https://arxiv.org/abs/2407.19342	2025-06-30 14:17:11 +02:00
Zeju Qiu	d936478f07	ENH Make OFT faster and more memory efficient (#2575 ) Make OFT faster and more memory efficient. This new version of OFT is not backwards compatible with older checkpoints and vice versa. To load older checkpoints, downgrade PEFT to 0.15.2 or lower.	2025-06-26 14:27:03 +02:00
Benjamin Bossan	1f4143a7ca	DOC Update README, contributing.md, GH templates (#2588 ) - Use a more up to date example code in the README - A section on transformers integration - Update devs to tag - Simplify issue template (did not seem useful in practice) - Update contribution guideline --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-06-18 18:11:59 +02:00
Quentin Gallouédec	b3130c9edb	Use HF Papers (#2542 ) Replaced all arxiv.org/pdf links with HF papers.	2025-05-27 13:48:53 +02:00
omahs	d5776f605d	fix typos (#2544 )	2025-05-26 17:35:55 +02:00
Paul Albert	6c48949930	Randlora documentation and some example usage (#2524 ) This is a follow up to #2464 and issue #2441. Entails documentation for RandLora and slightly updated example usage in the model.py docstring. Also adds RandLoRA to method comparison. --------- Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>	2025-05-07 14:40:55 +02:00
Daniel Socek	003cf20bcd	FEAT Add LoRA INC support (#2499 ) Add LoRA Adds Intel Neural Compressor. --------- Signed-off-by: Daniel Socek <daniel.socek@intel.com>	2025-04-28 18:39:37 +02:00
Aaron Chung	0c2bdbb11a	FEAT Add LoRA-FA to PEFT (#2468 ) Adds LoRA with frozen A (LoRA-FA) to PEFT. Paper: https://arxiv.org/abs/2308.03303	2025-04-10 10:53:19 +02:00
Benjamin Bossan	dfd82f73f0	Fix: Multiple PEFT methods have issues with models loaded in float16 or bfloat16 (#2433 ) As a user, it should be possible to manually cast the base model to a lower precision dtype, float16 or bfloat16, and still have the different PEFT methods work correctly. Currently, this is not the case for many PEFT methods, as can be replicated by the added tests. To understand the problem, it helps to take a step back. By default, PEFT will treat the adapter weights with high precision, i.e. with float32. When the base model is lower precision, the user needs to pass inputs in lower precision too, as otherwise self.base_layer(x) would fail. However, this low precision input clashes with the high precision adapter weights. The solution implemented in this PR is to cast the input to a higher dtype [1]. That way, the whole adapter operation is conducted in high precision. Only once that has finished will the final result be cast to the original dtype. This should lead to better results, but it may require more memory. Note that this is how LoRA is implemented, so the changes in this PR bring the other methods more in line with what LoRA does. If the user does not want the adapter to be in float32, they can always pass autocast_adapter_dtype=False when calling get_peft_model or PeftModel.from_pretrained. This is also tested. Besides adjusting the forward method to account for these changes, the merge and unmerge methods also often had to be adjusted, as they did not correctly account for the base model dtype. Now, those methods should always conserve the original dtype of the base model. Note that if, for whatever reason, the input casting in [1] is not desired, users can use the disable_input_dtype_casting context manager to disable it (more context information on this feature can be found in PR #2353). I updated the corresponding code to be agnostic to the specific PEFT method (beforehand, it was only for LoRA). Note that model.merge_adapter(safe_merge=True) did not work so far, even though the argument was documented it was not actually there. This is now fixed.	2025-04-04 12:06:17 +02:00
J.L	7dcdf7b311	DOC Update of Bone/Bat/DiSHA docs (#2312 )	2025-04-02 12:18:52 +02:00
omahs	e5e7b73fcf	Fix typos (#2447 )	2025-03-24 11:36:32 +01:00
Nick McCormick	42bb6b55cc	DOC Fix incorrect link in DeepSpeed docs (#2444 )	2025-03-24 11:23:37 +01:00
Benjamin Bossan	e79fdd78f6	DOC: Tip on how to merge with DeepSpeed ZeRO-3 (#2446 ) --------- Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>	2025-03-21 13:58:23 +01:00
Benjamin Bossan	2f063e6342	ENH: Extend the regex for rank/alpha pattern (#2419 ) Supersedes #2382 Right now, the regex used to match the keys passed for rank_pattern and alpha_pattern requires that either: 1. The module name is identical to the key 2. The module name having a prefix and then ending on the key This is restrictive, since it doesn't allow to disambiguate between all cases. E.g. if we have a model with these attributes: - model.foo - model.bar.foo We cannot currently target just model.foo. (We can already target only model.bar.foo by passing "bar.foo" as a key to the rank_pattern / alpha_pattern dict). This PR makes it possible to pass "^foo" as a key. This way, model.bar.foo is not targeted, as the key does not start with "foo". As a general rule for users, if they intend to have a full match, they should pass the full name of the module preceded by a ^. This is the least ambigious way. When running the test case with the old code, all the test cases with ^ will fail, which is fine, since ^ was not working anyway. At the same time, all test cases not using ^ pass, which means they are backwards compatible.	2025-03-13 12:53:27 +01:00
githubnemo	461f6426ef	Trainable Tokens: Support for Weight Tying (#2399 ) This is a follow-up PR of #2376 to add support for weight-tying. Some models, such as gpt2, tie the weights between the LM head and the input embeddings for various reasons. If we use the trainable tokens adapter, we're changing the result of the forward() of the input embeddings but we do not change the weights (unless we merge()). This means that the changes are not reflected in the tied weights, such as the LM head, leading to wrong results when training. The current approach is searching for tied layers and putting TrainableTokensLayer adapters on them as well but initialized to use the parameters from the embedding layer's TrainableTokensLayer. This is done via the tied_adapter argument of TrailableTokensLayer.__init__(). Notable other changes: * Implement weight-tying for encoder-decoder models Notably we are removing the duplication filter of `named_modules` when searching for the (tied) target modules since tied weights are by definition duplicates. * Implement embedding name inference It's now possible to let the adapter decide which is the input embedding layer based on the output of `model.get_input_embeddings()`. If that fails, the default is still `embed_tokens`. * Refactor getattr in AuxiliaryTrainingWrapper Before this change only the selection of the module that was supposed to have the queried attribute was given to the wrapper implemention (via `_{has,get}attr_wrapped`). Now the full `getattr()` call is done by the implementation. This change is motivated by the need for access to `embedding.weight` at certain times which, for `ModulesToSaveWrapper` is not a problem - but it is for `TrainableTokensWrapper` since the original module's weights differ from the current weights, at least potentially. What we do now is to merge the weights and return those when `embedding.weight` is accessed. No other attributes are currently forwarded. * initialization from buffers was broken since `persistent` flag was set too late (update() is called before setting the flag) * update from other BufferDict was broken since it was assumed that BufferDict was a mapping collection object. we cannot simply change it to a Mapping since it then will break pytorch code which assumes that modules are hashable. --------- Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>	2025-03-06 14:09:01 +01:00
githubnemo	f51203f3e4	Standalone Custom Tokens Tuner and integrated into LoRA (#2376 ) This change is based on the nifty addition of @marcusinthesky from #1541. When adding tokens or fine-tuning the representation of specific tokens we currently have little choice but to retrain the whole embedding matrix which can be huge and adds to the memory footprint (in RAM but also on disk). This method creates a sparse matrix of shape (n, embed_dim) where n is the number of tokens to be customized and only trains these few values. This change introduces two ways of using it: ``` peft_config = TrainableTokensConfig(target_modules=['embed_tokens'], token_indices=[0, 1, 2]) peft_model = get_peft_model(model, peft_config) ``` and with LoRA ``` peft_config = LoraConfig( target_modules='all-linear', trainable_token_indices={'embed_tokens': [0, 1, 2]}, ) peft_model = get_peft_model(model, peft_config) ``` Adding this feature to adapters other than LoRA should be relatively easy, mostly adding the `trainable_token_indices` config option and some debugging. To make this change it was necessary to change the `modules_to_save` infrastructure as combining this feature with LoRA is quite similar. This refactoring entailed moving most of the basic functionality of `ModulesToSave` to the `AuxiliaryTrainingWrapper` class. This also changes the logic how `modules_to_save` is loaded/saved from from the state dict, so there could still be bugs here. This implementation does not entail support for weight-tied layers yet. This will follow in a future change. --- Notable commits in this squash: * Use unload_and_optionally_merge_module protocol With `AuxiliaryTrainingWrapper` as abstraction it is probably a good idea to have support for `unload_and_optionally_merge_module`. Since the wrapper is more akin to a PEFT layer than a model the name semantics are fine and it does basically the same job. * trainable tokens is also trained in certain adapters Before, the assumption was that modules_to_save was the only thing that is trained alongside an adapter's parameters. Now there's also the token_adapter delta tokens via `NewTokensWrapper`. * Remove old modules_to_save handling This is now all handled via the `AuxiliaryTrainingWrapper`. * Fix modules_to_save module overwriting The state dict imlementation of ModulesToSaveWrapper was incorrect in that it did not include its own parameters, just the parameters it needs to overwrite in the end. I.e. if layer `lin1` is modules to save wrapped, `lin1.{weight,bias}` is saved and overwritten but `lin1.modules_to_save.<adpater_name>.[...]` is not saved. * Introduce a load key map for aux. train wrapper Before this change it was only possible to remove a key prefix from the wrapper's state dict (e.g., `modules_to_save.default.weight` -> `weight`); now it is possible to restore such reduced value by mapping the key back (i.e., `weight` -> `modules_to_save.default.weight`). * Replace sparse matrix with dense + index_copy This change is mostly because sparse matrices are not that beneficial in this case (at least not from what we can see right now) and they do not solve the problem of having to change the new tokens in-place to avoid outdated deltas when new token vectors are initialized randomly after loading the deltas. * Make peft_config.layers_to_transform optional Before this change the base tuner class was forcing this attribute to be present on the config class even though the attribute is not specified in the base config. * Implement missing key logic in `_set_trainable` Before this it was not checked if the targeted module by `modules_to_save` or `trainable_token_indices` existed or not (when used in conjunction with a PEFT method). In this case an error message similar to the `inject_adapter` error is raised when no module is found. --------- Co-authored-by: Marcus Gawronsky <marcus.g@myrunway.co.za> Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>	2025-02-26 16:51:45 +01:00
Benjamin Bossan	5e03d058b8	DOC: Explain uninitialized weights warning (#2369 ) Users sometimes get confused by the warning from transformers that some weights are uninitialized and need to be trained when they use models for classification. A recent example is #2367. Even though the warning does not come from PEFT, let's add a section to the docs to explain this warning, as the situation is a bit different here. --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-02-10 12:00:58 +01:00
Costa Shulyupin	40fe166446	DOC Fix links to BOFT in docs (#2365 ) Fixes #2364	2025-02-07 11:14:42 +01:00
Benjamin Bossan	eaab05e18d	Hotswap allow different alpha scalings and ranks (#2177 ) Hotswapping of LoRA adapters is already implemented, but when alpha scalings or ranks differ, this triggers recompilation of the model is compiled, which is inefficient. Users can now call prepare_model_for_compiled_hotswap to prevent recompilation in many cases (see the doc update for caveats).	2025-02-05 18:04:06 +01:00
Benjamin Bossan	db9dd3f4db	ENH Allow disabling input dtype casting for LoRA (#2353 ) Provides the disable_input_dtype_casting to prevent the input dtype to be cast during the forward call of a PEFT layer. Normally, the dtype of the weight and input need to match, which is why the dtype is cast. However, in certain circumustances, this is handled by forward hooks, e.g. when using layerwise casting in diffusers. In that case, PEFT casting the dtype interferes with the layerwise casting, which is why the option to disable it is given. Right now, this only supports LoRA. LoKr and LoHa don't cast the input dtype anyway. Therefore, the PEFT methods most relevant for diffusers are covered.	2025-02-04 17:32:29 +01:00
Costa Shulyupin	2825774d2d	DOC Rename link to PEFT Quicktour (#2358 ) The "Get started" link currently points to the "Quicktour" article, while "Get started" is also the first title in the TOC, causing confusion. Rename the "Get started" link to "Quicktour" to match the article and ensure consistency.	2025-02-03 17:36:28 +01:00
Costa Shulyupin	57126d5bdd	DOC Fix links to PEFT guides (#2357 )	2025-02-03 12:48:10 +01:00
githubnemo	9c25d9411a	Documentation & error checking for AdaLoRA timing (#2341 ) The documentation about how the AdaLoRA works was a bit unclear. Especially that `tfinal` is not a point in time but a duration. It was also possible to build schedules that never budget and therefore lead to an exception because the code does not expect this case (which is OK). We prevent such a scenario now by treating this configuration as invalid. (Issue #2337) We also check for `total_step` != None since this is also a guaranteed error in the code.	2025-01-24 18:54:17 +01:00
Benjamin Bossan	6538e56e13	TST: Update torch.compile tests and docs (#2332 ) We have tests to check if torch.compile works for various PEFT methods and "advanced" features (QLoRA, merging, ...). These tests are not run on a regular basis, but are triggered manually. As such, it was time to revisit them. So far, a few of these tests were marked as xfailing. All these tests are passing now. The reasons for this: - Presumably: New PyTorch version (I haven't checked older) - Loosening some tolerances - Remove a spurious argument added by torch.compile - Slightly adjust order of when torch.compile is called The docs have been updated to reflect these new findings.	2025-01-24 15:21:28 +01:00
jiqing-feng	6e30991e97	FEAT Add gptqmodel support (#2247 ) Add support for gptqmodel quantization. This is a replacement for auto-gptq. For now, both packages are supported, but since auto-gptq is no longer being developed, it will be deprecated and removed at some point in the future. --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com> Co-authored-by: LRL-ModelCloud <165116337+LRL-ModelCloud@users.noreply.github.com> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> Co-authored-by: ZX-ModelCloud <165115237+ZX-ModelCloud@users.noreply.github.com> Co-authored-by: LRL <lrl@lbx.dev> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-01-23 14:00:11 +01:00
Benjamin Bossan	1b9bcb200b	DOC Add entry to solve unknown config argument (#2340 ) There have been multiple issues and forum posts in the past asking about errors like: TypeError: LoraConfig.__init__() got an unexpected keyword argument ... This error can occur when the adapter that is being loaded is trained with a more recent PEFT version than the one currently being used. I thus added a section to the Troubleshooting part of our docs to describe the solutions. Note that we already added changes to PEFT in #2038 to make configs forward compatible. But since users who encounter this problem have, by definition, older PEFT versions, they don't benefit from this. --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-01-23 12:41:51 +01:00
Daniel Kleine	af637acc5b	DOC In-place modification through get_peft_model (#2313 )	2025-01-09 15:05:41 +01:00

1 2 3 4 5

222 Commits