Implements DeLoRA: "Decoupling Angles and Strength in Low-rank
Adaptation" (https://huggingface.co/papers/2503.18225).
Similar to DoRA, DeLoRA decouples the angular learning from the
adaptation strength, but it also allows to limit the norm of the change.
This way, DeLoRA promises to reduce the risk of catastrophic forgetting
and to be more robust to hyper-parameter settings such as the learning
rate.
Resolves#2809
Some models like Gemma3 apply a scalar to the embedding output. It needs
to be taken into account when using trainable tokens or LoRA applied to
the embedding layer.
Implements the paper "Exploring Sparsity for Parameter Efficient Fine
Tuning Using Wavelets" (https://arxiv.org/abs/2505.12532).
WaveFT enables fine-grained control over the number of trainable
parameters by directly learning a sparse set of coefficients in the
wavelet domain of residual matrices. Experiments show that it works well
in the text-to-image generation space.
This PR adds support for Arrow, a modular routing mechanism for LoRA experts introduced here, as well as the refinement method GenKnowSub, proposed in our ACL 2025 Main Conference paper. GenKnowSub enhances Arrow by subtracting a general-domain LoRA from task-specific ones prior to routing, leading to improved generalisation and modularity.
This PR migrates Activated LoRA (aLoRA) support from a standalone Github (see above) to PEFT itself.
Note there is also an active PR for vLLM inference support for Activated LoRA: vllm-project/vllm#19710 . There are also collections of aLoRA models on huggingface (in the ibm-granite org), note that these preexisting models run off of the standalone github repo and will be updated to work with this new PEFT feature if merged.
Description of changes: Activated LoRA is a modification of the LoRA architecture to "activate" the adapter weights only on tokens coming after a specified invocation_string. This fact makes it so that KV values for the string coming before the activation matches KV values for the base model. This allows KV cache for the input to be interchangeable between the base model and adapter model, and allows for major speedups in inference pipelines (e.g. agentic pipelines) that want to use both base models and adapter models. See the paper for detailed exploration of use cases and further elaboration.
Other notes:
The crux of the changes are really in layer.py. Everything else is simply managing the alora_offsets quantity which defines where the weights start to be activated. This is determined by scanning input strings for the invocation_string defined in the aLoraConfig.
I believe that aLoRA really only makes sense for CausalLMs, hence I've only implemented this for that model type.
Merging doesn't make sense for aLoRA adapters since the weights are not universally applied to all tokens.
I used the LoRA code as a starting point, but did not implement various seemingly extra features in that code.
As of now, invocation_string should probably start and end with special tokens, to avoid tokenizer issues at the boundary. Open to suggestions on how to make this more general if needed.
---------
Co-authored-by: githubnemo <githubnemo@users.noreply.github.com>
Add MiSS, an evolution of Bone, from https://arxiv.org/abs/2409.15371.
MiSS will replace Bone, which is now deprecated. A script to convert Bone
checkpoints to MiSS checkpoints is included.
* REFAC Update tokenizer parameter to processing_class in SFTTrainer instances across multiple examples
* REFAC Replace tokenizer parameter with processing_class in Trainer instances across documentation and examples
* Refactor tokenizer parameter to processing_class in various examples
- Updated the Trainer initialization in corda_finetuning.py to use processing_class instead of tokenizer.
- Changed the execution_count to null in image_classification_peft_lora.ipynb.
- Modified the tokenizer parameter to processing_class in image_classification_peft_lora.ipynb.
- Adjusted the tokenizer parameter to processing_class in peft_bnb_whisper_large_v2_training.ipynb.
- Updated the README.md in lorafa_finetune to reflect the change from tokenizer to processing_class in Trainer initialization.
* REFAC Update tokenizer parameter to processing_class in Seq2SeqTrainer instantiation
* REFAC Replace tokenizer parameter with processing_class in README and notebook examples
Besides fixes, includes an example script that uses
`hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4`
---------
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
Add new PEFT method C³A (Circular Convolution Adaptation).
From "Parameter-Efficient Fine-Tuning via Circular Convolution":
https://arxiv.org/abs/2407.19342
This is a follow up to #2464 and issue #2441.
Entails documentation for RandLora and slightly updated example usage in the model.py docstring.
Also adds RandLoRA to method comparison.
---------
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>