Release: v0.6.2

fix doc typo (#1121 )
Correctly deal with ModulesToSaveWrapper when using Low-level API (#1112 )
2025-10-20 23:43:47 +08:00 · 2023-11-13 23:31:41 +05:30 · 2023-11-13 10:48:50 +01:00 · 2023-11-13 12:22:30 +05:30 · 2023-11-10 18:37:38 +01:00 · 2023-11-10 14:21:14 +01:00
31 changed files with 497 additions and 141 deletions
--- a/docker/peft-gpu/Dockerfile
+++ b/docker/peft-gpu/Dockerfile
@ -29,15 +29,6 @@ ENV PATH /opt/conda/envs/peft/bin:$PATH
 # Activate our bash shell
 RUN chsh -s /bin/bash
 SHELL ["/bin/bash", "-c"]
-# Activate the conda env and install transformers + accelerate from source
-RUN source activate peft && \
-    python3 -m pip install --no-cache-dir \
-    librosa \
-    "soundfile>=0.12.1" \
-    scipy \
-    git+https://github.com/huggingface/transformers \
-    git+https://github.com/huggingface/accelerate \
-    peft[test]@git+https://github.com/huggingface/peft

 # Stage 2
 FROM nvidia/cuda:12.2.2-devel-ubuntu22.04 AS build-image
@ -49,6 +40,18 @@ SHELL ["/bin/bash", "-c"]
 RUN source activate peft && \ 
    python3 -m pip install --no-cache-dir bitsandbytes optimum auto-gptq

+# Activate the conda env and install transformers + accelerate from source
+RUN source activate peft && \
+    python3 -m pip install -U --no-cache-dir \
+    librosa \
+    "soundfile>=0.12.1" \
+    scipy \
+    git+https://github.com/huggingface/transformers \
+    git+https://github.com/huggingface/accelerate \
+    peft[test]@git+https://github.com/huggingface/peft
+
+RUN pip freeze | grep transformers
+
 # Install apt libs
 RUN apt-get update && \
    apt-get install -y curl git wget && \
--- a/docs/source/conceptual_guides/ia3.mdx
+++ b/docs/source/conceptual_guides/ia3.mdx
@ -28,10 +28,13 @@ Being similar to LoRA, IA3 carries many of the same advantages:
 * Performance of models fine-tuned using IA3 is comparable to the performance of fully fine-tuned models.
 * IA3 does not add any inference latency because adapter weights can be merged with the base model.

-In principle, IA3 can be applied to any subset of weight matrices in a neural network to reduce the number of trainable 
-parameters. Following the authors' implementation, IA3 weights are added to the key, value and feedforward layers 
-of a Transformer model. Given the target layers for injecting IA3 parameters, the number of trainable parameters 
-can be determined based on the size of the weight matrices. 
+In principle, IA3 can be applied to any subset of weight matrices in a neural network to reduce the number of trainable
+parameters. Following the authors' implementation, IA3 weights are added to the key, value and feedforward layers
+of a Transformer model. To be specific, for transformer models, IA3 weights are added to the outputs of key and value layers, and to the input of the second feedforward layer
+in each transformer block.
+
+Given the target layers for injecting IA3 parameters, the number of trainable parameters
+can be determined based on the size of the weight matrices.


 ## Common IA3 parameters in PEFT
@ -43,10 +46,19 @@ As with other methods supported by PEFT, to fine-tune a model using IA3, you nee
 3. Wrap the base model with `get_peft_model()` to get a trainable `PeftModel`.
 4. Train the `PeftModel` as you normally would train the base model.

-`IA3Config` allows you to control how IA3 is applied to the base model through the following parameters: 
+`IA3Config` allows you to control how IA3 is applied to the base model through the following parameters:

 - `target_modules`: The modules (for example, attention blocks) to apply the IA3 vectors.
- `feedforward_modules`: The list of modules to be treated as feedforward layers in `target_modules`. While learned vectors are multiplied with 
-the output activation for attention blocks, the vectors are multiplied with the input for classic feedforward layers.
+- `feedforward_modules`: The list of modules to be treated as feedforward layers in `target_modules`. While learned vectors are multiplied with
+the output activation for attention blocks, the vectors are multiplied with the input for classic feedforward layers. Note that `feedforward_modules` must be a subset of `target_modules`.
 - `modules_to_save`: List of modules apart from IA3 layers to be set as trainable and saved in the final checkpoint. These typically include model's custom head that is randomly initialized for the fine-tuning task.

+## Example Usage
+
+For the task of sequence classification, one can initialize the IA3 config for a Llama model as follows:
+
+```py
+peft_config = IA3Config(
+    task_type=TaskType.SEQ_CLS, target_modules=["k_proj", "v_proj", "down_proj"], feedforward_modules=["down_proj"]
+)
+```
--- a/docs/source/task_guides/dreambooth_lora.mdx
+++ b/docs/source/task_guides/dreambooth_lora.mdx
@ -83,6 +83,7 @@ accelerate launch train_dreambooth.py \
  --output_dir=$OUTPUT_DIR \
  --train_text_encoder \
  --with_prior_preservation --prior_loss_weight=1.0 \
+  --num_dataloader_workers=1 \
  --instance_prompt="a photo of sks dog" \
  --class_prompt="a photo of dog" \
  --resolution=512 \
@ -101,6 +102,8 @@ accelerate launch train_dreambooth.py \
  --max_train_steps=800
 ```

+If you are running this script on Windows, you may need to set the `--num_dataloader_workers` to 0.
+
 ## Inference with a single adapter

 To run inference with the fine-tuned model, first specify the base model with which the fine-tuned LoRA weights will be combined:
@ -171,7 +174,7 @@ image.save("DESTINATION_PATH_FOR_THE_IMAGE")
 ## Multi-adapter inference

 With PEFT you can combine multiple adapters for inference. In the previous example you have fine-tuned Stable Diffusion on 
-some dog images. The pipeline created based on these weights got a name - `adapter_name="dog`. Now, suppose you also fine-tuned 
+some dog images. The pipeline created based on these weights got a name - `adapter_name="dog"`. Now, suppose you also fine-tuned 
 this base model on images of a crochet toy. Let's see how we can use both adapters. 

 First, you'll need to perform all the steps as in the single adapter inference example:
--- a/examples/lora_dreambooth/train_dreambooth.py
+++ b/examples/lora_dreambooth/train_dreambooth.py
@ -213,6 +213,10 @@ def parse_args(input_args=None):
        help="Bias type for Lora. Can be 'none', 'all' or 'lora_only', only used if use_lora and `train_text_encoder` are True",
    )

+    parser.add_argument(
+        "--num_dataloader_workers", type=int, default=1, help="Num of workers for the training dataloader."
+    )
+
    parser.add_argument(
        "--train_batch_size", type=int, default=4, help="Batch size (per device) for the training dataloader."
    )
@ -799,7 +803,7 @@ def main(args):
        batch_size=args.train_batch_size,
        shuffle=True,
        collate_fn=lambda examples: collate_fn(examples, args.with_prior_preservation),
-        num_workers=1,
+        num_workers=args.num_dataloader_workers,
    )

    # Scheduler and math around the number of training steps.
--- a/setup.py
+++ b/setup.py
@ -22,7 +22,7 @@ extras["test"] = extras["dev"] + ["pytest", "pytest-cov", "pytest-xdist", "param

 setup(
    name="peft",
-    version="0.6.0",
+    version="0.6.2",
    description="Parameter-Efficient Fine-Tuning (PEFT)",
    license_files=["LICENSE"],
    long_description=open("README.md", "r", encoding="utf-8").read(),
@ -63,19 +63,23 @@ setup(
 )

 # Release checklist
-# 1. Change the version in __init__.py and setup.py.
-# 2. Commit these changes with the message: "Release: VERSION"
-# 3. Add a tag in git to mark the release: "git tag VERSION -m 'Adds tag VERSION for pypi' "
-#    Push the tag to git: git push --tags origin main
-# 4. Run the following commands in the top-level directory:
+# 1. Change the version in __init__.py and setup.py to the release version, e.g. from "0.6.0.dev0" to "0.6.0"
+# 2. Check if there are any deprecations that need to be addressed for this release by seaching for "# TODO" in the code
+# 3. Commit these changes with the message: "Release: VERSION", create a PR and merge it.
+# 4. Add a tag in git to mark the release: "git tag -a VERSION -m 'Adds tag VERSION for pypi' "
+#    Push the tag to git:
+#      git push --tags origin main
+#    It is necessary to work on the original repository, not on a fork.
+# 5. Run the following commands in the top-level directory:
 #      python setup.py bdist_wheel
 #      python setup.py sdist
-# 5. Upload the package to the pypi test server first:
+#    Ensure that you are on the clean and up-to-date main branch (git status --untracked-files=no should not list any
+#    files and show the main branch)
+# 6. Upload the package to the pypi test server first:
 #      twine upload dist/* -r pypitest
-#      twine upload dist/* -r pypitest --repository-url=https://test.pypi.org/legacy/
-# 6. Check that you can install it in a virtualenv by running:
+# 7. Check that you can install it in a virtualenv by running:
 #      pip install -i https://testpypi.python.org/pypi peft
-# 7. Upload the final version to actual pypi:
+# 8. Upload the final version to actual pypi:
 #      twine upload dist/* -r pypi
-# 8. Add release notes to the tag in github once everything is looking hunky-dory.
-# 9. Update the version in __init__.py, setup.py to the new version "-dev" and push to master
+# 9. Add release notes to the tag on https://github.com/huggingface/peft/releases once everything is looking hunky-dory.
+# 10. Update the version in __init__.py, setup.py to the bumped minor version + ".dev0" (e.g. from "0.6.0" to "0.7.0.dev0")
--- a/src/peft/init.py
+++ b/src/peft/init.py
@ -17,7 +17,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-__version__ = "0.6.0"
+__version__ = "0.6.2"

 from .auto import (
    AutoPeftModel,
--- a/src/peft/import_utils.py
+++ b/src/peft/import_utils.py
@ -13,6 +13,10 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import importlib
+import importlib.metadata as importlib_metadata
+from functools import lru_cache
+
+import packaging.version


 def is_bnb_available() -> bool:
@ -28,9 +32,35 @@ def is_bnb_4bit_available() -> bool:
    return hasattr(bnb.nn, "Linear4bit")


-def is_auto_gptq_available() -> bool:
-    return importlib.util.find_spec("auto_gptq") is not None
+def is_auto_gptq_available():
+    if importlib.util.find_spec("auto_gptq") is not None:
+        AUTOGPTQ_MINIMUM_VERSION = packaging.version.parse("0.5.0")
+        version_autogptq = packaging.version.parse(importlib_metadata.version("auto_gptq"))
+        if AUTOGPTQ_MINIMUM_VERSION <= version_autogptq:
+            return True
+        else:
+            raise ImportError(
+                f"Found an incompatible version of auto-gptq. Found version {version_autogptq}, "
+                f"but only versions above {AUTOGPTQ_MINIMUM_VERSION} are supported"
+            )


 def is_optimum_available() -> bool:
    return importlib.util.find_spec("optimum") is not None
+
+
+@lru_cache()
+def is_torch_tpu_available(check_device=True):
+    "Checks if `torch_xla` is installed and potentially if a TPU is in the environment"
+    if importlib.util.find_spec("torch_xla") is not None:
+        if check_device:
+            # We need to check if `xla_device` can be found, will raise a RuntimeError if not
+            try:
+                import torch_xla.core.xla_model as xm
+
+                _ = xm.xla_device()
+                return True
+            except RuntimeError:
+                return False
+        return True
+    return False
--- a/src/peft/peft_model.py
+++ b/src/peft/peft_model.py
@ -15,6 +15,7 @@

 from __future__ import annotations

+import collections
 import inspect
 import os
 import warnings
@ -58,6 +59,7 @@ from .utils import (
    _set_adapter,
    _set_trainable,
    get_peft_model_state_dict,
+    id_tensor_storage,
    infer_device,
    load_peft_weights,
    set_peft_model_state_dict,
@ -168,6 +170,8 @@ class PeftModel(PushToHubMixin, torch.nn.Module):
            save_directory (`str`):
                Directory where the adapter model and configuration files will be saved (will be created if it does not
                exist).
+            safe_serialization (`bool`, *optional*):
+                Whether to save the adapter files in safetensors format.
            kwargs (additional keyword arguments, *optional*):
                Additional keyword arguments passed along to the `push_to_hub` method.
        """
@ -199,6 +203,28 @@ class PeftModel(PushToHubMixin, torch.nn.Module):
            os.makedirs(output_dir, exist_ok=True)

            if safe_serialization:
+                # Section copied from: https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L2111-L2134
+                # Safetensors does not allow tensor aliasing.
+                # We're going to remove aliases before saving
+                ptrs = collections.defaultdict(list)
+                for name, tensor in output_state_dict.items():
+                    # Sometimes in the state_dict we have non-tensor objects.
+                    # e.g. in bitsandbytes we have some `str` objects in the state_dict
+                    if isinstance(tensor, torch.Tensor):
+                        ptrs[id_tensor_storage(tensor)].append(name)
+                    else:
+                        # In the non-tensor case, fall back to the pointer of the object itself
+                        ptrs[id(tensor)].append(name)
+
+                # These are all the pointers of shared tensors.
+                shared_ptrs = {ptr: names for ptr, names in ptrs.items() if len(names) > 1}
+
+                for _, names in shared_ptrs.items():
+                    # Here we just clone the shared tensors to avoid tensor aliasing which is
+                    # not supported in safetensors.
+                    for shared_tensor_name in names[1:]:
+                        output_state_dict[shared_tensor_name] = output_state_dict[shared_tensor_name].clone()
+
                safe_save_file(
                    output_state_dict,
                    os.path.join(output_dir, SAFETENSORS_WEIGHTS_NAME),
--- a/src/peft/tuners/adalora/layer.py
+++ b/src/peft/tuners/adalora/layer.py
@ -26,7 +26,8 @@ from peft.utils import transpose
 class AdaLoraLayer(LoraLayer):
    # List all names of layers that may contain adapter weights
    # Note: ranknum doesn't need to be included as it is not an nn.Module
-    adapter_layer_names = ["lora_A", "lora_B", "lora_E", "lora_embedding_A", "lora_embedding_B"]
+    adapter_layer_names = ("lora_A", "lora_B", "lora_E", "lora_embedding_A", "lora_embedding_B")
+    # other_param_names is defined in LoraLayer

    def __init__(
        self,
--- a/src/peft/tuners/adaption_prompt/utils.py
+++ b/src/peft/tuners/adaption_prompt/utils.py
@ -39,12 +39,20 @@ def llama_apply_rotary_pos_emb(q, cos, sin, position_ids):
    This function was adapted from:
    https://github.com/huggingface/transformers/blob/1de8ce9ee1191ba761a593ac15d9ccbf5851bfc5/src/transformers/models/llama/modeling_llama.py#L133

-    It was modified to remove unnecessary processing of key states.
+    It was modified to remove unnecessary processing of key states. The method is compatible with transformers <=
+    4.34.2 and also with the latest version (>=4.35).
    """
-    gather_indices = position_ids[:, None, :, None]  # [bs, 1, seq_len, 1]
-    gather_indices = gather_indices.repeat(1, cos.shape[1], 1, cos.shape[3])
-    cos = torch.gather(cos.repeat(gather_indices.shape[0], 1, 1, 1), 2, gather_indices)
-    sin = torch.gather(sin.repeat(gather_indices.shape[0], 1, 1, 1), 2, gather_indices)
+    # In previous transformers version cos/sin cached had a shape of 4D
+    if len(cos.shape) == 4:
+        gather_indices = position_ids[:, None, :, None]  # [bs, 1, seq_len, 1]
+        gather_indices = gather_indices.repeat(1, cos.shape[1], 1, cos.shape[3])
+        cos = torch.gather(cos.repeat(gather_indices.shape[0], 1, 1, 1), 2, gather_indices)
+        sin = torch.gather(sin.repeat(gather_indices.shape[0], 1, 1, 1), 2, gather_indices)
+    # In the new version, it is 2D so we fall back to the new implementation
+    # https://github.com/huggingface/transformers/blame/eef7ea98c31a333bacdc7ae7a2372bde772be8e4/src/transformers/models/llama/modeling_llama.py#L222-L226
+    else:
+        cos = cos[position_ids].unsqueeze(1)
+        sin = sin[position_ids].unsqueeze(1)
    q_embed = (q * cos) + (llama_rotate_half(q) * sin)
    return q_embed

--- a/src/peft/tuners/ia3/config.py
+++ b/src/peft/tuners/ia3/config.py
@ -29,7 +29,9 @@ class IA3Config(PeftConfig):
        target_modules (`Union[List[str],str]`):
            The names of the modules to apply (IA)^3 to.
        feedforward_modules (`Union[List[str],str]`):
-            The names of the modules to be treated as feedforward modules, as in the original paper.
+            The names of the modules to be treated as feedforward modules, as in the original paper. These modules will
+            have (IA)^3 vectors multiplied to the input, instead of the output. feedforward_modules must be a name or a
+            subset of names present in target_modules.
        fan_in_fan_out (`bool`):
            Set this to True if the layer to replace stores weight like (fan_in, fan_out). For example, gpt-2 uses
            `Conv1D` which stores weights like (fan_in, fan_out) and hence this should be set to `True`.
@ -78,3 +80,8 @@ class IA3Config(PeftConfig):
        self.feedforward_modules = (
            set(self.feedforward_modules) if isinstance(self.feedforward_modules, list) else self.feedforward_modules
        )
+
+        # check if feedforward_modules is a subset of target_modules. run the check only if both are sets
+        if isinstance(self.feedforward_modules, set) and isinstance(self.target_modules, set):
+            if not self.feedforward_modules.issubset(self.target_modules):
+                raise ValueError("`feedforward_modules` should be a subset of `target_modules`")
--- a/src/peft/tuners/ia3/layer.py
+++ b/src/peft/tuners/ia3/layer.py
@ -25,8 +25,10 @@ from peft.utils import transpose


 class IA3Layer(BaseTunerLayer):
-    # List all names of layers that may contain adapter weights
-    adapter_layer_names = ["ia3_l"]
+    # All names of layers that may contain adapter weights
+    adapter_layer_names = ("ia3_l",)
+    # All names of other parameters that may contain adapter-related parameters
+    other_layer_names = ("scaling",)

    def __init__(
        self,
--- a/src/peft/tuners/ia3/model.py
+++ b/src/peft/tuners/ia3/model.py
@ -206,7 +206,7 @@ class IA3Model(BaseTuner):
                    "New adapter should have the same value for `is_feedforward` as previously added adapter."
                )
            if isinstance(target, torch.nn.Conv2d):
-                target.update_layer_conv2d(
+                target.update_layer(
                    adapter_name,
                    ia3_config.init_ia3_weights,
                )
--- a/src/peft/tuners/loha/layer.py
+++ b/src/peft/tuners/loha/layer.py
@ -24,8 +24,9 @@ from peft.tuners.lycoris_utils import LycorisLayer


 class LoHaLayer(LycorisLayer, nn.Module):
-    # List all names of layers that may contain adapter weights
-    adapter_layer_names = ["hada_w1_a", "hada_w1_b", "hada_w2_a", "hada_w2_b", "hada_t1", "hada_t2"]
+    # All names of layers that may contain adapter weights
+    adapter_layer_names = ("hada_w1_a", "hada_w1_b", "hada_w2_a", "hada_w2_b", "hada_t1", "hada_t2")
+    # other_param_names is defined on parent class

    def __init__(self):
        LycorisLayer.__init__(self)
--- a/src/peft/tuners/lokr/layer.py
+++ b/src/peft/tuners/lokr/layer.py
@ -24,8 +24,8 @@ from peft.tuners.lycoris_utils import LycorisLayer


 class LoKrLayer(LycorisLayer, nn.Module):
-    # List all names of layers that may contain adapter weights
-    adapter_layer_names = [
+    # All names of layers that may contain adapter weights
+    adapter_layer_names = (
        "lokr_w1",
        "lokr_w1_a",
        "lokr_w1_b",
@ -33,7 +33,8 @@ class LoKrLayer(LycorisLayer, nn.Module):
        "lokr_w2_a",
        "lokr_w2_b",
        "lokr_t2",
-    ]
+    )
+    # other_param_names is defined on parent class

    def __init__(self):
        LycorisLayer.__init__(self)
--- a/src/peft/tuners/lora/layer.py
+++ b/src/peft/tuners/lora/layer.py
@ -26,8 +26,10 @@ from peft.utils.other import transpose


 class LoraLayer(BaseTunerLayer):
-    # List all names of layers that may contain adapter weights
-    adapter_layer_names = ["lora_A", "lora_B", "lora_embedding_A", "lora_embedding_B"]
+    # All names of layers that may contain (trainable) adapter weights
+    adapter_layer_names = ("lora_A", "lora_B", "lora_embedding_A", "lora_embedding_B")
+    # All names of other parameters that may contain adapter-related parameters
+    other_param_names = ("r", "lora_alpha", "scaling", "lora_dropout")

    def __init__(self, in_features: int, out_features: int, **kwargs):
        self.r = {}
--- a/src/peft/tuners/lora/model.py
+++ b/src/peft/tuners/lora/model.py
@ -661,29 +661,15 @@ class LoraModel(BaseTuner):
        del self.peft_config[adapter_name]

        key_list = [key for key, _ in self.model.named_modules() if "lora" not in key]
+        new_adapter = None
        for key in key_list:
            _, target, _ = _get_submodules(self.model, key)
            if isinstance(target, LoraLayer):
-                for attr in [
-                    "r",
-                    "lora_alpha",
-                    "scaling",
-                    "lora_A",
-                    "lora_B",
-                    "lora_embedding_A",
-                    "lora_embedding_B",
-                    "lora_dropout",
-                ]:
-                    if adapter_name in getattr(target, attr):
-                        getattr(target, attr).pop(adapter_name)
-                if adapter_name in target.active_adapters:
-                    resetting_active_adapter = (
-                        list(self.peft_config.keys())[0] if len(self.peft_config) > 0 else "default"
-                    )
-                    warnings.warn(
-                        f"Adapter {adapter_name} was active which is now deleted. Setting active adapter to {resetting_active_adapter}. "
-                    )
-                    target.set_adapter(resetting_active_adapter)
+                target.delete_adapter(adapter_name)
+                if new_adapter is None:
+                    new_adapter = target.active_adapters[:]
+
+        self.active_adapter = new_adapter or []

    def merge_and_unload(self, progressbar: bool = False, safe_merge: bool = False):
        r"""
--- a/src/peft/tuners/lycoris_utils.py
+++ b/src/peft/tuners/lycoris_utils.py
@ -62,6 +62,8 @@ class LycorisLayer(BaseTunerLayer, nn.Module):
    r"""
    A base layer for LyCORIS like adapters
    """
+    # adapter_layer_names needs to be defined on the child class
+    other_param_names = ("r", "alpha", "scaling", "rank_dropout", "module_dropout")

    def __init__(self):
        self.r = {}
@ -391,17 +393,12 @@ class LycorisTuner(BaseTuner):
        del self.peft_config[adapter_name]

        key_list = [key for key, _ in self.model.named_modules() if self.prefix not in key]
+        new_adapter = None
        for key in key_list:
            _, target, _ = _get_submodules(self.model, key)
            if isinstance(target, LycorisLayer):
-                for attr in target.adapter_layer_names:
-                    if adapter_name in getattr(target, attr):
-                        getattr(target, attr).pop(adapter_name)
-                if adapter_name in target.active_adapters:
-                    resetting_active_adapter = (
-                        list(self.peft_config.keys())[0] if len(self.peft_config) > 0 else "default"
-                    )
-                    warnings.warn(
-                        f"Adapter {adapter_name} was active which is now deleted. Setting active adapter to {resetting_active_adapter}. "
-                    )
-                    target.set_adapter(resetting_active_adapter)
+                target.delete_adapter(adapter_name)
+                if new_adapter is None:
+                    new_adapter = target.active_adapters[:]
+
+        self.active_adapter = new_adapter or []
--- a/src/peft/tuners/tuners_utils.py
+++ b/src/peft/tuners/tuners_utils.py
@ -16,6 +16,7 @@ from __future__ import annotations

 import logging
 import re
+import warnings
 from abc import ABC, abstractmethod
 from typing import Any, Union

@ -24,7 +25,7 @@ from torch import nn
 from peft.utils import COMMON_LAYERS_PATTERN

 from ..config import PeftConfig
-from ..utils import _get_submodules
+from ..utils import ModulesToSaveWrapper, _get_submodules


 logger = logging.getLogger(__name__)
@ -210,6 +211,9 @@ class BaseTuner(nn.Module, ABC):
        is_target_modules_in_base_model = False
        key_list = [key for key, _ in model.named_modules()]

+        _check_for_modules_to_save = getattr(peft_config, "modules_to_save", None) is not None
+        _has_modules_to_save = False
+
        model_config = getattr(model, "config", {"model_type": "custom"})
        if hasattr(model_config, "to_dict"):
            model_config = model_config.to_dict()
@ -217,6 +221,22 @@ class BaseTuner(nn.Module, ABC):
        peft_config = self._prepare_adapter_config(peft_config, model_config)

        for key in key_list:
+            # Check for modules_to_save in case
+            if _check_for_modules_to_save and any(
+                key.endswith(f"{module_to_save}") for module_to_save in peft_config.modules_to_save
+            ):
+                # Optionally set the modules to save
+                parent, target, target_name = _get_submodules(model, key)
+
+                if not isinstance(target, ModulesToSaveWrapper):
+                    new_module = ModulesToSaveWrapper(target, adapter_name)
+                    setattr(parent, target_name, new_module)
+                else:
+                    target.update(adapter_name)
+
+                _has_modules_to_save = True
+                continue
+
            if not self._check_target_module_exists(peft_config, key):
                continue

@ -243,6 +263,12 @@ class BaseTuner(nn.Module, ABC):
                if adapter_name in n:
                    p.requires_grad = False

+        if _has_modules_to_save:
+            if not hasattr(model, "modules_to_save"):
+                model.modules_to_save = set(peft_config.modules_to_save)
+            else:
+                model.modules_to_save.update(set(peft_config.modules_to_save))
+
    def merge_adapter(self):
        """
        This method merges the LoRa layers into the base model.
@ -272,8 +298,10 @@ class BaseTunerLayer(ABC):
    """
    active_adapter = None

-    # List all names of layers that may contain adapter weights
-    adapter_layer_names: list[str] = []
+    # All names of layers that may contain adapter (trainable) weights
+    adapter_layer_names: tuple[str] = ()
+    # All names of other parameters that may contain adapter-related parameters
+    other_param_names: tuple[str] = ()

    # indicates whether all adapters should be disabled
    _disable_adapters: bool = False
@ -351,6 +379,54 @@ class BaseTunerLayer(ABC):

        self._active_adapter = adapter_names

+    def _all_available_adapter_names(self) -> list[str]:
+        """Return a sorted list of all available adapter names"""
+        adapter_names = set()
+        for name in self.adapter_layer_names + self.other_param_names:
+            # we check each possible attribute and if it's a dict or ModuleDict, we assume that the keys are the adapter
+            # names
+            attr = getattr(self, name)
+            if hasattr(attr, "keys"):
+                adapter_names.update(attr.keys())
+        return sorted(adapter_names)
+
+    def delete_adapter(self, adapter_name: str) -> None:
+        """
+        Delete an adapter from the layer
+
+        This should be called on all adapter layers, or else we will get an inconsistent state.
+
+        This method will also set a new active adapter if the deleted adapter was an active adapter. It is important
+        that the new adapter is chosen in a deterministic way, so that the same adapter is chosen on all layers.
+
+        Args:
+            adapter_name (`str`): The name of the adapter to delete
+
+        """
+        for attr in self.adapter_layer_names + self.other_param_names:
+            if adapter_name in getattr(self, attr):
+                del getattr(self, attr)[adapter_name]
+
+        if adapter_name in self.active_adapters:
+            # choose a new active adapter
+            active_adapters = self.active_adapters[:]
+            active_adapters.remove(adapter_name)
+            if active_adapters:
+                self.set_adapter(active_adapters)
+            else:
+                # no active adapters left, set a new default adapter
+                # here we get the list of all adapters existing adapter names and choose the first one
+                remaining_adapters = self._all_available_adapter_names()
+                if not remaining_adapters:
+                    self.set_adapter([])
+                else:
+                    new_active_adapter = remaining_adapters[0]
+                    warnings.warn(
+                        f"Adapter {adapter_name} was active which is now deleted. Setting active adapter to "
+                        f"{new_active_adapter}."
+                    )
+                    self.set_adapter(remaining_adapters[0])
+

 def check_target_module_exists(config, key: str) -> bool | re.Match[str] | None:
    """A helper method to check if the passed module's key name matches any of the target modules in the adapter_config.
--- a/src/peft/utils/init.py
+++ b/src/peft/utils/init.py
@ -45,6 +45,7 @@ from .other import (
    infer_device,
    get_auto_gptq_quant_linear,
    get_quantization_config,
+    id_tensor_storage,
 )
 from .hub_utils import hub_file_exists
 from .save_and_load import get_peft_model_state_dict, set_peft_model_state_dict, load_peft_weights
--- a/src/peft/utils/other.py
+++ b/src/peft/utils/other.py
@ -15,14 +15,15 @@
 import copy
 import inspect
 import warnings
-from typing import Optional
+from typing import Optional, Tuple

 import accelerate
 import torch
 from accelerate.hooks import add_hook_to_module, remove_hook_from_module
 from accelerate.utils import is_npu_available, is_xpu_available
+from safetensors.torch import storage_ptr, storage_size

-from ..import_utils import is_auto_gptq_available
+from ..import_utils import is_auto_gptq_available, is_torch_tpu_available


 # Get current device name based on available devices
@ -412,25 +413,57 @@ def get_auto_gptq_quant_linear(gptq_quantization_config):
    """
    Get the right AutoGPTQQuantLinear class based on the quantization config file
    """
-    if is_auto_gptq_available():
+    if gptq_quantization_config is not None and is_auto_gptq_available():
        from auto_gptq.utils.import_utils import dynamically_import_QuantLinear

-        if gptq_quantization_config is not None:
-            desc_act = gptq_quantization_config.desc_act
-            group_size = gptq_quantization_config.group_size
-            bits = gptq_quantization_config.bits
-            disable_exllama = gptq_quantization_config.disable_exllama
-            AutoGPTQQuantLinear = dynamically_import_QuantLinear(
-                use_triton=False,
-                desc_act=desc_act,
-                group_size=group_size,
-                bits=bits,
-                disable_exllama=disable_exllama,
-            )
-            return AutoGPTQQuantLinear
+        desc_act = gptq_quantization_config.desc_act
+        group_size = gptq_quantization_config.group_size
+        bits = gptq_quantization_config.bits
+        if hasattr(gptq_quantization_config, "use_exllama"):
+            use_exllama = gptq_quantization_config.use_exllama
+        else:
+            use_exllama = not gptq_quantization_config.disable_exllama
+        if hasattr(gptq_quantization_config, "exllama_config"):
+            exllama_version = gptq_quantization_config.exllama_config["version"]
+        else:
+            exllama_version = 1
+        AutoGPTQQuantLinear = dynamically_import_QuantLinear(
+            use_triton=False,
+            desc_act=desc_act,
+            group_size=group_size,
+            bits=bits,
+            disable_exllama=not (use_exllama and exllama_version == 1),
+            disable_exllamav2=not (use_exllama and exllama_version == 2),
+        )
+        return AutoGPTQQuantLinear
    return None


+def id_tensor_storage(tensor: torch.Tensor) -> Tuple[torch.device, int, int]:
+    """
+    Unique identifier to a tensor storage. Multiple different tensors can share the same underlying storage. For
+    example, "meta" tensors all share the same storage, and thus their identifier will all be equal. This identifier is
+    guaranteed to be unique and constant for this tensor's storage during its lifetime. Two tensor storages with
+    non-overlapping lifetimes may have the same id.
+
+    This method is the exact same copy of
+    https://github.com/huggingface/transformers/blob/main/src/transformers/pytorch_utils.py#L282C1-L300C58 but we added
+    it here manually to avoid import issue with old versions of transformers.
+    """
+    if tensor.device.type == "xla" and is_torch_tpu_available():
+        # NOTE: xla tensors dont have storage
+        # use some other unique id to distinguish.
+        # this is a XLA tensor, it must be created using torch_xla's
+        # device. So the following import is safe:
+        import torch_xla
+
+        unique_id = torch_xla._XLAC._xla_get_tensor_id(tensor)
+    else:
+        unique_id = storage_ptr(tensor)
+
+    return tensor.device, unique_id, storage_size(tensor)
+
+
 TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING = {
    "t5": ["q", "v"],
    "mt5": ["q", "v"],
@ -478,9 +511,9 @@ TRANSFORMERS_MODELS_TO_IA3_TARGET_MODULES_MAPPING = {
    "bert": ["key", "value", "output.dense"],
    "deberta-v2": ["key_proj", "value_proj", "output.dense"],
    "deberta": ["in_proj", "output.dense"],
-    "RefinedWebModel": ["query_key_value"],
-    "RefinedWeb": ["query_key_value"],
-    "falcon": ["query_key_value"],
+    "RefinedWebModel": ["query_key_value", "dense_4h_to_h"],
+    "RefinedWeb": ["query_key_value", "dense_4h_to_h"],
+    "falcon": ["query_key_value", "dense_4h_to_h"],
 }

 TRANSFORMERS_MODELS_TO_IA3_FEEDFORWARD_MODULES_MAPPING = {
@ -499,9 +532,9 @@ TRANSFORMERS_MODELS_TO_IA3_FEEDFORWARD_MODULES_MAPPING = {
    "bert": ["output.dense"],
    "deberta-v2": ["output.dense"],
    "deberta": ["output.dense"],
-    "RefinedWeb": ["query_key_value"],
-    "RefinedWebModel": ["query_key_value"],
-    "falcon": ["query_key_value"],
+    "RefinedWeb": ["dense_4h_to_h"],
+    "RefinedWebModel": ["dense_4h_to_h"],
+    "falcon": ["dense_4h_to_h"],
 }

 COMMON_LAYERS_PATTERN = ["layers", "h", "block", "blocks", "layer"]
--- a/tests/test_adaption_prompt.py
+++ b/tests/test_adaption_prompt.py
@ -53,7 +53,7 @@ class AdaptionPromptTester(TestCase, PeftCommonTester):
    """

    def setUp(self):
-        """Check that llama is available in transformers package before running each test."""
+        # Check that llama is available in transformers package before running each test.
        if not is_llama_available():
            self.skipTest("Llama not available in transformers. Skipping test.")

--- a/tests/test_common_gpu.py
+++ b/tests/test_common_gpu.py
@ -13,6 +13,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import gc
+import tempfile
 import unittest

 import pytest
@ -22,6 +23,7 @@ from transformers import (
    AutoModelForCausalLM,
    AutoModelForSeq2SeqLM,
    AutoModelForSequenceClassification,
+    AutoModelForTokenClassification,
    AutoTokenizer,
    BitsAndBytesConfig,
    LlamaForCausalLM,
@ -33,6 +35,7 @@ from peft import (
    IA3Config,
    LoraConfig,
    PeftModel,
+    TaskType,
    get_peft_model,
    prepare_model_for_kbit_training,
 )
@ -158,12 +161,12 @@ class PeftGPUCommonTests(unittest.TestCase):
        flan_ia3_config = IA3Config(target_modules=["q", "v"], task_type="SEQ_2_SEQ_LM")

        opt_ia3_config = IA3Config(
-            target_modules=["q_proj", "v_proj"],
-            feedforward_modules=["down_proj"],
+            target_modules=["q_proj", "v_proj", "fc2"],
+            feedforward_modules=["fc2"],
            task_type="CAUSAL_LM",
        )

-        config = IA3Config(target_modules=["q_proj", "v_proj"], feedforward_modules=["down_proj"])
+        config = IA3Config(target_modules=["q_proj", "v_proj", "fc2"], feedforward_modules=["fc2"])

        flan_8bit = get_peft_model(flan_8bit, flan_ia3_config)
        self.assertTrue(
@ -276,12 +279,12 @@ class PeftGPUCommonTests(unittest.TestCase):
        flan_ia3_config = IA3Config(target_modules=["q", "v"], task_type="SEQ_2_SEQ_LM")

        opt_ia3_config = IA3Config(
-            target_modules=["q_proj", "v_proj"],
-            feedforward_modules=["down_proj"],
+            target_modules=["q_proj", "v_proj", "fc2"],
+            feedforward_modules=["fc2"],
            task_type="CAUSAL_LM",
        )

-        config = IA3Config(target_modules=["q_proj", "v_proj"], feedforward_modules=["down_proj"])
+        config = IA3Config(target_modules=["q_proj", "v_proj", "fc2"], feedforward_modules=["fc2"])

        flan_4bit = get_peft_model(flan_4bit, flan_ia3_config)
        self.assertTrue(
@ -631,3 +634,16 @@ class PeftGPUCommonTests(unittest.TestCase):
        self.assertTrue(isinstance(model, PeftModel))
        self.assertTrue(isinstance(model.base_model.model.model.decoder.layers[0].self_attn.q_proj, LoraLinear4bit))
        self.assertTrue(isinstance(model.base_model.model.model.decoder.layers[0].self_attn.v_proj, LoraLinear4bit))
+
+    @require_torch_gpu
+    @pytest.mark.single_gpu_tests
+    def test_serialization_shared_tensors(self):
+        model_checkpoint = "roberta-base"
+        peft_config = LoraConfig(
+            task_type=TaskType.TOKEN_CLS, inference_mode=False, r=16, lora_alpha=16, lora_dropout=0.1, bias="all"
+        )
+        model = AutoModelForTokenClassification.from_pretrained(model_checkpoint, num_labels=11).to("cuda")
+        model = get_peft_model(model, peft_config)
+
+        with tempfile.TemporaryDirectory() as tmp_dir:
+            model.save_pretrained(tmp_dir, safe_serialization=True)
--- a/tests/test_config.py
+++ b/tests/test_config.py
@ -24,6 +24,7 @@ from parameterized import parameterized

 from peft import (
    AdaLoraConfig,
+    # TODO: uncomment once PEFT works again with transformers
    AdaptionPromptConfig,
    IA3Config,
    LoHaConfig,
@ -40,6 +41,7 @@ from peft import (
 PEFT_MODELS_TO_TEST = [("lewtun/tiny-random-OPTForCausalLM-delta", "v1")]

 ALL_CONFIG_CLASSES = (
+    # TODO: uncomment once PEFT works again with transformers
    AdaptionPromptConfig,
    AdaLoraConfig,
    IA3Config,
@ -221,3 +223,31 @@ class PeftConfigTester(unittest.TestCase):

        # should run without errors
        LoraConfig(**valid_config)
+
+    def test_ia3_is_feedforward_subset_invalid_config(self):
+        # This test checks that the IA3 config raises a value error if the feedforward_modules argument
+        # is not a subset of the target_modules argument
+
+        # an example invalid config
+        invalid_config = {"target_modules": ["k", "v"], "feedforward_modules": ["q"]}
+
+        with self.assertRaisesRegex(
+            ValueError, expected_regex="^`feedforward_modules` should be a subset of `target_modules`$"
+        ):
+            IA3Config(**invalid_config)
+
+    def test_ia3_is_feedforward_subset_valid_config(self):
+        # This test checks that the IA3 config is created without errors with valid arguments.
+        # feedforward_modules should be a subset of target_modules if both are lists
+
+        # an example valid config with regex expressions.
+        valid_config_regex_exp = {
+            "target_modules": ".*.(SelfAttention|EncDecAttention|DenseReluDense).*(q|v|wo)$",
+            "feedforward_modules": ".*.DenseReluDense.wo$",
+        }
+        # an example valid config with module lists.
+        valid_config_list = {"target_modules": ["k", "v", "wo"], "feedforward_modules": ["wo"]}
+
+        # should run without errors
+        IA3Config(**valid_config_regex_exp)
+        IA3Config(**valid_config_list)
--- a/tests/test_custom_models.py
+++ b/tests/test_custom_models.py
@ -681,6 +681,14 @@ class PeftCustomModelTester(unittest.TestCase, PeftCommonTester):
            # This is bad, there was a warning about the bias when there should not have been any.
            self.fail("There should be no warning when bias is set to 'none'")

+    @parameterized.expand(TEST_CASES)
+    def test_delete_adapter(self, test_name, model_id, config_cls, config_kwargs):
+        self._test_delete_adapter(model_id, config_cls, config_kwargs)
+
+    @parameterized.expand(TEST_CASES)
+    def test_delete_inactive_adapter(self, test_name, model_id, config_cls, config_kwargs):
+        self._test_delete_inactive_adapter(model_id, config_cls, config_kwargs)
+
    @parameterized.expand(TEST_CASES)
    def test_adding_multiple_adapters_with_bias_raises(self, test_name, model_id, config_cls, config_kwargs):
        self._test_adding_multiple_adapters_with_bias_raises(model_id, config_cls, config_kwargs)
--- a/tests/test_decoder_models.py
+++ b/tests/test_decoder_models.py
@ -154,6 +154,10 @@ class PeftDecoderModelTester(unittest.TestCase, PeftCommonTester):
    def test_delete_adapter(self, test_name, model_id, config_cls, config_kwargs):
        self._test_delete_adapter(model_id, config_cls, config_kwargs)

+    @parameterized.expand(PeftTestConfigManager.get_grid_parameters(FULL_GRID))
+    def test_delete_inactive_adapter(self, test_name, model_id, config_cls, config_kwargs):
+        self._test_delete_inactive_adapter(model_id, config_cls, config_kwargs)
+
    @parameterized.expand(PeftTestConfigManager.get_grid_parameters(FULL_GRID))
    def test_adding_multiple_adapters_with_bias_raises(self, test_name, model_id, config_cls, config_kwargs):
        self._test_adding_multiple_adapters_with_bias_raises(model_id, config_cls, config_kwargs)
--- a/tests/test_encoder_decoder_models.py
+++ b/tests/test_encoder_decoder_models.py
@ -12,11 +12,14 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
+import tempfile
 import unittest

 import torch
 from parameterized import parameterized
-from transformers import AutoModelForSeq2SeqLM
+from transformers import AutoModelForSeq2SeqLM, AutoModelForTokenClassification
+
+from peft import LoraConfig, TaskType, get_peft_model

 from .testing_common import PeftCommonTester, PeftTestConfigManager

@ -125,6 +128,10 @@ class PeftEncoderDecoderModelTester(unittest.TestCase, PeftCommonTester):
    def test_delete_adapter(self, test_name, model_id, config_cls, config_kwargs):
        self._test_delete_adapter(model_id, config_cls, config_kwargs)

+    @parameterized.expand(PeftTestConfigManager.get_grid_parameters(FULL_GRID))
+    def test_delete_inactive_adapter(self, test_name, model_id, config_cls, config_kwargs):
+        self._test_delete_inactive_adapter(model_id, config_cls, config_kwargs)
+
    @parameterized.expand(PeftTestConfigManager.get_grid_parameters(FULL_GRID))
    def test_adding_multiple_adapters_with_bias_raises(self, test_name, model_id, config_cls, config_kwargs):
        self._test_adding_multiple_adapters_with_bias_raises(model_id, config_cls, config_kwargs)
@ -172,3 +179,20 @@ class PeftEncoderDecoderModelTester(unittest.TestCase, PeftCommonTester):
    )
    def test_disable_adapter(self, test_name, model_id, config_cls, config_kwargs):
        self._test_disable_adapter(model_id, config_cls, config_kwargs)
+
+
+class PeftEncoderDecoderCustomModelTester(unittest.TestCase):
+    """
+    A custom class to write any custom test related with Enc-Dec models
+    """
+
+    def test_save_shared_tensors(self):
+        model_id = "hf-internal-testing/tiny-random-RobertaModel"
+        peft_config = LoraConfig(
+            task_type=TaskType.TOKEN_CLS, inference_mode=False, r=16, lora_alpha=16, lora_dropout=0.1, bias="all"
+        )
+        model = AutoModelForTokenClassification.from_pretrained(model_id, num_labels=11)
+        model = get_peft_model(model, peft_config)
+        with tempfile.TemporaryDirectory() as tmp_dir:
+            # This should work fine
+            model.save_pretrained(tmp_dir, safe_serialization=True)
--- a/tests/test_feature_extraction_models.py
+++ b/tests/test_feature_extraction_models.py
@ -146,6 +146,10 @@ class PeftFeatureExtractionModelTester(unittest.TestCase, PeftCommonTester):
    def test_delete_adapter(self, test_name, model_id, config_cls, config_kwargs):
        self._test_delete_adapter(model_id, config_cls, config_kwargs)

+    @parameterized.expand(PeftTestConfigManager.get_grid_parameters(FULL_GRID))
+    def test_delete_inactive_adapter(self, test_name, model_id, config_cls, config_kwargs):
+        self._test_delete_inactive_adapter(model_id, config_cls, config_kwargs)
+
    @parameterized.expand(
        PeftTestConfigManager.get_grid_parameters(
            {
--- a/tests/test_gpu_examples.py
+++ b/tests/test_gpu_examples.py
@ -658,7 +658,8 @@ class PeftGPTQGPUTests(unittest.TestCase):
        from transformers import GPTQConfig

        self.causal_lm_model_id = "marcsun13/opt-350m-gptq-4bit"
-        self.quantization_config = GPTQConfig(bits=4, disable_exllama=True)
+        # TODO : check if it works for Exllamav2 kernels
+        self.quantization_config = GPTQConfig(bits=4, use_exllama=False)
        self.tokenizer = AutoTokenizer.from_pretrained(self.causal_lm_model_id)

    def tearDown(self):
--- a/tests/test_low_level_api.py
+++ b/tests/test_low_level_api.py
@ -19,6 +19,7 @@ import unittest
 import torch

 from peft import LoraConfig, get_peft_model_state_dict, inject_adapter_in_model
+from peft.utils import ModulesToSaveWrapper


 class DummyModel(torch.nn.Module):
@ -63,3 +64,28 @@ class TestPeft(unittest.TestCase):

        for key in peft_state_dict.keys():
            self.assertTrue("lora" in key)
+
+    def test_modules_to_save(self):
+        self.model = DummyModel()
+
+        lora_config = LoraConfig(
+            lora_alpha=16,
+            lora_dropout=0.1,
+            r=64,
+            bias="none",
+            target_modules=["linear"],
+            modules_to_save=["embedding"],
+        )
+
+        self.model = inject_adapter_in_model(lora_config, self.model)
+
+        for name, module in self.model.named_modules():
+            if name == "linear":
+                self.assertTrue(hasattr(module, "lora_A"))
+                self.assertTrue(hasattr(module, "lora_B"))
+            elif name == "embedding":
+                self.assertTrue(isinstance(module, ModulesToSaveWrapper))
+
+        state_dict = get_peft_model_state_dict(self.model)
+
+        self.assertTrue("embedding.weight" in state_dict.keys())
--- a/tests/testing_common.py
+++ b/tests/testing_common.py
@ -29,6 +29,7 @@ from peft import (
    IA3Config,
    LoraConfig,
    PeftModel,
+    PeftType,
    PrefixTuningConfig,
    PromptEncoderConfig,
    PromptLearningConfig,
@ -815,42 +816,87 @@ class PeftCommonTester:
            self.assertIsNotNone(param.grad)

    def _test_delete_adapter(self, model_id, config_cls, config_kwargs):
-        if issubclass(config_cls, AdaLoraConfig):
-            # AdaLora does not support adding more than 1 adapter
-            return
-
-        model = self.transformers_class.from_pretrained(model_id)
+        supported_peft_types = [PeftType.LORA, PeftType.LOHA, PeftType.LOKR]
+        # IA3 does not support deleting adapters yet, but it just needs to be added
+        # AdaLora does not support multiple adapters
        config = config_cls(
            base_model_name_or_path=model_id,
            **config_kwargs,
        )
+        if config.peft_type not in supported_peft_types:
+            return
+
+        model = self.transformers_class.from_pretrained(model_id)
+        if isinstance(config.target_modules, str):
+            # TODO this should be doable
+            self.skipTest("Multiple adapters cannot currently be added when target_modules is a string.")
+
        adapter_to_delete = "delete_me"
        model = get_peft_model(model, config)
        model.add_adapter(adapter_to_delete, config)
        model.set_adapter(adapter_to_delete)
        model = model.to(self.torch_device)
+        model.delete_adapter(adapter_to_delete)
+        self.assertFalse(adapter_to_delete in model.peft_config)
+        self.assertEqual(model.active_adapters, ["default"])

-        if config.peft_type not in ("LORA"):
-            with self.assertRaises(AttributeError):
-                model.delete_adapter(adapter_to_delete)
-        else:
-            model.delete_adapter(adapter_to_delete)
-            self.assertFalse(adapter_to_delete in model.peft_config)
-            key_list = [key for key, _ in model.named_modules() if "lora" not in key]
-            for key in key_list:
-                _, target, _ = _get_submodules(model, key)
-                if isinstance(target, LoraLayer):
-                    for attr in [
-                        "r",
-                        "lora_alpha",
-                        "scaling",
-                        "lora_A",
-                        "lora_B",
-                        "lora_embedding_A",
-                        "lora_embedding_B",
-                        "lora_dropout",
-                    ]:
-                        self.assertFalse(adapter_to_delete in getattr(target, attr))
+        key_list = [key for key, _ in model.named_modules() if "lora" not in key]
+        for key in key_list:
+            _, target, _ = _get_submodules(model, key)
+            attributes_to_check = getattr(target, "adapter_layer_names", []) + getattr(target, "other_param_names", [])
+            for attr in attributes_to_check:
+                self.assertFalse(adapter_to_delete in getattr(target, attr))
+
+        # check that we can also delete the last remaining adapter
+        model.delete_adapter("default")
+        self.assertFalse("default" in model.peft_config)
+        self.assertEqual(model.active_adapters, [])
+
+        input = self.prepare_inputs_for_testing()
+        # note: we cannot call model(**input) because PeftModel always expects there to be at least one adapter
+        model.base_model(**input)  # should not raise an error
+
+    def _test_delete_inactive_adapter(self, model_id, config_cls, config_kwargs):
+        # same as test_delete_adapter, but this time an inactive adapter is deleted
+        supported_peft_types = [PeftType.LORA, PeftType.LOHA, PeftType.LOKR]
+        # IA3 does not support deleting adapters yet, but it just needs to be added
+        # AdaLora does not support multiple adapters
+        config = config_cls(
+            base_model_name_or_path=model_id,
+            **config_kwargs,
+        )
+        if config.peft_type not in supported_peft_types:
+            return
+
+        model = self.transformers_class.from_pretrained(model_id)
+        if isinstance(config.target_modules, str):
+            # TODO this should be doable
+            self.skipTest("Multiple adapters cannot currently be added when target_modules is a string.")
+
+        adapter_to_delete = "delete_me"
+        model = get_peft_model(model, config)
+        model.add_adapter(adapter_to_delete, config)
+        # "delete_me" is added but not activated
+        model = model.to(self.torch_device)
+        model.delete_adapter(adapter_to_delete)
+        self.assertFalse(adapter_to_delete in model.peft_config)
+        self.assertEqual(model.active_adapters, ["default"])
+
+        key_list = [key for key, _ in model.named_modules() if "lora" not in key]
+        for key in key_list:
+            _, target, _ = _get_submodules(model, key)
+            attributes_to_check = getattr(target, "adapter_layer_names", []) + getattr(target, "other_param_names", [])
+            for attr in attributes_to_check:
+                self.assertFalse(adapter_to_delete in getattr(target, attr))
+
+        # check that we can also delete the last remaining adapter
+        model.delete_adapter("default")
+        self.assertFalse("default" in model.peft_config)
+        self.assertEqual(model.active_adapters, [])
+
+        input = self.prepare_inputs_for_testing()
+        # note: we cannot call model(**input) because PeftModel always expects there to be at least one adapter
+        model.base_model(**input)  # should not raise an error

    def _test_unload_adapter(self, model_id, config_cls, config_kwargs):
        model = self.transformers_class.from_pretrained(model_id)
Author	SHA1	Message	Date
Sourab Mangrulkar	32357c2dd2	Release: v0.6.2	2023-11-13 23:31:41 +05:30
ChG	79298c7c24	fix doc typo (#1121 )	2023-11-13 10:48:50 +01:00
Younes Belkada	b25ce8a0cd	Correctly deal with `ModulesToSaveWrapper` when using Low-level API (#1112 ) * correctly deal with `ModulesToSaveWrapper` * style * fix tests (#1117)	2023-11-13 12:22:30 +05:30
Younes Belkada	5d84484079	fix import issue transformers (#1116 )	2023-11-10 18:37:38 +01:00
Lukas Kuhn	49ddefa834	Add num_dataloader_workers arg to dreambooth script (#1107 ) This is especially important for Windows users, who may have to set the number of workers to 0.	2023-11-10 14:21:14 +01:00
Benjamin Bossan	3af469eeea	Refactor adapter deletion (#1105 ) Description The job of deleting an adapter is now transferred to the adapter layer, instead of the adapter model. This makes it easier for users or other libraries who don't use the adapter model to delete adapters. Implementation The code should now be more generic, relying less on hard-coded attributes. As a precaution, I also changed the type of adapter_layer_names from list to tuple, as it should not be mutated. When deleting the active adapter, the logic for choosing the new active adapter has been changed slightly to ensure consistency across layers. In practice, this should rarely make a difference. An error is now raised if the last remaining adapter is deleted. Test coverage has been increased: - Deleting adapters is now also tested for custom models. - It is also tested for LoHa, LoKr, not only LoRA. - I added a test for deleting the non-active adapter. Not implemented I did not add adapter deletion to IA³, since it is included in #980. LMK if it should be added here instead.	2023-11-10 13:33:56 +01:00
Wing Lian	5e7e5ad836	Avoid over-eager auto-gptq import (#1109 )	2023-11-10 12:35:18 +01:00
Younes Belkada	9d8287f3e3	set dev version (#1104 )	2023-11-09 15:44:28 +01:00
Younes Belkada	2efd02769b	Release: 0.6.1 (#1103 )	2023-11-09 15:16:33 +01:00
Younes Belkada	669dd4edeb	Change to 0.6.1.dev0 (#1102 ) * change to 0.6.1.dev0 * oops	2023-11-09 15:03:15 +01:00
Younes Belkada	b5641cc744	[`core`] Fix safetensors serialization for shared tensors (#1101 ) * fix st serialization * add test * add CI test * add comment	2023-11-09 14:50:35 +01:00
Benjamin Bossan	c5d94855cd	FIX Failing nightly CI tests due to IA3 config (#1100 ) Same idea as in PR as #1094, but for yet more ill-configured IA³ configs. The tests are now failing because we do stricter checks on incorrect IA³ configs.	2023-11-09 13:50:44 +01:00
Sumanth R Hegde	face67dfeb	Fix IA3 config for Falcon models (#1007 ) * fixed feedforward for falcon * fixed target_modules for falcon	2023-11-09 12:41:57 +05:30
KCFindstr	d9094cebea	FIX: broken f-string in import_utils (#1091 )	2023-11-08 12:12:24 +01:00
Sourab Mangrulkar	493ae58beb	fix the failing CI tests (#1094 )	2023-11-08 14:47:55 +05:30
Marc Sun	ed4ce9fc94	fix-gptq-training (#1086 ) * fix-gptq-training * style * review	2023-11-07 11:12:23 -05:00
Benjamin Bossan	4c48970cb0	Update the release checklist (#1075 ) As discussed, we wanted to make small amendments to the release process, so that we have a 0.N.0 commit on main. I also adjusted the wording here and there.	2023-11-07 14:23:38 +01:00
Younes Belkada	46e03602ed	[`Docker`] Update Dockerfile to force-use transformers main (#1085 ) * Update Dockerfile * Update Dockerfile * Update Dockerfile	2023-11-07 12:20:15 +01:00
Sumanth R Hegde	45343a4ccc	Improve documentation for IA³ (#984 ) - Improve ia3 documentation - Raise value error for incorrect feedforward_module list - Added tests --------- Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com> Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>	2023-11-07 11:44:27 +01:00
Younes Belkada	276c91b143	FIX: fix adaptation prompt CI and compatibility with latest transformers (4.35.0) (#1084 ) * fix adaptation prompt CI * undo some other changes	2023-11-06 14:04:19 +01:00
Benjamin Bossan	cfe35a7878	FIX: Skip adaption prompt tests with new transformers versions (#1077 ) Adaption prompt is failing with transformers v4.35.0. This PR skips the adaption prompt tests so that CI is green again. The PR also adds an error when users try to use adaption prompt with that version, instructing them to use an older transformers version instead. This should be removed as soon as the issue is fixed in PEFT/transformers.	2023-11-03 15:52:51 +01:00
Benjamin Bossan	d47d23aa0e	After release: Bump version to 0.7.0.dev0 (#1074 )	2023-11-03 11:25:04 +01:00