Release: v0.1.0

Merge pull request #67 from huggingface/smangrul/fix-save-pretrained
make `save_pretrained` work in a way training could be resumed
2025-10-20 23:43:47 +08:00 · 2023-02-10 15:13:51 +05:30 · 2023-02-10 00:14:39 +05:30 · 2023-02-10 00:06:25 +05:30 · 2023-02-09 17:26:17 +05:30 · 2023-02-09 17:25:52 +05:30
42 changed files with 29919 additions and 9125 deletions
--- a/6
+++ b/6
@ -1,6 +1,6 @@
 .PHONY: quality style test docs

-check_dirs := src examples
+check_dirs := src tests examples

 # Check that source code meets quality standards

@ -9,11 +9,11 @@ quality:
 	black --check $(check_dirs)
 	isort --check-only $(check_dirs)
 	flake8 $(check_dirs)
-	doc-builder style src --max_len 119 --check_only
+	doc-builder style src tests --max_len 119 --check_only

 # Format source code automatically and check is there are any problems left that need manual fixing
 style:
 	black $(check_dirs)
 	isort $(check_dirs)
-	doc-builder style src --max_len 119
+	doc-builder style src tests --max_len 119
 	
--- a/README.md
+++ b/README.md
@ -21,7 +21,7 @@ limitations under the License.

 Parameter-Efficient Fine-Tuning (PEFT) methods enable efficient adaptation of pre-trained language models (PLMs) to various downstream applications without fine-tuning all the model's parameters. Fine-tuning large-scale PLMs is often prohibitively costly. In this regard, PEFT methods only fine-tune a small number of (extra) model parameters, thereby greatly decreasing the computational and storage costs. Recent State-of-the-Art PEFT techniques achieve performance comparable to that of full fine-tuning. 

-Seamlessly integrated with 🤗 Accelerate for large scale models leveraging PyTorch FSDP. 
+Seamlessly integrated with 🤗 Accelerate for large scale models leveraging DeepSpeed and Big Model Inference. 

 Supported methods:

@ -34,11 +34,11 @@ Supported methods:

 ```python
 from transformers import AutoModelForSeq2SeqLM
-from peft import get_peft_config, get_peft_model, LoRAConfig, TaskType
+from peft import get_peft_config, get_peft_model, LoraConfig, TaskType
 model_name_or_path = "bigscience/mt0-large"
 tokenizer_name_or_path = "bigscience/mt0-large"

-peft_config = LoRAConfig(
+peft_config = LoraConfig(
    task_type=TaskType.SEQ_2_SEQ_LM, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1
 )

@ -65,7 +65,7 @@ Hardware: Single A100 80GB GPU with CPU RAM above 64GB

 Performance of PEFT-LoRA tuned `bigscience/T0_3B` on `ought/raft/twitter_complaints` leaderboard. 
 A point to note is that we didn't try to sequeeze performance by playing around with input instruction templates, LoRA hyperparams and other training related hyperparams. Also, we didn't use the larger 13B mt0-xxl model.
-So, we are already seeing comparable performance to SoTA with parameter effcient tuning. Also, the final checkpoint size is just `19MB` in comparison to `11GB` size of the backbone `bigscience/T0_3B` model.
+So, we are already seeing comparable performance to SoTA with parameter efficient tuning. Also, the final checkpoint size is just `19MB` in comparison to `11GB` size of the backbone `bigscience/T0_3B` model.

 |   Submission Name        | Accuracy |
 | --------- | ---- |
@ -77,7 +77,7 @@ So, we are already seeing comparable performance to SoTA with parameter effcient

 ### Parameter Efficient Tuning of Diffusion Models

-GPU memory required by different settings during training are given below. The final checkpoint size being `8.8 MB`.
+GPU memory required by different settings during training is given below. The final checkpoint size is `8.8 MB``.

 Hardware: Single A100 80GB GPU with CPU RAM above 64G

@ -127,6 +127,12 @@ Try out the 🤗 Gradio Space which should run seamlessly on a T4 instance:

 ### Parameter Efficient Tuning of LLMs for RLHF components such as Ranker and Policy [ToDo]

+### INT8 training of large models in Colab using PEFT LoRA and bits_and_bytes
+
+Here is now a demo on how to fine tune OPT-6.7b (14GB in fp16) in a Google colab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1jCkpikz0J2o20FBQmYmAGdiKmJGOMo-o?usp=sharing)
+
+Here is now a demo on how to fine tune wishper-large (1.5B params) (14GB in fp16) in a Google colab: [ToDo]
+
 ### Save compute and storage even for medium and small models

 Save storage by avoiding full finetuning of models on each of the downstream tasks/datasets,
@ -143,10 +149,10 @@ Another example is fine-tuning `roberta-large` on `MRPC` GLUE dataset suing diff
 PEFT models work with 🤗 Accelerate out of the box. Use 🤗 Accelerate for Distributed training on various hardware such as GPUs, Apple Silicon devices etc during training.
 Use 🤗 Accelerate for inferencing on consumer hardware with small resources.

-### Example of PEFT model training using 🤗 Accelerate's DeepSpeed integation
+### Example of PEFT model training using 🤗 Accelerate's DeepSpeed integration

- Currently DeepSpeed requires PR [ZeRO3 handling frozen weights](https://github.com/microsoft/DeepSpeed/pull/2653) to fix [[REQUEST] efficiently deal with frozen weights during training](https://github.com/microsoft/DeepSpeed/issues/2615) issue. Example is provided in `~examples/conditional_generation/peft_lora_seq2seq_accelerate_ds_zero3_offload.py`. 
-  a. First run `accelerate config --config_file ds_zero3_cpu.yaml` and answer the questionaire. 
+ Currently DeepSpeed requires PR [ZeRO3 handling frozen weights](https://github.com/microsoft/DeepSpeed/pull/2653) to fix [[REQUEST] efficiently deal with frozen weights during training](https://github.com/microsoft/DeepSpeed/issues/2615) issue. An example is provided in `~examples/conditional_generation/peft_lora_seq2seq_accelerate_ds_zero3_offload.py`. 
+  a. First, run `accelerate` config --config_file ds_zero3_cpu.yaml` and answer the questionnaire. 
  Below are the contents of the config file.
  ```
  compute_environment: LOCAL_MACHINE
@ -172,7 +178,7 @@ Use 🤗 Accelerate for inferencing on consumer hardware with small resources.
  same_network: true
  use_cpu: false
  ```
-  b. run the below command to launch example script
+  b. run the below command to launch the example script
  ```
  accelerate launch --config_file ds_zero3_cpu.yaml examples/peft_lora_seq2seq_accelerate_ds_zero3_offload.py
  ```
@ -203,8 +209,7 @@ Use 🤗 Accelerate for inferencing on consumer hardware with small resources.
  ```

 ### Example of PEFT model inference using 🤗 Accelerate's Big Model Inferencing capabilities
-
-Example is provided in `~examples/causal_language_modeling/peft_lora_clm_accelerate_big_model_inference.ipynb`. 
+An example is provided in `~examples/causal_language_modeling/peft_lora_clm_accelerate_big_model_inference.ipynb`. 


 ## Models support matrix
@ -250,7 +255,30 @@ Example is provided in `~examples/causal_language_modeling/peft_lora_clm_acceler
 | Deberta        | ✅  |     |   |   | 
 | Deberta-v2     | ✅  |     |   |   |

+### Text-to-Image Generation

+|   Model         | LoRA | Prefix Tuning  | P-Tuning | Prompt Tuning  | 
+| --------- | ---- | ---- | ---- | ----  |
+| Stable Diffusion           | ✅  |   |   |   |  
+
+
+### Image Classification
+
+|   Model         | LoRA | Prefix Tuning  | P-Tuning | Prompt Tuning  | 
+| --------- | ---- | ---- | ---- | ----  |
+| ViT           | ✅  |   |   |   | 
+| Swin           | ✅  |   |   |   | 
+
+___Note that we have tested LoRA for https://huggingface.co/docs/transformers/model_doc/vit and [https://huggingface.co/docs/transformers/model_doc/swin] for fine-tuning on image classification. However, it should be possible to use LoRA for any compatible model [provided](https://huggingface.co/models?pipeline_tag=image-classification&sort=downloads&search=vit) by 🤗 Transformers. Check out the respective
+examples to learn more. If you run into problems, please open an issue.___
+
+The same principle applies to our [segmentation models](https://huggingface.co/models?pipeline_tag=image-segmentation&sort=downloads) as well. 
+
+### Semantic Segmentation
+
+|   Model         | LoRA | Prefix Tuning  | P-Tuning | Prompt Tuning  | 
+| --------- | ---- | ---- | ---- | ----  |
+| SegFormer           | ✅  |   |   |   | 
 ## Caveats:

 1. Below is an example of using PyTorch FSDP for training. However, it doesn't lead to 
@ -268,7 +296,7 @@ any GPU memory savings. Please refer issue [[FSDP] FSDP with CPU offload consume
  ```

  Example of parameter efficient tuning with `mt0-xxl` base model using 🤗 Accelerate is provided in `~examples/conditional_generation/peft_lora_seq2seq_accelerate_fsdp.py`. 
-  a. First run `accelerate config --config_file fsdp_config.yaml` and answer the questionaire. 
+  a. First, run `accelerate config --config_file fsdp_config.yaml` and answer the questionnaire. 
  Below are the contents of the config file.
  ```
  command_file: null
@ -300,19 +328,19 @@ any GPU memory savings. Please refer issue [[FSDP] FSDP with CPU offload consume
  tpu_zone: null
  use_cpu: false
  ```
-  b. run the below command to launch example script
+  b. run the below command to launch the example script
  ```
  accelerate launch --config_file fsdp_config.yaml examples/peft_lora_seq2seq_accelerate_fsdp.py
  ```

 2. When using `P_TUNING` or `PROMPT_TUNING` with `SEQ_2_SEQ` task, remember to remove the `num_virtual_token` virtual prompt predictions from the left side of the model outputs during evaluations. 

-3. `P_TUNING` or `PROMPT_TUNING` doesn't support `generate` functionality of transformers bcause `generate` strictly requires `input_ids`/`decoder_input_ids` but 
+3. For encoder-decoder models, `P_TUNING` or `PROMPT_TUNING` doesn't support `generate` functionality of transformers because `generate` strictly requires `decoder_input_ids` but 
 `P_TUNING`/`PROMPT_TUNING` appends soft prompt embeddings to `input_embeds` to create
 new `input_embeds` to be given to the model. Therefore, `generate` doesn't support this yet.

 ## Backlog:
-1. Explore and possibly integrate `(IA)^3` and `UniPELT`
+1. Explore and possibly integrate `(IA)^3`
 2. Add tests
 3. Add more use cases and examples

@ -323,7 +351,7 @@ If you use 🤗 PEFT in your publication, please cite it by using the following
 ```bibtex
@Misc{peft,
  title =        {PEFT: State-of-the-art Parameter-Efficient Fine-Tuning methods},
-  author =       {Sourab Mangrulkar, Sylvain Gugger},
+  author =       {Sourab Mangrulkar, Sylvain Gugger, Lysandre Debut, Younes Belkada, Sayak Paul},
  howpublished = {\url{https://github.com/huggingface/peft}},
  year =         {2022}
 }
--- a/examples/causal_language_modeling/accelerate_ds_zero3_cpu_offload_config.yaml
+++ b/examples/causal_language_modeling/accelerate_ds_zero3_cpu_offload_config.yaml
@ -0,0 +1,22 @@
+compute_environment: LOCAL_MACHINE
+deepspeed_config:
+  gradient_accumulation_steps: 1
+  gradient_clipping: 1.0
+  offload_optimizer_device: none
+  offload_param_device: none
+  zero3_init_flag: true
+  zero3_save_16bit_model: true
+  zero_stage: 3
+distributed_type: DEEPSPEED
+downcast_bf16: 'no'
+dynamo_backend: 'NO'
+fsdp_config: {}
+machine_rank: 0
+main_training_function: main
+megatron_lm_config: {}
+mixed_precision: 'no'
+num_machines: 1
+num_processes: 1
+rdzv_backend: static
+same_network: true
+use_cpu: false
--- a/examples/causal_language_modeling/peft_lora_clm_accelerate_big_model_inference.ipynb
+++ b/examples/causal_language_modeling/peft_lora_clm_accelerate_big_model_inference.ipynb
--- a/examples/causal_language_modeling/peft_lora_clm_accelerate_ds_zero3_offload.py
+++ b/examples/causal_language_modeling/peft_lora_clm_accelerate_ds_zero3_offload.py
@ -17,7 +17,7 @@ from transformers import (

 import psutil
 from datasets import load_dataset
-from peft import LoraConfig, TaskType, get_peft_model, get_peft_model_state_dict
+from peft import LoraConfig, TaskType, get_peft_model
 from tqdm import tqdm


@ -111,9 +111,6 @@ def main():
    model_name_or_path = "bigscience/bloomz-7b1"
    dataset_name = "twitter_complaints"
    peft_config = LoraConfig(task_type=TaskType.CAUSAL_LM, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1)
-    checkpoint_name = (
-        f"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}_v1.pt".replace("/", "_")
-    )
    text_column = "Tweet text"
    label_column = "text_label"
    lr = 3e-3
@ -121,6 +118,7 @@ def main():
    batch_size = 8
    seed = 42
    max_length = 64
+    do_test = False
    set_seed(seed)

    dataset = load_dataset("ought/raft", dataset_name)
@ -315,35 +313,41 @@ def main():
        accelerator.print(f"{eval_preds[:10]=}")
        accelerator.print(f"{dataset['train'][label_column][:10]=}")

-    model.eval()
-    test_preds = []
-    for _, batch in enumerate(tqdm(test_dataloader)):
-        batch = {k: v for k, v in batch.items() if k != "labels"}
-        with torch.no_grad():
-            outputs = accelerator.unwrap_model(model).generate(
-                **batch, synced_gpus=is_ds_zero_3, max_new_tokens=10
-            )  # synced_gpus=True for DS-stage 3
-        test_preds.extend(
-            tokenizer.batch_decode(outputs[:, max_length:].detach().cpu().numpy(), skip_special_tokens=True)
-        )
+    if do_test:
+        model.eval()
+        test_preds = []
+        for _, batch in enumerate(tqdm(test_dataloader)):
+            batch = {k: v for k, v in batch.items() if k != "labels"}
+            with torch.no_grad():
+                outputs = accelerator.unwrap_model(model).generate(
+                    **batch, synced_gpus=is_ds_zero_3, max_new_tokens=10
+                )  # synced_gpus=True for DS-stage 3
+            test_preds.extend(
+                tokenizer.batch_decode(outputs[:, max_length:].detach().cpu().numpy(), skip_special_tokens=True)
+            )

-    test_preds_cleaned = []
-    for _, pred in enumerate(test_preds):
-        test_preds_cleaned.append(get_closest_label(pred, classes))
+        test_preds_cleaned = []
+        for _, pred in enumerate(test_preds):
+            test_preds_cleaned.append(get_closest_label(pred, classes))

-    test_df = dataset["test"].to_pandas()
-    test_df[label_column] = test_preds_cleaned
-    test_df["text_labels_orig"] = test_preds
-    accelerator.print(test_df[[text_column, label_column]].sample(20))
+        test_df = dataset["test"].to_pandas()
+        test_df[label_column] = test_preds_cleaned
+        test_df["text_labels_orig"] = test_preds
+        accelerator.print(test_df[[text_column, label_column]].sample(20))

-    pred_df = test_df[["ID", label_column]]
-    pred_df.columns = ["ID", "Label"]
+        pred_df = test_df[["ID", label_column]]
+        pred_df.columns = ["ID", "Label"]

-    os.makedirs(f"data/{dataset_name}", exist_ok=True)
-    pred_df.to_csv(f"data/{dataset_name}/predictions.csv", index=False)
+        os.makedirs(f"data/{dataset_name}", exist_ok=True)
+        pred_df.to_csv(f"data/{dataset_name}/predictions.csv", index=False)

    accelerator.wait_for_everyone()
-    accelerator.save(get_peft_model_state_dict(model, state_dict=accelerator.get_state_dict(model)), checkpoint_name)
+    model.push_to_hub(
+        "smangrul/"
+        + f"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}".replace("/", "_"),
+        state_dict=accelerator.get_state_dict(model),
+        use_auth_token=True,
+    )
    accelerator.wait_for_everyone()


--- a/examples/causal_language_modeling/peft_prefix_tuning_clm.ipynb
+++ b/examples/causal_language_modeling/peft_prefix_tuning_clm.ipynb
--- a/examples/causal_language_modeling/peft_prompt_tuning_clm.ipynb
+++ b/examples/causal_language_modeling/peft_prompt_tuning_clm.ipynb
--- a/examples/conditional_generation/accelerate_ds_zero3_cpu_offload_config.yaml
+++ b/examples/conditional_generation/accelerate_ds_zero3_cpu_offload_config.yaml
@ -0,0 +1,22 @@
+compute_environment: LOCAL_MACHINE
+deepspeed_config:
+  gradient_accumulation_steps: 1
+  gradient_clipping: 1.0
+  offload_optimizer_device: none
+  offload_param_device: none
+  zero3_init_flag: true
+  zero3_save_16bit_model: true
+  zero_stage: 3
+distributed_type: DEEPSPEED
+downcast_bf16: 'no'
+dynamo_backend: 'NO'
+fsdp_config: {}
+machine_rank: 0
+main_training_function: main
+megatron_lm_config: {}
+mixed_precision: 'no'
+num_machines: 1
+num_processes: 1
+rdzv_backend: static
+same_network: true
+use_cpu: false
--- a/examples/conditional_generation/peft_lora_seq2seq.ipynb
+++ b/examples/conditional_generation/peft_lora_seq2seq.ipynb
@ -2,10 +2,26 @@
 "cells": [
  {
   "cell_type": "code",
-   "execution_count": 17,
+   "execution_count": 1,
   "id": "5f93b7d1",
   "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "===================================BUG REPORT===================================\n",
+      "Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues\n",
+      "For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link\n",
+      "================================================================================\n",
+      "CUDA SETUP: CUDA runtime path found: /home/sourab/miniconda3/envs/ml/lib/libcudart.so\n",
+      "CUDA SETUP: Highest compute capability among GPUs detected: 7.5\n",
+      "CUDA SETUP: Detected CUDA version 117\n",
+      "CUDA SETUP: Loading binary /home/sourab/miniconda3/envs/ml/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...\n"
+     ]
+    }
+   ],
   "source": [
    "from transformers import AutoModelForSeq2SeqLM\n",
    "from peft import get_peft_config,get_peft_model, get_peft_model_state_dict, LoraConfig, TaskType\n",
@ -60,15 +76,13 @@
     "name": "stderr",
     "output_type": "stream",
     "text": [
-      "/home/sourab/miniconda3/envs/ml/lib/python3.10/site-packages/huggingface_hub/utils/_deprecation.py:97: FutureWarning: Deprecated argument(s) used in 'dataset_info': token. Will not be supported from version '0.12'.\n",
-      "  warnings.warn(message, FutureWarning)\n",
      "Found cached dataset financial_phrasebank (/home/sourab/.cache/huggingface/datasets/financial_phrasebank/sentences_allagree/1.0.0/550bde12e6c30e2674da973a55f57edde5181d53f5a5a34c1531c53f93b7e141)\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "6de075f8208349108291ac5ab7f5c980",
+       "model_id": "3403bf3d718042018b0531848cc30209",
       "version_major": 2,
       "version_minor": 0
      },
@ -82,7 +96,7 @@
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "4b0e67b6d93f43e4b0f6a2f8978e4b0c",
+       "model_id": "d3d5c45e3776469f9560b6eaa9346f8f",
       "version_major": 2,
       "version_minor": 0
      },
@ -96,7 +110,7 @@
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "a9551029c9884529bda7421a99170b51",
+       "model_id": "e9736f26e9aa450b8d65f95c0b9c81cc",
       "version_major": 2,
       "version_minor": 0
      },
@ -110,7 +124,7 @@
    {
     "data": {
      "text/plain": [
-       "{'sentence': 'The order was valued at USD12 .2 m.',\n",
+       "{'sentence': \"The 10,000-odd square metre plot that Stockmann has bought for the Nevsky Center shopping center is located on Nevsky Prospect , St Petersburg 's high street , next to the Vosstaniya Square underground station , in the immediate vicinity of Moscow Station .\",\n",
       " 'label': 1,\n",
       " 'text_label': 'neutral'}"
      ]
@ -147,7 +161,7 @@
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "4421971232434db1b6141e91fda2f6d7",
+       "model_id": "c460989d4ab24e3f97d81ef040b1d1b4",
       "version_major": 2,
       "version_minor": 0
      },
@ -161,7 +175,7 @@
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "9b2ef793d93443949f4a5d5874d4bc05",
+       "model_id": "1acc389b08b94f8a87900b9fbdbccce4",
       "version_major": 2,
       "version_minor": 0
      },
@ -234,45 +248,52 @@
     "name": "stderr",
     "output_type": "stream",
     "text": [
-      "100%|█████████████████████████████████████████████████████████████| 255/255 [00:53<00:00,  4.80it/s]\n",
-      "100%|███████████████████████████████████████████████████████████████| 29/29 [00:02<00:00, 14.16it/s]\n"
+      "100%|████████████████████████████████████████████████████████████████████████████████████████| 255/255 [02:21<00:00,  1.81it/s]\n",
+      "100%|██████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:07<00:00,  4.13it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "epoch=0: train_ppl=tensor(13.6966, device='cuda:0') train_epoch_loss=tensor(2.6171, device='cuda:0') eval_ppl=tensor(1.0046, device='cuda:0') eval_epoch_loss=tensor(0.0046, device='cuda:0')\n"
+      "epoch=0: train_ppl=tensor(14.6341, device='cuda:0') train_epoch_loss=tensor(2.6834, device='cuda:0') eval_ppl=tensor(1.0057, device='cuda:0') eval_epoch_loss=tensor(0.0057, device='cuda:0')\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
-      "100%|█████████████████████████████████████████████████████████████| 255/255 [00:52<00:00,  4.88it/s]\n",
-      "100%|███████████████████████████████████████████████████████████████| 29/29 [00:02<00:00, 14.20it/s]\n"
+      "100%|████████████████████████████████████████████████████████████████████████████████████████| 255/255 [02:00<00:00,  2.11it/s]\n",
+      "100%|██████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:05<00:00,  5.66it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "epoch=1: train_ppl=tensor(1.5893, device='cuda:0') train_epoch_loss=tensor(0.4633, device='cuda:0') eval_ppl=tensor(1.0020, device='cuda:0') eval_epoch_loss=tensor(0.0020, device='cuda:0')\n"
+      "epoch=1: train_ppl=tensor(1.7576, device='cuda:0') train_epoch_loss=tensor(0.5640, device='cuda:0') eval_ppl=tensor(1.0052, device='cuda:0') eval_epoch_loss=tensor(0.0052, device='cuda:0')\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
-      "100%|█████████████████████████████████████████████████████████████| 255/255 [00:52<00:00,  4.87it/s]\n",
-      "100%|███████████████████████████████████████████████████████████████| 29/29 [00:02<00:00, 14.18it/s]\n"
+      "100%|████████████████████████████████████████████████████████████████████████████████████████| 255/255 [01:33<00:00,  2.74it/s]\n",
+      "100%|██████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:04<00:00,  6.23it/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "epoch=2: train_ppl=tensor(1.3210, device='cuda:0') train_epoch_loss=tensor(0.2784, device='cuda:0') eval_ppl=tensor(1.0026, device='cuda:0') eval_epoch_loss=tensor(0.0026, device='cuda:0')\n"
+      "epoch=2: train_ppl=tensor(1.3830, device='cuda:0') train_epoch_loss=tensor(0.3243, device='cuda:0') eval_ppl=tensor(1.0035, device='cuda:0') eval_epoch_loss=tensor(0.0035, device='cuda:0')\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\n"
     ]
    }
   ],
@ -313,7 +334,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 20,
+   "execution_count": 7,
   "id": "6cafa67b",
   "metadata": {},
   "outputs": [
@ -321,9 +342,9 @@
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "accuracy=98.23788546255507 % on the evaluation dataset\n",
-      "eval_preds[:10]=['neutral', 'neutral', 'positive', 'positive', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral']\n",
-      "dataset['validation']['text_label'][:10]=['neutral', 'neutral', 'positive', 'positive', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral']\n"
+      "accuracy=97.3568281938326 % on the evaluation dataset\n",
+      "eval_preds[:10]=['neutral', 'neutral', 'neutral', 'positive', 'neutral', 'positive', 'positive', 'neutral', 'neutral', 'neutral']\n",
+      "dataset['validation']['text_label'][:10]=['neutral', 'neutral', 'neutral', 'positive', 'neutral', 'positive', 'positive', 'neutral', 'neutral', 'neutral']\n"
     ]
    }
   ],
@ -343,20 +364,19 @@
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 8,
   "id": "a8de6005",
   "metadata": {},
   "outputs": [],
   "source": [
    "# saving model\n",
-    "state_dict = get_peft_model_state_dict(model)\n",
-    "torch.save(state_dict, checkpoint_name)\n",
-    "print(state_dict)"
+    "peft_model_id = f\"{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}\"\n",
+    "model.save_pretrained(peft_model_id)"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 18,
+   "execution_count": 9,
   "id": "bd20cd4c",
   "metadata": {},
   "outputs": [
@ -364,18 +384,74 @@
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "19M\tfinancial_sentiment_analysis_lora_v1.pt\r\n"
+      "9,2M\tbigscience/mt0-large_LORA_SEQ_2_SEQ_LM/adapter_model.bin\r\n"
     ]
    }
   ],
   "source": [
-    "!du -h $checkpoint_name"
+    "ckpt = f\"{peft_model_id}/adapter_model.bin\"\n",
+    "!du -h $ckpt"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "76c2fc29",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from peft import PeftModel, PeftConfig\n",
+    "peft_model_id = f\"{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}\"\n",
+    "\n",
+    "config = PeftConfig.from_pretrained(peft_model_id)\n",
+    "model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path)\n",
+    "model = PeftModel.from_pretrained(model, peft_model_id)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "37d712ce",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "- Demand for fireplace products was lower than expected , especially in Germany .\n",
+      "{'input_ids': tensor([[  259,   264,   259, 82903,   332,  1090, 10040, 10371,   639,   259,\n",
+      "         19540,  2421,   259, 25505,   259,   261,   259, 21230,   281, 17052,\n",
+      "           259,   260,     1]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}\n",
+      "tensor([[    0,   259, 32588,     1]])\n",
+      "['negative']\n"
+     ]
+    }
+   ],
+   "source": [
+    "model.eval()\n",
+    "i = 13\n",
+    "inputs = tokenizer(dataset[\"validation\"][text_column][i], return_tensors=\"pt\")\n",
+    "print(dataset[\"validation\"][text_column][i])\n",
+    "print(inputs)\n",
+    "\n",
+    "with torch.no_grad():\n",
+    "    outputs = model.generate(input_ids=inputs[\"input_ids\"], max_new_tokens=10)\n",
+    "    print(outputs)\n",
+    "    print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
-   "id": "76c2fc29",
+   "id": "66c65ea4",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "65e71f78",
   "metadata": {},
   "outputs": [],
   "source": []
@ -383,7 +459,7 @@
 ],
 "metadata": {
  "kernelspec": {
-   "display_name": "Python 3.10.5 64-bit",
+   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
@ -397,7 +473,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.10.5 (v3.10.5:f377153967, Jun  6 2022, 12:36:10) [Clang 13.0.0 (clang-1300.0.29.30)]"
+   "version": "3.10.4"
  },
  "vscode": {
   "interpreter": {
--- a/examples/conditional_generation/peft_lora_seq2seq_accelerate_big_model_inference.ipynb
+++ b/examples/conditional_generation/peft_lora_seq2seq_accelerate_big_model_inference.ipynb
@ -0,0 +1,255 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "71fbfca2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from transformers import AutoModelForSeq2SeqLM\n",
+    "from peft import PeftModel, PeftConfig\n",
+    "import torch\n",
+    "from datasets import load_dataset\n",
+    "import os\n",
+    "from transformers import AutoTokenizer\n",
+    "from torch.utils.data import DataLoader\n",
+    "from transformers import default_data_collator,get_linear_schedule_with_warmup\n",
+    "from tqdm import tqdm\n",
+    "from datasets import load_dataset\n",
+    "\n",
+    "dataset_name = \"twitter_complaints\"\n",
+    "text_column = \"Tweet text\"\n",
+    "label_column = \"text_label\"\n",
+    "batch_size=8\n",
+    "\n",
+    "peft_model_id = \"smangrul/twitter_complaints_bigscience_T0_3B_LORA_SEQ_2_SEQ_LM\"\n",
+    "config = PeftConfig.from_pretrained(peft_model_id)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "cc55820a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "peft_model_id = \"smangrul/twitter_complaints_bigscience_T0_3B_LORA_SEQ_2_SEQ_LM\"\n",
+    "max_memory={0: \"6GIB\", 1: \"0GIB\", 2: \"0GIB\", 3: \"0GIB\", 4: \"0GIB\", \"cpu\":\"30GB\"}\n",
+    "config = PeftConfig.from_pretrained(peft_model_id)\n",
+    "model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path, device_map=\"auto\", max_memory=max_memory)\n",
+    "model = PeftModel.from_pretrained(model, peft_model_id, device_map=\"auto\", max_memory=max_memory)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e1a3648b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from datasets import load_dataset\n",
+    "\n",
+    "dataset = load_dataset(\"ought/raft\", dataset_name)\n",
+    "\n",
+    "classes = [k.replace(\"_\", \" \") for k in dataset[\"train\"].features[\"Label\"].names]\n",
+    "print(classes)\n",
+    "dataset = dataset.map(\n",
+    "    lambda x: {\"text_label\": [classes[label] for label in x[\"Label\"]]},\n",
+    "    batched=True,\n",
+    "    num_proc=1,\n",
+    "    \n",
+    ")\n",
+    "print(dataset)\n",
+    "dataset[\"train\"][0]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fe12d4d3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)\n",
+    "target_max_length = max([len(tokenizer(class_label)[\"input_ids\"]) for class_label in classes])\n",
+    "def preprocess_function(examples):\n",
+    "    inputs = examples[text_column]\n",
+    "    targets = examples[label_column]\n",
+    "    model_inputs = tokenizer(inputs, truncation=True)\n",
+    "    labels = tokenizer(\n",
+    "        targets, max_length=target_max_length, padding=\"max_length\", truncation=True, return_tensors=\"pt\"\n",
+    "    )\n",
+    "    labels = labels[\"input_ids\"]\n",
+    "    labels[labels == tokenizer.pad_token_id] = -100\n",
+    "    model_inputs[\"labels\"] = labels\n",
+    "    return model_inputs\n",
+    "\n",
+    "processed_datasets = dataset.map(\n",
+    "    preprocess_function,\n",
+    "    batched=True,\n",
+    "    num_proc=1,\n",
+    "    remove_columns=dataset[\"train\"].column_names,\n",
+    "    load_from_cache_file=True,\n",
+    "    desc=\"Running tokenizer on dataset\",\n",
+    ")\n",
+    "\n",
+    "train_dataset = processed_datasets[\"train\"]\n",
+    "eval_dataset = processed_datasets[\"train\"]\n",
+    "test_dataset = processed_datasets[\"test\"]\n",
+    "\n",
+    "\n",
+    "def collate_fn(examples):\n",
+    "        return tokenizer.pad(examples, padding=\"longest\", return_tensors=\"pt\")\n",
+    "\n",
+    "train_dataloader = DataLoader(\n",
+    "    train_dataset, shuffle=True, collate_fn=collate_fn, batch_size=batch_size, pin_memory=True\n",
+    ")\n",
+    "eval_dataloader = DataLoader(eval_dataset, collate_fn=collate_fn, batch_size=batch_size, pin_memory=True)\n",
+    "test_dataloader = DataLoader(test_dataset, collate_fn=collate_fn, batch_size=batch_size, pin_memory=True)\n",
+    "\n",
+    "\n",
+    "\n",
+    "\n",
+    "    "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "b33be5e6",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "@NYTsupport i have complained a dozen times &amp; yet my papers are still thrown FAR from my door. Why is this so hard to resolve?\n",
+      "{'input_ids': tensor([[25335,  1499,     3,    10,  3320, 12056,   382, 20390,     3,    23,\n",
+      "            43, 25932,     3,     9,  9611,   648,     3,   184,  4624,   117,\n",
+      "           780,    82,  5778,    33,   341,     3, 12618,   377,  4280,    45,\n",
+      "            82,  1365,     5,  1615,    19,    48,    78,   614,    12,  7785,\n",
+      "            58, 16229,     3,    10,     3,     1]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n",
+      "         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}\n",
+      "tensor([[    0, 10394,     1]], device='cuda:0')\n",
+      "['complaint']\n"
+     ]
+    }
+   ],
+   "source": [
+    "model.eval()\n",
+    "i = 15\n",
+    "inputs = tokenizer(f'{text_column} : {dataset[\"test\"][i][\"Tweet text\"]} Label : ', return_tensors=\"pt\")\n",
+    "print(dataset[\"test\"][i][\"Tweet text\"])\n",
+    "print(inputs)\n",
+    "\n",
+    "with torch.no_grad():\n",
+    "    outputs = model.generate(input_ids=inputs[\"input_ids\"].to(\"cuda\"), max_new_tokens=10)\n",
+    "    print(outputs)\n",
+    "    print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "b6d6cd5b",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "  0%|                                                                                                    | 0/7 [00:00<?, ?it/s]You're using a T5TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n",
+      "100%|████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:10<00:00,  1.48s/it]\n"
+     ]
+    }
+   ],
+   "source": [
+    "model.eval()\n",
+    "eval_preds = []\n",
+    "for _, batch in enumerate(tqdm(eval_dataloader)):\n",
+    "    batch = {k: v.to(\"cuda\") for k, v in batch.items() if k != \"labels\"}\n",
+    "    with torch.no_grad():\n",
+    "        outputs = model.generate(**batch, max_new_tokens=10)\n",
+    "    preds = outputs.detach().cpu().numpy()\n",
+    "    eval_preds.extend(tokenizer.batch_decode(preds, skip_special_tokens=True))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "61264abe",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "accuracy=100.0\n",
+      "eval_preds[:10]=['no complaint', 'no complaint', 'complaint', 'complaint', 'no complaint', 'no complaint', 'no complaint', 'complaint', 'complaint', 'no complaint']\n",
+      "dataset['train'][label_column][:10]=['no complaint', 'no complaint', 'complaint', 'complaint', 'no complaint', 'no complaint', 'no complaint', 'complaint', 'complaint', 'no complaint']\n"
+     ]
+    }
+   ],
+   "source": [
+    "correct = 0\n",
+    "total = 0\n",
+    "for pred, true in zip(eval_preds, dataset[\"train\"][label_column]):\n",
+    "    if pred.strip() == true.strip():\n",
+    "        correct += 1\n",
+    "    total += 1\n",
+    "accuracy = correct / total * 100\n",
+    "print(f\"{accuracy=}\")\n",
+    "print(f\"{eval_preds[:10]=}\")\n",
+    "print(f\"{dataset['train'][label_column][:10]=}\")\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a70802a3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model.eval()\n",
+    "test_preds = []\n",
+    "\n",
+    "for _, batch in enumerate(tqdm(test_dataloader)):\n",
+    "    batch = {k: v for k, v in batch.items() if k != \"labels\"}\n",
+    "    with torch.no_grad():\n",
+    "        outputs = model.generate(**batch, max_new_tokens=10)\n",
+    "    preds = outputs.detach().cpu().numpy()\n",
+    "    test_preds.extend(tokenizer.batch_decode(preds, skip_special_tokens=True))\n",
+    "    if len(test_preds)>100:\n",
+    "        break\n",
+    "test_preds"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.5 (v3.10.5:f377153967, Jun  6 2022, 12:36:10) [Clang 13.0.0 (clang-1300.0.29.30)]"
+  },
+  "vscode": {
+   "interpreter": {
+    "hash": "aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49"
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/examples/conditional_generation/peft_lora_seq2seq_accelerate_ds_zero3_offload.py
+++ b/examples/conditional_generation/peft_lora_seq2seq_accelerate_ds_zero3_offload.py
@ -11,7 +11,7 @@ from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, get_linear_schedu

 import psutil
 from datasets import load_dataset
-from peft import LoraConfig, TaskType, get_peft_model, get_peft_model_state_dict
+from peft import LoraConfig, TaskType, get_peft_model
 from tqdm import tqdm


@ -107,15 +107,13 @@ def main():
    peft_config = LoraConfig(
        task_type=TaskType.SEQ_2_SEQ_LM, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1
    )
-    checkpoint_name = (
-        f"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}_v1.pt".replace("/", "_")
-    )
    text_column = "Tweet text"
    label_column = "text_label"
    lr = 3e-3
    num_epochs = 5
    batch_size = 8
    seed = 42
+    do_test = False
    set_seed(seed)

    dataset = load_dataset("ought/raft", dataset_name)
@ -265,33 +263,39 @@ def main():
        accelerator.print(f"{eval_preds[:10]=}")
        accelerator.print(f"{dataset['train'][label_column][:10]=}")

-    model.eval()
-    test_preds = []
-    for _, batch in enumerate(tqdm(test_dataloader)):
-        batch = {k: v for k, v in batch.items() if k != "labels"}
-        with torch.no_grad():
-            outputs = accelerator.unwrap_model(model).generate(
-                **batch, synced_gpus=is_ds_zero_3
-            )  # synced_gpus=True for DS-stage 3
-        test_preds.extend(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))
+    if do_test:
+        model.eval()
+        test_preds = []
+        for _, batch in enumerate(tqdm(test_dataloader)):
+            batch = {k: v for k, v in batch.items() if k != "labels"}
+            with torch.no_grad():
+                outputs = accelerator.unwrap_model(model).generate(
+                    **batch, synced_gpus=is_ds_zero_3
+                )  # synced_gpus=True for DS-stage 3
+            test_preds.extend(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))

-    test_preds_cleaned = []
-    for _, pred in enumerate(test_preds):
-        test_preds_cleaned.append(get_closest_label(pred, classes))
+        test_preds_cleaned = []
+        for _, pred in enumerate(test_preds):
+            test_preds_cleaned.append(get_closest_label(pred, classes))

-    test_df = dataset["test"].to_pandas()
-    test_df[label_column] = test_preds_cleaned
-    test_df["text_labels_orig"] = test_preds
-    accelerator.print(test_df[[text_column, label_column]].sample(20))
+        test_df = dataset["test"].to_pandas()
+        test_df[label_column] = test_preds_cleaned
+        test_df["text_labels_orig"] = test_preds
+        accelerator.print(test_df[[text_column, label_column]].sample(20))

-    pred_df = test_df[["ID", label_column]]
-    pred_df.columns = ["ID", "Label"]
+        pred_df = test_df[["ID", label_column]]
+        pred_df.columns = ["ID", "Label"]

-    os.makedirs(f"data/{dataset_name}", exist_ok=True)
-    pred_df.to_csv(f"data/{dataset_name}/predictions.csv", index=False)
+        os.makedirs(f"data/{dataset_name}", exist_ok=True)
+        pred_df.to_csv(f"data/{dataset_name}/predictions.csv", index=False)

    accelerator.wait_for_everyone()
-    accelerator.save(get_peft_model_state_dict(model, state_dict=accelerator.get_state_dict(model)), checkpoint_name)
+    model.push_to_hub(
+        "smangrul/"
+        + f"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}".replace("/", "_"),
+        state_dict=accelerator.get_state_dict(model),
+        use_auth_token=True,
+    )
    accelerator.wait_for_everyone()


--- a/examples/conditional_generation/peft_lora_seq2seq_accelerate_fsdp.py
+++ b/examples/conditional_generation/peft_lora_seq2seq_accelerate_fsdp.py
@ -6,7 +6,7 @@ from torch.utils.data import DataLoader
 from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, default_data_collator, get_linear_schedule_with_warmup

 from datasets import load_dataset
-from peft import LoraConfig, TaskType, get_peft_model, get_peft_model_state_dict
+from peft import LoraConfig, TaskType, get_peft_model
 from peft.utils.other import fsdp_auto_wrap_policy
 from tqdm import tqdm

@ -25,7 +25,6 @@ def main():
    peft_config = LoraConfig(
        task_type=TaskType.SEQ_2_SEQ_LM, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1
    )
-    checkpoint_name = "financial_sentiment_analysis_lora_fsdp_v1.pt"
    model = AutoModelForSeq2SeqLM.from_pretrained(model_name_or_path)
    model = get_peft_model(model, peft_config)
    accelerator.print(model.print_trainable_parameters())
@ -126,8 +125,10 @@ def main():
        accelerator.print(f"{eval_preds[:10]=}")
        accelerator.print(f"{dataset['validation'][label_column][:10]=}")
        accelerator.wait_for_everyone()
-        accelerator.save(
-            get_peft_model_state_dict(model, state_dict=accelerator.get_state_dict(model)), checkpoint_name
+        model.push_to_hub(
+            "smangrul/" + f"{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}".replace("/", "_"),
+            state_dict=accelerator.get_state_dict(model),
+            use_auth_token=True,
        )
        accelerator.wait_for_everyone()

--- a/examples/conditional_generation/peft_prefix_tuning_seq2seq.ipynb
+++ b/examples/conditional_generation/peft_prefix_tuning_seq2seq.ipynb
@ -5,7 +5,23 @@
   "execution_count": 1,
   "id": "5f93b7d1",
   "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "===================================BUG REPORT===================================\n",
+      "Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues\n",
+      "For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link\n",
+      "================================================================================\n",
+      "CUDA SETUP: CUDA runtime path found: /home/sourab/miniconda3/envs/ml/lib/libcudart.so\n",
+      "CUDA SETUP: Highest compute capability among GPUs detected: 7.5\n",
+      "CUDA SETUP: Detected CUDA version 117\n",
+      "CUDA SETUP: Loading binary /home/sourab/miniconda3/envs/ml/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...\n"
+     ]
+    }
+   ],
   "source": [
    "from transformers import AutoModelForSeq2SeqLM\n",
    "from peft import get_peft_config,get_peft_model, get_peft_model_state_dict, PrefixTuningConfig, TaskType\n",
@ -61,15 +77,13 @@
     "name": "stderr",
     "output_type": "stream",
     "text": [
-      "/home/sourab/miniconda3/envs/ml/lib/python3.10/site-packages/huggingface_hub/utils/_deprecation.py:97: FutureWarning: Deprecated argument(s) used in 'dataset_info': token. Will not be supported from version '0.12'.\n",
-      "  warnings.warn(message, FutureWarning)\n",
      "Found cached dataset financial_phrasebank (/home/sourab/.cache/huggingface/datasets/financial_phrasebank/sentences_allagree/1.0.0/550bde12e6c30e2674da973a55f57edde5181d53f5a5a34c1531c53f93b7e141)\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "e3f8b8faca0a4112b2c3499faee9544b",
+       "model_id": "ec4be98991b84181bfa75f8846422b8b",
       "version_major": 2,
       "version_minor": 0
      },
@ -83,7 +97,7 @@
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "935c8aebde284a5784348588e0bb013a",
+       "model_id": "82a6bd694c4f4751a23c370ab51f01a4",
       "version_major": 2,
       "version_minor": 0
      },
@ -97,7 +111,7 @@
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "e3487cd55f6847588492bf7fa51348ca",
+       "model_id": "3844878631534468a1495e435563e4b0",
       "version_major": 2,
       "version_minor": 0
      },
@ -111,9 +125,9 @@
    {
     "data": {
      "text/plain": [
-       "{'sentence': 'ADPnews - Feb 5 , 2010 - Finnish real estate investor Sponda Oyj HEL : SDA1V said today that it slipped to a net loss of EUR 81.5 million USD 11.8 m in 2009 from a profit of EUR 29.3 million in 2008 .',\n",
-       " 'label': 0,\n",
-       " 'text_label': 'negative'}"
+       "{'sentence': 'Finnish elevators and escalators maker KONE Corporation said on Tuesday ( 18 March ) that it has received a major order from Sir Robert McAlpine to supply all elevators and escalators for the Watermark Place project in the City of London .',\n",
+       " 'label': 2,\n",
+       " 'text_label': 'positive'}"
      ]
     },
     "execution_count": 3,
@ -145,39 +159,11 @@
   "id": "adf9608c",
   "metadata": {},
   "outputs": [
-    {
-     "data": {
-      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "2ce088f4437d4e2c80c267332a5b84e5",
-       "version_major": 2,
-       "version_minor": 0
-      },
-      "text/plain": [
-       "Downloading:   0%|          | 0.00/792k [00:00<?, ?B/s]"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "4e5f69b61f194220b39336e48edd2f9e",
-       "version_major": 2,
-       "version_minor": 0
-      },
-      "text/plain": [
-       "Downloading:   0%|          | 0.00/1.39M [00:00<?, ?B/s]"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
-      "/home/sourab/transformers/src/transformers/models/t5/tokenization_t5_fast.py:156: FutureWarning: This tokenizer was incorrectly instantiated with a model max length of 512 which will be corrected in Transformers v5.\n",
+      "/home/sourab/transformers/src/transformers/models/t5/tokenization_t5_fast.py:155: FutureWarning: This tokenizer was incorrectly instantiated with a model max length of 512 which will be corrected in Transformers v5.\n",
      "For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.\n",
      "- Be aware that you SHOULD NOT rely on t5-large automatically truncating your input to 512 when padding/encoding.\n",
      "- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.\n",
@ -188,7 +174,7 @@
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "230c5631891e4ea8ac7a1b39f315a4f0",
+       "model_id": "4af8c12efb5643659573347509079f3a",
       "version_major": 2,
       "version_minor": 0
      },
@ -202,7 +188,7 @@
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "b581e5677d2a45459ceb725534ed0891",
+       "model_id": "86033b6257384584afd034075af808cb",
       "version_major": 2,
       "version_minor": 0
      },
@ -275,82 +261,75 @@
     "name": "stderr",
     "output_type": "stream",
     "text": [
-      "100%|█████████████████████████████████████████████████████████████| 255/255 [00:20<00:00, 12.27it/s]\n",
-      "100%|███████████████████████████████████████████████████████████████| 29/29 [00:01<00:00, 17.32it/s]\n"
+      "100%|████████████████████████████████████████████████████████████████████████████████████████| 255/255 [00:49<00:00,  5.15it/s]\n",
+      "100%|██████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:03<00:00,  7.56it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "epoch=0: train_ppl=tensor(2697769., device='cuda:0') train_epoch_loss=tensor(14.8079, device='cuda:0') eval_ppl=tensor(1.0089, device='cuda:0') eval_epoch_loss=tensor(0.0089, device='cuda:0')\n"
+      "epoch=0: train_ppl=tensor(2760654.5000, device='cuda:0') train_epoch_loss=tensor(14.8310, device='cuda:0') eval_ppl=tensor(1.0124, device='cuda:0') eval_epoch_loss=tensor(0.0124, device='cuda:0')\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
-      "100%|█████████████████████████████████████████████████████████████| 255/255 [00:19<00:00, 12.75it/s]\n",
-      "100%|███████████████████████████████████████████████████████████████| 29/29 [00:01<00:00, 17.33it/s]\n"
+      "100%|████████████████████████████████████████████████████████████████████████████████████████| 255/255 [00:40<00:00,  6.22it/s]\n",
+      "100%|██████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:05<00:00,  5.05it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "epoch=1: train_ppl=tensor(2.9475, device='cuda:0') train_epoch_loss=tensor(1.0809, device='cuda:0') eval_ppl=tensor(1.0072, device='cuda:0') eval_epoch_loss=tensor(0.0072, device='cuda:0')\n"
+      "epoch=1: train_ppl=tensor(2.7329, device='cuda:0') train_epoch_loss=tensor(1.0054, device='cuda:0') eval_ppl=tensor(1.0081, device='cuda:0') eval_epoch_loss=tensor(0.0080, device='cuda:0')\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
-      "100%|█████████████████████████████████████████████████████████████| 255/255 [00:20<00:00, 12.71it/s]\n",
-      "100%|███████████████████████████████████████████████████████████████| 29/29 [00:01<00:00, 17.31it/s]\n"
+      "100%|████████████████████████████████████████████████████████████████████████████████████████| 255/255 [00:58<00:00,  4.36it/s]\n",
+      "100%|██████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:05<00:00,  5.05it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "epoch=2: train_ppl=tensor(2.0588, device='cuda:0') train_epoch_loss=tensor(0.7221, device='cuda:0') eval_ppl=tensor(1.0055, device='cuda:0') eval_epoch_loss=tensor(0.0054, device='cuda:0')\n"
+      "epoch=2: train_ppl=tensor(2.1698, device='cuda:0') train_epoch_loss=tensor(0.7747, device='cuda:0') eval_ppl=tensor(1.0057, device='cuda:0') eval_epoch_loss=tensor(0.0057, device='cuda:0')\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
-      "100%|█████████████████████████████████████████████████████████████| 255/255 [00:20<00:00, 12.70it/s]\n",
-      "100%|███████████████████████████████████████████████████████████████| 29/29 [00:01<00:00, 17.32it/s]\n"
+      "100%|████████████████████████████████████████████████████████████████████████████████████████| 255/255 [00:58<00:00,  4.35it/s]\n",
+      "100%|██████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:05<00:00,  5.06it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "epoch=3: train_ppl=tensor(1.7939, device='cuda:0') train_epoch_loss=tensor(0.5844, device='cuda:0') eval_ppl=tensor(1.0063, device='cuda:0') eval_epoch_loss=tensor(0.0063, device='cuda:0')\n"
+      "epoch=3: train_ppl=tensor(2.0724, device='cuda:0') train_epoch_loss=tensor(0.7287, device='cuda:0') eval_ppl=tensor(1.0051, device='cuda:0') eval_epoch_loss=tensor(0.0051, device='cuda:0')\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
-      "100%|█████████████████████████████████████████████████████████████| 255/255 [00:19<00:00, 13.01it/s]\n",
-      "100%|███████████████████████████████████████████████████████████████| 29/29 [00:01<00:00, 17.33it/s]"
+      "100%|████████████████████████████████████████████████████████████████████████████████████████| 255/255 [01:02<00:00,  4.10it/s]\n",
+      "100%|██████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:06<00:00,  4.74it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "epoch=4: train_ppl=tensor(1.7740, device='cuda:0') train_epoch_loss=tensor(0.5732, device='cuda:0') eval_ppl=tensor(1.0062, device='cuda:0') eval_epoch_loss=tensor(0.0061, device='cuda:0')\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "\n"
+      "epoch=4: train_ppl=tensor(1.7598, device='cuda:0') train_epoch_loss=tensor(0.5652, device='cuda:0') eval_ppl=tensor(1.0047, device='cuda:0') eval_epoch_loss=tensor(0.0047, device='cuda:0')\n"
     ]
    }
   ],
@ -399,9 +378,9 @@
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "accuracy=96.47577092511013 % on the evaluation dataset\n",
-      "eval_preds[:10]=['neutral', 'neutral', 'neutral', 'negative', 'neutral', 'neutral', 'neutral', 'neutral', 'positive', 'positive']\n",
-      "dataset['validation']['text_label'][:10]=['neutral', 'neutral', 'neutral', 'negative', 'neutral', 'neutral', 'neutral', 'neutral', 'positive', 'positive']\n"
+      "accuracy=96.91629955947137 % on the evaluation dataset\n",
+      "eval_preds[:10]=['negative', 'positive', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral']\n",
+      "dataset['validation']['text_label'][:10]=['negative', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral']\n"
     ]
    }
   ],
@ -424,26 +403,11 @@
   "execution_count": 8,
   "id": "a8de6005",
   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "{'prompt_embeddings': tensor([[-0.3165, -0.8389,  0.3262,  ..., -1.5049, -1.6963,  0.3444],\n",
-      "        [-1.8359,  1.1936,  1.0483,  ...,  0.6197, -0.4452,  0.5844],\n",
-      "        [-0.6027,  0.3246, -1.5601,  ..., -0.3645,  0.2329,  0.3402],\n",
-      "        ...,\n",
-      "        [-1.9525, -0.5035,  0.8474,  ...,  0.4793, -0.0789, -0.9305],\n",
-      "        [-1.9741,  0.5242, -2.0594,  ..., -0.7970, -0.4889,  2.7323],\n",
-      "        [ 0.9355, -0.2714,  0.4610,  ...,  0.2692, -1.5801, -1.6405]])}\n"
-     ]
-    }
-   ],
+   "outputs": [],
   "source": [
    "# saving model\n",
-    "state_dict = get_peft_model_state_dict(model)\n",
-    "torch.save(state_dict, checkpoint_name)\n",
-    "print(state_dict)"
+    "peft_model_id = f\"{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}\"\n",
+    "model.save_pretrained(peft_model_id)"
   ]
  },
  {
@ -456,18 +420,68 @@
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "3,8M\tfinancial_sentiment_analysis_prefix_tuning_v1.pt\r\n"
+      "3,8M\tt5-large_PREFIX_TUNING_SEQ_2_SEQ_LM/adapter_model.bin\r\n"
     ]
    }
   ],
   "source": [
-    "!du -h $checkpoint_name"
+    "ckpt = f\"{peft_model_id}/adapter_model.bin\"\n",
+    "!du -h $ckpt"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "76c2fc29",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from peft import PeftModel, PeftConfig\n",
+    "peft_model_id = f\"{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}\"\n",
+    "\n",
+    "config = PeftConfig.from_pretrained(peft_model_id)\n",
+    "model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path)\n",
+    "model = PeftModel.from_pretrained(model, peft_model_id)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 27,
+   "id": "d997f1cc",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Acando AB ( ACANB SS ) fell 8.9 percent to 13.35 kronor , the lowest close since Dec. 11 .\n",
+      "{'input_ids': tensor([[ 4292,   232,    32,     3,  5359,    41,     3, 22029, 14972,     3,\n",
+      "          4256,     3,    61,  4728,  4848,  1298,  1093,    12,  8808,  2469,\n",
+      "             3, 22318,    29,   127,     3,     6,     8,  7402,   885,   437,\n",
+      "          4451,     5,   850,     3,     5,     1]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n",
+      "         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}\n",
+      "tensor([[   0, 2841,    1]])\n",
+      "['negative']\n"
+     ]
+    }
+   ],
+   "source": [
+    "model.eval()\n",
+    "i = 107\n",
+    "inputs = tokenizer(dataset[\"validation\"][text_column][i], return_tensors=\"pt\")\n",
+    "print(dataset[\"validation\"][text_column][i])\n",
+    "print(inputs)\n",
+    "\n",
+    "with torch.no_grad():\n",
+    "    outputs = model.generate(input_ids=inputs[\"input_ids\"], max_new_tokens=10)\n",
+    "    print(outputs)\n",
+    "    print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
-   "id": "76c2fc29",
+   "id": "fb746c1e",
   "metadata": {},
   "outputs": [],
   "source": []
@ -475,7 +489,7 @@
 ],
 "metadata": {
  "kernelspec": {
-   "display_name": "Python 3.10.5 64-bit",
+   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
--- a/examples/image_classification/README.md
+++ b/examples/image_classification/README.md
@ -0,0 +1,7 @@
+# Fine-tuning for image classification using LoRA and 🤗 PEFT
+
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/peft/blob/main/examples/image_classification/image_classification_peft_lora.ipynb) 
+
+We provide a notebook (`image_classification_peft_lora.ipynb`) where we learn how to use [LoRA](https://arxiv.org/abs/2106.09685) from 🤗 PEFT to fine-tune an image classification model by ONLY using **0.7%** of the original trainable parameters of the model. 
+
+LoRA adds low-rank "update matrices" to certain blocks in the underlying model (in this case the attention blocks) and ONLY trains those matrices during fine-tuning. During inference, these update matrices are _merged_ with the original model parameters. For more details, check out the [original LoRA paper](https://arxiv.org/abs/2106.09685). 
--- a/examples/image_classification/image_classification_peft_lora.ipynb
+++ b/examples/image_classification/image_classification_peft_lora.ipynb
--- a/examples/int8_training/Finetune_opt_bnb_peft.ipynb
+++ b/examples/int8_training/Finetune_opt_bnb_peft.ipynb
--- a/examples/lora_dreambooth/colab_notebook.ipynb
+++ b/examples/lora_dreambooth/colab_notebook.ipynb
@ -0,0 +1,54 @@
+{
+  "cells": [
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "kdOhtpergLCQ"
+      },
+      "outputs": [],
+      "source": [
+        "!git clone https://huggingface.co/spaces/smangrul/peft-lora-sd-dreambooth"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "_LuGk9mihPx7"
+      },
+      "outputs": [],
+      "source": [
+        "%cd \"peft-lora-sd-dreambooth\"\n",
+        "!pip install -r requirements.txt"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "BYKO8e5ElJOX"
+      },
+      "outputs": [],
+      "source": [
+        "!python colab.py"
+      ]
+    }
+  ],
+  "metadata": {
+    "accelerator": "GPU",
+    "colab": {
+      "provenance": []
+    },
+    "gpuClass": "premium",
+    "kernelspec": {
+      "display_name": "Python 3",
+      "name": "python3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}
--- a/examples/lora_dreambooth/lora_dreambooth_inference.ipynb
+++ b/examples/lora_dreambooth/lora_dreambooth_inference.ipynb
--- a/examples/lora_dreambooth/requirements.txt
+++ b/examples/lora_dreambooth/requirements.txt
@ -2,7 +2,6 @@ transformers
 accelerate
 loralib
 evaluate
-deepspeed
 tqdm
 datasets
 diffusers
--- a/examples/lora_dreambooth/train_dreambooth.py
+++ b/examples/lora_dreambooth/train_dreambooth.py
@ -11,6 +11,7 @@ import warnings
 from pathlib import Path
 from typing import Optional

+import numpy as np
 import torch
 import torch.nn.functional as F
 import torch.utils.checkpoint
@ -24,7 +25,13 @@ from transformers import AutoTokenizer, PretrainedConfig
 import datasets
 import diffusers
 import psutil
-from diffusers import AutoencoderKL, DDPMScheduler, DiffusionPipeline, UNet2DConditionModel
+from diffusers import (
+    AutoencoderKL,
+    DDPMScheduler,
+    DiffusionPipeline,
+    DPMSolverMultistepScheduler,
+    UNet2DConditionModel,
+)
 from diffusers.optimization import get_scheduler
 from diffusers.utils import check_min_version
 from diffusers.utils.import_utils import is_xformers_available
@ -129,6 +136,27 @@ def parse_args(input_args=None):
            " class_data_dir, additional images will be sampled with class_prompt."
        ),
    )
+    parser.add_argument(
+        "--validation_prompt",
+        type=str,
+        default=None,
+        help="A prompt that is used during validation to verify that the model is learning.",
+    )
+    parser.add_argument(
+        "--num_validation_images",
+        type=int,
+        default=4,
+        help="Number of images that should be generated during validation with `validation_prompt`.",
+    )
+    parser.add_argument(
+        "--validation_steps",
+        type=int,
+        default=100,
+        help=(
+            "Run dreambooth validation every X steps. Dreambooth validation consists of running the prompt"
+            " `args.validation_prompt` multiple times: `args.num_validation_images`."
+        ),
+    )
    parser.add_argument(
        "--output_dir",
        type=str,
@ -948,6 +976,54 @@ def main(args):
                progress_bar.set_postfix(**logs)
                accelerator.log(logs, step=global_step)

+                if (
+                    args.validation_prompt is not None
+                    and (step + num_update_steps_per_epoch * epoch) % args.validation_steps == 0
+                ):
+                    logger.info(
+                        f"Running validation... \n Generating {args.num_validation_images} images with prompt:"
+                        f" {args.validation_prompt}."
+                    )
+                    # create pipeline
+                    pipeline = DiffusionPipeline.from_pretrained(
+                        args.pretrained_model_name_or_path,
+                        safety_checker=None,
+                        revision=args.revision,
+                    )
+                    # set `keep_fp32_wrapper` to True because we do not want to remove
+                    # mixed precision hooks while we are still training
+                    pipeline.unet = accelerator.unwrap_model(unet, keep_fp32_wrapper=True)
+                    pipeline.text_encoder = accelerator.unwrap_model(text_encoder, keep_fp32_wrapper=True)
+                    pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config)
+                    pipeline = pipeline.to(accelerator.device)
+                    pipeline.set_progress_bar_config(disable=True)
+
+                    # run inference
+                    generator = torch.Generator(device=accelerator.device).manual_seed(args.seed)
+                    images = []
+                    for _ in range(args.num_validation_images):
+                        image = pipeline(args.validation_prompt, num_inference_steps=25, generator=generator).images[0]
+                        images.append(image)
+
+                    for tracker in accelerator.trackers:
+                        if tracker.name == "tensorboard":
+                            np_images = np.stack([np.asarray(img) for img in images])
+                            tracker.writer.add_images("validation", np_images, epoch, dataformats="NHWC")
+                        if tracker.name == "wandb":
+                            import wandb
+
+                            tracker.log(
+                                {
+                                    "validation": [
+                                        wandb.Image(image, caption=f"{i}: {args.validation_prompt}")
+                                        for i, image in enumerate(images)
+                                    ]
+                                }
+                            )
+
+                    del pipeline
+                    torch.cuda.empty_cache()
+
                if global_step >= args.max_train_steps:
                    break
        # Printing the GPU memory usage details such as allocated memory, peak memory, and total memory usage
--- a/examples/semantic_segmentation/README.md
+++ b/examples/semantic_segmentation/README.md
@ -0,0 +1,7 @@
+# Fine-tuning for semantic segmentation using LoRA and 🤗 PEFT
+
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/peft/blob/main/examples/semantic_segmentation/semantic_segmentation_peft_lora.ipynb) 
+
+We provide a notebook (`semantic_segmentation_peft_lora.ipynb`) where we learn how to use [LoRA](https://arxiv.org/abs/2106.09685) from 🤗 PEFT to fine-tune an semantic segmentation by ONLY using **14%%** of the original trainable parameters of the model. 
+
+LoRA adds low-rank "update matrices" to certain blocks in the underlying model (in this case the attention blocks) and ONLY trains those matrices during fine-tuning. During inference, these update matrices are _merged_ with the original model parameters. For more details, check out the [original LoRA paper](https://arxiv.org/abs/2106.09685). 
--- a/examples/semantic_segmentation/semantic_segmentation_peft_lora.ipynb
+++ b/examples/semantic_segmentation/semantic_segmentation_peft_lora.ipynb
--- a/examples/sequence_classification/LoRA.ipynb
+++ b/examples/sequence_classification/LoRA.ipynb
--- a/examples/sequence_classification/P_Tuning.ipynb
+++ b/examples/sequence_classification/P_Tuning.ipynb
--- a/examples/sequence_classification/Prompt_Tuning.ipynb
+++ b/examples/sequence_classification/Prompt_Tuning.ipynb
--- a/examples/sequence_classification/prefix_tuning.ipynb
+++ b/examples/sequence_classification/prefix_tuning.ipynb
--- a/examples/sequence_classification/requirements.txt
+++ b/examples/sequence_classification/requirements.txt
@ -2,6 +2,5 @@ transformers
 accelerate
 loralib
 evaluate
-deepspeed
 tqdm
 datasets
--- a/examples/token_classification/peft_lora_token_cls.ipynb
+++ b/examples/token_classification/peft_lora_token_cls.ipynb
--- a/examples/token_classification/requirements.txt
+++ b/examples/token_classification/requirements.txt
@ -2,7 +2,6 @@ transformers
 accelerate
 loralib
 evaluate
-deepspeed
 tqdm
 datasets
 Pillow
--- a/setup.py
+++ b/setup.py
@ -1,4 +1,4 @@
-# Copyright 2021 The HuggingFace Team. All rights reserved.
+# Copyright 2023 The HuggingFace Team. All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@ -22,7 +22,7 @@ extras["dev"] = extras["quality"] + extras["docs_specific"]

 setup(
    name="peft",
-    version="0.0.2",
+    version="0.1.0",
    description="Parameter-Efficient Fine-Tuning (PEFT)",
    long_description=open("README.md", "r", encoding="utf-8").read(),
    long_description_content_type="text/markdown",
@ -30,7 +30,7 @@ setup(
    license="Apache",
    author="The HuggingFace team",
    author_email="sourab@huggingface.co",
-    url="https://github.com/huggingface/pets",
+    url="https://github.com/huggingface/peft",
    package_dir={"": "src"},
    packages=find_packages("src"),
    entry_points={},
@ -43,7 +43,7 @@ setup(
        "torch>=1.13.0",
        "transformers",
        "accelerate",
-        "loralib",
+        "bitsandbytes",
    ],
    extras_require=extras,
    classifiers=[
@ -71,9 +71,7 @@ setup(
 #      twine upload dist/* -r pypitest
 #      twine upload dist/* -r pypitest --repository-url=https://test.pypi.org/legacy/
 # 6. Check that you can install it in a virtualenv by running:
-#      pip install -i https://testpypi.python.org/pypi accelerate
-#      accelerate env
-#      accelerate test
+#      pip install -i https://testpypi.python.org/pypi peft
 # 7. Upload the final version to actual pypi:
 #      twine upload dist/* -r pypi
 # 8. Add release notes to the tag in github once everything is looking hunky-dory.
--- a/src/peft/init.py
+++ b/src/peft/init.py
@ -17,7 +17,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-__version__ = "0.0.2"
+__version__ = "0.1.0"

 from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING, PEFT_TYPE_TO_CONFIG_MAPPING, get_peft_config, get_peft_model
 from .peft_model import (
@ -40,13 +40,13 @@ from .tuners import (
    PromptTuningInit,
 )
 from .utils import (
+    TRANSFORMERS_MODELS_TO_PREFIX_TUNING_POSTPROCESS_MAPPING,
    PeftConfig,
    PeftType,
    PromptLearningConfig,
    TaskType,
    bloom_model_postprocess_past_key_value,
    get_peft_model_state_dict,
-    peft_model_load_and_dispatch,
    set_peft_model_state_dict,
    shift_tokens_right,
 )
--- a/src/peft/mapping.py
+++ b/src/peft/mapping.py
@ -14,13 +14,14 @@
 # limitations under the License.

 from .peft_model import (
+    PeftModel,
    PeftModelForCausalLM,
    PeftModelForSeq2SeqLM,
    PeftModelForSequenceClassification,
    PeftModelForTokenClassification,
 )
 from .tuners import LoraConfig, PrefixTuningConfig, PromptEncoderConfig, PromptTuningConfig
-from .utils import PeftType
+from .utils import PromptLearningConfig


 MODEL_TYPE_TO_PEFT_MODEL_MAPPING = {
@ -133,9 +134,12 @@ def get_peft_model(model, peft_config):
    """

    model_config = model.config.to_dict()
-    if peft_config.peft_type != PeftType.LORA:
-        peft_config = _prepare_prompt_learning_config(peft_config, model_config)
-    else:
+    peft_config.base_model_name_or_path = model.__dict__.get("name_or_path", None)
+    if peft_config.task_type not in MODEL_TYPE_TO_PEFT_MODEL_MAPPING.keys():
        peft_config = _prepare_lora_config(peft_config, model_config)
-
+        return PeftModel(model, peft_config)
+    if not isinstance(peft_config, PromptLearningConfig):
+        peft_config = _prepare_lora_config(peft_config, model_config)
+    else:
+        peft_config = _prepare_prompt_learning_config(peft_config, model_config)
    return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](model, peft_config)
--- a/src/peft/peft_model.py
+++ b/src/peft/peft_model.py
@ -14,18 +14,36 @@
 # limitations under the License.

 import inspect
+import os
 import warnings

 import torch
+from accelerate import dispatch_model, infer_auto_device_map
+from accelerate.hooks import AlignDevicesHook, add_hook_to_module, remove_hook_from_submodules
+from accelerate.utils import get_balanced_memory
 from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
 from transformers import PreTrainedModel
 from transformers.modeling_outputs import SequenceClassifierOutput, TokenClassifierOutput
+from transformers.utils import PushToHubMixin
+
+from huggingface_hub import hf_hub_download

 from .tuners import LoraModel, PrefixEncoder, PromptEmbedding, PromptEncoder
-from .utils import PeftConfig, PeftType, TaskType, _set_trainable, shift_tokens_right
+from .utils import (
+    TRANSFORMERS_MODELS_TO_PREFIX_TUNING_POSTPROCESS_MAPPING,
+    WEIGHTS_NAME,
+    PeftConfig,
+    PeftType,
+    PromptLearningConfig,
+    TaskType,
+    _set_trainable,
+    get_peft_model_state_dict,
+    set_peft_model_state_dict,
+    shift_tokens_right,
+)


-class PeftModel(torch.nn.Module):
+class PeftModel(PushToHubMixin, torch.nn.Module):
    """
    Parameter-Efficient Fine-Tuning Model. Base model encompassing various Peft methods.

@ -39,14 +57,14 @@ class PeftModel(torch.nn.Module):
        - **peft_config** ([`PeftConfig`]) -- The configuration of the Peft model.
        - **modules_to_save** (`list` of `str`) -- The list of sub-module names to save when
        saving the model.
-        - **prompt_encoder** ([`PromptEncoder`]) -- The prompt encoder used for Peft if `peft_config.peft_type
-        != PeftType.LORA`.
+        - **prompt_encoder** ([`PromptEncoder`]) -- The prompt encoder used for Peft if
+        `isinstance(self.peft_config, PromptLearningConfig)`.
        - **prompt_tokens** (`torch.Tensor`) -- The virtual prompt tokens used for Peft if
-        `peft_config.peft_type != PeftType.LORA`.
+        `isinstance(self.peft_config, PromptLearningConfig)`.
        - **transformer_backbone_name** (`str`) -- The name of the transformer
-        backbone in the base model if `peft_config.peft_type != PeftType.LORA`.
+        backbone in the base model if `isinstance(self.peft_config, PromptLearningConfig)`.
        - **word_embeddings** (`torch.nn.Embedding`) -- The word embeddings of the transformer backbone
-        in the base model if `peft_config.peft_type != PeftType.LORA`.
+        in the base model if `isinstance(self.peft_config, PromptLearningConfig)`.
    """

    def __init__(self, model, peft_config: PeftConfig):
@ -55,12 +73,114 @@ class PeftModel(torch.nn.Module):
        self.base_model = model
        self.config = self.base_model.config
        self.modules_to_save = None
-        if peft_config.peft_type != PeftType.LORA:
+        if isinstance(self.peft_config, PromptLearningConfig):
            self._setup_prompt_encoder()
        else:
            self.base_model = LoraModel(peft_config, model)
+        if getattr(self.peft_config, "modules_to_save", None) is not None:
+            self.modules_to_save = self.peft_config.modules_to_save
+            _set_trainable(self)
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

+    def save_pretrained(self, save_directory, **kwargs):
+        r"""
+        Args:
+        This function saves the adapter model and the adapter configuration files to a directory, so that it can be
+        re-loaded using the `LoraModel.from_pretrained` class method, and also used by the `LoraModel.push_to_hub`
+        method.
+            save_directory (`str`):
+                Directory where the adapter model and configuration files will be saved (will be created if it does not
+                exist).
+            **kwargs:
+                Additional keyword arguments passed along to the `push_to_hub` method.
+        """
+        if os.path.isfile(save_directory):
+            raise ValueError(f"Provided path ({save_directory}) should be a directory, not a file")
+        os.makedirs(save_directory, exist_ok=True)
+
+        # save only the trainable weights
+        output_state_dict = get_peft_model_state_dict(self, kwargs.get("state_dict", None))
+        torch.save(output_state_dict, os.path.join(save_directory, WEIGHTS_NAME))
+
+        # save the config and change the inference mode to `True`
+        if self.peft_config.base_model_name_or_path is None:
+            self.peft_config.base_model_name_or_path = (
+                self.base_model.__dict__.get("name_or_path", None)
+                if isinstance(self.peft_config, PromptLearningConfig)
+                else self.base_model.model.__dict__.get("name_or_path", None)
+            )
+        inference_mode = self.peft_config.inference_mode
+        self.peft_config.inference_mode = True
+        self.peft_config.save_pretrained(save_directory)
+        self.peft_config.inference_mode = inference_mode
+
+    @classmethod
+    def from_pretrained(cls, model, model_id, **kwargs):
+        r"""
+        Args:
+        Instantiate a `LoraModel` from a pretrained Lora configuration and weights.
+            model (`transformers.PreTrainedModel`):
+                The model to be adapted. The model should be initialized with the `from_pretrained` method. from
+                `transformers` library.
+            model_id (`str`):
+                The name of the Lora configuration to use. Can be either:
+                    - A string, the `model id` of a Lora configuration hosted inside a model repo on
+                        huggingface Hub
+                    - A path to a directory containing a Lora configuration file saved using the
+                        `save_pretrained` method, e.g., ``./my_lora_config_directory/``.
+        """
+        from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING, PEFT_TYPE_TO_CONFIG_MAPPING
+
+        # load the config
+        config = PEFT_TYPE_TO_CONFIG_MAPPING[PeftConfig.from_pretrained(model_id).peft_type].from_pretrained(model_id)
+
+        if getattr(model, "hf_device_map", None) is not None:
+            remove_hook_from_submodules(model)
+
+        if config.task_type not in MODEL_TYPE_TO_PEFT_MODEL_MAPPING.keys():
+            model = cls(model, config)
+        else:
+            model = MODEL_TYPE_TO_PEFT_MODEL_MAPPING[config.task_type](model, config)
+
+        # load weights if any
+        if os.path.exists(os.path.join(model_id, WEIGHTS_NAME)):
+            filename = os.path.join(model_id, WEIGHTS_NAME)
+        else:
+            try:
+                filename = hf_hub_download(model_id, WEIGHTS_NAME)
+            except:  # noqa
+                raise ValueError(
+                    f"Can't find weights for {model_id} in {model_id} or in the Hugging Face Hub. "
+                    f"Please check that the file {WEIGHTS_NAME} is present at {model_id}."
+                )
+
+        adapters_weights = torch.load(filename)
+        # load the weights into the model
+        model = set_peft_model_state_dict(model, adapters_weights)
+        if getattr(model, "hf_device_map", None) is not None:
+            device_map = kwargs.get("device_map", "auto")
+            max_memory = kwargs.get("max_memory", None)
+            no_split_module_classes = model._no_split_modules
+            if device_map != "sequential":
+                max_memory = get_balanced_memory(
+                    model,
+                    max_memory=max_memory,
+                    no_split_module_classes=no_split_module_classes,
+                    low_zero=(device_map == "balanced_low_0"),
+                )
+            if isinstance(device_map, str):
+                device_map = infer_auto_device_map(
+                    model, max_memory=max_memory, no_split_module_classes=no_split_module_classes
+                )
+            model = dispatch_model(model, device_map=device_map)
+            hook = AlignDevicesHook(io_same_device=True)
+            if model.peft_config.peft_type == PeftType.LORA:
+                add_hook_to_module(model.base_model.model, hook)
+            else:
+                remove_hook_from_submodules(model.prompt_encoder)
+                add_hook_to_module(model.base_model, hook)
+        return model
+
    def _setup_prompt_encoder(self):
        num_transformer_submodules = 0
        transformer_backbone = None
@ -127,8 +247,8 @@ class PeftModel(torch.nn.Module):
            past_key_values = past_key_values.permute([2, 0, 3, 1, 4]).split(
                self.peft_config.num_transformer_submodules * 2
            )
-            if self.peft_config.postprocess_past_key_value_function is not None:
-                post_process_fn = self.peft_config.postprocess_past_key_value_function
+            if TRANSFORMERS_MODELS_TO_PREFIX_TUNING_POSTPROCESS_MAPPING.get(self.config.model_type, None) is not None:
+                post_process_fn = TRANSFORMERS_MODELS_TO_PREFIX_TUNING_POSTPROCESS_MAPPING[self.config.model_type]
                past_key_values = post_process_fn(past_key_values)
            return past_key_values
        else:
@ -159,6 +279,15 @@ class PeftModel(torch.nn.Module):
        except AttributeError:
            return getattr(self.base_model, name)

+    def forward(self, *args, **kwargs):
+        """
+        Forward pass of the model.
+        """
+        if isinstance(self.peft_config, PromptLearningConfig):
+            return self.base_model(*args, **kwargs)
+        else:
+            return self.base_model.model(*args, **kwargs)
+

 class PeftModelForSequenceClassification(PeftModel):
    """
@ -211,7 +340,7 @@ class PeftModelForSequenceClassification(PeftModel):
    ):
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

-        if self.peft_config.peft_type == PeftType.LORA:
+        if not isinstance(self.peft_config, PromptLearningConfig):
            return self.base_model(
                input_ids=input_ids,
                attention_mask=attention_mask,
@ -368,7 +497,7 @@ class PeftModelForCausalLM(PeftModel):
        return_dict=None,
        **kwargs,
    ):
-        if self.peft_config.peft_type == PeftType.LORA:
+        if not isinstance(self.peft_config, PromptLearningConfig):
            return self.base_model(
                input_ids=input_ids,
                attention_mask=attention_mask,
@ -417,7 +546,7 @@ class PeftModelForCausalLM(PeftModel):
            return self.base_model(inputs_embeds=inputs_embeds, **kwargs)

    def generate(self, **kwargs):
-        if self.peft_config.peft_type == PeftType.LORA:
+        if not isinstance(self.peft_config, PromptLearningConfig):
            return self.base_model.generate(**kwargs)
        else:
            if "input_ids" not in kwargs:
@ -438,17 +567,22 @@ class PeftModelForCausalLM(PeftModel):
                )
                kwargs["token_type_ids"] = None

-            if self.peft_config.peft_type == PeftType.PREFIX_TUNING:
-                batch_size = kwargs["input_ids"].shape[0]
-                past_key_values = self.get_prompt(batch_size)
-                kwargs["past_key_values"] = past_key_values
-                return self.base_model.generate(**kwargs)
-            else:
-                raise NotImplementedError
+            return self.base_model.generate(**kwargs)

    def prepare_inputs_for_generation(self, *args, **kwargs):
        model_kwargs = self.base_model_prepare_inputs_for_generation(*args, **kwargs)
-        model_kwargs["past_key_values"] = kwargs.get("past", None) or kwargs.get("past_key_values", None)
+        if isinstance(self.peft_config, PromptLearningConfig):
+            if model_kwargs["past_key_values"] is None and self.peft_config.peft_type == PeftType.PREFIX_TUNING:
+                past_key_values = self.get_prompt(batch_size=model_kwargs["input_ids"].shape[0])
+                model_kwargs["past_key_values"] = past_key_values
+            else:
+                if model_kwargs["past_key_values"] is None:
+                    prompts = self.get_prompt(batch_size=model_kwargs["input_ids"].shape[0])
+                    model_kwargs["inputs_embeds"] = torch.cat(
+                        (prompts, self.word_embeddings(model_kwargs["input_ids"])), dim=1
+                    )
+                    model_kwargs["input_ids"] = None
+
        return model_kwargs


@ -499,7 +633,7 @@ class PeftModelForSeq2SeqLM(PeftModel):
        return_dict=None,
        **kwargs,
    ):
-        if self.peft_config.peft_type == PeftType.LORA:
+        if not isinstance(self.peft_config, PromptLearningConfig):
            return self.base_model(
                input_ids=input_ids,
                attention_mask=attention_mask,
@ -567,7 +701,7 @@ class PeftModelForSeq2SeqLM(PeftModel):
            return self.base_model(inputs_embeds=inputs_embeds, decoder_inputs_embeds=decoder_inputs_embeds, **kwargs)

    def generate(self, **kwargs):
-        if self.peft_config.peft_type == PeftType.LORA:
+        if not isinstance(self.peft_config, PromptLearningConfig):
            return self.base_model.generate(**kwargs)
        else:
            if "input_ids" not in kwargs:
@ -582,25 +716,16 @@ class PeftModelForSeq2SeqLM(PeftModel):
                kwargs["token_type_ids"] = None

            if self.peft_config.peft_type == PeftType.PREFIX_TUNING:
-                batch_size = kwargs["input_ids"].shape[0]
-                past_key_values = self.get_prompt(batch_size)
-                kwargs["past_key_values"] = past_key_values
                return self.base_model.generate(**kwargs)
            else:
                raise NotImplementedError

    def prepare_inputs_for_generation(self, *args, **kwargs):
        model_kwargs = self.base_model_prepare_inputs_for_generation(*args, **kwargs)
-        model_kwargs["past_key_values"] = kwargs.get("past", None) or kwargs.get("past_key_values", None)
-        return model_kwargs
-
-    def _prepare_encoder_decoder_kwargs_for_generation(self, inputs_tensor, model_kwargs, model_input_name=None):
-        past_key_values = model_kwargs.get("past_key_values", None)
-        model_kwargs["past_key_values"] = None
-        model_kwargs = self.base_model_prepare_encoder_decoder_kwargs_for_generation(
-            inputs_tensor, model_kwargs, model_input_name
-        )
-        model_kwargs["past_key_values"] = past_key_values
+        if model_kwargs["past_key_values"] is None and self.peft_config.peft_type == PeftType.PREFIX_TUNING:
+            batch_size = model_kwargs["decoder_input_ids"].shape[0]
+            past_key_values = self.get_prompt(batch_size)
+            model_kwargs["past_key_values"] = past_key_values
        return model_kwargs


@ -655,7 +780,7 @@ class PeftModelForTokenClassification(PeftModel):
    ):
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

-        if self.peft_config.peft_type == PeftType.LORA:
+        if not isinstance(self.peft_config, PromptLearningConfig):
            return self.base_model(
                input_ids=input_ids,
                attention_mask=attention_mask,
--- a/src/peft/tuners/lora.py
+++ b/src/peft/tuners/lora.py
@ -12,7 +12,6 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-
 import math
 import warnings
 from dataclasses import asdict, dataclass, field
@ -24,8 +23,7 @@ import torch.nn as nn
 import torch.nn.functional as F
 from transformers.pytorch_utils import Conv1D

-import loralib as lora  # noqa: F401
-from loralib import mark_only_lora_as_trainable
+import bitsandbytes as bnb

 from ..utils import PeftConfig, PeftType, transpose

@ -45,6 +43,8 @@ class LoraConfig(PeftConfig):
        fan_in_fan_out (`bool`): Set this to True if the layer to replace stores weight like (fan_in, fan_out)
        enable_lora ( `List[bool]`): Used with `lora.MergedLinear`.
        bias (`str`): Bias type for Lora. Can be 'none', 'all' or 'lora_only'
+        modules_to_save (`List[str]`):List of modules apart from LoRA layers to be set as trainable
+            and saved in the final checkpoint.
    """

    r: int = field(default=8, metadata={"help": "Lora attention dimension"})
@ -60,6 +60,14 @@ class LoraConfig(PeftConfig):
    )
    enable_lora: Optional[List[bool]] = field(default=None, metadata={"help": "Used with `lora.MergedLinear`."})
    bias: str = field(default="none", metadata={"help": "Bias type for Lora. Can be 'none', 'all' or 'lora_only'"})
+    modules_to_save: Optional[List[str]] = field(
+        default=None,
+        metadata={
+            "help": "List of modules apart from LoRA layers to be set as trainable and saved in the final checkpoint. "
+            "For example, in Sequence Classification or Token Classification tasks, "
+            "the final layer `classifier/score` are randomly initialized and as such need to be trainable and saved."
+        },
+    )

    def __post_init__(self):
        self.peft_type = PeftType.LORA
@ -95,8 +103,10 @@ class LoraModel(torch.nn.Module):
        self.model = model
        self._find_and_replace()
        mark_only_lora_as_trainable(self.model, self.peft_config.bias)
+        self.forward = self.model.forward

    def _find_and_replace(self):
+        is_target_modules_in_base_model = False
        kwargs = {
            "r": self.peft_config.r,
            "lora_alpha": self.peft_config.lora_alpha,
@ -107,9 +117,21 @@ class LoraModel(torch.nn.Module):
        key_list = [key for key, _ in self.model.named_modules()]
        for key in key_list:
            if any(key.endswith(target_key) for target_key in self.peft_config.target_modules):
+                if not is_target_modules_in_base_model:
+                    is_target_modules_in_base_model = True
                parent, target, target_name = self._get_submodules(key)
                bias = target.bias is not None
-                if isinstance(target, torch.nn.Linear) and self.peft_config.enable_lora is None:
+                if isinstance(target, bnb.nn.Linear8bitLt) and self.peft_config.enable_lora is None:
+                    kwargs.update(
+                        {
+                            "has_fp16_weights": target.state.has_fp16_weights,
+                            "memory_efficient_backward": target.state.memory_efficient_backward,
+                            "threshold": target.state.threshold,
+                            "index": target.index,
+                        }
+                    )
+                    new_module = Linear8bitLt(target.in_features, target.out_features, bias=bias, **kwargs)
+                elif isinstance(target, torch.nn.Linear) and self.peft_config.enable_lora is None:
                    new_module = Linear(target.in_features, target.out_features, bias=bias, **kwargs)
                elif self.peft_config.enable_lora is not None:
                    kwargs.update({"enable_lora": self.peft_config.enable_lora})
@ -125,6 +147,11 @@ class LoraModel(torch.nn.Module):
                            kwargs["fan_in_fan_out"] = False
                    new_module = MergedLinear(in_features, out_features, bias=bias, **kwargs)
                self._replace_module(parent, target_name, new_module, target)
+        if not is_target_modules_in_base_model:
+            raise ValueError(
+                f"Target modules {self.peft_config.target_modules} not found in the base model. "
+                f"Please check the target modules and try again."
+            )

    def _get_submodules(self, key):
        parent = self.model.get_submodule(".".join(key.split(".")[:-1]))
@ -137,9 +164,9 @@ class LoraModel(torch.nn.Module):
        new_module.weight = old_module.weight
        if old_module.bias is not None:
            new_module.bias = old_module.bias
-
-    def forward(self, *args, **kwargs):
-        return self.model(*args, **kwargs)
+        if getattr(old_module, "state", None) is not None:
+            new_module.state = old_module.state
+            new_module.to(old_module.weight.device)

    def __getattr__(self, name: str):
        """Forward missing attributes to the wrapped module."""
@ -349,3 +376,66 @@ class MergedLinear(nn.Linear, LoraLayer):
                after_B = self.lora_B(after_A.transpose(-2, -1)).transpose(-2, -1)
                result += self.zero_pad(after_B) * self.scaling
            return result
+
+
+class Linear8bitLt(bnb.nn.Linear8bitLt, LoraLayer):
+    # Lora implemented in a dense layer
+    def __init__(
+        self,
+        in_features,
+        out_features,
+        r: int = 0,
+        lora_alpha: int = 1,
+        lora_dropout: float = 0.0,
+        **kwargs,
+    ):
+        bnb.nn.Linear8bitLt.__init__(
+            self,
+            in_features,
+            out_features,
+            bias=kwargs.get("bias", True),
+            has_fp16_weights=kwargs.get("has_fp16_weights", True),
+            memory_efficient_backward=kwargs.get("memory_efficient_backward", False),
+            threshold=kwargs.get("threshold", 0.0),
+            index=kwargs.get("index", None),
+        )
+        LoraLayer.__init__(self, r=r, lora_alpha=lora_alpha, lora_dropout=lora_dropout, merge_weights=False)
+        # Actual trainable parameters
+        if r > 0:
+            self.lora_A = nn.Linear(in_features, r, bias=False)
+            self.lora_B = nn.Linear(r, out_features, bias=False)
+            self.scaling = self.lora_alpha / self.r
+            # Freezing the pre-trained weight matrix
+            self.weight.requires_grad = False
+        self.reset_parameters()
+
+    def reset_parameters(self):
+        if hasattr(self, "lora_A"):
+            # initialize A the same way as the default for nn.Linear and B to zero
+            nn.init.kaiming_uniform_(self.lora_A.weight, a=math.sqrt(5))
+            nn.init.zeros_(self.lora_B.weight)
+
+    def forward(self, x: torch.Tensor):
+        result = super().forward(x)
+        if self.r > 0:
+            result += self.lora_B(self.lora_A(self.lora_dropout(x))) * self.scaling
+        return result
+
+
+# had to adapt it for `lora_only` to work
+def mark_only_lora_as_trainable(model: nn.Module, bias: str = "none") -> None:
+    for n, p in model.named_parameters():
+        if "lora_" not in n:
+            p.requires_grad = False
+    if bias == "none":
+        return
+    elif bias == "all":
+        for n, p in model.named_parameters():
+            if "bias" in n:
+                p.requires_grad = True
+    elif bias == "lora_only":
+        for m in model.modules():
+            if isinstance(m, LoraLayer) and hasattr(m, "bias") and m.bias is not None:
+                m.bias.requires_grad = True
+    else:
+        raise NotImplementedError
--- a/src/peft/tuners/prefix_tuning.py
+++ b/src/peft/tuners/prefix_tuning.py
@ -15,7 +15,6 @@


 from dataclasses import dataclass, field
-from typing import Callable, Optional

 import torch

@ -30,7 +29,6 @@ class PrefixTuningConfig(PromptLearningConfig):
    Args:
        encoder_hidden_size (`int`): The hidden size of the prompt encoder.
        prefix_projection (`bool`): Whether to project the prefix embeddings.
-        postprocess_past_key_value_function (`Callable`, *optional*): The function to postprocess the past key value.
    """

    encoder_hidden_size: int = field(
@ -41,10 +39,6 @@ class PrefixTuningConfig(PromptLearningConfig):
        default=False,
        metadata={"help": "Whether to project the prefix tokens"},
    )
-    postprocess_past_key_value_function: Optional[Callable] = field(
-        default=None,
-        metadata={"help": "The function to postprocess the past key value"},
-    )

    def __post_init__(self):
        self.peft_type = PeftType.PREFIX_TUNING
--- a/src/peft/utils/init.py
+++ b/src/peft/utils/init.py
@ -17,6 +17,13 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

+from .adapters_utils import CONFIG_NAME, WEIGHTS_NAME
 from .config import PeftConfig, PeftType, PromptLearningConfig, TaskType
-from .other import _set_trainable, bloom_model_postprocess_past_key_value, shift_tokens_right, transpose
-from .save_and_load import get_peft_model_state_dict, peft_model_load_and_dispatch, set_peft_model_state_dict
+from .other import (
+    TRANSFORMERS_MODELS_TO_PREFIX_TUNING_POSTPROCESS_MAPPING,
+    _set_trainable,
+    bloom_model_postprocess_past_key_value,
+    shift_tokens_right,
+    transpose,
+)
+from .save_and_load import get_peft_model_state_dict, set_peft_model_state_dict
--- a/src/peft/utils/adapters_utils.py
+++ b/src/peft/utils/adapters_utils.py
@ -0,0 +1,18 @@
+# coding=utf-8
+# Copyright 2023-present the HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+WEIGHTS_NAME = "adapter_model.bin"
+CONFIG_NAME = "adapter_config.json"
+
+# TODO: add automapping and superclass here?
--- a/src/peft/utils/config.py
+++ b/src/peft/utils/config.py
@ -12,11 +12,18 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-
 import enum
-from dataclasses import dataclass, field
+import json
+import os
+from dataclasses import asdict, dataclass, field
 from typing import Optional, Union

+from transformers.utils import PushToHubMixin
+
+from huggingface_hub import hf_hub_download
+
+from .adapters_utils import CONFIG_NAME
+

 class PeftType(str, enum.Enum):
    PROMPT_TUNING = "PROMPT_TUNING"
@ -33,7 +40,94 @@ class TaskType(str, enum.Enum):


@dataclass
-class PeftConfig:
+class PeftConfigMixin(PushToHubMixin):
+    r"""
+    This is the base configuration class for PEFT adapter models. It contains all the methods that are common to all
+    PEFT adapter models. This class inherits from `transformers.utils.PushToHubMixin` which contains the methods to
+    push your model to the Hub. The method `save_pretrained` will save the configuration of your adapter model in a
+    directory. The method `from_pretrained` will load the configuration of your adapter model from a directory.
+
+    Args:
+        peft_type (Union[[`~peft.utils.config.PeftType`], `str`]): The type of Peft method to use.
+    """
+    peft_type: Optional[PeftType] = field(default=None, metadata={"help": "The type of PEFT model."})
+
+    @property
+    def __dict__(self):
+        return asdict(self)
+
+    def to_dict(self):
+        return self.__dict__
+
+    def save_pretrained(self, save_directory, **kwargs):
+        r"""
+        This method saves the configuration of your adapter model in a directory.
+
+        Args:
+            save_directory (`str`):
+                The directory where the configuration will be saved.
+            **kwargs:
+                Additional keyword arguments passed along to the `transformers.utils.PushToHubMixin.push_to_hub`
+                method.
+        """
+        if os.path.isfile(save_directory):
+            raise AssertionError(f"Provided path ({save_directory}) should be a directory, not a file")
+
+        os.makedirs(save_directory, exist_ok=True)
+
+        output_dict = self.__dict__
+        output_path = os.path.join(save_directory, CONFIG_NAME)
+
+        # save it
+        with open(output_path, "w") as writer:
+            writer.write(json.dumps(output_dict, indent=2, sort_keys=True))
+
+    @classmethod
+    def from_pretrained(cls, pretrained_model_name_or_path, **kwargs):
+        r"""
+        This method loads the configuration of your adapter model from a directory.
+
+        Args:
+            pretrained_model_name_or_path (`str`):
+                The directory or the hub-id where the configuration is saved.
+            **kwargs:
+                Additional keyword arguments passed along to the child class initialization.
+        """
+        if os.path.isfile(os.path.join(pretrained_model_name_or_path, CONFIG_NAME)):
+            config_file = os.path.join(pretrained_model_name_or_path, CONFIG_NAME)
+        else:
+            try:
+                config_file = hf_hub_download(pretrained_model_name_or_path, CONFIG_NAME)
+            except:
+                raise ValueError(f"Can't find config.json at '{pretrained_model_name_or_path}'")
+
+        loaded_attributes = cls.from_json_file(config_file)
+
+        config = cls(**kwargs)
+
+        for key, value in loaded_attributes.items():
+            if hasattr(config, key):
+                setattr(config, key, value)
+
+        return config
+
+    @classmethod
+    def from_json_file(cls, path_json_file, **kwargs):
+        r"""
+        Loads a configuration file from a json file.
+
+        Args:
+            path_json_file (`str`):
+                The path to the json file.
+        """
+        with open(path_json_file, "r") as file:
+            json_object = json.load(file)
+
+        return json_object
+
+
+@dataclass
+class PeftConfig(PeftConfigMixin):
    """
    This is the base configuration class to store the configuration of a :class:`~peft.PeftModel`.

@ -43,6 +137,7 @@ class PeftConfig:
        inference_mode (`bool`, defaults to `False`): Whether to use the Peft model in inference mode.
    """

+    base_model_name_or_path: str = field(default=None, metadata={"help": "The name of the base model to use."})
    peft_type: Union[str, PeftType] = field(default=None, metadata={"help": "Peft type"})
    task_type: Union[str, TaskType] = field(default=None, metadata={"help": "Task type"})
    inference_mode: bool = field(default=False, metadata={"help": "Whether to use inference mode"})
--- a/src/peft/utils/other.py
+++ b/src/peft/utils/other.py
@ -30,6 +30,11 @@ def bloom_model_postprocess_past_key_value(past_key_values):
    return tuple(zip(keys, values))


+TRANSFORMERS_MODELS_TO_PREFIX_TUNING_POSTPROCESS_MAPPING = {
+    "bloom": bloom_model_postprocess_past_key_value,
+}
+
+
 # copied from transformers.models.bart.modeling_bart
 def shift_tokens_right(input_ids: torch.Tensor, pad_token_id: int, decoder_start_token_id: int):
    """
--- a/src/peft/utils/save_and_load.py
+++ b/src/peft/utils/save_and_load.py
@ -50,7 +50,10 @@ def get_peft_model_state_dict(model, state_dict=None):
            raise NotImplementedError
    else:
        to_return = {}
-        prompt_embeddings = model.get_prompt_embedding_to_save()
+        if model.peft_config.inference_mode:
+            prompt_embeddings = model.prompt_encoder.embedding.weight
+        else:
+            prompt_embeddings = model.get_prompt_embedding_to_save()
        to_return["prompt_embeddings"] = prompt_embeddings
    if model.modules_to_save is not None:
        for key, value in state_dict.items():
@ -74,35 +77,3 @@ def set_peft_model_state_dict(model, peft_model_state_dict):
            {"weight": peft_model_state_dict["prompt_embeddings"]}, strict=True
        )
    return model
-
-
-def peft_model_load_and_dispatch(model, peft_model_state_dict, peft_config, max_memory=None):
-    """
-    Load the Peft model state dict and dispatch the model to the correct device.
-
-    Args:
-        model ([`PeftModel`]): The Pre-trained base model which has already been sharded and dispatched
-        using `accelerate` functionalities.
-        peft_model_state_dict (`dict`): The state dict of the Peft model.
-        max_memory (`Dict`, *optional*):
-            A dictionary device identifier to maximum memory. Will default to the maximum memory available for each GPU
-            and the available CPU RAM if unset.
-    """
-    from accelerate import dispatch_model, infer_auto_device_map
-    from accelerate.hooks import AlignDevicesHook, add_hook_to_module, remove_hook_from_submodules
-
-    from ..mapping import get_peft_model
-
-    remove_hook_from_submodules(model)
-    model = get_peft_model(model, peft_config)
-    model.print_trainable_parameters()
-    set_peft_model_state_dict(model, peft_model_state_dict)
-    device_map = infer_auto_device_map(model, max_memory=max_memory, no_split_module_classes=model._no_split_modules)
-    model = dispatch_model(model, device_map=device_map)
-    hook = AlignDevicesHook(io_same_device=True)
-    if model.peft_config.peft_type == PeftType.LORA:
-        add_hook_to_module(model.base_model.model, hook)
-    else:
-        remove_hook_from_submodules(model.prompt_encoder)
-        add_hook_to_module(model.base_model, hook)
-    return model
--- a/tests/test_config.py
+++ b/tests/test_config.py
@ -0,0 +1,96 @@
+# coding=utf-8
+# Copyright 2023-present the HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+import tempfile
+import unittest
+
+from peft import LoraConfig, PrefixTuningConfig, PromptEncoderConfig, PromptTuningConfig
+
+
+class PeftConfigTestMixin:
+    all_config_classes = (
+        LoraConfig,
+        PromptEncoderConfig,
+        PrefixTuningConfig,
+        PromptTuningConfig,
+    )
+
+
+class PeftConfigTester(unittest.TestCase, PeftConfigTestMixin):
+    def test_methods(self):
+        r"""
+        Test if all configs have the expected methods. Here we test
+        - to_dict
+        - save_pretrained
+        - from_pretrained
+        - from_json_file
+        """
+        # test if all configs have the expected methods
+        for config_class in self.all_config_classes:
+            config = config_class()
+            self.assertTrue(hasattr(config, "to_dict"))
+            self.assertTrue(hasattr(config, "save_pretrained"))
+            self.assertTrue(hasattr(config, "from_pretrained"))
+            self.assertTrue(hasattr(config, "from_json_file"))
+
+    def test_task_type(self):
+        for config_class in self.all_config_classes:
+            # assert this will not fail
+            _ = config_class(task_type="test")
+
+    def test_save_pretrained(self):
+        r"""
+        Test if the config is correctly saved and loaded using
+        - save_pretrained
+        """
+        for config_class in self.all_config_classes:
+            config = config_class()
+            with tempfile.TemporaryDirectory() as tmp_dirname:
+                config.save_pretrained(tmp_dirname)
+
+                config_from_pretrained = config_class.from_pretrained(tmp_dirname)
+                self.assertEqual(config.to_dict(), config_from_pretrained.to_dict())
+
+    def test_from_json_file(self):
+        for config_class in self.all_config_classes:
+            config = config_class()
+            with tempfile.TemporaryDirectory() as tmp_dirname:
+                config.save_pretrained(tmp_dirname)
+
+                config_from_json = config_class.from_json_file(os.path.join(tmp_dirname, "adapter_config.json"))
+                self.assertEqual(config.to_dict(), config_from_json)
+
+    def test_to_dict(self):
+        r"""
+        Test if the config can be correctly converted to a dict using:
+        - to_dict
+        - __dict__
+        """
+        for config_class in self.all_config_classes:
+            config = config_class()
+            self.assertEqual(config.to_dict(), config.__dict__)
+            self.assertTrue(isinstance(config.to_dict(), dict))
+
+    def test_set_attributes(self):
+        # manually set attributes and check if they are correctly written
+        for config_class in self.all_config_classes:
+            config = config_class(peft_type="test")
+
+            # save pretrained
+            with tempfile.TemporaryDirectory() as tmp_dirname:
+                config.save_pretrained(tmp_dirname)
+
+                config_from_pretrained = config_class.from_pretrained(tmp_dirname)
+                self.assertEqual(config.to_dict(), config_from_pretrained.to_dict())
--- a/tests/test_save_and_load.py
+++ b/tests/test_save_and_load.py
@ -0,0 +1,136 @@
+# coding=utf-8
+# Copyright 2023-present the HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+import tempfile
+import unittest
+
+import torch
+from transformers import AutoModelForCausalLM
+
+from peft import (
+    LoraConfig,
+    PeftModel,
+    PrefixTuningConfig,
+    PromptEncoderConfig,
+    PromptTuningConfig,
+    get_peft_model,
+    get_peft_model_state_dict,
+)
+
+
+class PeftTestMixin:
+    checkpoints_to_test = [
+        "hf-internal-testing/tiny-random-OPTForCausalLM",
+    ]
+    config_classes = (
+        LoraConfig,
+        PrefixTuningConfig,
+        PromptEncoderConfig,
+        PromptTuningConfig,
+    )
+    config_kwargs = (
+        dict(
+            r=8,
+            lora_alpha=32,
+            target_modules=["q_proj", "v_proj"],
+            lora_dropout=0.05,
+            bias="none",
+            task_type="CAUSAL_LM",
+        ),
+        dict(
+            num_virtual_tokens=10,
+            task_type="CAUSAL_LM",
+        ),
+        dict(
+            num_virtual_tokens=10,
+            encoder_hidden_size=32,
+            task_type="CAUSAL_LM",
+        ),
+        dict(
+            num_virtual_tokens=10,
+            task_type="CAUSAL_LM",
+        ),
+    )
+
+
+class PeftModelTester(unittest.TestCase, PeftTestMixin):
+    r"""
+    Test if the PeftModel behaves as expected. This includes:
+    - test if the model has the expected methods
+    """
+
+    def test_attributes_model(self):
+        for model_id in self.checkpoints_to_test:
+            for i, config_cls in enumerate(self.config_classes):
+                model = AutoModelForCausalLM.from_pretrained(model_id)
+                config = config_cls(
+                    base_model_name_or_path=model_id,
+                    **self.config_kwargs[i],
+                )
+                model = get_peft_model(model, config)
+
+                self.assertTrue(hasattr(model, "save_pretrained"))
+                self.assertTrue(hasattr(model, "from_pretrained"))
+                self.assertTrue(hasattr(model, "push_to_hub"))
+
+    def test_save_pretrained(self):
+        r"""
+        A test to check if `save_pretrained` behaves as expected. This function should only save the state dict of the
+        adapter model and not the state dict of the base model. Hence inside each saved directory you should have:
+
+        - README.md (that contains an entry `base_model`)
+        - adapter_config.json
+        - adapter_model.bin
+
+        """
+        for model_id in self.checkpoints_to_test:
+            for i, config_cls in enumerate(self.config_classes):
+                model = AutoModelForCausalLM.from_pretrained(model_id)
+                config = config_cls(
+                    base_model_name_or_path=model_id,
+                    **self.config_kwargs[i],
+                )
+                model = get_peft_model(model, config)
+                model.to(model.device)
+
+                with tempfile.TemporaryDirectory() as tmp_dirname:
+                    model.save_pretrained(tmp_dirname)
+
+                    model_from_pretrained = AutoModelForCausalLM.from_pretrained(model_id)
+                    model_from_pretrained = PeftModel.from_pretrained(model_from_pretrained, tmp_dirname)
+                    model_from_pretrained.to(model.device)
+
+                    # check if the state dicts are equal
+                    state_dict = get_peft_model_state_dict(model)
+                    state_dict_from_pretrained = get_peft_model_state_dict(model_from_pretrained)
+
+                    # check if same keys
+                    self.assertEqual(state_dict.keys(), state_dict_from_pretrained.keys())
+
+                    # check if tensors equal
+                    for key in state_dict.keys():
+                        self.assertTrue(torch.allclose(state_dict[key], state_dict_from_pretrained[key]))
+
+                    # check if `adapter_model.bin` is present
+                    self.assertTrue(os.path.exists(os.path.join(tmp_dirname, "adapter_model.bin")))
+
+                    # check if `adapter_config.json` is present
+                    self.assertTrue(os.path.exists(os.path.join(tmp_dirname, "adapter_config.json")))
+
+                    # check if `pytorch_model.bin` is not present
+                    self.assertFalse(os.path.exists(os.path.join(tmp_dirname, "pytorch_model.bin")))
+
+                    # check if `config.json` is not present
+                    self.assertFalse(os.path.exists(os.path.join(tmp_dirname, "config.json")))
Author	SHA1	Message	Date
Sourab Mangrulkar	29357d41eb	Release: v0.1.0	2023-02-10 15:13:51 +05:30
Sourab Mangrulkar	f8e737648a	Merge pull request #67 from huggingface/smangrul/fix-save-pretrained make `save_pretrained` work in a way training could be resumed	2023-02-10 00:14:39 +05:30
Sourab Mangrulkar	b1af297707	make `save_pretrained` work in a way training could be resumed	2023-02-10 00:06:25 +05:30
Sourab Mangrulkar	85c7b98307	Merge pull request #66 from huggingface/smangrul/update-bibtex update bibtex	2023-02-09 17:26:17 +05:30
Sourab Mangrulkar	e41152e5f1	update bibtex	2023-02-09 17:25:52 +05:30
Sourab Mangrulkar	9f19ce6729	Merge pull request #64 from kashif/patch-1 Fix typos in readme	2023-02-09 17:18:36 +05:30
Kashif Rasul	ae85e185ad	another typo	2023-02-09 10:59:56 +01:00
Kashif Rasul	93762cc658	Fix typos in readme	2023-02-09 10:38:18 +01:00
Sayak Paul	ed608025eb	Merge pull request #63 from huggingface/vision-examples add: vision examples to readme.	2023-02-09 13:57:11 +05:30
Sayak Paul	14a293a6b3	PeftModel => get_pefT_model()	2023-02-09 13:34:21 +05:30
Sayak Paul	c7b744db79	add: vision examples to readme.	2023-02-09 12:23:48 +05:30
Sayak Paul	250edccdda	Merge pull request #59 from sayakpaul/example/sem-seg add: example on semantic segmentation.	2023-02-09 12:09:18 +05:30
Sayak Paul	1daf087682	reword some things.	2023-02-09 11:22:50 +05:30
Sourab Mangrulkar	d3d601d5c3	Merge pull request #55 from huggingface/smangrul/fix-examples-with-hub-utils many code fixes and updates to examples	2023-02-08 18:56:48 +05:30
Sourab Mangrulkar	8083c9515f	update README and fix token_cls example	2023-02-08 18:54:46 +05:30
Sourab Mangrulkar	73cd16b7b5	quality	2023-02-08 18:43:00 +05:30
Sourab Mangrulkar	65112b75bb	Merge branch 'main' into smangrul/fix-examples-with-hub-utils	2023-02-08 18:41:19 +05:30
Sourab Mangrulkar	3cf0b7a2d4	fix more examples	2023-02-08 18:40:57 +05:30
Sourab Mangrulkar	afb171eefb	fixes and updating examples	2023-02-08 18:07:15 +05:30
Sourab Mangrulkar	b07ea17f49	update examples	2023-02-08 14:55:08 +05:30
Sourab Mangrulkar	83ded43ee7	Update peft_lora_clm_accelerate_ds_zero3_offload.py	2023-02-08 13:35:10 +05:30
Sourab Mangrulkar	537c971a47	fix	2023-02-08 13:05:27 +05:30
Sourab Mangrulkar	ed0c962ff5	fixes	2023-02-08 12:59:29 +05:30
Sourab Mangrulkar	eec0b9329d	Update peft_lora_clm_accelerate_ds_zero3_offload.py	2023-02-08 12:41:27 +05:30
Sourab Mangrulkar	1929a84e1e	remove `peft_model_load_and_dispatch` as it is part of `PeftModel.from_pretrained`	2023-02-08 12:29:03 +05:30
Sourab Mangrulkar	522a6b6c17	add `load_and_dispatch` to `load_pretrained`	2023-02-08 12:18:03 +05:30
Sourab Mangrulkar	462b65fe45	fix `lora_only`	2023-02-08 10:26:56 +05:30
Sayak Paul	2b89fbf963	add: example on semantic segmentation.	2023-02-08 09:49:13 +05:30
Sourab Mangrulkar	b5c97f2039	Update save_and_load.py	2023-02-08 09:25:21 +05:30
Sourab Mangrulkar	64d2d19598	update `peft_model_load_and_dispatch`	2023-02-08 09:21:49 +05:30
Sourab Mangrulkar	a7dd034710	fix prefix tuning config to remove function field as it cannot be converted to json	2023-02-08 08:49:15 +05:30
Sourab Mangrulkar	ed0bcdac4f	Merge pull request #58 from sayakpaul/patch-1 Update image classification README.md to include the latest Colab Notebook link	2023-02-07 19:00:11 +05:30
Sourab Mangrulkar	bdeb3778d0	add support for `generate` when using `prompt_tuning`	2023-02-07 15:07:56 +05:30
Sayak Paul	185c852088	Update README.md	2023-02-07 12:53:37 +05:30
Sourab Mangrulkar	a1b7e42783	Merge pull request #56 from sayakpaul/example/img-cls add: example on fine-tuning for image classification.	2023-02-07 12:51:56 +05:30
Sayak Paul	3c4b64785f	Update README.md	2023-02-07 11:11:36 +05:30
Sayak Paul	ab43d6aa5c	fix: inference section	2023-02-07 11:04:39 +05:30
Sayak Paul	3cf7034e9c	Empty commit. Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>	2023-02-07 09:57:57 +05:30
Sayak Paul	ddb37c353c	add: correct Colab link.	2023-02-07 09:55:07 +05:30
Sayak Paul	dbe3b9b99e	add: example on fine-tuning for image classification.	2023-02-07 09:53:34 +05:30
Sourab Mangrulkar	5bc815e2e2	fix `generate` because of recent transformers release	2023-02-06 23:50:48 +05:30
Sourab Mangrulkar	5a43a3a321	seq cls examples update	2023-02-06 18:57:13 +05:30
Sourab Mangrulkar	7ae63299a8	Merge pull request #53 from younesbelkada/add-int8-example [ `example`] add bnb example	2023-02-03 21:00:41 +05:30
younesbelkada	57de1d2677	add bnb example	2023-02-02 17:45:38 +01:00
Sourab Mangrulkar	383b5abb33	Merge pull request #51 from huggingface/smangrul/lora-raise-error-when-no-target-module-found for lora, raise error when no target modules in base model	2023-02-02 13:35:47 +05:30
Sourab Mangrulkar	d8ccd7d84c	for lora, raise error when no target modules in base model	2023-02-02 13:29:49 +05:30
Sourab Mangrulkar	df5b201c6b	Merge pull request #50 from huggingface/smangrul/add-modules-to-save-to-lora-config add `modules_to_save` to LoraConfig and other fixes	2023-02-02 13:19:37 +05:30
Sourab Mangrulkar	44d8e72ca8	fixes	2023-02-02 13:19:14 +05:30
Sourab Mangrulkar	c37ee25be7	trying diff approaches	2023-02-01 19:35:19 +05:30
Sourab Mangrulkar	c884daf96a	getting rid to forward call linking	2023-02-01 19:18:38 +05:30
Sourab Mangrulkar	fcd213708d	fixes	2023-02-01 17:17:14 +05:30
Sourab Mangrulkar	915a5db0c6	fixes	2023-02-01 16:25:42 +05:30
Sourab Mangrulkar	d53a631608	fixes	2023-02-01 15:59:24 +05:30
Sourab Mangrulkar	b4d0885203	Merge pull request #49 from orenwang/main fix validation_steps handling in dreambooth example	2023-02-01 15:42:35 +05:30
Sourab Mangrulkar	d04f6661ee	add `modules_to_save` to LoraConfig and other fixes 1. Add `modules_to_save` to LoraConfig 2. Using PeftModel for LoraConfig instead of task-specific classes because LoRA is task agnostic.	2023-02-01 15:41:35 +05:30
orenwang	80e1b262e5	fix validation_steps handling in dreambooth example	2023-01-31 10:59:46 +08:00
Sourab Mangrulkar	dd518985ff	Merge pull request #47 from orenwang/main allow validation images for lora training	2023-01-30 16:13:28 +05:30
orenwang	a17cea104e	add validation images for lora training	2023-01-30 17:18:43 +08:00
Sourab Mangrulkar	3f9b310c6a	Merge pull request #46 from huggingface/smangrul/fix-hf-hub-utils-tests fix hf hub util tests	2023-01-30 13:35:06 +05:30
Sourab Mangrulkar	06e49c0a87	fixes	2023-01-30 13:31:01 +05:30
Sourab Mangrulkar	6cf2cf5dae	fix hf hub util tests	2023-01-30 12:51:30 +05:30
Sourab Mangrulkar	3faaf0916a	Merge pull request #39 from younesbelkada/add-push-to-hub [`core`] Add hub utils	2023-01-30 12:34:13 +05:30
younesbelkada	6c9534e660	adapt for other models	2023-01-29 11:18:31 +00:00
younesbelkada	22295c4278	adapt from code review - remove `README` - inherit from `dataclass` - add new test	2023-01-29 10:49:31 +00:00
Younes Belkada	16182ea972	Apply suggestions from code review Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>	2023-01-29 11:41:38 +01:00
Younes Belkada	ad69958e52	Merge branch 'main' into add-push-to-hub	2023-01-28 10:17:57 +01:00
Sourab Mangrulkar	f8a2829318	Merge pull request #38 from huggingface/smangrul/fixes adding support for int8 lora training	2023-01-27 19:20:12 +05:30
younesbelkada	634f3692d8	working v1 - push to hub method works - add tests - add config super class - add Lora support for `from_pretrained`	2023-01-26 11:17:24 +00:00
younesbelkada	2cc7f2cbac	add config tests	2023-01-26 10:12:51 +00:00
younesbelkada	2896cf05fb	v1 working - from_pretrained support for config - from_pretrained support for loramodel - todo: tests - todo: push_to_hub	2023-01-25 22:43:22 +00:00
Sourab Mangrulkar	776a28f053	update lora to support int8 training	2023-01-25 12:27:02 +05:30
Sourab Mangrulkar	d75746be70	adding support for int8 lora training	2023-01-25 04:19:23 +05:30
Sourab Mangrulkar	1dbe7fc0db	Merge pull request #37 from huggingface/smangrul/fixes colab notebook example for lora peft application	2023-01-24 21:20:06 +05:30
Sourab Mangrulkar	ff8a5b9a69	colab notebook example for lora peft application	2023-01-24 21:19:47 +05:30
Sourab Mangrulkar	36267af51b	Merge pull request #36 from huggingface/smangrul/fixes correcting requirements.txt in example sub-folders	2023-01-22 11:21:01 +05:30
Sourab Mangrulkar	fef162cff8	correcting requirements.txt in example sub-folders	2023-01-22 11:20:39 +05:30
Sourab Mangrulkar	a8587916c8	Merge pull request #35 from huggingface/smangrul/fixes fixes and addressing comments from previous PR	2023-01-21 18:31:28 +05:30
Sourab Mangrulkar	77670ead76	fixes and addressing comments from previous PR 1. Minor updates/fixes in README.md and setup.py 2. Make `loralib` optional	2023-01-21 18:17:19 +05:30
Sourab Mangrulkar	360fb2f816	Merge pull request #34 from huggingface/fix-typos Review & fix typos	2023-01-21 18:00:12 +05:30
Lysandre	a40f20ad6c	Fix typos	2023-01-20 11:34:45 -05:00
Sourab Mangrulkar	407482eb37	Merge pull request #33 from huggingface/smangrul/fixes fixes, docs and version bump up	2023-01-20 15:34:29 +05:30
Sourab Mangrulkar	d9e7d6cd22	fixes, docs and version bump up	2023-01-20 15:34:11 +05:30
Sourab Mangrulkar	dbf438f99d	Merge pull request #32 from huggingface/v0.0.1-release V0.0.1 release	2023-01-20 14:37:57 +05:30