Compare commits

...

83 Commits

Author SHA1 Message Date
29357d41eb Release: v0.1.0 2023-02-10 15:13:51 +05:30
f8e737648a Merge pull request #67 from huggingface/smangrul/fix-save-pretrained
make `save_pretrained` work in a way training could be resumed
2023-02-10 00:14:39 +05:30
b1af297707 make save_pretrained work in a way training could be resumed 2023-02-10 00:06:25 +05:30
85c7b98307 Merge pull request #66 from huggingface/smangrul/update-bibtex
update bibtex
2023-02-09 17:26:17 +05:30
e41152e5f1 update bibtex 2023-02-09 17:25:52 +05:30
9f19ce6729 Merge pull request #64 from kashif/patch-1
Fix typos in readme
2023-02-09 17:18:36 +05:30
ae85e185ad another typo 2023-02-09 10:59:56 +01:00
93762cc658 Fix typos in readme 2023-02-09 10:38:18 +01:00
ed608025eb Merge pull request #63 from huggingface/vision-examples
add: vision examples to readme.
2023-02-09 13:57:11 +05:30
14a293a6b3 PeftModel => get_pefT_model() 2023-02-09 13:34:21 +05:30
c7b744db79 add: vision examples to readme. 2023-02-09 12:23:48 +05:30
250edccdda Merge pull request #59 from sayakpaul/example/sem-seg
add: example on semantic segmentation.
2023-02-09 12:09:18 +05:30
1daf087682 reword some things. 2023-02-09 11:22:50 +05:30
d3d601d5c3 Merge pull request #55 from huggingface/smangrul/fix-examples-with-hub-utils
many code fixes and updates to examples
2023-02-08 18:56:48 +05:30
8083c9515f update README and fix token_cls example 2023-02-08 18:54:46 +05:30
73cd16b7b5 quality 2023-02-08 18:43:00 +05:30
65112b75bb Merge branch 'main' into smangrul/fix-examples-with-hub-utils 2023-02-08 18:41:19 +05:30
3cf0b7a2d4 fix more examples 2023-02-08 18:40:57 +05:30
afb171eefb fixes and updating examples 2023-02-08 18:07:15 +05:30
b07ea17f49 update examples 2023-02-08 14:55:08 +05:30
83ded43ee7 Update peft_lora_clm_accelerate_ds_zero3_offload.py 2023-02-08 13:35:10 +05:30
537c971a47 fix 2023-02-08 13:05:27 +05:30
ed0c962ff5 fixes 2023-02-08 12:59:29 +05:30
eec0b9329d Update peft_lora_clm_accelerate_ds_zero3_offload.py 2023-02-08 12:41:27 +05:30
1929a84e1e remove peft_model_load_and_dispatch as it is part of PeftModel.from_pretrained 2023-02-08 12:29:03 +05:30
522a6b6c17 add load_and_dispatch to load_pretrained 2023-02-08 12:18:03 +05:30
462b65fe45 fix lora_only 2023-02-08 10:26:56 +05:30
2b89fbf963 add: example on semantic segmentation. 2023-02-08 09:49:13 +05:30
b5c97f2039 Update save_and_load.py 2023-02-08 09:25:21 +05:30
64d2d19598 update peft_model_load_and_dispatch 2023-02-08 09:21:49 +05:30
a7dd034710 fix prefix tuning config to remove function field as it cannot be converted to json 2023-02-08 08:49:15 +05:30
ed0bcdac4f Merge pull request #58 from sayakpaul/patch-1
Update image classification README.md to include the latest Colab Notebook link
2023-02-07 19:00:11 +05:30
bdeb3778d0 add support for generate when using prompt_tuning 2023-02-07 15:07:56 +05:30
185c852088 Update README.md 2023-02-07 12:53:37 +05:30
a1b7e42783 Merge pull request #56 from sayakpaul/example/img-cls
add: example on fine-tuning for image classification.
2023-02-07 12:51:56 +05:30
3c4b64785f Update README.md 2023-02-07 11:11:36 +05:30
ab43d6aa5c fix: inference section 2023-02-07 11:04:39 +05:30
3cf7034e9c Empty commit.
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
2023-02-07 09:57:57 +05:30
ddb37c353c add: correct Colab link. 2023-02-07 09:55:07 +05:30
dbe3b9b99e add: example on fine-tuning for image classification. 2023-02-07 09:53:34 +05:30
5bc815e2e2 fix generate because of recent transformers release 2023-02-06 23:50:48 +05:30
5a43a3a321 seq cls examples update 2023-02-06 18:57:13 +05:30
7ae63299a8 Merge pull request #53 from younesbelkada/add-int8-example
[ `example`] add bnb example
2023-02-03 21:00:41 +05:30
57de1d2677 add bnb example 2023-02-02 17:45:38 +01:00
383b5abb33 Merge pull request #51 from huggingface/smangrul/lora-raise-error-when-no-target-module-found
for lora, raise error when no target modules in base model
2023-02-02 13:35:47 +05:30
d8ccd7d84c for lora, raise error when no target modules in base model 2023-02-02 13:29:49 +05:30
df5b201c6b Merge pull request #50 from huggingface/smangrul/add-modules-to-save-to-lora-config
add `modules_to_save` to LoraConfig and other fixes
2023-02-02 13:19:37 +05:30
44d8e72ca8 fixes 2023-02-02 13:19:14 +05:30
c37ee25be7 trying diff approaches 2023-02-01 19:35:19 +05:30
c884daf96a getting rid to forward call linking 2023-02-01 19:18:38 +05:30
fcd213708d fixes 2023-02-01 17:17:14 +05:30
915a5db0c6 fixes 2023-02-01 16:25:42 +05:30
d53a631608 fixes 2023-02-01 15:59:24 +05:30
b4d0885203 Merge pull request #49 from orenwang/main
fix validation_steps handling in dreambooth example
2023-02-01 15:42:35 +05:30
d04f6661ee add modules_to_save to LoraConfig and other fixes
1. Add `modules_to_save` to LoraConfig
2. Using PeftModel for LoraConfig instead of task-specific classes because LoRA is task agnostic.
2023-02-01 15:41:35 +05:30
80e1b262e5 fix validation_steps handling in dreambooth example 2023-01-31 10:59:46 +08:00
dd518985ff Merge pull request #47 from orenwang/main
allow validation images for lora training
2023-01-30 16:13:28 +05:30
a17cea104e add validation images for lora training 2023-01-30 17:18:43 +08:00
3f9b310c6a Merge pull request #46 from huggingface/smangrul/fix-hf-hub-utils-tests
fix hf hub util tests
2023-01-30 13:35:06 +05:30
06e49c0a87 fixes 2023-01-30 13:31:01 +05:30
6cf2cf5dae fix hf hub util tests 2023-01-30 12:51:30 +05:30
3faaf0916a Merge pull request #39 from younesbelkada/add-push-to-hub
[`core`] Add hub utils
2023-01-30 12:34:13 +05:30
6c9534e660 adapt for other models 2023-01-29 11:18:31 +00:00
22295c4278 adapt from code review
- remove `README`
- inherit from `dataclass`
- add new test
2023-01-29 10:49:31 +00:00
16182ea972 Apply suggestions from code review
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
2023-01-29 11:41:38 +01:00
ad69958e52 Merge branch 'main' into add-push-to-hub 2023-01-28 10:17:57 +01:00
f8a2829318 Merge pull request #38 from huggingface/smangrul/fixes
adding support for int8 lora training
2023-01-27 19:20:12 +05:30
634f3692d8 working v1
- push to hub method works
- add tests
- add config super class
- add Lora support for `from_pretrained`
2023-01-26 11:17:24 +00:00
2cc7f2cbac add config tests 2023-01-26 10:12:51 +00:00
2896cf05fb v1 working
- from_pretrained support for config
- from_pretrained support for loramodel
- todo: tests
- todo: push_to_hub
2023-01-25 22:43:22 +00:00
776a28f053 update lora to support int8 training 2023-01-25 12:27:02 +05:30
d75746be70 adding support for int8 lora training 2023-01-25 04:19:23 +05:30
1dbe7fc0db Merge pull request #37 from huggingface/smangrul/fixes
colab notebook example for lora peft application
2023-01-24 21:20:06 +05:30
ff8a5b9a69 colab notebook example for lora peft application 2023-01-24 21:19:47 +05:30
36267af51b Merge pull request #36 from huggingface/smangrul/fixes
correcting requirements.txt in example sub-folders
2023-01-22 11:21:01 +05:30
fef162cff8 correcting requirements.txt in example sub-folders 2023-01-22 11:20:39 +05:30
a8587916c8 Merge pull request #35 from huggingface/smangrul/fixes
fixes and addressing comments from previous PR
2023-01-21 18:31:28 +05:30
77670ead76 fixes and addressing comments from previous PR
1. Minor updates/fixes in README.md and setup.py
2. Make `loralib` optional
2023-01-21 18:17:19 +05:30
360fb2f816 Merge pull request #34 from huggingface/fix-typos
Review & fix typos
2023-01-21 18:00:12 +05:30
a40f20ad6c Fix typos 2023-01-20 11:34:45 -05:00
407482eb37 Merge pull request #33 from huggingface/smangrul/fixes
fixes, docs and version bump up
2023-01-20 15:34:29 +05:30
d9e7d6cd22 fixes, docs and version bump up 2023-01-20 15:34:11 +05:30
dbf438f99d Merge pull request #32 from huggingface/v0.0.1-release
V0.0.1 release
2023-01-20 14:37:57 +05:30
42 changed files with 29919 additions and 9125 deletions

View File

@ -1,6 +1,6 @@
.PHONY: quality style test docs
check_dirs := src examples
check_dirs := src tests examples
# Check that source code meets quality standards
@ -9,11 +9,11 @@ quality:
black --check $(check_dirs)
isort --check-only $(check_dirs)
flake8 $(check_dirs)
doc-builder style src --max_len 119 --check_only
doc-builder style src tests --max_len 119 --check_only
# Format source code automatically and check is there are any problems left that need manual fixing
style:
black $(check_dirs)
isort $(check_dirs)
doc-builder style src --max_len 119
doc-builder style src tests --max_len 119

View File

@ -21,7 +21,7 @@ limitations under the License.
Parameter-Efficient Fine-Tuning (PEFT) methods enable efficient adaptation of pre-trained language models (PLMs) to various downstream applications without fine-tuning all the model's parameters. Fine-tuning large-scale PLMs is often prohibitively costly. In this regard, PEFT methods only fine-tune a small number of (extra) model parameters, thereby greatly decreasing the computational and storage costs. Recent State-of-the-Art PEFT techniques achieve performance comparable to that of full fine-tuning.
Seamlessly integrated with 🤗 Accelerate for large scale models leveraging PyTorch FSDP.
Seamlessly integrated with 🤗 Accelerate for large scale models leveraging DeepSpeed and Big Model Inference.
Supported methods:
@ -34,11 +34,11 @@ Supported methods:
```python
from transformers import AutoModelForSeq2SeqLM
from peft import get_peft_config, get_peft_model, LoRAConfig, TaskType
from peft import get_peft_config, get_peft_model, LoraConfig, TaskType
model_name_or_path = "bigscience/mt0-large"
tokenizer_name_or_path = "bigscience/mt0-large"
peft_config = LoRAConfig(
peft_config = LoraConfig(
task_type=TaskType.SEQ_2_SEQ_LM, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1
)
@ -65,7 +65,7 @@ Hardware: Single A100 80GB GPU with CPU RAM above 64GB
Performance of PEFT-LoRA tuned `bigscience/T0_3B` on `ought/raft/twitter_complaints` leaderboard.
A point to note is that we didn't try to sequeeze performance by playing around with input instruction templates, LoRA hyperparams and other training related hyperparams. Also, we didn't use the larger 13B mt0-xxl model.
So, we are already seeing comparable performance to SoTA with parameter effcient tuning. Also, the final checkpoint size is just `19MB` in comparison to `11GB` size of the backbone `bigscience/T0_3B` model.
So, we are already seeing comparable performance to SoTA with parameter efficient tuning. Also, the final checkpoint size is just `19MB` in comparison to `11GB` size of the backbone `bigscience/T0_3B` model.
| Submission Name | Accuracy |
| --------- | ---- |
@ -77,7 +77,7 @@ So, we are already seeing comparable performance to SoTA with parameter effcient
### Parameter Efficient Tuning of Diffusion Models
GPU memory required by different settings during training are given below. The final checkpoint size being `8.8 MB`.
GPU memory required by different settings during training is given below. The final checkpoint size is `8.8 MB``.
Hardware: Single A100 80GB GPU with CPU RAM above 64G
@ -127,6 +127,12 @@ Try out the 🤗 Gradio Space which should run seamlessly on a T4 instance:
### Parameter Efficient Tuning of LLMs for RLHF components such as Ranker and Policy [ToDo]
### INT8 training of large models in Colab using PEFT LoRA and bits_and_bytes
Here is now a demo on how to fine tune OPT-6.7b (14GB in fp16) in a Google colab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1jCkpikz0J2o20FBQmYmAGdiKmJGOMo-o?usp=sharing)
Here is now a demo on how to fine tune wishper-large (1.5B params) (14GB in fp16) in a Google colab: [ToDo]
### Save compute and storage even for medium and small models
Save storage by avoiding full finetuning of models on each of the downstream tasks/datasets,
@ -143,10 +149,10 @@ Another example is fine-tuning `roberta-large` on `MRPC` GLUE dataset suing diff
PEFT models work with 🤗 Accelerate out of the box. Use 🤗 Accelerate for Distributed training on various hardware such as GPUs, Apple Silicon devices etc during training.
Use 🤗 Accelerate for inferencing on consumer hardware with small resources.
### Example of PEFT model training using 🤗 Accelerate's DeepSpeed integation
### Example of PEFT model training using 🤗 Accelerate's DeepSpeed integration
Currently DeepSpeed requires PR [ZeRO3 handling frozen weights](https://github.com/microsoft/DeepSpeed/pull/2653) to fix [[REQUEST] efficiently deal with frozen weights during training](https://github.com/microsoft/DeepSpeed/issues/2615) issue. Example is provided in `~examples/conditional_generation/peft_lora_seq2seq_accelerate_ds_zero3_offload.py`.
a. First run `accelerate config --config_file ds_zero3_cpu.yaml` and answer the questionaire.
Currently DeepSpeed requires PR [ZeRO3 handling frozen weights](https://github.com/microsoft/DeepSpeed/pull/2653) to fix [[REQUEST] efficiently deal with frozen weights during training](https://github.com/microsoft/DeepSpeed/issues/2615) issue. An example is provided in `~examples/conditional_generation/peft_lora_seq2seq_accelerate_ds_zero3_offload.py`.
a. First, run `accelerate` config --config_file ds_zero3_cpu.yaml` and answer the questionnaire.
Below are the contents of the config file.
```
compute_environment: LOCAL_MACHINE
@ -172,7 +178,7 @@ Use 🤗 Accelerate for inferencing on consumer hardware with small resources.
same_network: true
use_cpu: false
```
b. run the below command to launch example script
b. run the below command to launch the example script
```
accelerate launch --config_file ds_zero3_cpu.yaml examples/peft_lora_seq2seq_accelerate_ds_zero3_offload.py
```
@ -203,8 +209,7 @@ Use 🤗 Accelerate for inferencing on consumer hardware with small resources.
```
### Example of PEFT model inference using 🤗 Accelerate's Big Model Inferencing capabilities
Example is provided in `~examples/causal_language_modeling/peft_lora_clm_accelerate_big_model_inference.ipynb`.
An example is provided in `~examples/causal_language_modeling/peft_lora_clm_accelerate_big_model_inference.ipynb`.
## Models support matrix
@ -250,7 +255,30 @@ Example is provided in `~examples/causal_language_modeling/peft_lora_clm_acceler
| Deberta | ✅ | | | |
| Deberta-v2 | ✅ | | | |
### Text-to-Image Generation
| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning |
| --------- | ---- | ---- | ---- | ---- |
| Stable Diffusion | ✅ | | | |
### Image Classification
| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning |
| --------- | ---- | ---- | ---- | ---- |
| ViT | ✅ | | | |
| Swin | ✅ | | | |
___Note that we have tested LoRA for https://huggingface.co/docs/transformers/model_doc/vit and [https://huggingface.co/docs/transformers/model_doc/swin] for fine-tuning on image classification. However, it should be possible to use LoRA for any compatible model [provided](https://huggingface.co/models?pipeline_tag=image-classification&sort=downloads&search=vit) by 🤗 Transformers. Check out the respective
examples to learn more. If you run into problems, please open an issue.___
The same principle applies to our [segmentation models](https://huggingface.co/models?pipeline_tag=image-segmentation&sort=downloads) as well.
### Semantic Segmentation
| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning |
| --------- | ---- | ---- | ---- | ---- |
| SegFormer | ✅ | | | |
## Caveats:
1. Below is an example of using PyTorch FSDP for training. However, it doesn't lead to
@ -268,7 +296,7 @@ any GPU memory savings. Please refer issue [[FSDP] FSDP with CPU offload consume
```
Example of parameter efficient tuning with `mt0-xxl` base model using 🤗 Accelerate is provided in `~examples/conditional_generation/peft_lora_seq2seq_accelerate_fsdp.py`.
a. First run `accelerate config --config_file fsdp_config.yaml` and answer the questionaire.
a. First, run `accelerate config --config_file fsdp_config.yaml` and answer the questionnaire.
Below are the contents of the config file.
```
command_file: null
@ -300,19 +328,19 @@ any GPU memory savings. Please refer issue [[FSDP] FSDP with CPU offload consume
tpu_zone: null
use_cpu: false
```
b. run the below command to launch example script
b. run the below command to launch the example script
```
accelerate launch --config_file fsdp_config.yaml examples/peft_lora_seq2seq_accelerate_fsdp.py
```
2. When using `P_TUNING` or `PROMPT_TUNING` with `SEQ_2_SEQ` task, remember to remove the `num_virtual_token` virtual prompt predictions from the left side of the model outputs during evaluations.
3. `P_TUNING` or `PROMPT_TUNING` doesn't support `generate` functionality of transformers bcause `generate` strictly requires `input_ids`/`decoder_input_ids` but
3. For encoder-decoder models, `P_TUNING` or `PROMPT_TUNING` doesn't support `generate` functionality of transformers because `generate` strictly requires `decoder_input_ids` but
`P_TUNING`/`PROMPT_TUNING` appends soft prompt embeddings to `input_embeds` to create
new `input_embeds` to be given to the model. Therefore, `generate` doesn't support this yet.
## Backlog:
1. Explore and possibly integrate `(IA)^3` and `UniPELT`
1. Explore and possibly integrate `(IA)^3`
2. Add tests
3. Add more use cases and examples
@ -323,7 +351,7 @@ If you use 🤗 PEFT in your publication, please cite it by using the following
```bibtex
@Misc{peft,
title = {PEFT: State-of-the-art Parameter-Efficient Fine-Tuning methods},
author = {Sourab Mangrulkar, Sylvain Gugger},
author = {Sourab Mangrulkar, Sylvain Gugger, Lysandre Debut, Younes Belkada, Sayak Paul},
howpublished = {\url{https://github.com/huggingface/peft}},
year = {2022}
}

View File

@ -0,0 +1,22 @@
compute_environment: LOCAL_MACHINE
deepspeed_config:
gradient_accumulation_steps: 1
gradient_clipping: 1.0
offload_optimizer_device: none
offload_param_device: none
zero3_init_flag: true
zero3_save_16bit_model: true
zero_stage: 3
distributed_type: DEEPSPEED
downcast_bf16: 'no'
dynamo_backend: 'NO'
fsdp_config: {}
machine_rank: 0
main_training_function: main
megatron_lm_config: {}
mixed_precision: 'no'
num_machines: 1
num_processes: 1
rdzv_backend: static
same_network: true
use_cpu: false

View File

@ -17,7 +17,7 @@ from transformers import (
import psutil
from datasets import load_dataset
from peft import LoraConfig, TaskType, get_peft_model, get_peft_model_state_dict
from peft import LoraConfig, TaskType, get_peft_model
from tqdm import tqdm
@ -111,9 +111,6 @@ def main():
model_name_or_path = "bigscience/bloomz-7b1"
dataset_name = "twitter_complaints"
peft_config = LoraConfig(task_type=TaskType.CAUSAL_LM, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1)
checkpoint_name = (
f"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}_v1.pt".replace("/", "_")
)
text_column = "Tweet text"
label_column = "text_label"
lr = 3e-3
@ -121,6 +118,7 @@ def main():
batch_size = 8
seed = 42
max_length = 64
do_test = False
set_seed(seed)
dataset = load_dataset("ought/raft", dataset_name)
@ -315,35 +313,41 @@ def main():
accelerator.print(f"{eval_preds[:10]=}")
accelerator.print(f"{dataset['train'][label_column][:10]=}")
model.eval()
test_preds = []
for _, batch in enumerate(tqdm(test_dataloader)):
batch = {k: v for k, v in batch.items() if k != "labels"}
with torch.no_grad():
outputs = accelerator.unwrap_model(model).generate(
**batch, synced_gpus=is_ds_zero_3, max_new_tokens=10
) # synced_gpus=True for DS-stage 3
test_preds.extend(
tokenizer.batch_decode(outputs[:, max_length:].detach().cpu().numpy(), skip_special_tokens=True)
)
if do_test:
model.eval()
test_preds = []
for _, batch in enumerate(tqdm(test_dataloader)):
batch = {k: v for k, v in batch.items() if k != "labels"}
with torch.no_grad():
outputs = accelerator.unwrap_model(model).generate(
**batch, synced_gpus=is_ds_zero_3, max_new_tokens=10
) # synced_gpus=True for DS-stage 3
test_preds.extend(
tokenizer.batch_decode(outputs[:, max_length:].detach().cpu().numpy(), skip_special_tokens=True)
)
test_preds_cleaned = []
for _, pred in enumerate(test_preds):
test_preds_cleaned.append(get_closest_label(pred, classes))
test_preds_cleaned = []
for _, pred in enumerate(test_preds):
test_preds_cleaned.append(get_closest_label(pred, classes))
test_df = dataset["test"].to_pandas()
test_df[label_column] = test_preds_cleaned
test_df["text_labels_orig"] = test_preds
accelerator.print(test_df[[text_column, label_column]].sample(20))
test_df = dataset["test"].to_pandas()
test_df[label_column] = test_preds_cleaned
test_df["text_labels_orig"] = test_preds
accelerator.print(test_df[[text_column, label_column]].sample(20))
pred_df = test_df[["ID", label_column]]
pred_df.columns = ["ID", "Label"]
pred_df = test_df[["ID", label_column]]
pred_df.columns = ["ID", "Label"]
os.makedirs(f"data/{dataset_name}", exist_ok=True)
pred_df.to_csv(f"data/{dataset_name}/predictions.csv", index=False)
os.makedirs(f"data/{dataset_name}", exist_ok=True)
pred_df.to_csv(f"data/{dataset_name}/predictions.csv", index=False)
accelerator.wait_for_everyone()
accelerator.save(get_peft_model_state_dict(model, state_dict=accelerator.get_state_dict(model)), checkpoint_name)
model.push_to_hub(
"smangrul/"
+ f"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}".replace("/", "_"),
state_dict=accelerator.get_state_dict(model),
use_auth_token=True,
)
accelerator.wait_for_everyone()

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,22 @@
compute_environment: LOCAL_MACHINE
deepspeed_config:
gradient_accumulation_steps: 1
gradient_clipping: 1.0
offload_optimizer_device: none
offload_param_device: none
zero3_init_flag: true
zero3_save_16bit_model: true
zero_stage: 3
distributed_type: DEEPSPEED
downcast_bf16: 'no'
dynamo_backend: 'NO'
fsdp_config: {}
machine_rank: 0
main_training_function: main
megatron_lm_config: {}
mixed_precision: 'no'
num_machines: 1
num_processes: 1
rdzv_backend: static
same_network: true
use_cpu: false

View File

@ -2,10 +2,26 @@
"cells": [
{
"cell_type": "code",
"execution_count": 17,
"execution_count": 1,
"id": "5f93b7d1",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"===================================BUG REPORT===================================\n",
"Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues\n",
"For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link\n",
"================================================================================\n",
"CUDA SETUP: CUDA runtime path found: /home/sourab/miniconda3/envs/ml/lib/libcudart.so\n",
"CUDA SETUP: Highest compute capability among GPUs detected: 7.5\n",
"CUDA SETUP: Detected CUDA version 117\n",
"CUDA SETUP: Loading binary /home/sourab/miniconda3/envs/ml/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...\n"
]
}
],
"source": [
"from transformers import AutoModelForSeq2SeqLM\n",
"from peft import get_peft_config,get_peft_model, get_peft_model_state_dict, LoraConfig, TaskType\n",
@ -60,15 +76,13 @@
"name": "stderr",
"output_type": "stream",
"text": [
"/home/sourab/miniconda3/envs/ml/lib/python3.10/site-packages/huggingface_hub/utils/_deprecation.py:97: FutureWarning: Deprecated argument(s) used in 'dataset_info': token. Will not be supported from version '0.12'.\n",
" warnings.warn(message, FutureWarning)\n",
"Found cached dataset financial_phrasebank (/home/sourab/.cache/huggingface/datasets/financial_phrasebank/sentences_allagree/1.0.0/550bde12e6c30e2674da973a55f57edde5181d53f5a5a34c1531c53f93b7e141)\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "6de075f8208349108291ac5ab7f5c980",
"model_id": "3403bf3d718042018b0531848cc30209",
"version_major": 2,
"version_minor": 0
},
@ -82,7 +96,7 @@
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "4b0e67b6d93f43e4b0f6a2f8978e4b0c",
"model_id": "d3d5c45e3776469f9560b6eaa9346f8f",
"version_major": 2,
"version_minor": 0
},
@ -96,7 +110,7 @@
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "a9551029c9884529bda7421a99170b51",
"model_id": "e9736f26e9aa450b8d65f95c0b9c81cc",
"version_major": 2,
"version_minor": 0
},
@ -110,7 +124,7 @@
{
"data": {
"text/plain": [
"{'sentence': 'The order was valued at USD12 .2 m.',\n",
"{'sentence': \"The 10,000-odd square metre plot that Stockmann has bought for the Nevsky Center shopping center is located on Nevsky Prospect , St Petersburg 's high street , next to the Vosstaniya Square underground station , in the immediate vicinity of Moscow Station .\",\n",
" 'label': 1,\n",
" 'text_label': 'neutral'}"
]
@ -147,7 +161,7 @@
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "4421971232434db1b6141e91fda2f6d7",
"model_id": "c460989d4ab24e3f97d81ef040b1d1b4",
"version_major": 2,
"version_minor": 0
},
@ -161,7 +175,7 @@
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "9b2ef793d93443949f4a5d5874d4bc05",
"model_id": "1acc389b08b94f8a87900b9fbdbccce4",
"version_major": 2,
"version_minor": 0
},
@ -234,45 +248,52 @@
"name": "stderr",
"output_type": "stream",
"text": [
"100%|█████████████████████████████████████████████████████████████| 255/255 [00:53<00:00, 4.80it/s]\n",
"100%|███████████████████████████████████████████████████████████████| 29/29 [00:02<00:00, 14.16it/s]\n"
"100%|████████████████████████████████████████████████████████████████████████████████████████| 255/255 [02:21<00:00, 1.81it/s]\n",
"100%|██████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:07<00:00, 4.13it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch=0: train_ppl=tensor(13.6966, device='cuda:0') train_epoch_loss=tensor(2.6171, device='cuda:0') eval_ppl=tensor(1.0046, device='cuda:0') eval_epoch_loss=tensor(0.0046, device='cuda:0')\n"
"epoch=0: train_ppl=tensor(14.6341, device='cuda:0') train_epoch_loss=tensor(2.6834, device='cuda:0') eval_ppl=tensor(1.0057, device='cuda:0') eval_epoch_loss=tensor(0.0057, device='cuda:0')\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|█████████████████████████████████████████████████████████████| 255/255 [00:52<00:00, 4.88it/s]\n",
"100%|███████████████████████████████████████████████████████████████| 29/29 [00:02<00:00, 14.20it/s]\n"
"100%|████████████████████████████████████████████████████████████████████████████████████████| 255/255 [02:00<00:00, 2.11it/s]\n",
"100%|██████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:05<00:00, 5.66it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch=1: train_ppl=tensor(1.5893, device='cuda:0') train_epoch_loss=tensor(0.4633, device='cuda:0') eval_ppl=tensor(1.0020, device='cuda:0') eval_epoch_loss=tensor(0.0020, device='cuda:0')\n"
"epoch=1: train_ppl=tensor(1.7576, device='cuda:0') train_epoch_loss=tensor(0.5640, device='cuda:0') eval_ppl=tensor(1.0052, device='cuda:0') eval_epoch_loss=tensor(0.0052, device='cuda:0')\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|█████████████████████████████████████████████████████████████| 255/255 [00:52<00:00, 4.87it/s]\n",
"100%|███████████████████████████████████████████████████████████████| 29/29 [00:02<00:00, 14.18it/s]\n"
"100%|████████████████████████████████████████████████████████████████████████████████████████| 255/255 [01:33<00:00, 2.74it/s]\n",
"100%|██████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:04<00:00, 6.23it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch=2: train_ppl=tensor(1.3210, device='cuda:0') train_epoch_loss=tensor(0.2784, device='cuda:0') eval_ppl=tensor(1.0026, device='cuda:0') eval_epoch_loss=tensor(0.0026, device='cuda:0')\n"
"epoch=2: train_ppl=tensor(1.3830, device='cuda:0') train_epoch_loss=tensor(0.3243, device='cuda:0') eval_ppl=tensor(1.0035, device='cuda:0') eval_epoch_loss=tensor(0.0035, device='cuda:0')\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n"
]
}
],
@ -313,7 +334,7 @@
},
{
"cell_type": "code",
"execution_count": 20,
"execution_count": 7,
"id": "6cafa67b",
"metadata": {},
"outputs": [
@ -321,9 +342,9 @@
"name": "stdout",
"output_type": "stream",
"text": [
"accuracy=98.23788546255507 % on the evaluation dataset\n",
"eval_preds[:10]=['neutral', 'neutral', 'positive', 'positive', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral']\n",
"dataset['validation']['text_label'][:10]=['neutral', 'neutral', 'positive', 'positive', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral']\n"
"accuracy=97.3568281938326 % on the evaluation dataset\n",
"eval_preds[:10]=['neutral', 'neutral', 'neutral', 'positive', 'neutral', 'positive', 'positive', 'neutral', 'neutral', 'neutral']\n",
"dataset['validation']['text_label'][:10]=['neutral', 'neutral', 'neutral', 'positive', 'neutral', 'positive', 'positive', 'neutral', 'neutral', 'neutral']\n"
]
}
],
@ -343,20 +364,19 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 8,
"id": "a8de6005",
"metadata": {},
"outputs": [],
"source": [
"# saving model\n",
"state_dict = get_peft_model_state_dict(model)\n",
"torch.save(state_dict, checkpoint_name)\n",
"print(state_dict)"
"peft_model_id = f\"{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}\"\n",
"model.save_pretrained(peft_model_id)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"execution_count": 9,
"id": "bd20cd4c",
"metadata": {},
"outputs": [
@ -364,18 +384,74 @@
"name": "stdout",
"output_type": "stream",
"text": [
"19M\tfinancial_sentiment_analysis_lora_v1.pt\r\n"
"9,2M\tbigscience/mt0-large_LORA_SEQ_2_SEQ_LM/adapter_model.bin\r\n"
]
}
],
"source": [
"!du -h $checkpoint_name"
"ckpt = f\"{peft_model_id}/adapter_model.bin\"\n",
"!du -h $ckpt"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "76c2fc29",
"metadata": {},
"outputs": [],
"source": [
"from peft import PeftModel, PeftConfig\n",
"peft_model_id = f\"{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}\"\n",
"\n",
"config = PeftConfig.from_pretrained(peft_model_id)\n",
"model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path)\n",
"model = PeftModel.from_pretrained(model, peft_model_id)\n"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "37d712ce",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"- Demand for fireplace products was lower than expected , especially in Germany .\n",
"{'input_ids': tensor([[ 259, 264, 259, 82903, 332, 1090, 10040, 10371, 639, 259,\n",
" 19540, 2421, 259, 25505, 259, 261, 259, 21230, 281, 17052,\n",
" 259, 260, 1]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}\n",
"tensor([[ 0, 259, 32588, 1]])\n",
"['negative']\n"
]
}
],
"source": [
"model.eval()\n",
"i = 13\n",
"inputs = tokenizer(dataset[\"validation\"][text_column][i], return_tensors=\"pt\")\n",
"print(dataset[\"validation\"][text_column][i])\n",
"print(inputs)\n",
"\n",
"with torch.no_grad():\n",
" outputs = model.generate(input_ids=inputs[\"input_ids\"], max_new_tokens=10)\n",
" print(outputs)\n",
" print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "76c2fc29",
"id": "66c65ea4",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "65e71f78",
"metadata": {},
"outputs": [],
"source": []
@ -383,7 +459,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.10.5 64-bit",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@ -397,7 +473,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.5 (v3.10.5:f377153967, Jun 6 2022, 12:36:10) [Clang 13.0.0 (clang-1300.0.29.30)]"
"version": "3.10.4"
},
"vscode": {
"interpreter": {

View File

@ -0,0 +1,255 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "71fbfca2",
"metadata": {},
"outputs": [],
"source": [
"from transformers import AutoModelForSeq2SeqLM\n",
"from peft import PeftModel, PeftConfig\n",
"import torch\n",
"from datasets import load_dataset\n",
"import os\n",
"from transformers import AutoTokenizer\n",
"from torch.utils.data import DataLoader\n",
"from transformers import default_data_collator,get_linear_schedule_with_warmup\n",
"from tqdm import tqdm\n",
"from datasets import load_dataset\n",
"\n",
"dataset_name = \"twitter_complaints\"\n",
"text_column = \"Tweet text\"\n",
"label_column = \"text_label\"\n",
"batch_size=8\n",
"\n",
"peft_model_id = \"smangrul/twitter_complaints_bigscience_T0_3B_LORA_SEQ_2_SEQ_LM\"\n",
"config = PeftConfig.from_pretrained(peft_model_id)\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "cc55820a",
"metadata": {},
"outputs": [],
"source": [
"peft_model_id = \"smangrul/twitter_complaints_bigscience_T0_3B_LORA_SEQ_2_SEQ_LM\"\n",
"max_memory={0: \"6GIB\", 1: \"0GIB\", 2: \"0GIB\", 3: \"0GIB\", 4: \"0GIB\", \"cpu\":\"30GB\"}\n",
"config = PeftConfig.from_pretrained(peft_model_id)\n",
"model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path, device_map=\"auto\", max_memory=max_memory)\n",
"model = PeftModel.from_pretrained(model, peft_model_id, device_map=\"auto\", max_memory=max_memory)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e1a3648b",
"metadata": {},
"outputs": [],
"source": [
"from datasets import load_dataset\n",
"\n",
"dataset = load_dataset(\"ought/raft\", dataset_name)\n",
"\n",
"classes = [k.replace(\"_\", \" \") for k in dataset[\"train\"].features[\"Label\"].names]\n",
"print(classes)\n",
"dataset = dataset.map(\n",
" lambda x: {\"text_label\": [classes[label] for label in x[\"Label\"]]},\n",
" batched=True,\n",
" num_proc=1,\n",
" \n",
")\n",
"print(dataset)\n",
"dataset[\"train\"][0]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fe12d4d3",
"metadata": {},
"outputs": [],
"source": [
"tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)\n",
"target_max_length = max([len(tokenizer(class_label)[\"input_ids\"]) for class_label in classes])\n",
"def preprocess_function(examples):\n",
" inputs = examples[text_column]\n",
" targets = examples[label_column]\n",
" model_inputs = tokenizer(inputs, truncation=True)\n",
" labels = tokenizer(\n",
" targets, max_length=target_max_length, padding=\"max_length\", truncation=True, return_tensors=\"pt\"\n",
" )\n",
" labels = labels[\"input_ids\"]\n",
" labels[labels == tokenizer.pad_token_id] = -100\n",
" model_inputs[\"labels\"] = labels\n",
" return model_inputs\n",
"\n",
"processed_datasets = dataset.map(\n",
" preprocess_function,\n",
" batched=True,\n",
" num_proc=1,\n",
" remove_columns=dataset[\"train\"].column_names,\n",
" load_from_cache_file=True,\n",
" desc=\"Running tokenizer on dataset\",\n",
")\n",
"\n",
"train_dataset = processed_datasets[\"train\"]\n",
"eval_dataset = processed_datasets[\"train\"]\n",
"test_dataset = processed_datasets[\"test\"]\n",
"\n",
"\n",
"def collate_fn(examples):\n",
" return tokenizer.pad(examples, padding=\"longest\", return_tensors=\"pt\")\n",
"\n",
"train_dataloader = DataLoader(\n",
" train_dataset, shuffle=True, collate_fn=collate_fn, batch_size=batch_size, pin_memory=True\n",
")\n",
"eval_dataloader = DataLoader(eval_dataset, collate_fn=collate_fn, batch_size=batch_size, pin_memory=True)\n",
"test_dataloader = DataLoader(test_dataset, collate_fn=collate_fn, batch_size=batch_size, pin_memory=True)\n",
"\n",
"\n",
"\n",
"\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "b33be5e6",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"@NYTsupport i have complained a dozen times &amp; yet my papers are still thrown FAR from my door. Why is this so hard to resolve?\n",
"{'input_ids': tensor([[25335, 1499, 3, 10, 3320, 12056, 382, 20390, 3, 23,\n",
" 43, 25932, 3, 9, 9611, 648, 3, 184, 4624, 117,\n",
" 780, 82, 5778, 33, 341, 3, 12618, 377, 4280, 45,\n",
" 82, 1365, 5, 1615, 19, 48, 78, 614, 12, 7785,\n",
" 58, 16229, 3, 10, 3, 1]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n",
" 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}\n",
"tensor([[ 0, 10394, 1]], device='cuda:0')\n",
"['complaint']\n"
]
}
],
"source": [
"model.eval()\n",
"i = 15\n",
"inputs = tokenizer(f'{text_column} : {dataset[\"test\"][i][\"Tweet text\"]} Label : ', return_tensors=\"pt\")\n",
"print(dataset[\"test\"][i][\"Tweet text\"])\n",
"print(inputs)\n",
"\n",
"with torch.no_grad():\n",
" outputs = model.generate(input_ids=inputs[\"input_ids\"].to(\"cuda\"), max_new_tokens=10)\n",
" print(outputs)\n",
" print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "b6d6cd5b",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
" 0%| | 0/7 [00:00<?, ?it/s]You're using a T5TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n",
"100%|████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:10<00:00, 1.48s/it]\n"
]
}
],
"source": [
"model.eval()\n",
"eval_preds = []\n",
"for _, batch in enumerate(tqdm(eval_dataloader)):\n",
" batch = {k: v.to(\"cuda\") for k, v in batch.items() if k != \"labels\"}\n",
" with torch.no_grad():\n",
" outputs = model.generate(**batch, max_new_tokens=10)\n",
" preds = outputs.detach().cpu().numpy()\n",
" eval_preds.extend(tokenizer.batch_decode(preds, skip_special_tokens=True))"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "61264abe",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"accuracy=100.0\n",
"eval_preds[:10]=['no complaint', 'no complaint', 'complaint', 'complaint', 'no complaint', 'no complaint', 'no complaint', 'complaint', 'complaint', 'no complaint']\n",
"dataset['train'][label_column][:10]=['no complaint', 'no complaint', 'complaint', 'complaint', 'no complaint', 'no complaint', 'no complaint', 'complaint', 'complaint', 'no complaint']\n"
]
}
],
"source": [
"correct = 0\n",
"total = 0\n",
"for pred, true in zip(eval_preds, dataset[\"train\"][label_column]):\n",
" if pred.strip() == true.strip():\n",
" correct += 1\n",
" total += 1\n",
"accuracy = correct / total * 100\n",
"print(f\"{accuracy=}\")\n",
"print(f\"{eval_preds[:10]=}\")\n",
"print(f\"{dataset['train'][label_column][:10]=}\")\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a70802a3",
"metadata": {},
"outputs": [],
"source": [
"model.eval()\n",
"test_preds = []\n",
"\n",
"for _, batch in enumerate(tqdm(test_dataloader)):\n",
" batch = {k: v for k, v in batch.items() if k != \"labels\"}\n",
" with torch.no_grad():\n",
" outputs = model.generate(**batch, max_new_tokens=10)\n",
" preds = outputs.detach().cpu().numpy()\n",
" test_preds.extend(tokenizer.batch_decode(preds, skip_special_tokens=True))\n",
" if len(test_preds)>100:\n",
" break\n",
"test_preds"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.5 (v3.10.5:f377153967, Jun 6 2022, 12:36:10) [Clang 13.0.0 (clang-1300.0.29.30)]"
},
"vscode": {
"interpreter": {
"hash": "aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49"
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@ -11,7 +11,7 @@ from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, get_linear_schedu
import psutil
from datasets import load_dataset
from peft import LoraConfig, TaskType, get_peft_model, get_peft_model_state_dict
from peft import LoraConfig, TaskType, get_peft_model
from tqdm import tqdm
@ -107,15 +107,13 @@ def main():
peft_config = LoraConfig(
task_type=TaskType.SEQ_2_SEQ_LM, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1
)
checkpoint_name = (
f"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}_v1.pt".replace("/", "_")
)
text_column = "Tweet text"
label_column = "text_label"
lr = 3e-3
num_epochs = 5
batch_size = 8
seed = 42
do_test = False
set_seed(seed)
dataset = load_dataset("ought/raft", dataset_name)
@ -265,33 +263,39 @@ def main():
accelerator.print(f"{eval_preds[:10]=}")
accelerator.print(f"{dataset['train'][label_column][:10]=}")
model.eval()
test_preds = []
for _, batch in enumerate(tqdm(test_dataloader)):
batch = {k: v for k, v in batch.items() if k != "labels"}
with torch.no_grad():
outputs = accelerator.unwrap_model(model).generate(
**batch, synced_gpus=is_ds_zero_3
) # synced_gpus=True for DS-stage 3
test_preds.extend(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))
if do_test:
model.eval()
test_preds = []
for _, batch in enumerate(tqdm(test_dataloader)):
batch = {k: v for k, v in batch.items() if k != "labels"}
with torch.no_grad():
outputs = accelerator.unwrap_model(model).generate(
**batch, synced_gpus=is_ds_zero_3
) # synced_gpus=True for DS-stage 3
test_preds.extend(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))
test_preds_cleaned = []
for _, pred in enumerate(test_preds):
test_preds_cleaned.append(get_closest_label(pred, classes))
test_preds_cleaned = []
for _, pred in enumerate(test_preds):
test_preds_cleaned.append(get_closest_label(pred, classes))
test_df = dataset["test"].to_pandas()
test_df[label_column] = test_preds_cleaned
test_df["text_labels_orig"] = test_preds
accelerator.print(test_df[[text_column, label_column]].sample(20))
test_df = dataset["test"].to_pandas()
test_df[label_column] = test_preds_cleaned
test_df["text_labels_orig"] = test_preds
accelerator.print(test_df[[text_column, label_column]].sample(20))
pred_df = test_df[["ID", label_column]]
pred_df.columns = ["ID", "Label"]
pred_df = test_df[["ID", label_column]]
pred_df.columns = ["ID", "Label"]
os.makedirs(f"data/{dataset_name}", exist_ok=True)
pred_df.to_csv(f"data/{dataset_name}/predictions.csv", index=False)
os.makedirs(f"data/{dataset_name}", exist_ok=True)
pred_df.to_csv(f"data/{dataset_name}/predictions.csv", index=False)
accelerator.wait_for_everyone()
accelerator.save(get_peft_model_state_dict(model, state_dict=accelerator.get_state_dict(model)), checkpoint_name)
model.push_to_hub(
"smangrul/"
+ f"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}".replace("/", "_"),
state_dict=accelerator.get_state_dict(model),
use_auth_token=True,
)
accelerator.wait_for_everyone()

View File

@ -6,7 +6,7 @@ from torch.utils.data import DataLoader
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, default_data_collator, get_linear_schedule_with_warmup
from datasets import load_dataset
from peft import LoraConfig, TaskType, get_peft_model, get_peft_model_state_dict
from peft import LoraConfig, TaskType, get_peft_model
from peft.utils.other import fsdp_auto_wrap_policy
from tqdm import tqdm
@ -25,7 +25,6 @@ def main():
peft_config = LoraConfig(
task_type=TaskType.SEQ_2_SEQ_LM, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1
)
checkpoint_name = "financial_sentiment_analysis_lora_fsdp_v1.pt"
model = AutoModelForSeq2SeqLM.from_pretrained(model_name_or_path)
model = get_peft_model(model, peft_config)
accelerator.print(model.print_trainable_parameters())
@ -126,8 +125,10 @@ def main():
accelerator.print(f"{eval_preds[:10]=}")
accelerator.print(f"{dataset['validation'][label_column][:10]=}")
accelerator.wait_for_everyone()
accelerator.save(
get_peft_model_state_dict(model, state_dict=accelerator.get_state_dict(model)), checkpoint_name
model.push_to_hub(
"smangrul/" + f"{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}".replace("/", "_"),
state_dict=accelerator.get_state_dict(model),
use_auth_token=True,
)
accelerator.wait_for_everyone()

View File

@ -5,7 +5,23 @@
"execution_count": 1,
"id": "5f93b7d1",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"===================================BUG REPORT===================================\n",
"Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues\n",
"For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link\n",
"================================================================================\n",
"CUDA SETUP: CUDA runtime path found: /home/sourab/miniconda3/envs/ml/lib/libcudart.so\n",
"CUDA SETUP: Highest compute capability among GPUs detected: 7.5\n",
"CUDA SETUP: Detected CUDA version 117\n",
"CUDA SETUP: Loading binary /home/sourab/miniconda3/envs/ml/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...\n"
]
}
],
"source": [
"from transformers import AutoModelForSeq2SeqLM\n",
"from peft import get_peft_config,get_peft_model, get_peft_model_state_dict, PrefixTuningConfig, TaskType\n",
@ -61,15 +77,13 @@
"name": "stderr",
"output_type": "stream",
"text": [
"/home/sourab/miniconda3/envs/ml/lib/python3.10/site-packages/huggingface_hub/utils/_deprecation.py:97: FutureWarning: Deprecated argument(s) used in 'dataset_info': token. Will not be supported from version '0.12'.\n",
" warnings.warn(message, FutureWarning)\n",
"Found cached dataset financial_phrasebank (/home/sourab/.cache/huggingface/datasets/financial_phrasebank/sentences_allagree/1.0.0/550bde12e6c30e2674da973a55f57edde5181d53f5a5a34c1531c53f93b7e141)\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "e3f8b8faca0a4112b2c3499faee9544b",
"model_id": "ec4be98991b84181bfa75f8846422b8b",
"version_major": 2,
"version_minor": 0
},
@ -83,7 +97,7 @@
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "935c8aebde284a5784348588e0bb013a",
"model_id": "82a6bd694c4f4751a23c370ab51f01a4",
"version_major": 2,
"version_minor": 0
},
@ -97,7 +111,7 @@
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "e3487cd55f6847588492bf7fa51348ca",
"model_id": "3844878631534468a1495e435563e4b0",
"version_major": 2,
"version_minor": 0
},
@ -111,9 +125,9 @@
{
"data": {
"text/plain": [
"{'sentence': 'ADPnews - Feb 5 , 2010 - Finnish real estate investor Sponda Oyj HEL : SDA1V said today that it slipped to a net loss of EUR 81.5 million USD 11.8 m in 2009 from a profit of EUR 29.3 million in 2008 .',\n",
" 'label': 0,\n",
" 'text_label': 'negative'}"
"{'sentence': 'Finnish elevators and escalators maker KONE Corporation said on Tuesday ( 18 March ) that it has received a major order from Sir Robert McAlpine to supply all elevators and escalators for the Watermark Place project in the City of London .',\n",
" 'label': 2,\n",
" 'text_label': 'positive'}"
]
},
"execution_count": 3,
@ -145,39 +159,11 @@
"id": "adf9608c",
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "2ce088f4437d4e2c80c267332a5b84e5",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Downloading: 0%| | 0.00/792k [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "4e5f69b61f194220b39336e48edd2f9e",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Downloading: 0%| | 0.00/1.39M [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/sourab/transformers/src/transformers/models/t5/tokenization_t5_fast.py:156: FutureWarning: This tokenizer was incorrectly instantiated with a model max length of 512 which will be corrected in Transformers v5.\n",
"/home/sourab/transformers/src/transformers/models/t5/tokenization_t5_fast.py:155: FutureWarning: This tokenizer was incorrectly instantiated with a model max length of 512 which will be corrected in Transformers v5.\n",
"For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.\n",
"- Be aware that you SHOULD NOT rely on t5-large automatically truncating your input to 512 when padding/encoding.\n",
"- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.\n",
@ -188,7 +174,7 @@
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "230c5631891e4ea8ac7a1b39f315a4f0",
"model_id": "4af8c12efb5643659573347509079f3a",
"version_major": 2,
"version_minor": 0
},
@ -202,7 +188,7 @@
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "b581e5677d2a45459ceb725534ed0891",
"model_id": "86033b6257384584afd034075af808cb",
"version_major": 2,
"version_minor": 0
},
@ -275,82 +261,75 @@
"name": "stderr",
"output_type": "stream",
"text": [
"100%|█████████████████████████████████████████████████████████████| 255/255 [00:20<00:00, 12.27it/s]\n",
"100%|███████████████████████████████████████████████████████████████| 29/29 [00:01<00:00, 17.32it/s]\n"
"100%|████████████████████████████████████████████████████████████████████████████████████████| 255/255 [00:49<00:00, 5.15it/s]\n",
"100%|██████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:03<00:00, 7.56it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch=0: train_ppl=tensor(2697769., device='cuda:0') train_epoch_loss=tensor(14.8079, device='cuda:0') eval_ppl=tensor(1.0089, device='cuda:0') eval_epoch_loss=tensor(0.0089, device='cuda:0')\n"
"epoch=0: train_ppl=tensor(2760654.5000, device='cuda:0') train_epoch_loss=tensor(14.8310, device='cuda:0') eval_ppl=tensor(1.0124, device='cuda:0') eval_epoch_loss=tensor(0.0124, device='cuda:0')\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|█████████████████████████████████████████████████████████████| 255/255 [00:19<00:00, 12.75it/s]\n",
"100%|███████████████████████████████████████████████████████████████| 29/29 [00:01<00:00, 17.33it/s]\n"
"100%|████████████████████████████████████████████████████████████████████████████████████████| 255/255 [00:40<00:00, 6.22it/s]\n",
"100%|██████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:05<00:00, 5.05it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch=1: train_ppl=tensor(2.9475, device='cuda:0') train_epoch_loss=tensor(1.0809, device='cuda:0') eval_ppl=tensor(1.0072, device='cuda:0') eval_epoch_loss=tensor(0.0072, device='cuda:0')\n"
"epoch=1: train_ppl=tensor(2.7329, device='cuda:0') train_epoch_loss=tensor(1.0054, device='cuda:0') eval_ppl=tensor(1.0081, device='cuda:0') eval_epoch_loss=tensor(0.0080, device='cuda:0')\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|█████████████████████████████████████████████████████████████| 255/255 [00:20<00:00, 12.71it/s]\n",
"100%|███████████████████████████████████████████████████████████████| 29/29 [00:01<00:00, 17.31it/s]\n"
"100%|████████████████████████████████████████████████████████████████████████████████████████| 255/255 [00:58<00:00, 4.36it/s]\n",
"100%|██████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:05<00:00, 5.05it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch=2: train_ppl=tensor(2.0588, device='cuda:0') train_epoch_loss=tensor(0.7221, device='cuda:0') eval_ppl=tensor(1.0055, device='cuda:0') eval_epoch_loss=tensor(0.0054, device='cuda:0')\n"
"epoch=2: train_ppl=tensor(2.1698, device='cuda:0') train_epoch_loss=tensor(0.7747, device='cuda:0') eval_ppl=tensor(1.0057, device='cuda:0') eval_epoch_loss=tensor(0.0057, device='cuda:0')\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|█████████████████████████████████████████████████████████████| 255/255 [00:20<00:00, 12.70it/s]\n",
"100%|███████████████████████████████████████████████████████████████| 29/29 [00:01<00:00, 17.32it/s]\n"
"100%|████████████████████████████████████████████████████████████████████████████████████████| 255/255 [00:58<00:00, 4.35it/s]\n",
"100%|██████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:05<00:00, 5.06it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch=3: train_ppl=tensor(1.7939, device='cuda:0') train_epoch_loss=tensor(0.5844, device='cuda:0') eval_ppl=tensor(1.0063, device='cuda:0') eval_epoch_loss=tensor(0.0063, device='cuda:0')\n"
"epoch=3: train_ppl=tensor(2.0724, device='cuda:0') train_epoch_loss=tensor(0.7287, device='cuda:0') eval_ppl=tensor(1.0051, device='cuda:0') eval_epoch_loss=tensor(0.0051, device='cuda:0')\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|█████████████████████████████████████████████████████████████| 255/255 [00:19<00:00, 13.01it/s]\n",
"100%|███████████████████████████████████████████████████████████████| 29/29 [00:01<00:00, 17.33it/s]"
"100%|████████████████████████████████████████████████████████████████████████████████████████| 255/255 [01:02<00:00, 4.10it/s]\n",
"100%|██████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:06<00:00, 4.74it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch=4: train_ppl=tensor(1.7740, device='cuda:0') train_epoch_loss=tensor(0.5732, device='cuda:0') eval_ppl=tensor(1.0062, device='cuda:0') eval_epoch_loss=tensor(0.0061, device='cuda:0')\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n"
"epoch=4: train_ppl=tensor(1.7598, device='cuda:0') train_epoch_loss=tensor(0.5652, device='cuda:0') eval_ppl=tensor(1.0047, device='cuda:0') eval_epoch_loss=tensor(0.0047, device='cuda:0')\n"
]
}
],
@ -399,9 +378,9 @@
"name": "stdout",
"output_type": "stream",
"text": [
"accuracy=96.47577092511013 % on the evaluation dataset\n",
"eval_preds[:10]=['neutral', 'neutral', 'neutral', 'negative', 'neutral', 'neutral', 'neutral', 'neutral', 'positive', 'positive']\n",
"dataset['validation']['text_label'][:10]=['neutral', 'neutral', 'neutral', 'negative', 'neutral', 'neutral', 'neutral', 'neutral', 'positive', 'positive']\n"
"accuracy=96.91629955947137 % on the evaluation dataset\n",
"eval_preds[:10]=['negative', 'positive', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral']\n",
"dataset['validation']['text_label'][:10]=['negative', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral']\n"
]
}
],
@ -424,26 +403,11 @@
"execution_count": 8,
"id": "a8de6005",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'prompt_embeddings': tensor([[-0.3165, -0.8389, 0.3262, ..., -1.5049, -1.6963, 0.3444],\n",
" [-1.8359, 1.1936, 1.0483, ..., 0.6197, -0.4452, 0.5844],\n",
" [-0.6027, 0.3246, -1.5601, ..., -0.3645, 0.2329, 0.3402],\n",
" ...,\n",
" [-1.9525, -0.5035, 0.8474, ..., 0.4793, -0.0789, -0.9305],\n",
" [-1.9741, 0.5242, -2.0594, ..., -0.7970, -0.4889, 2.7323],\n",
" [ 0.9355, -0.2714, 0.4610, ..., 0.2692, -1.5801, -1.6405]])}\n"
]
}
],
"outputs": [],
"source": [
"# saving model\n",
"state_dict = get_peft_model_state_dict(model)\n",
"torch.save(state_dict, checkpoint_name)\n",
"print(state_dict)"
"peft_model_id = f\"{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}\"\n",
"model.save_pretrained(peft_model_id)"
]
},
{
@ -456,18 +420,68 @@
"name": "stdout",
"output_type": "stream",
"text": [
"3,8M\tfinancial_sentiment_analysis_prefix_tuning_v1.pt\r\n"
"3,8M\tt5-large_PREFIX_TUNING_SEQ_2_SEQ_LM/adapter_model.bin\r\n"
]
}
],
"source": [
"!du -h $checkpoint_name"
"ckpt = f\"{peft_model_id}/adapter_model.bin\"\n",
"!du -h $ckpt"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "76c2fc29",
"metadata": {},
"outputs": [],
"source": [
"from peft import PeftModel, PeftConfig\n",
"peft_model_id = f\"{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}\"\n",
"\n",
"config = PeftConfig.from_pretrained(peft_model_id)\n",
"model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path)\n",
"model = PeftModel.from_pretrained(model, peft_model_id)\n"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "d997f1cc",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Acando AB ( ACANB SS ) fell 8.9 percent to 13.35 kronor , the lowest close since Dec. 11 .\n",
"{'input_ids': tensor([[ 4292, 232, 32, 3, 5359, 41, 3, 22029, 14972, 3,\n",
" 4256, 3, 61, 4728, 4848, 1298, 1093, 12, 8808, 2469,\n",
" 3, 22318, 29, 127, 3, 6, 8, 7402, 885, 437,\n",
" 4451, 5, 850, 3, 5, 1]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n",
" 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}\n",
"tensor([[ 0, 2841, 1]])\n",
"['negative']\n"
]
}
],
"source": [
"model.eval()\n",
"i = 107\n",
"inputs = tokenizer(dataset[\"validation\"][text_column][i], return_tensors=\"pt\")\n",
"print(dataset[\"validation\"][text_column][i])\n",
"print(inputs)\n",
"\n",
"with torch.no_grad():\n",
" outputs = model.generate(input_ids=inputs[\"input_ids\"], max_new_tokens=10)\n",
" print(outputs)\n",
" print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "76c2fc29",
"id": "fb746c1e",
"metadata": {},
"outputs": [],
"source": []
@ -475,7 +489,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.10.5 64-bit",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},

View File

@ -0,0 +1,7 @@
# Fine-tuning for image classification using LoRA and 🤗 PEFT
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/peft/blob/main/examples/image_classification/image_classification_peft_lora.ipynb)
We provide a notebook (`image_classification_peft_lora.ipynb`) where we learn how to use [LoRA](https://arxiv.org/abs/2106.09685) from 🤗 PEFT to fine-tune an image classification model by ONLY using **0.7%** of the original trainable parameters of the model.
LoRA adds low-rank "update matrices" to certain blocks in the underlying model (in this case the attention blocks) and ONLY trains those matrices during fine-tuning. During inference, these update matrices are _merged_ with the original model parameters. For more details, check out the [original LoRA paper](https://arxiv.org/abs/2106.09685).

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,54 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "kdOhtpergLCQ"
},
"outputs": [],
"source": [
"!git clone https://huggingface.co/spaces/smangrul/peft-lora-sd-dreambooth"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "_LuGk9mihPx7"
},
"outputs": [],
"source": [
"%cd \"peft-lora-sd-dreambooth\"\n",
"!pip install -r requirements.txt"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "BYKO8e5ElJOX"
},
"outputs": [],
"source": [
"!python colab.py"
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"provenance": []
},
"gpuClass": "premium",
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
}

File diff suppressed because one or more lines are too long

View File

@ -2,7 +2,6 @@ transformers
accelerate
loralib
evaluate
deepspeed
tqdm
datasets
diffusers

View File

@ -11,6 +11,7 @@ import warnings
from pathlib import Path
from typing import Optional
import numpy as np
import torch
import torch.nn.functional as F
import torch.utils.checkpoint
@ -24,7 +25,13 @@ from transformers import AutoTokenizer, PretrainedConfig
import datasets
import diffusers
import psutil
from diffusers import AutoencoderKL, DDPMScheduler, DiffusionPipeline, UNet2DConditionModel
from diffusers import (
AutoencoderKL,
DDPMScheduler,
DiffusionPipeline,
DPMSolverMultistepScheduler,
UNet2DConditionModel,
)
from diffusers.optimization import get_scheduler
from diffusers.utils import check_min_version
from diffusers.utils.import_utils import is_xformers_available
@ -129,6 +136,27 @@ def parse_args(input_args=None):
" class_data_dir, additional images will be sampled with class_prompt."
),
)
parser.add_argument(
"--validation_prompt",
type=str,
default=None,
help="A prompt that is used during validation to verify that the model is learning.",
)
parser.add_argument(
"--num_validation_images",
type=int,
default=4,
help="Number of images that should be generated during validation with `validation_prompt`.",
)
parser.add_argument(
"--validation_steps",
type=int,
default=100,
help=(
"Run dreambooth validation every X steps. Dreambooth validation consists of running the prompt"
" `args.validation_prompt` multiple times: `args.num_validation_images`."
),
)
parser.add_argument(
"--output_dir",
type=str,
@ -948,6 +976,54 @@ def main(args):
progress_bar.set_postfix(**logs)
accelerator.log(logs, step=global_step)
if (
args.validation_prompt is not None
and (step + num_update_steps_per_epoch * epoch) % args.validation_steps == 0
):
logger.info(
f"Running validation... \n Generating {args.num_validation_images} images with prompt:"
f" {args.validation_prompt}."
)
# create pipeline
pipeline = DiffusionPipeline.from_pretrained(
args.pretrained_model_name_or_path,
safety_checker=None,
revision=args.revision,
)
# set `keep_fp32_wrapper` to True because we do not want to remove
# mixed precision hooks while we are still training
pipeline.unet = accelerator.unwrap_model(unet, keep_fp32_wrapper=True)
pipeline.text_encoder = accelerator.unwrap_model(text_encoder, keep_fp32_wrapper=True)
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config)
pipeline = pipeline.to(accelerator.device)
pipeline.set_progress_bar_config(disable=True)
# run inference
generator = torch.Generator(device=accelerator.device).manual_seed(args.seed)
images = []
for _ in range(args.num_validation_images):
image = pipeline(args.validation_prompt, num_inference_steps=25, generator=generator).images[0]
images.append(image)
for tracker in accelerator.trackers:
if tracker.name == "tensorboard":
np_images = np.stack([np.asarray(img) for img in images])
tracker.writer.add_images("validation", np_images, epoch, dataformats="NHWC")
if tracker.name == "wandb":
import wandb
tracker.log(
{
"validation": [
wandb.Image(image, caption=f"{i}: {args.validation_prompt}")
for i, image in enumerate(images)
]
}
)
del pipeline
torch.cuda.empty_cache()
if global_step >= args.max_train_steps:
break
# Printing the GPU memory usage details such as allocated memory, peak memory, and total memory usage

View File

@ -0,0 +1,7 @@
# Fine-tuning for semantic segmentation using LoRA and 🤗 PEFT
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/peft/blob/main/examples/semantic_segmentation/semantic_segmentation_peft_lora.ipynb)
We provide a notebook (`semantic_segmentation_peft_lora.ipynb`) where we learn how to use [LoRA](https://arxiv.org/abs/2106.09685) from 🤗 PEFT to fine-tune an semantic segmentation by ONLY using **14%%** of the original trainable parameters of the model.
LoRA adds low-rank "update matrices" to certain blocks in the underlying model (in this case the attention blocks) and ONLY trains those matrices during fine-tuning. During inference, these update matrices are _merged_ with the original model parameters. For more details, check out the [original LoRA paper](https://arxiv.org/abs/2106.09685).

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -2,6 +2,5 @@ transformers
accelerate
loralib
evaluate
deepspeed
tqdm
datasets

File diff suppressed because one or more lines are too long

View File

@ -2,7 +2,6 @@ transformers
accelerate
loralib
evaluate
deepspeed
tqdm
datasets
Pillow

View File

@ -1,4 +1,4 @@
# Copyright 2021 The HuggingFace Team. All rights reserved.
# Copyright 2023 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@ -22,7 +22,7 @@ extras["dev"] = extras["quality"] + extras["docs_specific"]
setup(
name="peft",
version="0.0.2",
version="0.1.0",
description="Parameter-Efficient Fine-Tuning (PEFT)",
long_description=open("README.md", "r", encoding="utf-8").read(),
long_description_content_type="text/markdown",
@ -30,7 +30,7 @@ setup(
license="Apache",
author="The HuggingFace team",
author_email="sourab@huggingface.co",
url="https://github.com/huggingface/pets",
url="https://github.com/huggingface/peft",
package_dir={"": "src"},
packages=find_packages("src"),
entry_points={},
@ -43,7 +43,7 @@ setup(
"torch>=1.13.0",
"transformers",
"accelerate",
"loralib",
"bitsandbytes",
],
extras_require=extras,
classifiers=[
@ -71,9 +71,7 @@ setup(
# twine upload dist/* -r pypitest
# twine upload dist/* -r pypitest --repository-url=https://test.pypi.org/legacy/
# 6. Check that you can install it in a virtualenv by running:
# pip install -i https://testpypi.python.org/pypi accelerate
# accelerate env
# accelerate test
# pip install -i https://testpypi.python.org/pypi peft
# 7. Upload the final version to actual pypi:
# twine upload dist/* -r pypi
# 8. Add release notes to the tag in github once everything is looking hunky-dory.

View File

@ -17,7 +17,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
__version__ = "0.0.2"
__version__ = "0.1.0"
from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING, PEFT_TYPE_TO_CONFIG_MAPPING, get_peft_config, get_peft_model
from .peft_model import (
@ -40,13 +40,13 @@ from .tuners import (
PromptTuningInit,
)
from .utils import (
TRANSFORMERS_MODELS_TO_PREFIX_TUNING_POSTPROCESS_MAPPING,
PeftConfig,
PeftType,
PromptLearningConfig,
TaskType,
bloom_model_postprocess_past_key_value,
get_peft_model_state_dict,
peft_model_load_and_dispatch,
set_peft_model_state_dict,
shift_tokens_right,
)

View File

@ -14,13 +14,14 @@
# limitations under the License.
from .peft_model import (
PeftModel,
PeftModelForCausalLM,
PeftModelForSeq2SeqLM,
PeftModelForSequenceClassification,
PeftModelForTokenClassification,
)
from .tuners import LoraConfig, PrefixTuningConfig, PromptEncoderConfig, PromptTuningConfig
from .utils import PeftType
from .utils import PromptLearningConfig
MODEL_TYPE_TO_PEFT_MODEL_MAPPING = {
@ -133,9 +134,12 @@ def get_peft_model(model, peft_config):
"""
model_config = model.config.to_dict()
if peft_config.peft_type != PeftType.LORA:
peft_config = _prepare_prompt_learning_config(peft_config, model_config)
else:
peft_config.base_model_name_or_path = model.__dict__.get("name_or_path", None)
if peft_config.task_type not in MODEL_TYPE_TO_PEFT_MODEL_MAPPING.keys():
peft_config = _prepare_lora_config(peft_config, model_config)
return PeftModel(model, peft_config)
if not isinstance(peft_config, PromptLearningConfig):
peft_config = _prepare_lora_config(peft_config, model_config)
else:
peft_config = _prepare_prompt_learning_config(peft_config, model_config)
return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](model, peft_config)

View File

@ -14,18 +14,36 @@
# limitations under the License.
import inspect
import os
import warnings
import torch
from accelerate import dispatch_model, infer_auto_device_map
from accelerate.hooks import AlignDevicesHook, add_hook_to_module, remove_hook_from_submodules
from accelerate.utils import get_balanced_memory
from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
from transformers import PreTrainedModel
from transformers.modeling_outputs import SequenceClassifierOutput, TokenClassifierOutput
from transformers.utils import PushToHubMixin
from huggingface_hub import hf_hub_download
from .tuners import LoraModel, PrefixEncoder, PromptEmbedding, PromptEncoder
from .utils import PeftConfig, PeftType, TaskType, _set_trainable, shift_tokens_right
from .utils import (
TRANSFORMERS_MODELS_TO_PREFIX_TUNING_POSTPROCESS_MAPPING,
WEIGHTS_NAME,
PeftConfig,
PeftType,
PromptLearningConfig,
TaskType,
_set_trainable,
get_peft_model_state_dict,
set_peft_model_state_dict,
shift_tokens_right,
)
class PeftModel(torch.nn.Module):
class PeftModel(PushToHubMixin, torch.nn.Module):
"""
Parameter-Efficient Fine-Tuning Model. Base model encompassing various Peft methods.
@ -39,14 +57,14 @@ class PeftModel(torch.nn.Module):
- **peft_config** ([`PeftConfig`]) -- The configuration of the Peft model.
- **modules_to_save** (`list` of `str`) -- The list of sub-module names to save when
saving the model.
- **prompt_encoder** ([`PromptEncoder`]) -- The prompt encoder used for Peft if `peft_config.peft_type
!= PeftType.LORA`.
- **prompt_encoder** ([`PromptEncoder`]) -- The prompt encoder used for Peft if
`isinstance(self.peft_config, PromptLearningConfig)`.
- **prompt_tokens** (`torch.Tensor`) -- The virtual prompt tokens used for Peft if
`peft_config.peft_type != PeftType.LORA`.
`isinstance(self.peft_config, PromptLearningConfig)`.
- **transformer_backbone_name** (`str`) -- The name of the transformer
backbone in the base model if `peft_config.peft_type != PeftType.LORA`.
backbone in the base model if `isinstance(self.peft_config, PromptLearningConfig)`.
- **word_embeddings** (`torch.nn.Embedding`) -- The word embeddings of the transformer backbone
in the base model if `peft_config.peft_type != PeftType.LORA`.
in the base model if `isinstance(self.peft_config, PromptLearningConfig)`.
"""
def __init__(self, model, peft_config: PeftConfig):
@ -55,12 +73,114 @@ class PeftModel(torch.nn.Module):
self.base_model = model
self.config = self.base_model.config
self.modules_to_save = None
if peft_config.peft_type != PeftType.LORA:
if isinstance(self.peft_config, PromptLearningConfig):
self._setup_prompt_encoder()
else:
self.base_model = LoraModel(peft_config, model)
if getattr(self.peft_config, "modules_to_save", None) is not None:
self.modules_to_save = self.peft_config.modules_to_save
_set_trainable(self)
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
def save_pretrained(self, save_directory, **kwargs):
r"""
Args:
This function saves the adapter model and the adapter configuration files to a directory, so that it can be
re-loaded using the `LoraModel.from_pretrained` class method, and also used by the `LoraModel.push_to_hub`
method.
save_directory (`str`):
Directory where the adapter model and configuration files will be saved (will be created if it does not
exist).
**kwargs:
Additional keyword arguments passed along to the `push_to_hub` method.
"""
if os.path.isfile(save_directory):
raise ValueError(f"Provided path ({save_directory}) should be a directory, not a file")
os.makedirs(save_directory, exist_ok=True)
# save only the trainable weights
output_state_dict = get_peft_model_state_dict(self, kwargs.get("state_dict", None))
torch.save(output_state_dict, os.path.join(save_directory, WEIGHTS_NAME))
# save the config and change the inference mode to `True`
if self.peft_config.base_model_name_or_path is None:
self.peft_config.base_model_name_or_path = (
self.base_model.__dict__.get("name_or_path", None)
if isinstance(self.peft_config, PromptLearningConfig)
else self.base_model.model.__dict__.get("name_or_path", None)
)
inference_mode = self.peft_config.inference_mode
self.peft_config.inference_mode = True
self.peft_config.save_pretrained(save_directory)
self.peft_config.inference_mode = inference_mode
@classmethod
def from_pretrained(cls, model, model_id, **kwargs):
r"""
Args:
Instantiate a `LoraModel` from a pretrained Lora configuration and weights.
model (`transformers.PreTrainedModel`):
The model to be adapted. The model should be initialized with the `from_pretrained` method. from
`transformers` library.
model_id (`str`):
The name of the Lora configuration to use. Can be either:
- A string, the `model id` of a Lora configuration hosted inside a model repo on
huggingface Hub
- A path to a directory containing a Lora configuration file saved using the
`save_pretrained` method, e.g., ``./my_lora_config_directory/``.
"""
from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING, PEFT_TYPE_TO_CONFIG_MAPPING
# load the config
config = PEFT_TYPE_TO_CONFIG_MAPPING[PeftConfig.from_pretrained(model_id).peft_type].from_pretrained(model_id)
if getattr(model, "hf_device_map", None) is not None:
remove_hook_from_submodules(model)
if config.task_type not in MODEL_TYPE_TO_PEFT_MODEL_MAPPING.keys():
model = cls(model, config)
else:
model = MODEL_TYPE_TO_PEFT_MODEL_MAPPING[config.task_type](model, config)
# load weights if any
if os.path.exists(os.path.join(model_id, WEIGHTS_NAME)):
filename = os.path.join(model_id, WEIGHTS_NAME)
else:
try:
filename = hf_hub_download(model_id, WEIGHTS_NAME)
except: # noqa
raise ValueError(
f"Can't find weights for {model_id} in {model_id} or in the Hugging Face Hub. "
f"Please check that the file {WEIGHTS_NAME} is present at {model_id}."
)
adapters_weights = torch.load(filename)
# load the weights into the model
model = set_peft_model_state_dict(model, adapters_weights)
if getattr(model, "hf_device_map", None) is not None:
device_map = kwargs.get("device_map", "auto")
max_memory = kwargs.get("max_memory", None)
no_split_module_classes = model._no_split_modules
if device_map != "sequential":
max_memory = get_balanced_memory(
model,
max_memory=max_memory,
no_split_module_classes=no_split_module_classes,
low_zero=(device_map == "balanced_low_0"),
)
if isinstance(device_map, str):
device_map = infer_auto_device_map(
model, max_memory=max_memory, no_split_module_classes=no_split_module_classes
)
model = dispatch_model(model, device_map=device_map)
hook = AlignDevicesHook(io_same_device=True)
if model.peft_config.peft_type == PeftType.LORA:
add_hook_to_module(model.base_model.model, hook)
else:
remove_hook_from_submodules(model.prompt_encoder)
add_hook_to_module(model.base_model, hook)
return model
def _setup_prompt_encoder(self):
num_transformer_submodules = 0
transformer_backbone = None
@ -127,8 +247,8 @@ class PeftModel(torch.nn.Module):
past_key_values = past_key_values.permute([2, 0, 3, 1, 4]).split(
self.peft_config.num_transformer_submodules * 2
)
if self.peft_config.postprocess_past_key_value_function is not None:
post_process_fn = self.peft_config.postprocess_past_key_value_function
if TRANSFORMERS_MODELS_TO_PREFIX_TUNING_POSTPROCESS_MAPPING.get(self.config.model_type, None) is not None:
post_process_fn = TRANSFORMERS_MODELS_TO_PREFIX_TUNING_POSTPROCESS_MAPPING[self.config.model_type]
past_key_values = post_process_fn(past_key_values)
return past_key_values
else:
@ -159,6 +279,15 @@ class PeftModel(torch.nn.Module):
except AttributeError:
return getattr(self.base_model, name)
def forward(self, *args, **kwargs):
"""
Forward pass of the model.
"""
if isinstance(self.peft_config, PromptLearningConfig):
return self.base_model(*args, **kwargs)
else:
return self.base_model.model(*args, **kwargs)
class PeftModelForSequenceClassification(PeftModel):
"""
@ -211,7 +340,7 @@ class PeftModelForSequenceClassification(PeftModel):
):
return_dict = return_dict if return_dict is not None else self.config.use_return_dict
if self.peft_config.peft_type == PeftType.LORA:
if not isinstance(self.peft_config, PromptLearningConfig):
return self.base_model(
input_ids=input_ids,
attention_mask=attention_mask,
@ -368,7 +497,7 @@ class PeftModelForCausalLM(PeftModel):
return_dict=None,
**kwargs,
):
if self.peft_config.peft_type == PeftType.LORA:
if not isinstance(self.peft_config, PromptLearningConfig):
return self.base_model(
input_ids=input_ids,
attention_mask=attention_mask,
@ -417,7 +546,7 @@ class PeftModelForCausalLM(PeftModel):
return self.base_model(inputs_embeds=inputs_embeds, **kwargs)
def generate(self, **kwargs):
if self.peft_config.peft_type == PeftType.LORA:
if not isinstance(self.peft_config, PromptLearningConfig):
return self.base_model.generate(**kwargs)
else:
if "input_ids" not in kwargs:
@ -438,17 +567,22 @@ class PeftModelForCausalLM(PeftModel):
)
kwargs["token_type_ids"] = None
if self.peft_config.peft_type == PeftType.PREFIX_TUNING:
batch_size = kwargs["input_ids"].shape[0]
past_key_values = self.get_prompt(batch_size)
kwargs["past_key_values"] = past_key_values
return self.base_model.generate(**kwargs)
else:
raise NotImplementedError
return self.base_model.generate(**kwargs)
def prepare_inputs_for_generation(self, *args, **kwargs):
model_kwargs = self.base_model_prepare_inputs_for_generation(*args, **kwargs)
model_kwargs["past_key_values"] = kwargs.get("past", None) or kwargs.get("past_key_values", None)
if isinstance(self.peft_config, PromptLearningConfig):
if model_kwargs["past_key_values"] is None and self.peft_config.peft_type == PeftType.PREFIX_TUNING:
past_key_values = self.get_prompt(batch_size=model_kwargs["input_ids"].shape[0])
model_kwargs["past_key_values"] = past_key_values
else:
if model_kwargs["past_key_values"] is None:
prompts = self.get_prompt(batch_size=model_kwargs["input_ids"].shape[0])
model_kwargs["inputs_embeds"] = torch.cat(
(prompts, self.word_embeddings(model_kwargs["input_ids"])), dim=1
)
model_kwargs["input_ids"] = None
return model_kwargs
@ -499,7 +633,7 @@ class PeftModelForSeq2SeqLM(PeftModel):
return_dict=None,
**kwargs,
):
if self.peft_config.peft_type == PeftType.LORA:
if not isinstance(self.peft_config, PromptLearningConfig):
return self.base_model(
input_ids=input_ids,
attention_mask=attention_mask,
@ -567,7 +701,7 @@ class PeftModelForSeq2SeqLM(PeftModel):
return self.base_model(inputs_embeds=inputs_embeds, decoder_inputs_embeds=decoder_inputs_embeds, **kwargs)
def generate(self, **kwargs):
if self.peft_config.peft_type == PeftType.LORA:
if not isinstance(self.peft_config, PromptLearningConfig):
return self.base_model.generate(**kwargs)
else:
if "input_ids" not in kwargs:
@ -582,25 +716,16 @@ class PeftModelForSeq2SeqLM(PeftModel):
kwargs["token_type_ids"] = None
if self.peft_config.peft_type == PeftType.PREFIX_TUNING:
batch_size = kwargs["input_ids"].shape[0]
past_key_values = self.get_prompt(batch_size)
kwargs["past_key_values"] = past_key_values
return self.base_model.generate(**kwargs)
else:
raise NotImplementedError
def prepare_inputs_for_generation(self, *args, **kwargs):
model_kwargs = self.base_model_prepare_inputs_for_generation(*args, **kwargs)
model_kwargs["past_key_values"] = kwargs.get("past", None) or kwargs.get("past_key_values", None)
return model_kwargs
def _prepare_encoder_decoder_kwargs_for_generation(self, inputs_tensor, model_kwargs, model_input_name=None):
past_key_values = model_kwargs.get("past_key_values", None)
model_kwargs["past_key_values"] = None
model_kwargs = self.base_model_prepare_encoder_decoder_kwargs_for_generation(
inputs_tensor, model_kwargs, model_input_name
)
model_kwargs["past_key_values"] = past_key_values
if model_kwargs["past_key_values"] is None and self.peft_config.peft_type == PeftType.PREFIX_TUNING:
batch_size = model_kwargs["decoder_input_ids"].shape[0]
past_key_values = self.get_prompt(batch_size)
model_kwargs["past_key_values"] = past_key_values
return model_kwargs
@ -655,7 +780,7 @@ class PeftModelForTokenClassification(PeftModel):
):
return_dict = return_dict if return_dict is not None else self.config.use_return_dict
if self.peft_config.peft_type == PeftType.LORA:
if not isinstance(self.peft_config, PromptLearningConfig):
return self.base_model(
input_ids=input_ids,
attention_mask=attention_mask,

View File

@ -12,7 +12,6 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import math
import warnings
from dataclasses import asdict, dataclass, field
@ -24,8 +23,7 @@ import torch.nn as nn
import torch.nn.functional as F
from transformers.pytorch_utils import Conv1D
import loralib as lora # noqa: F401
from loralib import mark_only_lora_as_trainable
import bitsandbytes as bnb
from ..utils import PeftConfig, PeftType, transpose
@ -45,6 +43,8 @@ class LoraConfig(PeftConfig):
fan_in_fan_out (`bool`): Set this to True if the layer to replace stores weight like (fan_in, fan_out)
enable_lora ( `List[bool]`): Used with `lora.MergedLinear`.
bias (`str`): Bias type for Lora. Can be 'none', 'all' or 'lora_only'
modules_to_save (`List[str]`):List of modules apart from LoRA layers to be set as trainable
and saved in the final checkpoint.
"""
r: int = field(default=8, metadata={"help": "Lora attention dimension"})
@ -60,6 +60,14 @@ class LoraConfig(PeftConfig):
)
enable_lora: Optional[List[bool]] = field(default=None, metadata={"help": "Used with `lora.MergedLinear`."})
bias: str = field(default="none", metadata={"help": "Bias type for Lora. Can be 'none', 'all' or 'lora_only'"})
modules_to_save: Optional[List[str]] = field(
default=None,
metadata={
"help": "List of modules apart from LoRA layers to be set as trainable and saved in the final checkpoint. "
"For example, in Sequence Classification or Token Classification tasks, "
"the final layer `classifier/score` are randomly initialized and as such need to be trainable and saved."
},
)
def __post_init__(self):
self.peft_type = PeftType.LORA
@ -95,8 +103,10 @@ class LoraModel(torch.nn.Module):
self.model = model
self._find_and_replace()
mark_only_lora_as_trainable(self.model, self.peft_config.bias)
self.forward = self.model.forward
def _find_and_replace(self):
is_target_modules_in_base_model = False
kwargs = {
"r": self.peft_config.r,
"lora_alpha": self.peft_config.lora_alpha,
@ -107,9 +117,21 @@ class LoraModel(torch.nn.Module):
key_list = [key for key, _ in self.model.named_modules()]
for key in key_list:
if any(key.endswith(target_key) for target_key in self.peft_config.target_modules):
if not is_target_modules_in_base_model:
is_target_modules_in_base_model = True
parent, target, target_name = self._get_submodules(key)
bias = target.bias is not None
if isinstance(target, torch.nn.Linear) and self.peft_config.enable_lora is None:
if isinstance(target, bnb.nn.Linear8bitLt) and self.peft_config.enable_lora is None:
kwargs.update(
{
"has_fp16_weights": target.state.has_fp16_weights,
"memory_efficient_backward": target.state.memory_efficient_backward,
"threshold": target.state.threshold,
"index": target.index,
}
)
new_module = Linear8bitLt(target.in_features, target.out_features, bias=bias, **kwargs)
elif isinstance(target, torch.nn.Linear) and self.peft_config.enable_lora is None:
new_module = Linear(target.in_features, target.out_features, bias=bias, **kwargs)
elif self.peft_config.enable_lora is not None:
kwargs.update({"enable_lora": self.peft_config.enable_lora})
@ -125,6 +147,11 @@ class LoraModel(torch.nn.Module):
kwargs["fan_in_fan_out"] = False
new_module = MergedLinear(in_features, out_features, bias=bias, **kwargs)
self._replace_module(parent, target_name, new_module, target)
if not is_target_modules_in_base_model:
raise ValueError(
f"Target modules {self.peft_config.target_modules} not found in the base model. "
f"Please check the target modules and try again."
)
def _get_submodules(self, key):
parent = self.model.get_submodule(".".join(key.split(".")[:-1]))
@ -137,9 +164,9 @@ class LoraModel(torch.nn.Module):
new_module.weight = old_module.weight
if old_module.bias is not None:
new_module.bias = old_module.bias
def forward(self, *args, **kwargs):
return self.model(*args, **kwargs)
if getattr(old_module, "state", None) is not None:
new_module.state = old_module.state
new_module.to(old_module.weight.device)
def __getattr__(self, name: str):
"""Forward missing attributes to the wrapped module."""
@ -349,3 +376,66 @@ class MergedLinear(nn.Linear, LoraLayer):
after_B = self.lora_B(after_A.transpose(-2, -1)).transpose(-2, -1)
result += self.zero_pad(after_B) * self.scaling
return result
class Linear8bitLt(bnb.nn.Linear8bitLt, LoraLayer):
# Lora implemented in a dense layer
def __init__(
self,
in_features,
out_features,
r: int = 0,
lora_alpha: int = 1,
lora_dropout: float = 0.0,
**kwargs,
):
bnb.nn.Linear8bitLt.__init__(
self,
in_features,
out_features,
bias=kwargs.get("bias", True),
has_fp16_weights=kwargs.get("has_fp16_weights", True),
memory_efficient_backward=kwargs.get("memory_efficient_backward", False),
threshold=kwargs.get("threshold", 0.0),
index=kwargs.get("index", None),
)
LoraLayer.__init__(self, r=r, lora_alpha=lora_alpha, lora_dropout=lora_dropout, merge_weights=False)
# Actual trainable parameters
if r > 0:
self.lora_A = nn.Linear(in_features, r, bias=False)
self.lora_B = nn.Linear(r, out_features, bias=False)
self.scaling = self.lora_alpha / self.r
# Freezing the pre-trained weight matrix
self.weight.requires_grad = False
self.reset_parameters()
def reset_parameters(self):
if hasattr(self, "lora_A"):
# initialize A the same way as the default for nn.Linear and B to zero
nn.init.kaiming_uniform_(self.lora_A.weight, a=math.sqrt(5))
nn.init.zeros_(self.lora_B.weight)
def forward(self, x: torch.Tensor):
result = super().forward(x)
if self.r > 0:
result += self.lora_B(self.lora_A(self.lora_dropout(x))) * self.scaling
return result
# had to adapt it for `lora_only` to work
def mark_only_lora_as_trainable(model: nn.Module, bias: str = "none") -> None:
for n, p in model.named_parameters():
if "lora_" not in n:
p.requires_grad = False
if bias == "none":
return
elif bias == "all":
for n, p in model.named_parameters():
if "bias" in n:
p.requires_grad = True
elif bias == "lora_only":
for m in model.modules():
if isinstance(m, LoraLayer) and hasattr(m, "bias") and m.bias is not None:
m.bias.requires_grad = True
else:
raise NotImplementedError

View File

@ -15,7 +15,6 @@
from dataclasses import dataclass, field
from typing import Callable, Optional
import torch
@ -30,7 +29,6 @@ class PrefixTuningConfig(PromptLearningConfig):
Args:
encoder_hidden_size (`int`): The hidden size of the prompt encoder.
prefix_projection (`bool`): Whether to project the prefix embeddings.
postprocess_past_key_value_function (`Callable`, *optional*): The function to postprocess the past key value.
"""
encoder_hidden_size: int = field(
@ -41,10 +39,6 @@ class PrefixTuningConfig(PromptLearningConfig):
default=False,
metadata={"help": "Whether to project the prefix tokens"},
)
postprocess_past_key_value_function: Optional[Callable] = field(
default=None,
metadata={"help": "The function to postprocess the past key value"},
)
def __post_init__(self):
self.peft_type = PeftType.PREFIX_TUNING

View File

@ -17,6 +17,13 @@
# See the License for the specific language governing permissions and
# limitations under the License.
from .adapters_utils import CONFIG_NAME, WEIGHTS_NAME
from .config import PeftConfig, PeftType, PromptLearningConfig, TaskType
from .other import _set_trainable, bloom_model_postprocess_past_key_value, shift_tokens_right, transpose
from .save_and_load import get_peft_model_state_dict, peft_model_load_and_dispatch, set_peft_model_state_dict
from .other import (
TRANSFORMERS_MODELS_TO_PREFIX_TUNING_POSTPROCESS_MAPPING,
_set_trainable,
bloom_model_postprocess_past_key_value,
shift_tokens_right,
transpose,
)
from .save_and_load import get_peft_model_state_dict, set_peft_model_state_dict

View File

@ -0,0 +1,18 @@
# coding=utf-8
# Copyright 2023-present the HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
WEIGHTS_NAME = "adapter_model.bin"
CONFIG_NAME = "adapter_config.json"
# TODO: add automapping and superclass here?

View File

@ -12,11 +12,18 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import enum
from dataclasses import dataclass, field
import json
import os
from dataclasses import asdict, dataclass, field
from typing import Optional, Union
from transformers.utils import PushToHubMixin
from huggingface_hub import hf_hub_download
from .adapters_utils import CONFIG_NAME
class PeftType(str, enum.Enum):
PROMPT_TUNING = "PROMPT_TUNING"
@ -33,7 +40,94 @@ class TaskType(str, enum.Enum):
@dataclass
class PeftConfig:
class PeftConfigMixin(PushToHubMixin):
r"""
This is the base configuration class for PEFT adapter models. It contains all the methods that are common to all
PEFT adapter models. This class inherits from `transformers.utils.PushToHubMixin` which contains the methods to
push your model to the Hub. The method `save_pretrained` will save the configuration of your adapter model in a
directory. The method `from_pretrained` will load the configuration of your adapter model from a directory.
Args:
peft_type (Union[[`~peft.utils.config.PeftType`], `str`]): The type of Peft method to use.
"""
peft_type: Optional[PeftType] = field(default=None, metadata={"help": "The type of PEFT model."})
@property
def __dict__(self):
return asdict(self)
def to_dict(self):
return self.__dict__
def save_pretrained(self, save_directory, **kwargs):
r"""
This method saves the configuration of your adapter model in a directory.
Args:
save_directory (`str`):
The directory where the configuration will be saved.
**kwargs:
Additional keyword arguments passed along to the `transformers.utils.PushToHubMixin.push_to_hub`
method.
"""
if os.path.isfile(save_directory):
raise AssertionError(f"Provided path ({save_directory}) should be a directory, not a file")
os.makedirs(save_directory, exist_ok=True)
output_dict = self.__dict__
output_path = os.path.join(save_directory, CONFIG_NAME)
# save it
with open(output_path, "w") as writer:
writer.write(json.dumps(output_dict, indent=2, sort_keys=True))
@classmethod
def from_pretrained(cls, pretrained_model_name_or_path, **kwargs):
r"""
This method loads the configuration of your adapter model from a directory.
Args:
pretrained_model_name_or_path (`str`):
The directory or the hub-id where the configuration is saved.
**kwargs:
Additional keyword arguments passed along to the child class initialization.
"""
if os.path.isfile(os.path.join(pretrained_model_name_or_path, CONFIG_NAME)):
config_file = os.path.join(pretrained_model_name_or_path, CONFIG_NAME)
else:
try:
config_file = hf_hub_download(pretrained_model_name_or_path, CONFIG_NAME)
except:
raise ValueError(f"Can't find config.json at '{pretrained_model_name_or_path}'")
loaded_attributes = cls.from_json_file(config_file)
config = cls(**kwargs)
for key, value in loaded_attributes.items():
if hasattr(config, key):
setattr(config, key, value)
return config
@classmethod
def from_json_file(cls, path_json_file, **kwargs):
r"""
Loads a configuration file from a json file.
Args:
path_json_file (`str`):
The path to the json file.
"""
with open(path_json_file, "r") as file:
json_object = json.load(file)
return json_object
@dataclass
class PeftConfig(PeftConfigMixin):
"""
This is the base configuration class to store the configuration of a :class:`~peft.PeftModel`.
@ -43,6 +137,7 @@ class PeftConfig:
inference_mode (`bool`, defaults to `False`): Whether to use the Peft model in inference mode.
"""
base_model_name_or_path: str = field(default=None, metadata={"help": "The name of the base model to use."})
peft_type: Union[str, PeftType] = field(default=None, metadata={"help": "Peft type"})
task_type: Union[str, TaskType] = field(default=None, metadata={"help": "Task type"})
inference_mode: bool = field(default=False, metadata={"help": "Whether to use inference mode"})

View File

@ -30,6 +30,11 @@ def bloom_model_postprocess_past_key_value(past_key_values):
return tuple(zip(keys, values))
TRANSFORMERS_MODELS_TO_PREFIX_TUNING_POSTPROCESS_MAPPING = {
"bloom": bloom_model_postprocess_past_key_value,
}
# copied from transformers.models.bart.modeling_bart
def shift_tokens_right(input_ids: torch.Tensor, pad_token_id: int, decoder_start_token_id: int):
"""

View File

@ -50,7 +50,10 @@ def get_peft_model_state_dict(model, state_dict=None):
raise NotImplementedError
else:
to_return = {}
prompt_embeddings = model.get_prompt_embedding_to_save()
if model.peft_config.inference_mode:
prompt_embeddings = model.prompt_encoder.embedding.weight
else:
prompt_embeddings = model.get_prompt_embedding_to_save()
to_return["prompt_embeddings"] = prompt_embeddings
if model.modules_to_save is not None:
for key, value in state_dict.items():
@ -74,35 +77,3 @@ def set_peft_model_state_dict(model, peft_model_state_dict):
{"weight": peft_model_state_dict["prompt_embeddings"]}, strict=True
)
return model
def peft_model_load_and_dispatch(model, peft_model_state_dict, peft_config, max_memory=None):
"""
Load the Peft model state dict and dispatch the model to the correct device.
Args:
model ([`PeftModel`]): The Pre-trained base model which has already been sharded and dispatched
using `accelerate` functionalities.
peft_model_state_dict (`dict`): The state dict of the Peft model.
max_memory (`Dict`, *optional*):
A dictionary device identifier to maximum memory. Will default to the maximum memory available for each GPU
and the available CPU RAM if unset.
"""
from accelerate import dispatch_model, infer_auto_device_map
from accelerate.hooks import AlignDevicesHook, add_hook_to_module, remove_hook_from_submodules
from ..mapping import get_peft_model
remove_hook_from_submodules(model)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()
set_peft_model_state_dict(model, peft_model_state_dict)
device_map = infer_auto_device_map(model, max_memory=max_memory, no_split_module_classes=model._no_split_modules)
model = dispatch_model(model, device_map=device_map)
hook = AlignDevicesHook(io_same_device=True)
if model.peft_config.peft_type == PeftType.LORA:
add_hook_to_module(model.base_model.model, hook)
else:
remove_hook_from_submodules(model.prompt_encoder)
add_hook_to_module(model.base_model, hook)
return model

96
tests/test_config.py Normal file
View File

@ -0,0 +1,96 @@
# coding=utf-8
# Copyright 2023-present the HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import tempfile
import unittest
from peft import LoraConfig, PrefixTuningConfig, PromptEncoderConfig, PromptTuningConfig
class PeftConfigTestMixin:
all_config_classes = (
LoraConfig,
PromptEncoderConfig,
PrefixTuningConfig,
PromptTuningConfig,
)
class PeftConfigTester(unittest.TestCase, PeftConfigTestMixin):
def test_methods(self):
r"""
Test if all configs have the expected methods. Here we test
- to_dict
- save_pretrained
- from_pretrained
- from_json_file
"""
# test if all configs have the expected methods
for config_class in self.all_config_classes:
config = config_class()
self.assertTrue(hasattr(config, "to_dict"))
self.assertTrue(hasattr(config, "save_pretrained"))
self.assertTrue(hasattr(config, "from_pretrained"))
self.assertTrue(hasattr(config, "from_json_file"))
def test_task_type(self):
for config_class in self.all_config_classes:
# assert this will not fail
_ = config_class(task_type="test")
def test_save_pretrained(self):
r"""
Test if the config is correctly saved and loaded using
- save_pretrained
"""
for config_class in self.all_config_classes:
config = config_class()
with tempfile.TemporaryDirectory() as tmp_dirname:
config.save_pretrained(tmp_dirname)
config_from_pretrained = config_class.from_pretrained(tmp_dirname)
self.assertEqual(config.to_dict(), config_from_pretrained.to_dict())
def test_from_json_file(self):
for config_class in self.all_config_classes:
config = config_class()
with tempfile.TemporaryDirectory() as tmp_dirname:
config.save_pretrained(tmp_dirname)
config_from_json = config_class.from_json_file(os.path.join(tmp_dirname, "adapter_config.json"))
self.assertEqual(config.to_dict(), config_from_json)
def test_to_dict(self):
r"""
Test if the config can be correctly converted to a dict using:
- to_dict
- __dict__
"""
for config_class in self.all_config_classes:
config = config_class()
self.assertEqual(config.to_dict(), config.__dict__)
self.assertTrue(isinstance(config.to_dict(), dict))
def test_set_attributes(self):
# manually set attributes and check if they are correctly written
for config_class in self.all_config_classes:
config = config_class(peft_type="test")
# save pretrained
with tempfile.TemporaryDirectory() as tmp_dirname:
config.save_pretrained(tmp_dirname)
config_from_pretrained = config_class.from_pretrained(tmp_dirname)
self.assertEqual(config.to_dict(), config_from_pretrained.to_dict())

136
tests/test_save_and_load.py Normal file
View File

@ -0,0 +1,136 @@
# coding=utf-8
# Copyright 2023-present the HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import tempfile
import unittest
import torch
from transformers import AutoModelForCausalLM
from peft import (
LoraConfig,
PeftModel,
PrefixTuningConfig,
PromptEncoderConfig,
PromptTuningConfig,
get_peft_model,
get_peft_model_state_dict,
)
class PeftTestMixin:
checkpoints_to_test = [
"hf-internal-testing/tiny-random-OPTForCausalLM",
]
config_classes = (
LoraConfig,
PrefixTuningConfig,
PromptEncoderConfig,
PromptTuningConfig,
)
config_kwargs = (
dict(
r=8,
lora_alpha=32,
target_modules=["q_proj", "v_proj"],
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
),
dict(
num_virtual_tokens=10,
task_type="CAUSAL_LM",
),
dict(
num_virtual_tokens=10,
encoder_hidden_size=32,
task_type="CAUSAL_LM",
),
dict(
num_virtual_tokens=10,
task_type="CAUSAL_LM",
),
)
class PeftModelTester(unittest.TestCase, PeftTestMixin):
r"""
Test if the PeftModel behaves as expected. This includes:
- test if the model has the expected methods
"""
def test_attributes_model(self):
for model_id in self.checkpoints_to_test:
for i, config_cls in enumerate(self.config_classes):
model = AutoModelForCausalLM.from_pretrained(model_id)
config = config_cls(
base_model_name_or_path=model_id,
**self.config_kwargs[i],
)
model = get_peft_model(model, config)
self.assertTrue(hasattr(model, "save_pretrained"))
self.assertTrue(hasattr(model, "from_pretrained"))
self.assertTrue(hasattr(model, "push_to_hub"))
def test_save_pretrained(self):
r"""
A test to check if `save_pretrained` behaves as expected. This function should only save the state dict of the
adapter model and not the state dict of the base model. Hence inside each saved directory you should have:
- README.md (that contains an entry `base_model`)
- adapter_config.json
- adapter_model.bin
"""
for model_id in self.checkpoints_to_test:
for i, config_cls in enumerate(self.config_classes):
model = AutoModelForCausalLM.from_pretrained(model_id)
config = config_cls(
base_model_name_or_path=model_id,
**self.config_kwargs[i],
)
model = get_peft_model(model, config)
model.to(model.device)
with tempfile.TemporaryDirectory() as tmp_dirname:
model.save_pretrained(tmp_dirname)
model_from_pretrained = AutoModelForCausalLM.from_pretrained(model_id)
model_from_pretrained = PeftModel.from_pretrained(model_from_pretrained, tmp_dirname)
model_from_pretrained.to(model.device)
# check if the state dicts are equal
state_dict = get_peft_model_state_dict(model)
state_dict_from_pretrained = get_peft_model_state_dict(model_from_pretrained)
# check if same keys
self.assertEqual(state_dict.keys(), state_dict_from_pretrained.keys())
# check if tensors equal
for key in state_dict.keys():
self.assertTrue(torch.allclose(state_dict[key], state_dict_from_pretrained[key]))
# check if `adapter_model.bin` is present
self.assertTrue(os.path.exists(os.path.join(tmp_dirname, "adapter_model.bin")))
# check if `adapter_config.json` is present
self.assertTrue(os.path.exists(os.path.join(tmp_dirname, "adapter_config.json")))
# check if `pytorch_model.bin` is not present
self.assertFalse(os.path.exists(os.path.join(tmp_dirname, "pytorch_model.bin")))
# check if `config.json` is not present
self.assertFalse(os.path.exists(os.path.join(tmp_dirname, "config.json")))