mirror of
https://github.com/huggingface/peft.git
synced 2025-10-20 23:43:47 +08:00
Compare commits
83 Commits
Author | SHA1 | Date | |
---|---|---|---|
29357d41eb | |||
f8e737648a | |||
b1af297707 | |||
85c7b98307 | |||
e41152e5f1 | |||
9f19ce6729 | |||
ae85e185ad | |||
93762cc658 | |||
ed608025eb | |||
14a293a6b3 | |||
c7b744db79 | |||
250edccdda | |||
1daf087682 | |||
d3d601d5c3 | |||
8083c9515f | |||
73cd16b7b5 | |||
65112b75bb | |||
3cf0b7a2d4 | |||
afb171eefb | |||
b07ea17f49 | |||
83ded43ee7 | |||
537c971a47 | |||
ed0c962ff5 | |||
eec0b9329d | |||
1929a84e1e | |||
522a6b6c17 | |||
462b65fe45 | |||
2b89fbf963 | |||
b5c97f2039 | |||
64d2d19598 | |||
a7dd034710 | |||
ed0bcdac4f | |||
bdeb3778d0 | |||
185c852088 | |||
a1b7e42783 | |||
3c4b64785f | |||
ab43d6aa5c | |||
3cf7034e9c | |||
ddb37c353c | |||
dbe3b9b99e | |||
5bc815e2e2 | |||
5a43a3a321 | |||
7ae63299a8 | |||
57de1d2677 | |||
383b5abb33 | |||
d8ccd7d84c | |||
df5b201c6b | |||
44d8e72ca8 | |||
c37ee25be7 | |||
c884daf96a | |||
fcd213708d | |||
915a5db0c6 | |||
d53a631608 | |||
b4d0885203 | |||
d04f6661ee | |||
80e1b262e5 | |||
dd518985ff | |||
a17cea104e | |||
3f9b310c6a | |||
06e49c0a87 | |||
6cf2cf5dae | |||
3faaf0916a | |||
6c9534e660 | |||
22295c4278 | |||
16182ea972 | |||
ad69958e52 | |||
f8a2829318 | |||
634f3692d8 | |||
2cc7f2cbac | |||
2896cf05fb | |||
776a28f053 | |||
d75746be70 | |||
1dbe7fc0db | |||
ff8a5b9a69 | |||
36267af51b | |||
fef162cff8 | |||
a8587916c8 | |||
77670ead76 | |||
360fb2f816 | |||
a40f20ad6c | |||
407482eb37 | |||
d9e7d6cd22 | |||
dbf438f99d |
6
Makefile
6
Makefile
@ -1,6 +1,6 @@
|
||||
.PHONY: quality style test docs
|
||||
|
||||
check_dirs := src examples
|
||||
check_dirs := src tests examples
|
||||
|
||||
# Check that source code meets quality standards
|
||||
|
||||
@ -9,11 +9,11 @@ quality:
|
||||
black --check $(check_dirs)
|
||||
isort --check-only $(check_dirs)
|
||||
flake8 $(check_dirs)
|
||||
doc-builder style src --max_len 119 --check_only
|
||||
doc-builder style src tests --max_len 119 --check_only
|
||||
|
||||
# Format source code automatically and check is there are any problems left that need manual fixing
|
||||
style:
|
||||
black $(check_dirs)
|
||||
isort $(check_dirs)
|
||||
doc-builder style src --max_len 119
|
||||
doc-builder style src tests --max_len 119
|
||||
|
60
README.md
60
README.md
@ -21,7 +21,7 @@ limitations under the License.
|
||||
|
||||
Parameter-Efficient Fine-Tuning (PEFT) methods enable efficient adaptation of pre-trained language models (PLMs) to various downstream applications without fine-tuning all the model's parameters. Fine-tuning large-scale PLMs is often prohibitively costly. In this regard, PEFT methods only fine-tune a small number of (extra) model parameters, thereby greatly decreasing the computational and storage costs. Recent State-of-the-Art PEFT techniques achieve performance comparable to that of full fine-tuning.
|
||||
|
||||
Seamlessly integrated with 🤗 Accelerate for large scale models leveraging PyTorch FSDP.
|
||||
Seamlessly integrated with 🤗 Accelerate for large scale models leveraging DeepSpeed and Big Model Inference.
|
||||
|
||||
Supported methods:
|
||||
|
||||
@ -34,11 +34,11 @@ Supported methods:
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForSeq2SeqLM
|
||||
from peft import get_peft_config, get_peft_model, LoRAConfig, TaskType
|
||||
from peft import get_peft_config, get_peft_model, LoraConfig, TaskType
|
||||
model_name_or_path = "bigscience/mt0-large"
|
||||
tokenizer_name_or_path = "bigscience/mt0-large"
|
||||
|
||||
peft_config = LoRAConfig(
|
||||
peft_config = LoraConfig(
|
||||
task_type=TaskType.SEQ_2_SEQ_LM, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1
|
||||
)
|
||||
|
||||
@ -65,7 +65,7 @@ Hardware: Single A100 80GB GPU with CPU RAM above 64GB
|
||||
|
||||
Performance of PEFT-LoRA tuned `bigscience/T0_3B` on `ought/raft/twitter_complaints` leaderboard.
|
||||
A point to note is that we didn't try to sequeeze performance by playing around with input instruction templates, LoRA hyperparams and other training related hyperparams. Also, we didn't use the larger 13B mt0-xxl model.
|
||||
So, we are already seeing comparable performance to SoTA with parameter effcient tuning. Also, the final checkpoint size is just `19MB` in comparison to `11GB` size of the backbone `bigscience/T0_3B` model.
|
||||
So, we are already seeing comparable performance to SoTA with parameter efficient tuning. Also, the final checkpoint size is just `19MB` in comparison to `11GB` size of the backbone `bigscience/T0_3B` model.
|
||||
|
||||
| Submission Name | Accuracy |
|
||||
| --------- | ---- |
|
||||
@ -77,7 +77,7 @@ So, we are already seeing comparable performance to SoTA with parameter effcient
|
||||
|
||||
### Parameter Efficient Tuning of Diffusion Models
|
||||
|
||||
GPU memory required by different settings during training are given below. The final checkpoint size being `8.8 MB`.
|
||||
GPU memory required by different settings during training is given below. The final checkpoint size is `8.8 MB``.
|
||||
|
||||
Hardware: Single A100 80GB GPU with CPU RAM above 64G
|
||||
|
||||
@ -127,6 +127,12 @@ Try out the 🤗 Gradio Space which should run seamlessly on a T4 instance:
|
||||
|
||||
### Parameter Efficient Tuning of LLMs for RLHF components such as Ranker and Policy [ToDo]
|
||||
|
||||
### INT8 training of large models in Colab using PEFT LoRA and bits_and_bytes
|
||||
|
||||
Here is now a demo on how to fine tune OPT-6.7b (14GB in fp16) in a Google colab: [](https://colab.research.google.com/drive/1jCkpikz0J2o20FBQmYmAGdiKmJGOMo-o?usp=sharing)
|
||||
|
||||
Here is now a demo on how to fine tune wishper-large (1.5B params) (14GB in fp16) in a Google colab: [ToDo]
|
||||
|
||||
### Save compute and storage even for medium and small models
|
||||
|
||||
Save storage by avoiding full finetuning of models on each of the downstream tasks/datasets,
|
||||
@ -143,10 +149,10 @@ Another example is fine-tuning `roberta-large` on `MRPC` GLUE dataset suing diff
|
||||
PEFT models work with 🤗 Accelerate out of the box. Use 🤗 Accelerate for Distributed training on various hardware such as GPUs, Apple Silicon devices etc during training.
|
||||
Use 🤗 Accelerate for inferencing on consumer hardware with small resources.
|
||||
|
||||
### Example of PEFT model training using 🤗 Accelerate's DeepSpeed integation
|
||||
### Example of PEFT model training using 🤗 Accelerate's DeepSpeed integration
|
||||
|
||||
Currently DeepSpeed requires PR [ZeRO3 handling frozen weights](https://github.com/microsoft/DeepSpeed/pull/2653) to fix [[REQUEST] efficiently deal with frozen weights during training](https://github.com/microsoft/DeepSpeed/issues/2615) issue. Example is provided in `~examples/conditional_generation/peft_lora_seq2seq_accelerate_ds_zero3_offload.py`.
|
||||
a. First run `accelerate config --config_file ds_zero3_cpu.yaml` and answer the questionaire.
|
||||
Currently DeepSpeed requires PR [ZeRO3 handling frozen weights](https://github.com/microsoft/DeepSpeed/pull/2653) to fix [[REQUEST] efficiently deal with frozen weights during training](https://github.com/microsoft/DeepSpeed/issues/2615) issue. An example is provided in `~examples/conditional_generation/peft_lora_seq2seq_accelerate_ds_zero3_offload.py`.
|
||||
a. First, run `accelerate` config --config_file ds_zero3_cpu.yaml` and answer the questionnaire.
|
||||
Below are the contents of the config file.
|
||||
```
|
||||
compute_environment: LOCAL_MACHINE
|
||||
@ -172,7 +178,7 @@ Use 🤗 Accelerate for inferencing on consumer hardware with small resources.
|
||||
same_network: true
|
||||
use_cpu: false
|
||||
```
|
||||
b. run the below command to launch example script
|
||||
b. run the below command to launch the example script
|
||||
```
|
||||
accelerate launch --config_file ds_zero3_cpu.yaml examples/peft_lora_seq2seq_accelerate_ds_zero3_offload.py
|
||||
```
|
||||
@ -203,8 +209,7 @@ Use 🤗 Accelerate for inferencing on consumer hardware with small resources.
|
||||
```
|
||||
|
||||
### Example of PEFT model inference using 🤗 Accelerate's Big Model Inferencing capabilities
|
||||
|
||||
Example is provided in `~examples/causal_language_modeling/peft_lora_clm_accelerate_big_model_inference.ipynb`.
|
||||
An example is provided in `~examples/causal_language_modeling/peft_lora_clm_accelerate_big_model_inference.ipynb`.
|
||||
|
||||
|
||||
## Models support matrix
|
||||
@ -250,7 +255,30 @@ Example is provided in `~examples/causal_language_modeling/peft_lora_clm_acceler
|
||||
| Deberta | ✅ | | | |
|
||||
| Deberta-v2 | ✅ | | | |
|
||||
|
||||
### Text-to-Image Generation
|
||||
|
||||
| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning |
|
||||
| --------- | ---- | ---- | ---- | ---- |
|
||||
| Stable Diffusion | ✅ | | | |
|
||||
|
||||
|
||||
### Image Classification
|
||||
|
||||
| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning |
|
||||
| --------- | ---- | ---- | ---- | ---- |
|
||||
| ViT | ✅ | | | |
|
||||
| Swin | ✅ | | | |
|
||||
|
||||
___Note that we have tested LoRA for https://huggingface.co/docs/transformers/model_doc/vit and [https://huggingface.co/docs/transformers/model_doc/swin] for fine-tuning on image classification. However, it should be possible to use LoRA for any compatible model [provided](https://huggingface.co/models?pipeline_tag=image-classification&sort=downloads&search=vit) by 🤗 Transformers. Check out the respective
|
||||
examples to learn more. If you run into problems, please open an issue.___
|
||||
|
||||
The same principle applies to our [segmentation models](https://huggingface.co/models?pipeline_tag=image-segmentation&sort=downloads) as well.
|
||||
|
||||
### Semantic Segmentation
|
||||
|
||||
| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning |
|
||||
| --------- | ---- | ---- | ---- | ---- |
|
||||
| SegFormer | ✅ | | | |
|
||||
## Caveats:
|
||||
|
||||
1. Below is an example of using PyTorch FSDP for training. However, it doesn't lead to
|
||||
@ -268,7 +296,7 @@ any GPU memory savings. Please refer issue [[FSDP] FSDP with CPU offload consume
|
||||
```
|
||||
|
||||
Example of parameter efficient tuning with `mt0-xxl` base model using 🤗 Accelerate is provided in `~examples/conditional_generation/peft_lora_seq2seq_accelerate_fsdp.py`.
|
||||
a. First run `accelerate config --config_file fsdp_config.yaml` and answer the questionaire.
|
||||
a. First, run `accelerate config --config_file fsdp_config.yaml` and answer the questionnaire.
|
||||
Below are the contents of the config file.
|
||||
```
|
||||
command_file: null
|
||||
@ -300,19 +328,19 @@ any GPU memory savings. Please refer issue [[FSDP] FSDP with CPU offload consume
|
||||
tpu_zone: null
|
||||
use_cpu: false
|
||||
```
|
||||
b. run the below command to launch example script
|
||||
b. run the below command to launch the example script
|
||||
```
|
||||
accelerate launch --config_file fsdp_config.yaml examples/peft_lora_seq2seq_accelerate_fsdp.py
|
||||
```
|
||||
|
||||
2. When using `P_TUNING` or `PROMPT_TUNING` with `SEQ_2_SEQ` task, remember to remove the `num_virtual_token` virtual prompt predictions from the left side of the model outputs during evaluations.
|
||||
|
||||
3. `P_TUNING` or `PROMPT_TUNING` doesn't support `generate` functionality of transformers bcause `generate` strictly requires `input_ids`/`decoder_input_ids` but
|
||||
3. For encoder-decoder models, `P_TUNING` or `PROMPT_TUNING` doesn't support `generate` functionality of transformers because `generate` strictly requires `decoder_input_ids` but
|
||||
`P_TUNING`/`PROMPT_TUNING` appends soft prompt embeddings to `input_embeds` to create
|
||||
new `input_embeds` to be given to the model. Therefore, `generate` doesn't support this yet.
|
||||
|
||||
## Backlog:
|
||||
1. Explore and possibly integrate `(IA)^3` and `UniPELT`
|
||||
1. Explore and possibly integrate `(IA)^3`
|
||||
2. Add tests
|
||||
3. Add more use cases and examples
|
||||
|
||||
@ -323,7 +351,7 @@ If you use 🤗 PEFT in your publication, please cite it by using the following
|
||||
```bibtex
|
||||
@Misc{peft,
|
||||
title = {PEFT: State-of-the-art Parameter-Efficient Fine-Tuning methods},
|
||||
author = {Sourab Mangrulkar, Sylvain Gugger},
|
||||
author = {Sourab Mangrulkar, Sylvain Gugger, Lysandre Debut, Younes Belkada, Sayak Paul},
|
||||
howpublished = {\url{https://github.com/huggingface/peft}},
|
||||
year = {2022}
|
||||
}
|
||||
|
@ -0,0 +1,22 @@
|
||||
compute_environment: LOCAL_MACHINE
|
||||
deepspeed_config:
|
||||
gradient_accumulation_steps: 1
|
||||
gradient_clipping: 1.0
|
||||
offload_optimizer_device: none
|
||||
offload_param_device: none
|
||||
zero3_init_flag: true
|
||||
zero3_save_16bit_model: true
|
||||
zero_stage: 3
|
||||
distributed_type: DEEPSPEED
|
||||
downcast_bf16: 'no'
|
||||
dynamo_backend: 'NO'
|
||||
fsdp_config: {}
|
||||
machine_rank: 0
|
||||
main_training_function: main
|
||||
megatron_lm_config: {}
|
||||
mixed_precision: 'no'
|
||||
num_machines: 1
|
||||
num_processes: 1
|
||||
rdzv_backend: static
|
||||
same_network: true
|
||||
use_cpu: false
|
File diff suppressed because it is too large
Load Diff
@ -17,7 +17,7 @@ from transformers import (
|
||||
|
||||
import psutil
|
||||
from datasets import load_dataset
|
||||
from peft import LoraConfig, TaskType, get_peft_model, get_peft_model_state_dict
|
||||
from peft import LoraConfig, TaskType, get_peft_model
|
||||
from tqdm import tqdm
|
||||
|
||||
|
||||
@ -111,9 +111,6 @@ def main():
|
||||
model_name_or_path = "bigscience/bloomz-7b1"
|
||||
dataset_name = "twitter_complaints"
|
||||
peft_config = LoraConfig(task_type=TaskType.CAUSAL_LM, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1)
|
||||
checkpoint_name = (
|
||||
f"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}_v1.pt".replace("/", "_")
|
||||
)
|
||||
text_column = "Tweet text"
|
||||
label_column = "text_label"
|
||||
lr = 3e-3
|
||||
@ -121,6 +118,7 @@ def main():
|
||||
batch_size = 8
|
||||
seed = 42
|
||||
max_length = 64
|
||||
do_test = False
|
||||
set_seed(seed)
|
||||
|
||||
dataset = load_dataset("ought/raft", dataset_name)
|
||||
@ -315,35 +313,41 @@ def main():
|
||||
accelerator.print(f"{eval_preds[:10]=}")
|
||||
accelerator.print(f"{dataset['train'][label_column][:10]=}")
|
||||
|
||||
model.eval()
|
||||
test_preds = []
|
||||
for _, batch in enumerate(tqdm(test_dataloader)):
|
||||
batch = {k: v for k, v in batch.items() if k != "labels"}
|
||||
with torch.no_grad():
|
||||
outputs = accelerator.unwrap_model(model).generate(
|
||||
**batch, synced_gpus=is_ds_zero_3, max_new_tokens=10
|
||||
) # synced_gpus=True for DS-stage 3
|
||||
test_preds.extend(
|
||||
tokenizer.batch_decode(outputs[:, max_length:].detach().cpu().numpy(), skip_special_tokens=True)
|
||||
)
|
||||
if do_test:
|
||||
model.eval()
|
||||
test_preds = []
|
||||
for _, batch in enumerate(tqdm(test_dataloader)):
|
||||
batch = {k: v for k, v in batch.items() if k != "labels"}
|
||||
with torch.no_grad():
|
||||
outputs = accelerator.unwrap_model(model).generate(
|
||||
**batch, synced_gpus=is_ds_zero_3, max_new_tokens=10
|
||||
) # synced_gpus=True for DS-stage 3
|
||||
test_preds.extend(
|
||||
tokenizer.batch_decode(outputs[:, max_length:].detach().cpu().numpy(), skip_special_tokens=True)
|
||||
)
|
||||
|
||||
test_preds_cleaned = []
|
||||
for _, pred in enumerate(test_preds):
|
||||
test_preds_cleaned.append(get_closest_label(pred, classes))
|
||||
test_preds_cleaned = []
|
||||
for _, pred in enumerate(test_preds):
|
||||
test_preds_cleaned.append(get_closest_label(pred, classes))
|
||||
|
||||
test_df = dataset["test"].to_pandas()
|
||||
test_df[label_column] = test_preds_cleaned
|
||||
test_df["text_labels_orig"] = test_preds
|
||||
accelerator.print(test_df[[text_column, label_column]].sample(20))
|
||||
test_df = dataset["test"].to_pandas()
|
||||
test_df[label_column] = test_preds_cleaned
|
||||
test_df["text_labels_orig"] = test_preds
|
||||
accelerator.print(test_df[[text_column, label_column]].sample(20))
|
||||
|
||||
pred_df = test_df[["ID", label_column]]
|
||||
pred_df.columns = ["ID", "Label"]
|
||||
pred_df = test_df[["ID", label_column]]
|
||||
pred_df.columns = ["ID", "Label"]
|
||||
|
||||
os.makedirs(f"data/{dataset_name}", exist_ok=True)
|
||||
pred_df.to_csv(f"data/{dataset_name}/predictions.csv", index=False)
|
||||
os.makedirs(f"data/{dataset_name}", exist_ok=True)
|
||||
pred_df.to_csv(f"data/{dataset_name}/predictions.csv", index=False)
|
||||
|
||||
accelerator.wait_for_everyone()
|
||||
accelerator.save(get_peft_model_state_dict(model, state_dict=accelerator.get_state_dict(model)), checkpoint_name)
|
||||
model.push_to_hub(
|
||||
"smangrul/"
|
||||
+ f"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}".replace("/", "_"),
|
||||
state_dict=accelerator.get_state_dict(model),
|
||||
use_auth_token=True,
|
||||
)
|
||||
accelerator.wait_for_everyone()
|
||||
|
||||
|
||||
|
File diff suppressed because it is too large
Load Diff
1190
examples/causal_language_modeling/peft_prompt_tuning_clm.ipynb
Normal file
1190
examples/causal_language_modeling/peft_prompt_tuning_clm.ipynb
Normal file
File diff suppressed because it is too large
Load Diff
@ -0,0 +1,22 @@
|
||||
compute_environment: LOCAL_MACHINE
|
||||
deepspeed_config:
|
||||
gradient_accumulation_steps: 1
|
||||
gradient_clipping: 1.0
|
||||
offload_optimizer_device: none
|
||||
offload_param_device: none
|
||||
zero3_init_flag: true
|
||||
zero3_save_16bit_model: true
|
||||
zero_stage: 3
|
||||
distributed_type: DEEPSPEED
|
||||
downcast_bf16: 'no'
|
||||
dynamo_backend: 'NO'
|
||||
fsdp_config: {}
|
||||
machine_rank: 0
|
||||
main_training_function: main
|
||||
megatron_lm_config: {}
|
||||
mixed_precision: 'no'
|
||||
num_machines: 1
|
||||
num_processes: 1
|
||||
rdzv_backend: static
|
||||
same_network: true
|
||||
use_cpu: false
|
@ -2,10 +2,26 @@
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"execution_count": 1,
|
||||
"id": "5f93b7d1",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"===================================BUG REPORT===================================\n",
|
||||
"Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues\n",
|
||||
"For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link\n",
|
||||
"================================================================================\n",
|
||||
"CUDA SETUP: CUDA runtime path found: /home/sourab/miniconda3/envs/ml/lib/libcudart.so\n",
|
||||
"CUDA SETUP: Highest compute capability among GPUs detected: 7.5\n",
|
||||
"CUDA SETUP: Detected CUDA version 117\n",
|
||||
"CUDA SETUP: Loading binary /home/sourab/miniconda3/envs/ml/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from transformers import AutoModelForSeq2SeqLM\n",
|
||||
"from peft import get_peft_config,get_peft_model, get_peft_model_state_dict, LoraConfig, TaskType\n",
|
||||
@ -60,15 +76,13 @@
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"/home/sourab/miniconda3/envs/ml/lib/python3.10/site-packages/huggingface_hub/utils/_deprecation.py:97: FutureWarning: Deprecated argument(s) used in 'dataset_info': token. Will not be supported from version '0.12'.\n",
|
||||
" warnings.warn(message, FutureWarning)\n",
|
||||
"Found cached dataset financial_phrasebank (/home/sourab/.cache/huggingface/datasets/financial_phrasebank/sentences_allagree/1.0.0/550bde12e6c30e2674da973a55f57edde5181d53f5a5a34c1531c53f93b7e141)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "6de075f8208349108291ac5ab7f5c980",
|
||||
"model_id": "3403bf3d718042018b0531848cc30209",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
@ -82,7 +96,7 @@
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "4b0e67b6d93f43e4b0f6a2f8978e4b0c",
|
||||
"model_id": "d3d5c45e3776469f9560b6eaa9346f8f",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
@ -96,7 +110,7 @@
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "a9551029c9884529bda7421a99170b51",
|
||||
"model_id": "e9736f26e9aa450b8d65f95c0b9c81cc",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
@ -110,7 +124,7 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'sentence': 'The order was valued at USD12 .2 m.',\n",
|
||||
"{'sentence': \"The 10,000-odd square metre plot that Stockmann has bought for the Nevsky Center shopping center is located on Nevsky Prospect , St Petersburg 's high street , next to the Vosstaniya Square underground station , in the immediate vicinity of Moscow Station .\",\n",
|
||||
" 'label': 1,\n",
|
||||
" 'text_label': 'neutral'}"
|
||||
]
|
||||
@ -147,7 +161,7 @@
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "4421971232434db1b6141e91fda2f6d7",
|
||||
"model_id": "c460989d4ab24e3f97d81ef040b1d1b4",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
@ -161,7 +175,7 @@
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "9b2ef793d93443949f4a5d5874d4bc05",
|
||||
"model_id": "1acc389b08b94f8a87900b9fbdbccce4",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
@ -234,45 +248,52 @@
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"100%|█████████████████████████████████████████████████████████████| 255/255 [00:53<00:00, 4.80it/s]\n",
|
||||
"100%|███████████████████████████████████████████████████████████████| 29/29 [00:02<00:00, 14.16it/s]\n"
|
||||
"100%|████████████████████████████████████████████████████████████████████████████████████████| 255/255 [02:21<00:00, 1.81it/s]\n",
|
||||
"100%|██████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:07<00:00, 4.13it/s]\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"epoch=0: train_ppl=tensor(13.6966, device='cuda:0') train_epoch_loss=tensor(2.6171, device='cuda:0') eval_ppl=tensor(1.0046, device='cuda:0') eval_epoch_loss=tensor(0.0046, device='cuda:0')\n"
|
||||
"epoch=0: train_ppl=tensor(14.6341, device='cuda:0') train_epoch_loss=tensor(2.6834, device='cuda:0') eval_ppl=tensor(1.0057, device='cuda:0') eval_epoch_loss=tensor(0.0057, device='cuda:0')\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"100%|█████████████████████████████████████████████████████████████| 255/255 [00:52<00:00, 4.88it/s]\n",
|
||||
"100%|███████████████████████████████████████████████████████████████| 29/29 [00:02<00:00, 14.20it/s]\n"
|
||||
"100%|████████████████████████████████████████████████████████████████████████████████████████| 255/255 [02:00<00:00, 2.11it/s]\n",
|
||||
"100%|██████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:05<00:00, 5.66it/s]\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"epoch=1: train_ppl=tensor(1.5893, device='cuda:0') train_epoch_loss=tensor(0.4633, device='cuda:0') eval_ppl=tensor(1.0020, device='cuda:0') eval_epoch_loss=tensor(0.0020, device='cuda:0')\n"
|
||||
"epoch=1: train_ppl=tensor(1.7576, device='cuda:0') train_epoch_loss=tensor(0.5640, device='cuda:0') eval_ppl=tensor(1.0052, device='cuda:0') eval_epoch_loss=tensor(0.0052, device='cuda:0')\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"100%|█████████████████████████████████████████████████████████████| 255/255 [00:52<00:00, 4.87it/s]\n",
|
||||
"100%|███████████████████████████████████████████████████████████████| 29/29 [00:02<00:00, 14.18it/s]\n"
|
||||
"100%|████████████████████████████████████████████████████████████████████████████████████████| 255/255 [01:33<00:00, 2.74it/s]\n",
|
||||
"100%|██████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:04<00:00, 6.23it/s]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"epoch=2: train_ppl=tensor(1.3210, device='cuda:0') train_epoch_loss=tensor(0.2784, device='cuda:0') eval_ppl=tensor(1.0026, device='cuda:0') eval_epoch_loss=tensor(0.0026, device='cuda:0')\n"
|
||||
"epoch=2: train_ppl=tensor(1.3830, device='cuda:0') train_epoch_loss=tensor(0.3243, device='cuda:0') eval_ppl=tensor(1.0035, device='cuda:0') eval_epoch_loss=tensor(0.0035, device='cuda:0')\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
@ -313,7 +334,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"execution_count": 7,
|
||||
"id": "6cafa67b",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@ -321,9 +342,9 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"accuracy=98.23788546255507 % on the evaluation dataset\n",
|
||||
"eval_preds[:10]=['neutral', 'neutral', 'positive', 'positive', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral']\n",
|
||||
"dataset['validation']['text_label'][:10]=['neutral', 'neutral', 'positive', 'positive', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral']\n"
|
||||
"accuracy=97.3568281938326 % on the evaluation dataset\n",
|
||||
"eval_preds[:10]=['neutral', 'neutral', 'neutral', 'positive', 'neutral', 'positive', 'positive', 'neutral', 'neutral', 'neutral']\n",
|
||||
"dataset['validation']['text_label'][:10]=['neutral', 'neutral', 'neutral', 'positive', 'neutral', 'positive', 'positive', 'neutral', 'neutral', 'neutral']\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
@ -343,20 +364,19 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 8,
|
||||
"id": "a8de6005",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# saving model\n",
|
||||
"state_dict = get_peft_model_state_dict(model)\n",
|
||||
"torch.save(state_dict, checkpoint_name)\n",
|
||||
"print(state_dict)"
|
||||
"peft_model_id = f\"{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}\"\n",
|
||||
"model.save_pretrained(peft_model_id)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"execution_count": 9,
|
||||
"id": "bd20cd4c",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@ -364,18 +384,74 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"19M\tfinancial_sentiment_analysis_lora_v1.pt\r\n"
|
||||
"9,2M\tbigscience/mt0-large_LORA_SEQ_2_SEQ_LM/adapter_model.bin\r\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"!du -h $checkpoint_name"
|
||||
"ckpt = f\"{peft_model_id}/adapter_model.bin\"\n",
|
||||
"!du -h $ckpt"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "76c2fc29",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from peft import PeftModel, PeftConfig\n",
|
||||
"peft_model_id = f\"{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}\"\n",
|
||||
"\n",
|
||||
"config = PeftConfig.from_pretrained(peft_model_id)\n",
|
||||
"model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path)\n",
|
||||
"model = PeftModel.from_pretrained(model, peft_model_id)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "37d712ce",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"- Demand for fireplace products was lower than expected , especially in Germany .\n",
|
||||
"{'input_ids': tensor([[ 259, 264, 259, 82903, 332, 1090, 10040, 10371, 639, 259,\n",
|
||||
" 19540, 2421, 259, 25505, 259, 261, 259, 21230, 281, 17052,\n",
|
||||
" 259, 260, 1]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}\n",
|
||||
"tensor([[ 0, 259, 32588, 1]])\n",
|
||||
"['negative']\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"model.eval()\n",
|
||||
"i = 13\n",
|
||||
"inputs = tokenizer(dataset[\"validation\"][text_column][i], return_tensors=\"pt\")\n",
|
||||
"print(dataset[\"validation\"][text_column][i])\n",
|
||||
"print(inputs)\n",
|
||||
"\n",
|
||||
"with torch.no_grad():\n",
|
||||
" outputs = model.generate(input_ids=inputs[\"input_ids\"], max_new_tokens=10)\n",
|
||||
" print(outputs)\n",
|
||||
" print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "76c2fc29",
|
||||
"id": "66c65ea4",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "65e71f78",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
@ -383,7 +459,7 @@
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.10.5 64-bit",
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
@ -397,7 +473,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.5 (v3.10.5:f377153967, Jun 6 2022, 12:36:10) [Clang 13.0.0 (clang-1300.0.29.30)]"
|
||||
"version": "3.10.4"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
|
@ -0,0 +1,255 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "71fbfca2",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from transformers import AutoModelForSeq2SeqLM\n",
|
||||
"from peft import PeftModel, PeftConfig\n",
|
||||
"import torch\n",
|
||||
"from datasets import load_dataset\n",
|
||||
"import os\n",
|
||||
"from transformers import AutoTokenizer\n",
|
||||
"from torch.utils.data import DataLoader\n",
|
||||
"from transformers import default_data_collator,get_linear_schedule_with_warmup\n",
|
||||
"from tqdm import tqdm\n",
|
||||
"from datasets import load_dataset\n",
|
||||
"\n",
|
||||
"dataset_name = \"twitter_complaints\"\n",
|
||||
"text_column = \"Tweet text\"\n",
|
||||
"label_column = \"text_label\"\n",
|
||||
"batch_size=8\n",
|
||||
"\n",
|
||||
"peft_model_id = \"smangrul/twitter_complaints_bigscience_T0_3B_LORA_SEQ_2_SEQ_LM\"\n",
|
||||
"config = PeftConfig.from_pretrained(peft_model_id)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "cc55820a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"peft_model_id = \"smangrul/twitter_complaints_bigscience_T0_3B_LORA_SEQ_2_SEQ_LM\"\n",
|
||||
"max_memory={0: \"6GIB\", 1: \"0GIB\", 2: \"0GIB\", 3: \"0GIB\", 4: \"0GIB\", \"cpu\":\"30GB\"}\n",
|
||||
"config = PeftConfig.from_pretrained(peft_model_id)\n",
|
||||
"model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path, device_map=\"auto\", max_memory=max_memory)\n",
|
||||
"model = PeftModel.from_pretrained(model, peft_model_id, device_map=\"auto\", max_memory=max_memory)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "e1a3648b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from datasets import load_dataset\n",
|
||||
"\n",
|
||||
"dataset = load_dataset(\"ought/raft\", dataset_name)\n",
|
||||
"\n",
|
||||
"classes = [k.replace(\"_\", \" \") for k in dataset[\"train\"].features[\"Label\"].names]\n",
|
||||
"print(classes)\n",
|
||||
"dataset = dataset.map(\n",
|
||||
" lambda x: {\"text_label\": [classes[label] for label in x[\"Label\"]]},\n",
|
||||
" batched=True,\n",
|
||||
" num_proc=1,\n",
|
||||
" \n",
|
||||
")\n",
|
||||
"print(dataset)\n",
|
||||
"dataset[\"train\"][0]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "fe12d4d3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)\n",
|
||||
"target_max_length = max([len(tokenizer(class_label)[\"input_ids\"]) for class_label in classes])\n",
|
||||
"def preprocess_function(examples):\n",
|
||||
" inputs = examples[text_column]\n",
|
||||
" targets = examples[label_column]\n",
|
||||
" model_inputs = tokenizer(inputs, truncation=True)\n",
|
||||
" labels = tokenizer(\n",
|
||||
" targets, max_length=target_max_length, padding=\"max_length\", truncation=True, return_tensors=\"pt\"\n",
|
||||
" )\n",
|
||||
" labels = labels[\"input_ids\"]\n",
|
||||
" labels[labels == tokenizer.pad_token_id] = -100\n",
|
||||
" model_inputs[\"labels\"] = labels\n",
|
||||
" return model_inputs\n",
|
||||
"\n",
|
||||
"processed_datasets = dataset.map(\n",
|
||||
" preprocess_function,\n",
|
||||
" batched=True,\n",
|
||||
" num_proc=1,\n",
|
||||
" remove_columns=dataset[\"train\"].column_names,\n",
|
||||
" load_from_cache_file=True,\n",
|
||||
" desc=\"Running tokenizer on dataset\",\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"train_dataset = processed_datasets[\"train\"]\n",
|
||||
"eval_dataset = processed_datasets[\"train\"]\n",
|
||||
"test_dataset = processed_datasets[\"test\"]\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def collate_fn(examples):\n",
|
||||
" return tokenizer.pad(examples, padding=\"longest\", return_tensors=\"pt\")\n",
|
||||
"\n",
|
||||
"train_dataloader = DataLoader(\n",
|
||||
" train_dataset, shuffle=True, collate_fn=collate_fn, batch_size=batch_size, pin_memory=True\n",
|
||||
")\n",
|
||||
"eval_dataloader = DataLoader(eval_dataset, collate_fn=collate_fn, batch_size=batch_size, pin_memory=True)\n",
|
||||
"test_dataloader = DataLoader(test_dataset, collate_fn=collate_fn, batch_size=batch_size, pin_memory=True)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
" "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "b33be5e6",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"@NYTsupport i have complained a dozen times & yet my papers are still thrown FAR from my door. Why is this so hard to resolve?\n",
|
||||
"{'input_ids': tensor([[25335, 1499, 3, 10, 3320, 12056, 382, 20390, 3, 23,\n",
|
||||
" 43, 25932, 3, 9, 9611, 648, 3, 184, 4624, 117,\n",
|
||||
" 780, 82, 5778, 33, 341, 3, 12618, 377, 4280, 45,\n",
|
||||
" 82, 1365, 5, 1615, 19, 48, 78, 614, 12, 7785,\n",
|
||||
" 58, 16229, 3, 10, 3, 1]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n",
|
||||
" 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}\n",
|
||||
"tensor([[ 0, 10394, 1]], device='cuda:0')\n",
|
||||
"['complaint']\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"model.eval()\n",
|
||||
"i = 15\n",
|
||||
"inputs = tokenizer(f'{text_column} : {dataset[\"test\"][i][\"Tweet text\"]} Label : ', return_tensors=\"pt\")\n",
|
||||
"print(dataset[\"test\"][i][\"Tweet text\"])\n",
|
||||
"print(inputs)\n",
|
||||
"\n",
|
||||
"with torch.no_grad():\n",
|
||||
" outputs = model.generate(input_ids=inputs[\"input_ids\"].to(\"cuda\"), max_new_tokens=10)\n",
|
||||
" print(outputs)\n",
|
||||
" print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "b6d6cd5b",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" 0%| | 0/7 [00:00<?, ?it/s]You're using a T5TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n",
|
||||
"100%|████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:10<00:00, 1.48s/it]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"model.eval()\n",
|
||||
"eval_preds = []\n",
|
||||
"for _, batch in enumerate(tqdm(eval_dataloader)):\n",
|
||||
" batch = {k: v.to(\"cuda\") for k, v in batch.items() if k != \"labels\"}\n",
|
||||
" with torch.no_grad():\n",
|
||||
" outputs = model.generate(**batch, max_new_tokens=10)\n",
|
||||
" preds = outputs.detach().cpu().numpy()\n",
|
||||
" eval_preds.extend(tokenizer.batch_decode(preds, skip_special_tokens=True))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "61264abe",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"accuracy=100.0\n",
|
||||
"eval_preds[:10]=['no complaint', 'no complaint', 'complaint', 'complaint', 'no complaint', 'no complaint', 'no complaint', 'complaint', 'complaint', 'no complaint']\n",
|
||||
"dataset['train'][label_column][:10]=['no complaint', 'no complaint', 'complaint', 'complaint', 'no complaint', 'no complaint', 'no complaint', 'complaint', 'complaint', 'no complaint']\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"correct = 0\n",
|
||||
"total = 0\n",
|
||||
"for pred, true in zip(eval_preds, dataset[\"train\"][label_column]):\n",
|
||||
" if pred.strip() == true.strip():\n",
|
||||
" correct += 1\n",
|
||||
" total += 1\n",
|
||||
"accuracy = correct / total * 100\n",
|
||||
"print(f\"{accuracy=}\")\n",
|
||||
"print(f\"{eval_preds[:10]=}\")\n",
|
||||
"print(f\"{dataset['train'][label_column][:10]=}\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "a70802a3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"model.eval()\n",
|
||||
"test_preds = []\n",
|
||||
"\n",
|
||||
"for _, batch in enumerate(tqdm(test_dataloader)):\n",
|
||||
" batch = {k: v for k, v in batch.items() if k != \"labels\"}\n",
|
||||
" with torch.no_grad():\n",
|
||||
" outputs = model.generate(**batch, max_new_tokens=10)\n",
|
||||
" preds = outputs.detach().cpu().numpy()\n",
|
||||
" test_preds.extend(tokenizer.batch_decode(preds, skip_special_tokens=True))\n",
|
||||
" if len(test_preds)>100:\n",
|
||||
" break\n",
|
||||
"test_preds"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.5 (v3.10.5:f377153967, Jun 6 2022, 12:36:10) [Clang 13.0.0 (clang-1300.0.29.30)]"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49"
|
||||
}
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -11,7 +11,7 @@ from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, get_linear_schedu
|
||||
|
||||
import psutil
|
||||
from datasets import load_dataset
|
||||
from peft import LoraConfig, TaskType, get_peft_model, get_peft_model_state_dict
|
||||
from peft import LoraConfig, TaskType, get_peft_model
|
||||
from tqdm import tqdm
|
||||
|
||||
|
||||
@ -107,15 +107,13 @@ def main():
|
||||
peft_config = LoraConfig(
|
||||
task_type=TaskType.SEQ_2_SEQ_LM, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1
|
||||
)
|
||||
checkpoint_name = (
|
||||
f"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}_v1.pt".replace("/", "_")
|
||||
)
|
||||
text_column = "Tweet text"
|
||||
label_column = "text_label"
|
||||
lr = 3e-3
|
||||
num_epochs = 5
|
||||
batch_size = 8
|
||||
seed = 42
|
||||
do_test = False
|
||||
set_seed(seed)
|
||||
|
||||
dataset = load_dataset("ought/raft", dataset_name)
|
||||
@ -265,33 +263,39 @@ def main():
|
||||
accelerator.print(f"{eval_preds[:10]=}")
|
||||
accelerator.print(f"{dataset['train'][label_column][:10]=}")
|
||||
|
||||
model.eval()
|
||||
test_preds = []
|
||||
for _, batch in enumerate(tqdm(test_dataloader)):
|
||||
batch = {k: v for k, v in batch.items() if k != "labels"}
|
||||
with torch.no_grad():
|
||||
outputs = accelerator.unwrap_model(model).generate(
|
||||
**batch, synced_gpus=is_ds_zero_3
|
||||
) # synced_gpus=True for DS-stage 3
|
||||
test_preds.extend(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))
|
||||
if do_test:
|
||||
model.eval()
|
||||
test_preds = []
|
||||
for _, batch in enumerate(tqdm(test_dataloader)):
|
||||
batch = {k: v for k, v in batch.items() if k != "labels"}
|
||||
with torch.no_grad():
|
||||
outputs = accelerator.unwrap_model(model).generate(
|
||||
**batch, synced_gpus=is_ds_zero_3
|
||||
) # synced_gpus=True for DS-stage 3
|
||||
test_preds.extend(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))
|
||||
|
||||
test_preds_cleaned = []
|
||||
for _, pred in enumerate(test_preds):
|
||||
test_preds_cleaned.append(get_closest_label(pred, classes))
|
||||
test_preds_cleaned = []
|
||||
for _, pred in enumerate(test_preds):
|
||||
test_preds_cleaned.append(get_closest_label(pred, classes))
|
||||
|
||||
test_df = dataset["test"].to_pandas()
|
||||
test_df[label_column] = test_preds_cleaned
|
||||
test_df["text_labels_orig"] = test_preds
|
||||
accelerator.print(test_df[[text_column, label_column]].sample(20))
|
||||
test_df = dataset["test"].to_pandas()
|
||||
test_df[label_column] = test_preds_cleaned
|
||||
test_df["text_labels_orig"] = test_preds
|
||||
accelerator.print(test_df[[text_column, label_column]].sample(20))
|
||||
|
||||
pred_df = test_df[["ID", label_column]]
|
||||
pred_df.columns = ["ID", "Label"]
|
||||
pred_df = test_df[["ID", label_column]]
|
||||
pred_df.columns = ["ID", "Label"]
|
||||
|
||||
os.makedirs(f"data/{dataset_name}", exist_ok=True)
|
||||
pred_df.to_csv(f"data/{dataset_name}/predictions.csv", index=False)
|
||||
os.makedirs(f"data/{dataset_name}", exist_ok=True)
|
||||
pred_df.to_csv(f"data/{dataset_name}/predictions.csv", index=False)
|
||||
|
||||
accelerator.wait_for_everyone()
|
||||
accelerator.save(get_peft_model_state_dict(model, state_dict=accelerator.get_state_dict(model)), checkpoint_name)
|
||||
model.push_to_hub(
|
||||
"smangrul/"
|
||||
+ f"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}".replace("/", "_"),
|
||||
state_dict=accelerator.get_state_dict(model),
|
||||
use_auth_token=True,
|
||||
)
|
||||
accelerator.wait_for_everyone()
|
||||
|
||||
|
||||
|
@ -6,7 +6,7 @@ from torch.utils.data import DataLoader
|
||||
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, default_data_collator, get_linear_schedule_with_warmup
|
||||
|
||||
from datasets import load_dataset
|
||||
from peft import LoraConfig, TaskType, get_peft_model, get_peft_model_state_dict
|
||||
from peft import LoraConfig, TaskType, get_peft_model
|
||||
from peft.utils.other import fsdp_auto_wrap_policy
|
||||
from tqdm import tqdm
|
||||
|
||||
@ -25,7 +25,6 @@ def main():
|
||||
peft_config = LoraConfig(
|
||||
task_type=TaskType.SEQ_2_SEQ_LM, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1
|
||||
)
|
||||
checkpoint_name = "financial_sentiment_analysis_lora_fsdp_v1.pt"
|
||||
model = AutoModelForSeq2SeqLM.from_pretrained(model_name_or_path)
|
||||
model = get_peft_model(model, peft_config)
|
||||
accelerator.print(model.print_trainable_parameters())
|
||||
@ -126,8 +125,10 @@ def main():
|
||||
accelerator.print(f"{eval_preds[:10]=}")
|
||||
accelerator.print(f"{dataset['validation'][label_column][:10]=}")
|
||||
accelerator.wait_for_everyone()
|
||||
accelerator.save(
|
||||
get_peft_model_state_dict(model, state_dict=accelerator.get_state_dict(model)), checkpoint_name
|
||||
model.push_to_hub(
|
||||
"smangrul/" + f"{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}".replace("/", "_"),
|
||||
state_dict=accelerator.get_state_dict(model),
|
||||
use_auth_token=True,
|
||||
)
|
||||
accelerator.wait_for_everyone()
|
||||
|
||||
|
@ -5,7 +5,23 @@
|
||||
"execution_count": 1,
|
||||
"id": "5f93b7d1",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"===================================BUG REPORT===================================\n",
|
||||
"Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues\n",
|
||||
"For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link\n",
|
||||
"================================================================================\n",
|
||||
"CUDA SETUP: CUDA runtime path found: /home/sourab/miniconda3/envs/ml/lib/libcudart.so\n",
|
||||
"CUDA SETUP: Highest compute capability among GPUs detected: 7.5\n",
|
||||
"CUDA SETUP: Detected CUDA version 117\n",
|
||||
"CUDA SETUP: Loading binary /home/sourab/miniconda3/envs/ml/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from transformers import AutoModelForSeq2SeqLM\n",
|
||||
"from peft import get_peft_config,get_peft_model, get_peft_model_state_dict, PrefixTuningConfig, TaskType\n",
|
||||
@ -61,15 +77,13 @@
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"/home/sourab/miniconda3/envs/ml/lib/python3.10/site-packages/huggingface_hub/utils/_deprecation.py:97: FutureWarning: Deprecated argument(s) used in 'dataset_info': token. Will not be supported from version '0.12'.\n",
|
||||
" warnings.warn(message, FutureWarning)\n",
|
||||
"Found cached dataset financial_phrasebank (/home/sourab/.cache/huggingface/datasets/financial_phrasebank/sentences_allagree/1.0.0/550bde12e6c30e2674da973a55f57edde5181d53f5a5a34c1531c53f93b7e141)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "e3f8b8faca0a4112b2c3499faee9544b",
|
||||
"model_id": "ec4be98991b84181bfa75f8846422b8b",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
@ -83,7 +97,7 @@
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "935c8aebde284a5784348588e0bb013a",
|
||||
"model_id": "82a6bd694c4f4751a23c370ab51f01a4",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
@ -97,7 +111,7 @@
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "e3487cd55f6847588492bf7fa51348ca",
|
||||
"model_id": "3844878631534468a1495e435563e4b0",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
@ -111,9 +125,9 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'sentence': 'ADPnews - Feb 5 , 2010 - Finnish real estate investor Sponda Oyj HEL : SDA1V said today that it slipped to a net loss of EUR 81.5 million USD 11.8 m in 2009 from a profit of EUR 29.3 million in 2008 .',\n",
|
||||
" 'label': 0,\n",
|
||||
" 'text_label': 'negative'}"
|
||||
"{'sentence': 'Finnish elevators and escalators maker KONE Corporation said on Tuesday ( 18 March ) that it has received a major order from Sir Robert McAlpine to supply all elevators and escalators for the Watermark Place project in the City of London .',\n",
|
||||
" 'label': 2,\n",
|
||||
" 'text_label': 'positive'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
@ -145,39 +159,11 @@
|
||||
"id": "adf9608c",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "2ce088f4437d4e2c80c267332a5b84e5",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"Downloading: 0%| | 0.00/792k [00:00<?, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "4e5f69b61f194220b39336e48edd2f9e",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"Downloading: 0%| | 0.00/1.39M [00:00<?, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"/home/sourab/transformers/src/transformers/models/t5/tokenization_t5_fast.py:156: FutureWarning: This tokenizer was incorrectly instantiated with a model max length of 512 which will be corrected in Transformers v5.\n",
|
||||
"/home/sourab/transformers/src/transformers/models/t5/tokenization_t5_fast.py:155: FutureWarning: This tokenizer was incorrectly instantiated with a model max length of 512 which will be corrected in Transformers v5.\n",
|
||||
"For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.\n",
|
||||
"- Be aware that you SHOULD NOT rely on t5-large automatically truncating your input to 512 when padding/encoding.\n",
|
||||
"- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.\n",
|
||||
@ -188,7 +174,7 @@
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "230c5631891e4ea8ac7a1b39f315a4f0",
|
||||
"model_id": "4af8c12efb5643659573347509079f3a",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
@ -202,7 +188,7 @@
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "b581e5677d2a45459ceb725534ed0891",
|
||||
"model_id": "86033b6257384584afd034075af808cb",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
@ -275,82 +261,75 @@
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"100%|█████████████████████████████████████████████████████████████| 255/255 [00:20<00:00, 12.27it/s]\n",
|
||||
"100%|███████████████████████████████████████████████████████████████| 29/29 [00:01<00:00, 17.32it/s]\n"
|
||||
"100%|████████████████████████████████████████████████████████████████████████████████████████| 255/255 [00:49<00:00, 5.15it/s]\n",
|
||||
"100%|██████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:03<00:00, 7.56it/s]\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"epoch=0: train_ppl=tensor(2697769., device='cuda:0') train_epoch_loss=tensor(14.8079, device='cuda:0') eval_ppl=tensor(1.0089, device='cuda:0') eval_epoch_loss=tensor(0.0089, device='cuda:0')\n"
|
||||
"epoch=0: train_ppl=tensor(2760654.5000, device='cuda:0') train_epoch_loss=tensor(14.8310, device='cuda:0') eval_ppl=tensor(1.0124, device='cuda:0') eval_epoch_loss=tensor(0.0124, device='cuda:0')\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"100%|█████████████████████████████████████████████████████████████| 255/255 [00:19<00:00, 12.75it/s]\n",
|
||||
"100%|███████████████████████████████████████████████████████████████| 29/29 [00:01<00:00, 17.33it/s]\n"
|
||||
"100%|████████████████████████████████████████████████████████████████████████████████████████| 255/255 [00:40<00:00, 6.22it/s]\n",
|
||||
"100%|██████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:05<00:00, 5.05it/s]\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"epoch=1: train_ppl=tensor(2.9475, device='cuda:0') train_epoch_loss=tensor(1.0809, device='cuda:0') eval_ppl=tensor(1.0072, device='cuda:0') eval_epoch_loss=tensor(0.0072, device='cuda:0')\n"
|
||||
"epoch=1: train_ppl=tensor(2.7329, device='cuda:0') train_epoch_loss=tensor(1.0054, device='cuda:0') eval_ppl=tensor(1.0081, device='cuda:0') eval_epoch_loss=tensor(0.0080, device='cuda:0')\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"100%|█████████████████████████████████████████████████████████████| 255/255 [00:20<00:00, 12.71it/s]\n",
|
||||
"100%|███████████████████████████████████████████████████████████████| 29/29 [00:01<00:00, 17.31it/s]\n"
|
||||
"100%|████████████████████████████████████████████████████████████████████████████████████████| 255/255 [00:58<00:00, 4.36it/s]\n",
|
||||
"100%|██████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:05<00:00, 5.05it/s]\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"epoch=2: train_ppl=tensor(2.0588, device='cuda:0') train_epoch_loss=tensor(0.7221, device='cuda:0') eval_ppl=tensor(1.0055, device='cuda:0') eval_epoch_loss=tensor(0.0054, device='cuda:0')\n"
|
||||
"epoch=2: train_ppl=tensor(2.1698, device='cuda:0') train_epoch_loss=tensor(0.7747, device='cuda:0') eval_ppl=tensor(1.0057, device='cuda:0') eval_epoch_loss=tensor(0.0057, device='cuda:0')\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"100%|█████████████████████████████████████████████████████████████| 255/255 [00:20<00:00, 12.70it/s]\n",
|
||||
"100%|███████████████████████████████████████████████████████████████| 29/29 [00:01<00:00, 17.32it/s]\n"
|
||||
"100%|████████████████████████████████████████████████████████████████████████████████████████| 255/255 [00:58<00:00, 4.35it/s]\n",
|
||||
"100%|██████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:05<00:00, 5.06it/s]\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"epoch=3: train_ppl=tensor(1.7939, device='cuda:0') train_epoch_loss=tensor(0.5844, device='cuda:0') eval_ppl=tensor(1.0063, device='cuda:0') eval_epoch_loss=tensor(0.0063, device='cuda:0')\n"
|
||||
"epoch=3: train_ppl=tensor(2.0724, device='cuda:0') train_epoch_loss=tensor(0.7287, device='cuda:0') eval_ppl=tensor(1.0051, device='cuda:0') eval_epoch_loss=tensor(0.0051, device='cuda:0')\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"100%|█████████████████████████████████████████████████████████████| 255/255 [00:19<00:00, 13.01it/s]\n",
|
||||
"100%|███████████████████████████████████████████████████████████████| 29/29 [00:01<00:00, 17.33it/s]"
|
||||
"100%|████████████████████████████████████████████████████████████████████████████████████████| 255/255 [01:02<00:00, 4.10it/s]\n",
|
||||
"100%|██████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:06<00:00, 4.74it/s]\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"epoch=4: train_ppl=tensor(1.7740, device='cuda:0') train_epoch_loss=tensor(0.5732, device='cuda:0') eval_ppl=tensor(1.0062, device='cuda:0') eval_epoch_loss=tensor(0.0061, device='cuda:0')\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n"
|
||||
"epoch=4: train_ppl=tensor(1.7598, device='cuda:0') train_epoch_loss=tensor(0.5652, device='cuda:0') eval_ppl=tensor(1.0047, device='cuda:0') eval_epoch_loss=tensor(0.0047, device='cuda:0')\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
@ -399,9 +378,9 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"accuracy=96.47577092511013 % on the evaluation dataset\n",
|
||||
"eval_preds[:10]=['neutral', 'neutral', 'neutral', 'negative', 'neutral', 'neutral', 'neutral', 'neutral', 'positive', 'positive']\n",
|
||||
"dataset['validation']['text_label'][:10]=['neutral', 'neutral', 'neutral', 'negative', 'neutral', 'neutral', 'neutral', 'neutral', 'positive', 'positive']\n"
|
||||
"accuracy=96.91629955947137 % on the evaluation dataset\n",
|
||||
"eval_preds[:10]=['negative', 'positive', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral']\n",
|
||||
"dataset['validation']['text_label'][:10]=['negative', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral', 'neutral']\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
@ -424,26 +403,11 @@
|
||||
"execution_count": 8,
|
||||
"id": "a8de6005",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{'prompt_embeddings': tensor([[-0.3165, -0.8389, 0.3262, ..., -1.5049, -1.6963, 0.3444],\n",
|
||||
" [-1.8359, 1.1936, 1.0483, ..., 0.6197, -0.4452, 0.5844],\n",
|
||||
" [-0.6027, 0.3246, -1.5601, ..., -0.3645, 0.2329, 0.3402],\n",
|
||||
" ...,\n",
|
||||
" [-1.9525, -0.5035, 0.8474, ..., 0.4793, -0.0789, -0.9305],\n",
|
||||
" [-1.9741, 0.5242, -2.0594, ..., -0.7970, -0.4889, 2.7323],\n",
|
||||
" [ 0.9355, -0.2714, 0.4610, ..., 0.2692, -1.5801, -1.6405]])}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# saving model\n",
|
||||
"state_dict = get_peft_model_state_dict(model)\n",
|
||||
"torch.save(state_dict, checkpoint_name)\n",
|
||||
"print(state_dict)"
|
||||
"peft_model_id = f\"{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}\"\n",
|
||||
"model.save_pretrained(peft_model_id)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -456,18 +420,68 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"3,8M\tfinancial_sentiment_analysis_prefix_tuning_v1.pt\r\n"
|
||||
"3,8M\tt5-large_PREFIX_TUNING_SEQ_2_SEQ_LM/adapter_model.bin\r\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"!du -h $checkpoint_name"
|
||||
"ckpt = f\"{peft_model_id}/adapter_model.bin\"\n",
|
||||
"!du -h $ckpt"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "76c2fc29",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from peft import PeftModel, PeftConfig\n",
|
||||
"peft_model_id = f\"{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}\"\n",
|
||||
"\n",
|
||||
"config = PeftConfig.from_pretrained(peft_model_id)\n",
|
||||
"model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path)\n",
|
||||
"model = PeftModel.from_pretrained(model, peft_model_id)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 27,
|
||||
"id": "d997f1cc",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Acando AB ( ACANB SS ) fell 8.9 percent to 13.35 kronor , the lowest close since Dec. 11 .\n",
|
||||
"{'input_ids': tensor([[ 4292, 232, 32, 3, 5359, 41, 3, 22029, 14972, 3,\n",
|
||||
" 4256, 3, 61, 4728, 4848, 1298, 1093, 12, 8808, 2469,\n",
|
||||
" 3, 22318, 29, 127, 3, 6, 8, 7402, 885, 437,\n",
|
||||
" 4451, 5, 850, 3, 5, 1]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n",
|
||||
" 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}\n",
|
||||
"tensor([[ 0, 2841, 1]])\n",
|
||||
"['negative']\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"model.eval()\n",
|
||||
"i = 107\n",
|
||||
"inputs = tokenizer(dataset[\"validation\"][text_column][i], return_tensors=\"pt\")\n",
|
||||
"print(dataset[\"validation\"][text_column][i])\n",
|
||||
"print(inputs)\n",
|
||||
"\n",
|
||||
"with torch.no_grad():\n",
|
||||
" outputs = model.generate(input_ids=inputs[\"input_ids\"], max_new_tokens=10)\n",
|
||||
" print(outputs)\n",
|
||||
" print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "76c2fc29",
|
||||
"id": "fb746c1e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
@ -475,7 +489,7 @@
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.10.5 64-bit",
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
|
7
examples/image_classification/README.md
Normal file
7
examples/image_classification/README.md
Normal file
@ -0,0 +1,7 @@
|
||||
# Fine-tuning for image classification using LoRA and 🤗 PEFT
|
||||
|
||||
[](https://colab.research.google.com/github/huggingface/peft/blob/main/examples/image_classification/image_classification_peft_lora.ipynb)
|
||||
|
||||
We provide a notebook (`image_classification_peft_lora.ipynb`) where we learn how to use [LoRA](https://arxiv.org/abs/2106.09685) from 🤗 PEFT to fine-tune an image classification model by ONLY using **0.7%** of the original trainable parameters of the model.
|
||||
|
||||
LoRA adds low-rank "update matrices" to certain blocks in the underlying model (in this case the attention blocks) and ONLY trains those matrices during fine-tuning. During inference, these update matrices are _merged_ with the original model parameters. For more details, check out the [original LoRA paper](https://arxiv.org/abs/2106.09685).
|
14955
examples/image_classification/image_classification_peft_lora.ipynb
Normal file
14955
examples/image_classification/image_classification_peft_lora.ipynb
Normal file
File diff suppressed because one or more lines are too long
9440
examples/int8_training/Finetune_opt_bnb_peft.ipynb
Normal file
9440
examples/int8_training/Finetune_opt_bnb_peft.ipynb
Normal file
File diff suppressed because it is too large
Load Diff
54
examples/lora_dreambooth/colab_notebook.ipynb
Normal file
54
examples/lora_dreambooth/colab_notebook.ipynb
Normal file
@ -0,0 +1,54 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "kdOhtpergLCQ"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!git clone https://huggingface.co/spaces/smangrul/peft-lora-sd-dreambooth"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "_LuGk9mihPx7"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%cd \"peft-lora-sd-dreambooth\"\n",
|
||||
"!pip install -r requirements.txt"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "BYKO8e5ElJOX"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!python colab.py"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"accelerator": "GPU",
|
||||
"colab": {
|
||||
"provenance": []
|
||||
},
|
||||
"gpuClass": "premium",
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"name": "python"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
File diff suppressed because one or more lines are too long
@ -2,7 +2,6 @@ transformers
|
||||
accelerate
|
||||
loralib
|
||||
evaluate
|
||||
deepspeed
|
||||
tqdm
|
||||
datasets
|
||||
diffusers
|
||||
|
@ -11,6 +11,7 @@ import warnings
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
import numpy as np
|
||||
import torch
|
||||
import torch.nn.functional as F
|
||||
import torch.utils.checkpoint
|
||||
@ -24,7 +25,13 @@ from transformers import AutoTokenizer, PretrainedConfig
|
||||
import datasets
|
||||
import diffusers
|
||||
import psutil
|
||||
from diffusers import AutoencoderKL, DDPMScheduler, DiffusionPipeline, UNet2DConditionModel
|
||||
from diffusers import (
|
||||
AutoencoderKL,
|
||||
DDPMScheduler,
|
||||
DiffusionPipeline,
|
||||
DPMSolverMultistepScheduler,
|
||||
UNet2DConditionModel,
|
||||
)
|
||||
from diffusers.optimization import get_scheduler
|
||||
from diffusers.utils import check_min_version
|
||||
from diffusers.utils.import_utils import is_xformers_available
|
||||
@ -129,6 +136,27 @@ def parse_args(input_args=None):
|
||||
" class_data_dir, additional images will be sampled with class_prompt."
|
||||
),
|
||||
)
|
||||
parser.add_argument(
|
||||
"--validation_prompt",
|
||||
type=str,
|
||||
default=None,
|
||||
help="A prompt that is used during validation to verify that the model is learning.",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--num_validation_images",
|
||||
type=int,
|
||||
default=4,
|
||||
help="Number of images that should be generated during validation with `validation_prompt`.",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--validation_steps",
|
||||
type=int,
|
||||
default=100,
|
||||
help=(
|
||||
"Run dreambooth validation every X steps. Dreambooth validation consists of running the prompt"
|
||||
" `args.validation_prompt` multiple times: `args.num_validation_images`."
|
||||
),
|
||||
)
|
||||
parser.add_argument(
|
||||
"--output_dir",
|
||||
type=str,
|
||||
@ -948,6 +976,54 @@ def main(args):
|
||||
progress_bar.set_postfix(**logs)
|
||||
accelerator.log(logs, step=global_step)
|
||||
|
||||
if (
|
||||
args.validation_prompt is not None
|
||||
and (step + num_update_steps_per_epoch * epoch) % args.validation_steps == 0
|
||||
):
|
||||
logger.info(
|
||||
f"Running validation... \n Generating {args.num_validation_images} images with prompt:"
|
||||
f" {args.validation_prompt}."
|
||||
)
|
||||
# create pipeline
|
||||
pipeline = DiffusionPipeline.from_pretrained(
|
||||
args.pretrained_model_name_or_path,
|
||||
safety_checker=None,
|
||||
revision=args.revision,
|
||||
)
|
||||
# set `keep_fp32_wrapper` to True because we do not want to remove
|
||||
# mixed precision hooks while we are still training
|
||||
pipeline.unet = accelerator.unwrap_model(unet, keep_fp32_wrapper=True)
|
||||
pipeline.text_encoder = accelerator.unwrap_model(text_encoder, keep_fp32_wrapper=True)
|
||||
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config)
|
||||
pipeline = pipeline.to(accelerator.device)
|
||||
pipeline.set_progress_bar_config(disable=True)
|
||||
|
||||
# run inference
|
||||
generator = torch.Generator(device=accelerator.device).manual_seed(args.seed)
|
||||
images = []
|
||||
for _ in range(args.num_validation_images):
|
||||
image = pipeline(args.validation_prompt, num_inference_steps=25, generator=generator).images[0]
|
||||
images.append(image)
|
||||
|
||||
for tracker in accelerator.trackers:
|
||||
if tracker.name == "tensorboard":
|
||||
np_images = np.stack([np.asarray(img) for img in images])
|
||||
tracker.writer.add_images("validation", np_images, epoch, dataformats="NHWC")
|
||||
if tracker.name == "wandb":
|
||||
import wandb
|
||||
|
||||
tracker.log(
|
||||
{
|
||||
"validation": [
|
||||
wandb.Image(image, caption=f"{i}: {args.validation_prompt}")
|
||||
for i, image in enumerate(images)
|
||||
]
|
||||
}
|
||||
)
|
||||
|
||||
del pipeline
|
||||
torch.cuda.empty_cache()
|
||||
|
||||
if global_step >= args.max_train_steps:
|
||||
break
|
||||
# Printing the GPU memory usage details such as allocated memory, peak memory, and total memory usage
|
||||
|
7
examples/semantic_segmentation/README.md
Normal file
7
examples/semantic_segmentation/README.md
Normal file
@ -0,0 +1,7 @@
|
||||
# Fine-tuning for semantic segmentation using LoRA and 🤗 PEFT
|
||||
|
||||
[](https://colab.research.google.com/github/huggingface/peft/blob/main/examples/semantic_segmentation/semantic_segmentation_peft_lora.ipynb)
|
||||
|
||||
We provide a notebook (`semantic_segmentation_peft_lora.ipynb`) where we learn how to use [LoRA](https://arxiv.org/abs/2106.09685) from 🤗 PEFT to fine-tune an semantic segmentation by ONLY using **14%%** of the original trainable parameters of the model.
|
||||
|
||||
LoRA adds low-rank "update matrices" to certain blocks in the underlying model (in this case the attention blocks) and ONLY trains those matrices during fine-tuning. During inference, these update matrices are _merged_ with the original model parameters. For more details, check out the [original LoRA paper](https://arxiv.org/abs/2106.09685).
|
1606
examples/semantic_segmentation/semantic_segmentation_peft_lora.ipynb
Normal file
1606
examples/semantic_segmentation/semantic_segmentation_peft_lora.ipynb
Normal file
File diff suppressed because one or more lines are too long
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@ -2,6 +2,5 @@ transformers
|
||||
accelerate
|
||||
loralib
|
||||
evaluate
|
||||
deepspeed
|
||||
tqdm
|
||||
datasets
|
File diff suppressed because one or more lines are too long
@ -2,7 +2,6 @@ transformers
|
||||
accelerate
|
||||
loralib
|
||||
evaluate
|
||||
deepspeed
|
||||
tqdm
|
||||
datasets
|
||||
Pillow
|
||||
|
12
setup.py
12
setup.py
@ -1,4 +1,4 @@
|
||||
# Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
# Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
@ -22,7 +22,7 @@ extras["dev"] = extras["quality"] + extras["docs_specific"]
|
||||
|
||||
setup(
|
||||
name="peft",
|
||||
version="0.0.2",
|
||||
version="0.1.0",
|
||||
description="Parameter-Efficient Fine-Tuning (PEFT)",
|
||||
long_description=open("README.md", "r", encoding="utf-8").read(),
|
||||
long_description_content_type="text/markdown",
|
||||
@ -30,7 +30,7 @@ setup(
|
||||
license="Apache",
|
||||
author="The HuggingFace team",
|
||||
author_email="sourab@huggingface.co",
|
||||
url="https://github.com/huggingface/pets",
|
||||
url="https://github.com/huggingface/peft",
|
||||
package_dir={"": "src"},
|
||||
packages=find_packages("src"),
|
||||
entry_points={},
|
||||
@ -43,7 +43,7 @@ setup(
|
||||
"torch>=1.13.0",
|
||||
"transformers",
|
||||
"accelerate",
|
||||
"loralib",
|
||||
"bitsandbytes",
|
||||
],
|
||||
extras_require=extras,
|
||||
classifiers=[
|
||||
@ -71,9 +71,7 @@ setup(
|
||||
# twine upload dist/* -r pypitest
|
||||
# twine upload dist/* -r pypitest --repository-url=https://test.pypi.org/legacy/
|
||||
# 6. Check that you can install it in a virtualenv by running:
|
||||
# pip install -i https://testpypi.python.org/pypi accelerate
|
||||
# accelerate env
|
||||
# accelerate test
|
||||
# pip install -i https://testpypi.python.org/pypi peft
|
||||
# 7. Upload the final version to actual pypi:
|
||||
# twine upload dist/* -r pypi
|
||||
# 8. Add release notes to the tag in github once everything is looking hunky-dory.
|
||||
|
@ -17,7 +17,7 @@
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
__version__ = "0.0.2"
|
||||
__version__ = "0.1.0"
|
||||
|
||||
from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING, PEFT_TYPE_TO_CONFIG_MAPPING, get_peft_config, get_peft_model
|
||||
from .peft_model import (
|
||||
@ -40,13 +40,13 @@ from .tuners import (
|
||||
PromptTuningInit,
|
||||
)
|
||||
from .utils import (
|
||||
TRANSFORMERS_MODELS_TO_PREFIX_TUNING_POSTPROCESS_MAPPING,
|
||||
PeftConfig,
|
||||
PeftType,
|
||||
PromptLearningConfig,
|
||||
TaskType,
|
||||
bloom_model_postprocess_past_key_value,
|
||||
get_peft_model_state_dict,
|
||||
peft_model_load_and_dispatch,
|
||||
set_peft_model_state_dict,
|
||||
shift_tokens_right,
|
||||
)
|
||||
|
@ -14,13 +14,14 @@
|
||||
# limitations under the License.
|
||||
|
||||
from .peft_model import (
|
||||
PeftModel,
|
||||
PeftModelForCausalLM,
|
||||
PeftModelForSeq2SeqLM,
|
||||
PeftModelForSequenceClassification,
|
||||
PeftModelForTokenClassification,
|
||||
)
|
||||
from .tuners import LoraConfig, PrefixTuningConfig, PromptEncoderConfig, PromptTuningConfig
|
||||
from .utils import PeftType
|
||||
from .utils import PromptLearningConfig
|
||||
|
||||
|
||||
MODEL_TYPE_TO_PEFT_MODEL_MAPPING = {
|
||||
@ -133,9 +134,12 @@ def get_peft_model(model, peft_config):
|
||||
"""
|
||||
|
||||
model_config = model.config.to_dict()
|
||||
if peft_config.peft_type != PeftType.LORA:
|
||||
peft_config = _prepare_prompt_learning_config(peft_config, model_config)
|
||||
else:
|
||||
peft_config.base_model_name_or_path = model.__dict__.get("name_or_path", None)
|
||||
if peft_config.task_type not in MODEL_TYPE_TO_PEFT_MODEL_MAPPING.keys():
|
||||
peft_config = _prepare_lora_config(peft_config, model_config)
|
||||
|
||||
return PeftModel(model, peft_config)
|
||||
if not isinstance(peft_config, PromptLearningConfig):
|
||||
peft_config = _prepare_lora_config(peft_config, model_config)
|
||||
else:
|
||||
peft_config = _prepare_prompt_learning_config(peft_config, model_config)
|
||||
return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](model, peft_config)
|
||||
|
@ -14,18 +14,36 @@
|
||||
# limitations under the License.
|
||||
|
||||
import inspect
|
||||
import os
|
||||
import warnings
|
||||
|
||||
import torch
|
||||
from accelerate import dispatch_model, infer_auto_device_map
|
||||
from accelerate.hooks import AlignDevicesHook, add_hook_to_module, remove_hook_from_submodules
|
||||
from accelerate.utils import get_balanced_memory
|
||||
from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
|
||||
from transformers import PreTrainedModel
|
||||
from transformers.modeling_outputs import SequenceClassifierOutput, TokenClassifierOutput
|
||||
from transformers.utils import PushToHubMixin
|
||||
|
||||
from huggingface_hub import hf_hub_download
|
||||
|
||||
from .tuners import LoraModel, PrefixEncoder, PromptEmbedding, PromptEncoder
|
||||
from .utils import PeftConfig, PeftType, TaskType, _set_trainable, shift_tokens_right
|
||||
from .utils import (
|
||||
TRANSFORMERS_MODELS_TO_PREFIX_TUNING_POSTPROCESS_MAPPING,
|
||||
WEIGHTS_NAME,
|
||||
PeftConfig,
|
||||
PeftType,
|
||||
PromptLearningConfig,
|
||||
TaskType,
|
||||
_set_trainable,
|
||||
get_peft_model_state_dict,
|
||||
set_peft_model_state_dict,
|
||||
shift_tokens_right,
|
||||
)
|
||||
|
||||
|
||||
class PeftModel(torch.nn.Module):
|
||||
class PeftModel(PushToHubMixin, torch.nn.Module):
|
||||
"""
|
||||
Parameter-Efficient Fine-Tuning Model. Base model encompassing various Peft methods.
|
||||
|
||||
@ -39,14 +57,14 @@ class PeftModel(torch.nn.Module):
|
||||
- **peft_config** ([`PeftConfig`]) -- The configuration of the Peft model.
|
||||
- **modules_to_save** (`list` of `str`) -- The list of sub-module names to save when
|
||||
saving the model.
|
||||
- **prompt_encoder** ([`PromptEncoder`]) -- The prompt encoder used for Peft if `peft_config.peft_type
|
||||
!= PeftType.LORA`.
|
||||
- **prompt_encoder** ([`PromptEncoder`]) -- The prompt encoder used for Peft if
|
||||
`isinstance(self.peft_config, PromptLearningConfig)`.
|
||||
- **prompt_tokens** (`torch.Tensor`) -- The virtual prompt tokens used for Peft if
|
||||
`peft_config.peft_type != PeftType.LORA`.
|
||||
`isinstance(self.peft_config, PromptLearningConfig)`.
|
||||
- **transformer_backbone_name** (`str`) -- The name of the transformer
|
||||
backbone in the base model if `peft_config.peft_type != PeftType.LORA`.
|
||||
backbone in the base model if `isinstance(self.peft_config, PromptLearningConfig)`.
|
||||
- **word_embeddings** (`torch.nn.Embedding`) -- The word embeddings of the transformer backbone
|
||||
in the base model if `peft_config.peft_type != PeftType.LORA`.
|
||||
in the base model if `isinstance(self.peft_config, PromptLearningConfig)`.
|
||||
"""
|
||||
|
||||
def __init__(self, model, peft_config: PeftConfig):
|
||||
@ -55,12 +73,114 @@ class PeftModel(torch.nn.Module):
|
||||
self.base_model = model
|
||||
self.config = self.base_model.config
|
||||
self.modules_to_save = None
|
||||
if peft_config.peft_type != PeftType.LORA:
|
||||
if isinstance(self.peft_config, PromptLearningConfig):
|
||||
self._setup_prompt_encoder()
|
||||
else:
|
||||
self.base_model = LoraModel(peft_config, model)
|
||||
if getattr(self.peft_config, "modules_to_save", None) is not None:
|
||||
self.modules_to_save = self.peft_config.modules_to_save
|
||||
_set_trainable(self)
|
||||
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
||||
|
||||
def save_pretrained(self, save_directory, **kwargs):
|
||||
r"""
|
||||
Args:
|
||||
This function saves the adapter model and the adapter configuration files to a directory, so that it can be
|
||||
re-loaded using the `LoraModel.from_pretrained` class method, and also used by the `LoraModel.push_to_hub`
|
||||
method.
|
||||
save_directory (`str`):
|
||||
Directory where the adapter model and configuration files will be saved (will be created if it does not
|
||||
exist).
|
||||
**kwargs:
|
||||
Additional keyword arguments passed along to the `push_to_hub` method.
|
||||
"""
|
||||
if os.path.isfile(save_directory):
|
||||
raise ValueError(f"Provided path ({save_directory}) should be a directory, not a file")
|
||||
os.makedirs(save_directory, exist_ok=True)
|
||||
|
||||
# save only the trainable weights
|
||||
output_state_dict = get_peft_model_state_dict(self, kwargs.get("state_dict", None))
|
||||
torch.save(output_state_dict, os.path.join(save_directory, WEIGHTS_NAME))
|
||||
|
||||
# save the config and change the inference mode to `True`
|
||||
if self.peft_config.base_model_name_or_path is None:
|
||||
self.peft_config.base_model_name_or_path = (
|
||||
self.base_model.__dict__.get("name_or_path", None)
|
||||
if isinstance(self.peft_config, PromptLearningConfig)
|
||||
else self.base_model.model.__dict__.get("name_or_path", None)
|
||||
)
|
||||
inference_mode = self.peft_config.inference_mode
|
||||
self.peft_config.inference_mode = True
|
||||
self.peft_config.save_pretrained(save_directory)
|
||||
self.peft_config.inference_mode = inference_mode
|
||||
|
||||
@classmethod
|
||||
def from_pretrained(cls, model, model_id, **kwargs):
|
||||
r"""
|
||||
Args:
|
||||
Instantiate a `LoraModel` from a pretrained Lora configuration and weights.
|
||||
model (`transformers.PreTrainedModel`):
|
||||
The model to be adapted. The model should be initialized with the `from_pretrained` method. from
|
||||
`transformers` library.
|
||||
model_id (`str`):
|
||||
The name of the Lora configuration to use. Can be either:
|
||||
- A string, the `model id` of a Lora configuration hosted inside a model repo on
|
||||
huggingface Hub
|
||||
- A path to a directory containing a Lora configuration file saved using the
|
||||
`save_pretrained` method, e.g., ``./my_lora_config_directory/``.
|
||||
"""
|
||||
from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING, PEFT_TYPE_TO_CONFIG_MAPPING
|
||||
|
||||
# load the config
|
||||
config = PEFT_TYPE_TO_CONFIG_MAPPING[PeftConfig.from_pretrained(model_id).peft_type].from_pretrained(model_id)
|
||||
|
||||
if getattr(model, "hf_device_map", None) is not None:
|
||||
remove_hook_from_submodules(model)
|
||||
|
||||
if config.task_type not in MODEL_TYPE_TO_PEFT_MODEL_MAPPING.keys():
|
||||
model = cls(model, config)
|
||||
else:
|
||||
model = MODEL_TYPE_TO_PEFT_MODEL_MAPPING[config.task_type](model, config)
|
||||
|
||||
# load weights if any
|
||||
if os.path.exists(os.path.join(model_id, WEIGHTS_NAME)):
|
||||
filename = os.path.join(model_id, WEIGHTS_NAME)
|
||||
else:
|
||||
try:
|
||||
filename = hf_hub_download(model_id, WEIGHTS_NAME)
|
||||
except: # noqa
|
||||
raise ValueError(
|
||||
f"Can't find weights for {model_id} in {model_id} or in the Hugging Face Hub. "
|
||||
f"Please check that the file {WEIGHTS_NAME} is present at {model_id}."
|
||||
)
|
||||
|
||||
adapters_weights = torch.load(filename)
|
||||
# load the weights into the model
|
||||
model = set_peft_model_state_dict(model, adapters_weights)
|
||||
if getattr(model, "hf_device_map", None) is not None:
|
||||
device_map = kwargs.get("device_map", "auto")
|
||||
max_memory = kwargs.get("max_memory", None)
|
||||
no_split_module_classes = model._no_split_modules
|
||||
if device_map != "sequential":
|
||||
max_memory = get_balanced_memory(
|
||||
model,
|
||||
max_memory=max_memory,
|
||||
no_split_module_classes=no_split_module_classes,
|
||||
low_zero=(device_map == "balanced_low_0"),
|
||||
)
|
||||
if isinstance(device_map, str):
|
||||
device_map = infer_auto_device_map(
|
||||
model, max_memory=max_memory, no_split_module_classes=no_split_module_classes
|
||||
)
|
||||
model = dispatch_model(model, device_map=device_map)
|
||||
hook = AlignDevicesHook(io_same_device=True)
|
||||
if model.peft_config.peft_type == PeftType.LORA:
|
||||
add_hook_to_module(model.base_model.model, hook)
|
||||
else:
|
||||
remove_hook_from_submodules(model.prompt_encoder)
|
||||
add_hook_to_module(model.base_model, hook)
|
||||
return model
|
||||
|
||||
def _setup_prompt_encoder(self):
|
||||
num_transformer_submodules = 0
|
||||
transformer_backbone = None
|
||||
@ -127,8 +247,8 @@ class PeftModel(torch.nn.Module):
|
||||
past_key_values = past_key_values.permute([2, 0, 3, 1, 4]).split(
|
||||
self.peft_config.num_transformer_submodules * 2
|
||||
)
|
||||
if self.peft_config.postprocess_past_key_value_function is not None:
|
||||
post_process_fn = self.peft_config.postprocess_past_key_value_function
|
||||
if TRANSFORMERS_MODELS_TO_PREFIX_TUNING_POSTPROCESS_MAPPING.get(self.config.model_type, None) is not None:
|
||||
post_process_fn = TRANSFORMERS_MODELS_TO_PREFIX_TUNING_POSTPROCESS_MAPPING[self.config.model_type]
|
||||
past_key_values = post_process_fn(past_key_values)
|
||||
return past_key_values
|
||||
else:
|
||||
@ -159,6 +279,15 @@ class PeftModel(torch.nn.Module):
|
||||
except AttributeError:
|
||||
return getattr(self.base_model, name)
|
||||
|
||||
def forward(self, *args, **kwargs):
|
||||
"""
|
||||
Forward pass of the model.
|
||||
"""
|
||||
if isinstance(self.peft_config, PromptLearningConfig):
|
||||
return self.base_model(*args, **kwargs)
|
||||
else:
|
||||
return self.base_model.model(*args, **kwargs)
|
||||
|
||||
|
||||
class PeftModelForSequenceClassification(PeftModel):
|
||||
"""
|
||||
@ -211,7 +340,7 @@ class PeftModelForSequenceClassification(PeftModel):
|
||||
):
|
||||
return_dict = return_dict if return_dict is not None else self.config.use_return_dict
|
||||
|
||||
if self.peft_config.peft_type == PeftType.LORA:
|
||||
if not isinstance(self.peft_config, PromptLearningConfig):
|
||||
return self.base_model(
|
||||
input_ids=input_ids,
|
||||
attention_mask=attention_mask,
|
||||
@ -368,7 +497,7 @@ class PeftModelForCausalLM(PeftModel):
|
||||
return_dict=None,
|
||||
**kwargs,
|
||||
):
|
||||
if self.peft_config.peft_type == PeftType.LORA:
|
||||
if not isinstance(self.peft_config, PromptLearningConfig):
|
||||
return self.base_model(
|
||||
input_ids=input_ids,
|
||||
attention_mask=attention_mask,
|
||||
@ -417,7 +546,7 @@ class PeftModelForCausalLM(PeftModel):
|
||||
return self.base_model(inputs_embeds=inputs_embeds, **kwargs)
|
||||
|
||||
def generate(self, **kwargs):
|
||||
if self.peft_config.peft_type == PeftType.LORA:
|
||||
if not isinstance(self.peft_config, PromptLearningConfig):
|
||||
return self.base_model.generate(**kwargs)
|
||||
else:
|
||||
if "input_ids" not in kwargs:
|
||||
@ -438,17 +567,22 @@ class PeftModelForCausalLM(PeftModel):
|
||||
)
|
||||
kwargs["token_type_ids"] = None
|
||||
|
||||
if self.peft_config.peft_type == PeftType.PREFIX_TUNING:
|
||||
batch_size = kwargs["input_ids"].shape[0]
|
||||
past_key_values = self.get_prompt(batch_size)
|
||||
kwargs["past_key_values"] = past_key_values
|
||||
return self.base_model.generate(**kwargs)
|
||||
else:
|
||||
raise NotImplementedError
|
||||
return self.base_model.generate(**kwargs)
|
||||
|
||||
def prepare_inputs_for_generation(self, *args, **kwargs):
|
||||
model_kwargs = self.base_model_prepare_inputs_for_generation(*args, **kwargs)
|
||||
model_kwargs["past_key_values"] = kwargs.get("past", None) or kwargs.get("past_key_values", None)
|
||||
if isinstance(self.peft_config, PromptLearningConfig):
|
||||
if model_kwargs["past_key_values"] is None and self.peft_config.peft_type == PeftType.PREFIX_TUNING:
|
||||
past_key_values = self.get_prompt(batch_size=model_kwargs["input_ids"].shape[0])
|
||||
model_kwargs["past_key_values"] = past_key_values
|
||||
else:
|
||||
if model_kwargs["past_key_values"] is None:
|
||||
prompts = self.get_prompt(batch_size=model_kwargs["input_ids"].shape[0])
|
||||
model_kwargs["inputs_embeds"] = torch.cat(
|
||||
(prompts, self.word_embeddings(model_kwargs["input_ids"])), dim=1
|
||||
)
|
||||
model_kwargs["input_ids"] = None
|
||||
|
||||
return model_kwargs
|
||||
|
||||
|
||||
@ -499,7 +633,7 @@ class PeftModelForSeq2SeqLM(PeftModel):
|
||||
return_dict=None,
|
||||
**kwargs,
|
||||
):
|
||||
if self.peft_config.peft_type == PeftType.LORA:
|
||||
if not isinstance(self.peft_config, PromptLearningConfig):
|
||||
return self.base_model(
|
||||
input_ids=input_ids,
|
||||
attention_mask=attention_mask,
|
||||
@ -567,7 +701,7 @@ class PeftModelForSeq2SeqLM(PeftModel):
|
||||
return self.base_model(inputs_embeds=inputs_embeds, decoder_inputs_embeds=decoder_inputs_embeds, **kwargs)
|
||||
|
||||
def generate(self, **kwargs):
|
||||
if self.peft_config.peft_type == PeftType.LORA:
|
||||
if not isinstance(self.peft_config, PromptLearningConfig):
|
||||
return self.base_model.generate(**kwargs)
|
||||
else:
|
||||
if "input_ids" not in kwargs:
|
||||
@ -582,25 +716,16 @@ class PeftModelForSeq2SeqLM(PeftModel):
|
||||
kwargs["token_type_ids"] = None
|
||||
|
||||
if self.peft_config.peft_type == PeftType.PREFIX_TUNING:
|
||||
batch_size = kwargs["input_ids"].shape[0]
|
||||
past_key_values = self.get_prompt(batch_size)
|
||||
kwargs["past_key_values"] = past_key_values
|
||||
return self.base_model.generate(**kwargs)
|
||||
else:
|
||||
raise NotImplementedError
|
||||
|
||||
def prepare_inputs_for_generation(self, *args, **kwargs):
|
||||
model_kwargs = self.base_model_prepare_inputs_for_generation(*args, **kwargs)
|
||||
model_kwargs["past_key_values"] = kwargs.get("past", None) or kwargs.get("past_key_values", None)
|
||||
return model_kwargs
|
||||
|
||||
def _prepare_encoder_decoder_kwargs_for_generation(self, inputs_tensor, model_kwargs, model_input_name=None):
|
||||
past_key_values = model_kwargs.get("past_key_values", None)
|
||||
model_kwargs["past_key_values"] = None
|
||||
model_kwargs = self.base_model_prepare_encoder_decoder_kwargs_for_generation(
|
||||
inputs_tensor, model_kwargs, model_input_name
|
||||
)
|
||||
model_kwargs["past_key_values"] = past_key_values
|
||||
if model_kwargs["past_key_values"] is None and self.peft_config.peft_type == PeftType.PREFIX_TUNING:
|
||||
batch_size = model_kwargs["decoder_input_ids"].shape[0]
|
||||
past_key_values = self.get_prompt(batch_size)
|
||||
model_kwargs["past_key_values"] = past_key_values
|
||||
return model_kwargs
|
||||
|
||||
|
||||
@ -655,7 +780,7 @@ class PeftModelForTokenClassification(PeftModel):
|
||||
):
|
||||
return_dict = return_dict if return_dict is not None else self.config.use_return_dict
|
||||
|
||||
if self.peft_config.peft_type == PeftType.LORA:
|
||||
if not isinstance(self.peft_config, PromptLearningConfig):
|
||||
return self.base_model(
|
||||
input_ids=input_ids,
|
||||
attention_mask=attention_mask,
|
||||
|
@ -12,7 +12,6 @@
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import math
|
||||
import warnings
|
||||
from dataclasses import asdict, dataclass, field
|
||||
@ -24,8 +23,7 @@ import torch.nn as nn
|
||||
import torch.nn.functional as F
|
||||
from transformers.pytorch_utils import Conv1D
|
||||
|
||||
import loralib as lora # noqa: F401
|
||||
from loralib import mark_only_lora_as_trainable
|
||||
import bitsandbytes as bnb
|
||||
|
||||
from ..utils import PeftConfig, PeftType, transpose
|
||||
|
||||
@ -45,6 +43,8 @@ class LoraConfig(PeftConfig):
|
||||
fan_in_fan_out (`bool`): Set this to True if the layer to replace stores weight like (fan_in, fan_out)
|
||||
enable_lora ( `List[bool]`): Used with `lora.MergedLinear`.
|
||||
bias (`str`): Bias type for Lora. Can be 'none', 'all' or 'lora_only'
|
||||
modules_to_save (`List[str]`):List of modules apart from LoRA layers to be set as trainable
|
||||
and saved in the final checkpoint.
|
||||
"""
|
||||
|
||||
r: int = field(default=8, metadata={"help": "Lora attention dimension"})
|
||||
@ -60,6 +60,14 @@ class LoraConfig(PeftConfig):
|
||||
)
|
||||
enable_lora: Optional[List[bool]] = field(default=None, metadata={"help": "Used with `lora.MergedLinear`."})
|
||||
bias: str = field(default="none", metadata={"help": "Bias type for Lora. Can be 'none', 'all' or 'lora_only'"})
|
||||
modules_to_save: Optional[List[str]] = field(
|
||||
default=None,
|
||||
metadata={
|
||||
"help": "List of modules apart from LoRA layers to be set as trainable and saved in the final checkpoint. "
|
||||
"For example, in Sequence Classification or Token Classification tasks, "
|
||||
"the final layer `classifier/score` are randomly initialized and as such need to be trainable and saved."
|
||||
},
|
||||
)
|
||||
|
||||
def __post_init__(self):
|
||||
self.peft_type = PeftType.LORA
|
||||
@ -95,8 +103,10 @@ class LoraModel(torch.nn.Module):
|
||||
self.model = model
|
||||
self._find_and_replace()
|
||||
mark_only_lora_as_trainable(self.model, self.peft_config.bias)
|
||||
self.forward = self.model.forward
|
||||
|
||||
def _find_and_replace(self):
|
||||
is_target_modules_in_base_model = False
|
||||
kwargs = {
|
||||
"r": self.peft_config.r,
|
||||
"lora_alpha": self.peft_config.lora_alpha,
|
||||
@ -107,9 +117,21 @@ class LoraModel(torch.nn.Module):
|
||||
key_list = [key for key, _ in self.model.named_modules()]
|
||||
for key in key_list:
|
||||
if any(key.endswith(target_key) for target_key in self.peft_config.target_modules):
|
||||
if not is_target_modules_in_base_model:
|
||||
is_target_modules_in_base_model = True
|
||||
parent, target, target_name = self._get_submodules(key)
|
||||
bias = target.bias is not None
|
||||
if isinstance(target, torch.nn.Linear) and self.peft_config.enable_lora is None:
|
||||
if isinstance(target, bnb.nn.Linear8bitLt) and self.peft_config.enable_lora is None:
|
||||
kwargs.update(
|
||||
{
|
||||
"has_fp16_weights": target.state.has_fp16_weights,
|
||||
"memory_efficient_backward": target.state.memory_efficient_backward,
|
||||
"threshold": target.state.threshold,
|
||||
"index": target.index,
|
||||
}
|
||||
)
|
||||
new_module = Linear8bitLt(target.in_features, target.out_features, bias=bias, **kwargs)
|
||||
elif isinstance(target, torch.nn.Linear) and self.peft_config.enable_lora is None:
|
||||
new_module = Linear(target.in_features, target.out_features, bias=bias, **kwargs)
|
||||
elif self.peft_config.enable_lora is not None:
|
||||
kwargs.update({"enable_lora": self.peft_config.enable_lora})
|
||||
@ -125,6 +147,11 @@ class LoraModel(torch.nn.Module):
|
||||
kwargs["fan_in_fan_out"] = False
|
||||
new_module = MergedLinear(in_features, out_features, bias=bias, **kwargs)
|
||||
self._replace_module(parent, target_name, new_module, target)
|
||||
if not is_target_modules_in_base_model:
|
||||
raise ValueError(
|
||||
f"Target modules {self.peft_config.target_modules} not found in the base model. "
|
||||
f"Please check the target modules and try again."
|
||||
)
|
||||
|
||||
def _get_submodules(self, key):
|
||||
parent = self.model.get_submodule(".".join(key.split(".")[:-1]))
|
||||
@ -137,9 +164,9 @@ class LoraModel(torch.nn.Module):
|
||||
new_module.weight = old_module.weight
|
||||
if old_module.bias is not None:
|
||||
new_module.bias = old_module.bias
|
||||
|
||||
def forward(self, *args, **kwargs):
|
||||
return self.model(*args, **kwargs)
|
||||
if getattr(old_module, "state", None) is not None:
|
||||
new_module.state = old_module.state
|
||||
new_module.to(old_module.weight.device)
|
||||
|
||||
def __getattr__(self, name: str):
|
||||
"""Forward missing attributes to the wrapped module."""
|
||||
@ -349,3 +376,66 @@ class MergedLinear(nn.Linear, LoraLayer):
|
||||
after_B = self.lora_B(after_A.transpose(-2, -1)).transpose(-2, -1)
|
||||
result += self.zero_pad(after_B) * self.scaling
|
||||
return result
|
||||
|
||||
|
||||
class Linear8bitLt(bnb.nn.Linear8bitLt, LoraLayer):
|
||||
# Lora implemented in a dense layer
|
||||
def __init__(
|
||||
self,
|
||||
in_features,
|
||||
out_features,
|
||||
r: int = 0,
|
||||
lora_alpha: int = 1,
|
||||
lora_dropout: float = 0.0,
|
||||
**kwargs,
|
||||
):
|
||||
bnb.nn.Linear8bitLt.__init__(
|
||||
self,
|
||||
in_features,
|
||||
out_features,
|
||||
bias=kwargs.get("bias", True),
|
||||
has_fp16_weights=kwargs.get("has_fp16_weights", True),
|
||||
memory_efficient_backward=kwargs.get("memory_efficient_backward", False),
|
||||
threshold=kwargs.get("threshold", 0.0),
|
||||
index=kwargs.get("index", None),
|
||||
)
|
||||
LoraLayer.__init__(self, r=r, lora_alpha=lora_alpha, lora_dropout=lora_dropout, merge_weights=False)
|
||||
# Actual trainable parameters
|
||||
if r > 0:
|
||||
self.lora_A = nn.Linear(in_features, r, bias=False)
|
||||
self.lora_B = nn.Linear(r, out_features, bias=False)
|
||||
self.scaling = self.lora_alpha / self.r
|
||||
# Freezing the pre-trained weight matrix
|
||||
self.weight.requires_grad = False
|
||||
self.reset_parameters()
|
||||
|
||||
def reset_parameters(self):
|
||||
if hasattr(self, "lora_A"):
|
||||
# initialize A the same way as the default for nn.Linear and B to zero
|
||||
nn.init.kaiming_uniform_(self.lora_A.weight, a=math.sqrt(5))
|
||||
nn.init.zeros_(self.lora_B.weight)
|
||||
|
||||
def forward(self, x: torch.Tensor):
|
||||
result = super().forward(x)
|
||||
if self.r > 0:
|
||||
result += self.lora_B(self.lora_A(self.lora_dropout(x))) * self.scaling
|
||||
return result
|
||||
|
||||
|
||||
# had to adapt it for `lora_only` to work
|
||||
def mark_only_lora_as_trainable(model: nn.Module, bias: str = "none") -> None:
|
||||
for n, p in model.named_parameters():
|
||||
if "lora_" not in n:
|
||||
p.requires_grad = False
|
||||
if bias == "none":
|
||||
return
|
||||
elif bias == "all":
|
||||
for n, p in model.named_parameters():
|
||||
if "bias" in n:
|
||||
p.requires_grad = True
|
||||
elif bias == "lora_only":
|
||||
for m in model.modules():
|
||||
if isinstance(m, LoraLayer) and hasattr(m, "bias") and m.bias is not None:
|
||||
m.bias.requires_grad = True
|
||||
else:
|
||||
raise NotImplementedError
|
||||
|
@ -15,7 +15,6 @@
|
||||
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Callable, Optional
|
||||
|
||||
import torch
|
||||
|
||||
@ -30,7 +29,6 @@ class PrefixTuningConfig(PromptLearningConfig):
|
||||
Args:
|
||||
encoder_hidden_size (`int`): The hidden size of the prompt encoder.
|
||||
prefix_projection (`bool`): Whether to project the prefix embeddings.
|
||||
postprocess_past_key_value_function (`Callable`, *optional*): The function to postprocess the past key value.
|
||||
"""
|
||||
|
||||
encoder_hidden_size: int = field(
|
||||
@ -41,10 +39,6 @@ class PrefixTuningConfig(PromptLearningConfig):
|
||||
default=False,
|
||||
metadata={"help": "Whether to project the prefix tokens"},
|
||||
)
|
||||
postprocess_past_key_value_function: Optional[Callable] = field(
|
||||
default=None,
|
||||
metadata={"help": "The function to postprocess the past key value"},
|
||||
)
|
||||
|
||||
def __post_init__(self):
|
||||
self.peft_type = PeftType.PREFIX_TUNING
|
||||
|
@ -17,6 +17,13 @@
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
from .adapters_utils import CONFIG_NAME, WEIGHTS_NAME
|
||||
from .config import PeftConfig, PeftType, PromptLearningConfig, TaskType
|
||||
from .other import _set_trainable, bloom_model_postprocess_past_key_value, shift_tokens_right, transpose
|
||||
from .save_and_load import get_peft_model_state_dict, peft_model_load_and_dispatch, set_peft_model_state_dict
|
||||
from .other import (
|
||||
TRANSFORMERS_MODELS_TO_PREFIX_TUNING_POSTPROCESS_MAPPING,
|
||||
_set_trainable,
|
||||
bloom_model_postprocess_past_key_value,
|
||||
shift_tokens_right,
|
||||
transpose,
|
||||
)
|
||||
from .save_and_load import get_peft_model_state_dict, set_peft_model_state_dict
|
||||
|
18
src/peft/utils/adapters_utils.py
Normal file
18
src/peft/utils/adapters_utils.py
Normal file
@ -0,0 +1,18 @@
|
||||
# coding=utf-8
|
||||
# Copyright 2023-present the HuggingFace Inc. team.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
WEIGHTS_NAME = "adapter_model.bin"
|
||||
CONFIG_NAME = "adapter_config.json"
|
||||
|
||||
# TODO: add automapping and superclass here?
|
@ -12,11 +12,18 @@
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import enum
|
||||
from dataclasses import dataclass, field
|
||||
import json
|
||||
import os
|
||||
from dataclasses import asdict, dataclass, field
|
||||
from typing import Optional, Union
|
||||
|
||||
from transformers.utils import PushToHubMixin
|
||||
|
||||
from huggingface_hub import hf_hub_download
|
||||
|
||||
from .adapters_utils import CONFIG_NAME
|
||||
|
||||
|
||||
class PeftType(str, enum.Enum):
|
||||
PROMPT_TUNING = "PROMPT_TUNING"
|
||||
@ -33,7 +40,94 @@ class TaskType(str, enum.Enum):
|
||||
|
||||
|
||||
@dataclass
|
||||
class PeftConfig:
|
||||
class PeftConfigMixin(PushToHubMixin):
|
||||
r"""
|
||||
This is the base configuration class for PEFT adapter models. It contains all the methods that are common to all
|
||||
PEFT adapter models. This class inherits from `transformers.utils.PushToHubMixin` which contains the methods to
|
||||
push your model to the Hub. The method `save_pretrained` will save the configuration of your adapter model in a
|
||||
directory. The method `from_pretrained` will load the configuration of your adapter model from a directory.
|
||||
|
||||
Args:
|
||||
peft_type (Union[[`~peft.utils.config.PeftType`], `str`]): The type of Peft method to use.
|
||||
"""
|
||||
peft_type: Optional[PeftType] = field(default=None, metadata={"help": "The type of PEFT model."})
|
||||
|
||||
@property
|
||||
def __dict__(self):
|
||||
return asdict(self)
|
||||
|
||||
def to_dict(self):
|
||||
return self.__dict__
|
||||
|
||||
def save_pretrained(self, save_directory, **kwargs):
|
||||
r"""
|
||||
This method saves the configuration of your adapter model in a directory.
|
||||
|
||||
Args:
|
||||
save_directory (`str`):
|
||||
The directory where the configuration will be saved.
|
||||
**kwargs:
|
||||
Additional keyword arguments passed along to the `transformers.utils.PushToHubMixin.push_to_hub`
|
||||
method.
|
||||
"""
|
||||
if os.path.isfile(save_directory):
|
||||
raise AssertionError(f"Provided path ({save_directory}) should be a directory, not a file")
|
||||
|
||||
os.makedirs(save_directory, exist_ok=True)
|
||||
|
||||
output_dict = self.__dict__
|
||||
output_path = os.path.join(save_directory, CONFIG_NAME)
|
||||
|
||||
# save it
|
||||
with open(output_path, "w") as writer:
|
||||
writer.write(json.dumps(output_dict, indent=2, sort_keys=True))
|
||||
|
||||
@classmethod
|
||||
def from_pretrained(cls, pretrained_model_name_or_path, **kwargs):
|
||||
r"""
|
||||
This method loads the configuration of your adapter model from a directory.
|
||||
|
||||
Args:
|
||||
pretrained_model_name_or_path (`str`):
|
||||
The directory or the hub-id where the configuration is saved.
|
||||
**kwargs:
|
||||
Additional keyword arguments passed along to the child class initialization.
|
||||
"""
|
||||
if os.path.isfile(os.path.join(pretrained_model_name_or_path, CONFIG_NAME)):
|
||||
config_file = os.path.join(pretrained_model_name_or_path, CONFIG_NAME)
|
||||
else:
|
||||
try:
|
||||
config_file = hf_hub_download(pretrained_model_name_or_path, CONFIG_NAME)
|
||||
except:
|
||||
raise ValueError(f"Can't find config.json at '{pretrained_model_name_or_path}'")
|
||||
|
||||
loaded_attributes = cls.from_json_file(config_file)
|
||||
|
||||
config = cls(**kwargs)
|
||||
|
||||
for key, value in loaded_attributes.items():
|
||||
if hasattr(config, key):
|
||||
setattr(config, key, value)
|
||||
|
||||
return config
|
||||
|
||||
@classmethod
|
||||
def from_json_file(cls, path_json_file, **kwargs):
|
||||
r"""
|
||||
Loads a configuration file from a json file.
|
||||
|
||||
Args:
|
||||
path_json_file (`str`):
|
||||
The path to the json file.
|
||||
"""
|
||||
with open(path_json_file, "r") as file:
|
||||
json_object = json.load(file)
|
||||
|
||||
return json_object
|
||||
|
||||
|
||||
@dataclass
|
||||
class PeftConfig(PeftConfigMixin):
|
||||
"""
|
||||
This is the base configuration class to store the configuration of a :class:`~peft.PeftModel`.
|
||||
|
||||
@ -43,6 +137,7 @@ class PeftConfig:
|
||||
inference_mode (`bool`, defaults to `False`): Whether to use the Peft model in inference mode.
|
||||
"""
|
||||
|
||||
base_model_name_or_path: str = field(default=None, metadata={"help": "The name of the base model to use."})
|
||||
peft_type: Union[str, PeftType] = field(default=None, metadata={"help": "Peft type"})
|
||||
task_type: Union[str, TaskType] = field(default=None, metadata={"help": "Task type"})
|
||||
inference_mode: bool = field(default=False, metadata={"help": "Whether to use inference mode"})
|
||||
|
@ -30,6 +30,11 @@ def bloom_model_postprocess_past_key_value(past_key_values):
|
||||
return tuple(zip(keys, values))
|
||||
|
||||
|
||||
TRANSFORMERS_MODELS_TO_PREFIX_TUNING_POSTPROCESS_MAPPING = {
|
||||
"bloom": bloom_model_postprocess_past_key_value,
|
||||
}
|
||||
|
||||
|
||||
# copied from transformers.models.bart.modeling_bart
|
||||
def shift_tokens_right(input_ids: torch.Tensor, pad_token_id: int, decoder_start_token_id: int):
|
||||
"""
|
||||
|
@ -50,7 +50,10 @@ def get_peft_model_state_dict(model, state_dict=None):
|
||||
raise NotImplementedError
|
||||
else:
|
||||
to_return = {}
|
||||
prompt_embeddings = model.get_prompt_embedding_to_save()
|
||||
if model.peft_config.inference_mode:
|
||||
prompt_embeddings = model.prompt_encoder.embedding.weight
|
||||
else:
|
||||
prompt_embeddings = model.get_prompt_embedding_to_save()
|
||||
to_return["prompt_embeddings"] = prompt_embeddings
|
||||
if model.modules_to_save is not None:
|
||||
for key, value in state_dict.items():
|
||||
@ -74,35 +77,3 @@ def set_peft_model_state_dict(model, peft_model_state_dict):
|
||||
{"weight": peft_model_state_dict["prompt_embeddings"]}, strict=True
|
||||
)
|
||||
return model
|
||||
|
||||
|
||||
def peft_model_load_and_dispatch(model, peft_model_state_dict, peft_config, max_memory=None):
|
||||
"""
|
||||
Load the Peft model state dict and dispatch the model to the correct device.
|
||||
|
||||
Args:
|
||||
model ([`PeftModel`]): The Pre-trained base model which has already been sharded and dispatched
|
||||
using `accelerate` functionalities.
|
||||
peft_model_state_dict (`dict`): The state dict of the Peft model.
|
||||
max_memory (`Dict`, *optional*):
|
||||
A dictionary device identifier to maximum memory. Will default to the maximum memory available for each GPU
|
||||
and the available CPU RAM if unset.
|
||||
"""
|
||||
from accelerate import dispatch_model, infer_auto_device_map
|
||||
from accelerate.hooks import AlignDevicesHook, add_hook_to_module, remove_hook_from_submodules
|
||||
|
||||
from ..mapping import get_peft_model
|
||||
|
||||
remove_hook_from_submodules(model)
|
||||
model = get_peft_model(model, peft_config)
|
||||
model.print_trainable_parameters()
|
||||
set_peft_model_state_dict(model, peft_model_state_dict)
|
||||
device_map = infer_auto_device_map(model, max_memory=max_memory, no_split_module_classes=model._no_split_modules)
|
||||
model = dispatch_model(model, device_map=device_map)
|
||||
hook = AlignDevicesHook(io_same_device=True)
|
||||
if model.peft_config.peft_type == PeftType.LORA:
|
||||
add_hook_to_module(model.base_model.model, hook)
|
||||
else:
|
||||
remove_hook_from_submodules(model.prompt_encoder)
|
||||
add_hook_to_module(model.base_model, hook)
|
||||
return model
|
||||
|
96
tests/test_config.py
Normal file
96
tests/test_config.py
Normal file
@ -0,0 +1,96 @@
|
||||
# coding=utf-8
|
||||
# Copyright 2023-present the HuggingFace Inc. team.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
import os
|
||||
import tempfile
|
||||
import unittest
|
||||
|
||||
from peft import LoraConfig, PrefixTuningConfig, PromptEncoderConfig, PromptTuningConfig
|
||||
|
||||
|
||||
class PeftConfigTestMixin:
|
||||
all_config_classes = (
|
||||
LoraConfig,
|
||||
PromptEncoderConfig,
|
||||
PrefixTuningConfig,
|
||||
PromptTuningConfig,
|
||||
)
|
||||
|
||||
|
||||
class PeftConfigTester(unittest.TestCase, PeftConfigTestMixin):
|
||||
def test_methods(self):
|
||||
r"""
|
||||
Test if all configs have the expected methods. Here we test
|
||||
- to_dict
|
||||
- save_pretrained
|
||||
- from_pretrained
|
||||
- from_json_file
|
||||
"""
|
||||
# test if all configs have the expected methods
|
||||
for config_class in self.all_config_classes:
|
||||
config = config_class()
|
||||
self.assertTrue(hasattr(config, "to_dict"))
|
||||
self.assertTrue(hasattr(config, "save_pretrained"))
|
||||
self.assertTrue(hasattr(config, "from_pretrained"))
|
||||
self.assertTrue(hasattr(config, "from_json_file"))
|
||||
|
||||
def test_task_type(self):
|
||||
for config_class in self.all_config_classes:
|
||||
# assert this will not fail
|
||||
_ = config_class(task_type="test")
|
||||
|
||||
def test_save_pretrained(self):
|
||||
r"""
|
||||
Test if the config is correctly saved and loaded using
|
||||
- save_pretrained
|
||||
"""
|
||||
for config_class in self.all_config_classes:
|
||||
config = config_class()
|
||||
with tempfile.TemporaryDirectory() as tmp_dirname:
|
||||
config.save_pretrained(tmp_dirname)
|
||||
|
||||
config_from_pretrained = config_class.from_pretrained(tmp_dirname)
|
||||
self.assertEqual(config.to_dict(), config_from_pretrained.to_dict())
|
||||
|
||||
def test_from_json_file(self):
|
||||
for config_class in self.all_config_classes:
|
||||
config = config_class()
|
||||
with tempfile.TemporaryDirectory() as tmp_dirname:
|
||||
config.save_pretrained(tmp_dirname)
|
||||
|
||||
config_from_json = config_class.from_json_file(os.path.join(tmp_dirname, "adapter_config.json"))
|
||||
self.assertEqual(config.to_dict(), config_from_json)
|
||||
|
||||
def test_to_dict(self):
|
||||
r"""
|
||||
Test if the config can be correctly converted to a dict using:
|
||||
- to_dict
|
||||
- __dict__
|
||||
"""
|
||||
for config_class in self.all_config_classes:
|
||||
config = config_class()
|
||||
self.assertEqual(config.to_dict(), config.__dict__)
|
||||
self.assertTrue(isinstance(config.to_dict(), dict))
|
||||
|
||||
def test_set_attributes(self):
|
||||
# manually set attributes and check if they are correctly written
|
||||
for config_class in self.all_config_classes:
|
||||
config = config_class(peft_type="test")
|
||||
|
||||
# save pretrained
|
||||
with tempfile.TemporaryDirectory() as tmp_dirname:
|
||||
config.save_pretrained(tmp_dirname)
|
||||
|
||||
config_from_pretrained = config_class.from_pretrained(tmp_dirname)
|
||||
self.assertEqual(config.to_dict(), config_from_pretrained.to_dict())
|
136
tests/test_save_and_load.py
Normal file
136
tests/test_save_and_load.py
Normal file
@ -0,0 +1,136 @@
|
||||
# coding=utf-8
|
||||
# Copyright 2023-present the HuggingFace Inc. team.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
import os
|
||||
import tempfile
|
||||
import unittest
|
||||
|
||||
import torch
|
||||
from transformers import AutoModelForCausalLM
|
||||
|
||||
from peft import (
|
||||
LoraConfig,
|
||||
PeftModel,
|
||||
PrefixTuningConfig,
|
||||
PromptEncoderConfig,
|
||||
PromptTuningConfig,
|
||||
get_peft_model,
|
||||
get_peft_model_state_dict,
|
||||
)
|
||||
|
||||
|
||||
class PeftTestMixin:
|
||||
checkpoints_to_test = [
|
||||
"hf-internal-testing/tiny-random-OPTForCausalLM",
|
||||
]
|
||||
config_classes = (
|
||||
LoraConfig,
|
||||
PrefixTuningConfig,
|
||||
PromptEncoderConfig,
|
||||
PromptTuningConfig,
|
||||
)
|
||||
config_kwargs = (
|
||||
dict(
|
||||
r=8,
|
||||
lora_alpha=32,
|
||||
target_modules=["q_proj", "v_proj"],
|
||||
lora_dropout=0.05,
|
||||
bias="none",
|
||||
task_type="CAUSAL_LM",
|
||||
),
|
||||
dict(
|
||||
num_virtual_tokens=10,
|
||||
task_type="CAUSAL_LM",
|
||||
),
|
||||
dict(
|
||||
num_virtual_tokens=10,
|
||||
encoder_hidden_size=32,
|
||||
task_type="CAUSAL_LM",
|
||||
),
|
||||
dict(
|
||||
num_virtual_tokens=10,
|
||||
task_type="CAUSAL_LM",
|
||||
),
|
||||
)
|
||||
|
||||
|
||||
class PeftModelTester(unittest.TestCase, PeftTestMixin):
|
||||
r"""
|
||||
Test if the PeftModel behaves as expected. This includes:
|
||||
- test if the model has the expected methods
|
||||
"""
|
||||
|
||||
def test_attributes_model(self):
|
||||
for model_id in self.checkpoints_to_test:
|
||||
for i, config_cls in enumerate(self.config_classes):
|
||||
model = AutoModelForCausalLM.from_pretrained(model_id)
|
||||
config = config_cls(
|
||||
base_model_name_or_path=model_id,
|
||||
**self.config_kwargs[i],
|
||||
)
|
||||
model = get_peft_model(model, config)
|
||||
|
||||
self.assertTrue(hasattr(model, "save_pretrained"))
|
||||
self.assertTrue(hasattr(model, "from_pretrained"))
|
||||
self.assertTrue(hasattr(model, "push_to_hub"))
|
||||
|
||||
def test_save_pretrained(self):
|
||||
r"""
|
||||
A test to check if `save_pretrained` behaves as expected. This function should only save the state dict of the
|
||||
adapter model and not the state dict of the base model. Hence inside each saved directory you should have:
|
||||
|
||||
- README.md (that contains an entry `base_model`)
|
||||
- adapter_config.json
|
||||
- adapter_model.bin
|
||||
|
||||
"""
|
||||
for model_id in self.checkpoints_to_test:
|
||||
for i, config_cls in enumerate(self.config_classes):
|
||||
model = AutoModelForCausalLM.from_pretrained(model_id)
|
||||
config = config_cls(
|
||||
base_model_name_or_path=model_id,
|
||||
**self.config_kwargs[i],
|
||||
)
|
||||
model = get_peft_model(model, config)
|
||||
model.to(model.device)
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmp_dirname:
|
||||
model.save_pretrained(tmp_dirname)
|
||||
|
||||
model_from_pretrained = AutoModelForCausalLM.from_pretrained(model_id)
|
||||
model_from_pretrained = PeftModel.from_pretrained(model_from_pretrained, tmp_dirname)
|
||||
model_from_pretrained.to(model.device)
|
||||
|
||||
# check if the state dicts are equal
|
||||
state_dict = get_peft_model_state_dict(model)
|
||||
state_dict_from_pretrained = get_peft_model_state_dict(model_from_pretrained)
|
||||
|
||||
# check if same keys
|
||||
self.assertEqual(state_dict.keys(), state_dict_from_pretrained.keys())
|
||||
|
||||
# check if tensors equal
|
||||
for key in state_dict.keys():
|
||||
self.assertTrue(torch.allclose(state_dict[key], state_dict_from_pretrained[key]))
|
||||
|
||||
# check if `adapter_model.bin` is present
|
||||
self.assertTrue(os.path.exists(os.path.join(tmp_dirname, "adapter_model.bin")))
|
||||
|
||||
# check if `adapter_config.json` is present
|
||||
self.assertTrue(os.path.exists(os.path.join(tmp_dirname, "adapter_config.json")))
|
||||
|
||||
# check if `pytorch_model.bin` is not present
|
||||
self.assertFalse(os.path.exists(os.path.join(tmp_dirname, "pytorch_model.bin")))
|
||||
|
||||
# check if `config.json` is not present
|
||||
self.assertFalse(os.path.exists(os.path.join(tmp_dirname, "config.json")))
|
Reference in New Issue
Block a user