Compare commits

...

355 Commits

Author SHA1 Message Date
30889ef260 Release: v0.8.0 (#1406) 2024-01-30 11:17:42 +05:30
67918efb49 Fix LoftQ docs (#1408) 2024-01-30 10:09:30 +05:30
189a9a666d add peft type constructor (#1398) 2024-01-29 11:55:01 +05:30
bfc102c0c0 [docs] Task guides (#1332)
* soft prompt guides

* small edits

* feedback

* feedback
2024-01-27 13:39:20 +05:30
1c1c7fdaa6 Fix LoRA module mapping for Phi models (#1375) 2024-01-24 19:24:38 +01:00
4a15595822 Improve documentation for the all-linear flag (#1357)
* added docs for all-linear

* added doc in quantization section

* added doc in lora section

* minor edit

* minor edit
2024-01-22 15:47:45 +01:00
bb2471d926 save the embeddings even when they aren't targetted but resized (#1383) 2024-01-22 20:16:42 +05:30
54ca31153d add mixtral in mapping (#1380) 2024-01-22 09:22:34 +01:00
ebbff4023a account for the new merged/unmerged weight to perform the quantization again (#1370) 2024-01-18 15:39:09 +01:00
62237dc9b1 Handle resizing of embedding layers for AutoPeftModel (#1367)
* handle resizing of embedding layers for AutoPeftModel

* fixes

* add test
2024-01-17 21:02:16 +05:30
eaa5eef28e Added missing getattr methods for mixed model (#1365) 2024-01-17 19:55:49 +05:30
bf54136a79 [docs] Docstring link (#1356)
* fix format

* hmm
2024-01-12 09:00:08 -08:00
a43ec59762 FEAT Add Poly Adapter (#1129)
Implement the Poly (Polytropon) adapter.

Papers:

- https://arxiv.org/abs/2202.13914
- https://arxiv.org/abs/2211.03831

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2024-01-12 17:19:12 +01:00
0089ebd272 DOC Add PeftMixedModel to API docs (#1354) 2024-01-12 17:29:52 +05:30
fe01d6de85 [Docs] make add_weighted_adapter example clear in the docs. (#1353)
* make add_weighted_adapter example clear in the docs.

* Apply suggestions from code review
2024-01-12 17:25:30 +05:30
f9b673ea37 DOC Extending the vocab and storing embeddings (#1335)
Resolves #1300

Sourab added the feature to store the embedding layers alongside the
adapter in #1147. This PR adds an entry to the documentation to explain
the new feature.

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-01-12 12:38:49 +01:00
dc28a61e82 FIX Setting active adapter for quantized layers (#1347)
Resolves #1345

See also #1294 for a similar (but incomplete) fix.

This commit fixes the setting of the adapter name on a couple of
quantized layers that was accidentally removed in #1106. This affects
users who use a non-default adapter name when they want to train these
layers.

---------

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
2024-01-12 11:55:46 +01:00
71585d611f New transformers caching ETA now v4.38 (#1348)
See #1252 and #1352 for more context.

The initial idea was for transformers 4.37 to add the new caching to all
architectures, but this was postponed to 4.38. The code needs to be
adapted for prompt tuning not to break when transformers 4.37 is
released.
2024-01-12 11:54:53 +01:00
c6bcf91ca1 QOL improvements and doc updates (#1318)
* improve docs and add small utils

* quality

* fix typo

* updates

* quality

* Update src/peft/utils/other.py

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* address comments

* quality

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2024-01-12 16:18:55 +05:30
4354a7d496 fix prepare_inputs_for_generation logic for Prompt Learning methods (#1352)
* fix `prepare_inputs_for_generation` logic for Prompt Learning methods

* 😅
2024-01-12 16:18:42 +05:30
f36f50acb4 DOC: Update docstring for the config classes (#1343)
* DOC Update docstring for the config classes

Over time, the docstrings of the numerous config classes have not kept
up to date with changes in the code. This PR updates the docstrings to
reflect the current state of the code.

On top of that, multiple small updates have been made:

- Correct wrong or imprecise type annotations.
- More neutral wording of the docstring. E.g., say "adapter" instead of
  "LoRA". This makes it easier to copy&paste the docstrings between
  classes.
- Use same wording for shared arguments.
- Add missing arguments.
- Uniform formatting: Always a line break after the first line of the
  docstring (not mixed, as that can be confusing).
- Fix line lengths to be consistently at 120 characters.
2024-01-12 11:29:39 +01:00
777c0b6ad7 DOC AdaLoraModel.update_and_allocate usage (#1341)
Clarify that this method needs to be called explicitly.
2024-01-11 14:52:14 +01:00
6451cbd70c Fix logic in target module finding (#1263)
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2024-01-10 15:08:00 +01:00
7d28536b18 DOC Correct help for CLI args in script (#1338) 2024-01-10 11:44:25 +01:00
eb2c12d99a ENH Add attribute to show targeted module names (#1330)
This is just a tiny convenience feature to help users understand
which modules are being targeted by the adapter. This can be useful
to quickly check if a complex regex works for `target_modules`.

Note: This should work for all adapters that use BaseTuner, so not only
LoRA but also IA³, LoHa, etc. Only the first two were tested but that
should be enough.
2024-01-10 11:38:40 +01:00
c6b28a22b8 DOC Troubleshooting for unscaling error with fp16 (#1336)
Some users ran into the issue of trying to use a model loaded in float16
with mixed precision, e.g. these issues: #341, #1249. This PR documents
a workaround to solve the issue.

I also added tests that demonstrate the issue, as well as the
workaround.

Notes

This is not strictly a PEFT issue, but more a general error when using
AMP with float16. Still, since PEFT users encounter this sometimes, it
is useful to document it.

When we discussed this issue in the past, I think we concluded that it's
not as straightforward as PEFT automatically casting the weights to
float32, though I cannot remember anymore what the drawbacks were.

In any case, should we ever add an automatic solution for this in PEFT,
the added test should fail, which alerts us to the fact that we need to
update the documentation.
2024-01-10 12:08:23 +05:30
e96eef9ea1 FIX Don't load tokenizer when unnecessary (#1333)
When loading prompt tuning for inference, it is not necessary to load
the tokenizer.
2024-01-09 17:28:57 +01:00
54ee2fb1af Refactor dispatching logic of LoRA layers (#1319)
This PR's goal is to simplify the logic for deciding which LoRA layer
backend is being used when LoRA is applied to a target layer.

Originally, this refactor was done in #1286 which was about adding the
"fast" backend for LoRA, but since that PR was closed, I moved the
refactor to this dedicated PR.

Motivation

Right, now, the LoraModel._create_new_module method has become quite
complex and hard to read, spanning >100 lines:

8665e2b571/src/peft/tuners/lora/model.py (L235-L339)

The reason for this is that method contains the logic for deciding which
LoRA layer backend to use for all the different types of LoRA layers
that we have, i.e. normal Linear layer, Conv2d layer, bnb layer, gptq,
etc.

Description

To remedy this, I moved the logic for deciding which layer to match to
the respective implementation of the layers. For example, in
lora/layer.py, there is now a function called dispatch_default, whose
responsibility it is to decide if an Embedding layer, Conv2d layer or
Linear layer is the right match. Similarly, in lora/bnb.py, there are
now the two functions dispatch_bnb_8bit and dispatch_bnb_4bit to decide
what/if any bnb 8bit or 4bit layer should be matched.

This way, the logic to decide what layer to match now resides next to
the respective layers. The only thing that LoraModel now needs to do is
to collect all the dispatching methods and use the first layer that
matches.

Note that only LoRA was modified, the other tuners don't have different
backends and thus this approach was not necessary for them. The only
exception is IA³, which has the normal and bnb backend. Since those are
only 2, it's not as complicated as for LoRA, but if this PR is accepted,
I can refactor IA³ in a similar fashion.

Other changes

- Removed the optional_kwargs argument from _create_and_replace, as it
  was an unnecessary indirection.
- Removed the bias argument from kwargs, as it was not used.

Backwards compatibility

This should be fully backwards compatible, as the constructed LoRA model
is 100% the same. If there are users that override _create_new_module,
their code will probably break, but since this is a private method, we
should be fine.
2024-01-09 12:18:31 +01:00
cbd783b4df Add an option 'ALL' to include all linear layers as target modules (#1295)
* added helper function to get list of all linear layers; added tests and updated documentation

* added bnb tests

* fixed issues with t5

* style issues

* improved lora and ia3 docs

* fixed code to work for any output embedding layer name

* style changes

* added a test for a base model without lm head

* added comments

* address review comments

* update tests

* update tests

* minor simplification

* changed argument to all_linear

* minor fix to configs

* minor edit

* Apply suggestions from code review

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* address review comments

* added test for diffusion models

* minor edits to configs

* spelling correction

* Update tests/test_tuners_utils.py

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* Update src/peft/tuners/tuners_utils.py

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* Update src/peft/tuners/tuners_utils.py

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* address review comments

* revert back to older decorator order

* style changes

* simplify logic for bnb layers

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2024-01-09 16:19:58 +05:30
26504a0119 Extend merge_and_unload to offloaded models (#1190)
* activated pre-forward

* activated pre-forward hook

* activated pre-forward hook

* activated pre-forward hook

* debugged hook call

* added explicit forwards

* debugged

* debugged

* fixed pre-forward hook call

* fixed pre-forward hook call

* debugged module iteration

* fixed post forward args

* added conditional attr check

* fixed conditional attr check

* memory overflow debug

* memory overflow debug

* added mem trace

* added mem trace

* more memory traces

* debug memory leak

* debug memory leak

* removed replace

* removed device assign during replacement

* no grad during replacement

* new module hook

* to cpu

* to cpu

* removed replace module

* conditional on replace module

* removed traces

* make style

* added back replace_module

* added test and make style

* inline key, module

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* fixed test and make style

* reverted _unload_and_optionally_merge and moved test

* match main

* make style

* reverted model.py

* make style

* reverted merge

* fetched model.py from head

* added onload

* debug

* removed replace module

* removed replace module

* pre forward on target and parent

* removed _replace_module

* reverted

* debugged

* debugged

* traced adapters

* debugged

* added trace on adapter names

* onloaded target

* further traces

* further traces

* further traces

* further traces

* further traces

* onloaded adapters

* onload module

* onload module

* onload module

* debugged

* debugged

* debugged

* removed delta weight onload

* revamped delta weight onload

* revamped delta weight onload

* removed replace module

* added parent and target act

* debugged

* debugged

* added traces

* added traces

* added traces

* init hook

* init hook

* traces

* traces

* specd weights map

* removed traces and offload check

* post forwards on lora

* added post forward for target and parent

* added trace

* removed traces and tp post forwards

* added onloads and offloads to embedding and conv2d

* updated test

* make style

* debugged and make style

* refactored and make style

* cleaned

* refactored and make style

* cleaned

* cleaned

* make style

* make style

* disk offload compatibility

* refactored linear onload via contextmanager

* refactored onloads

* debugged

* tempfile to tempfolder

* changed disk offload to original directory

* refactored for general tuners

* debugged

* explicit base layer

* added base traces

* more traces

* debugged;

* reverted lora layer.py

* removed traces and make style

* cleaned

* removed todo

* fixed test and cleaned

* added suggestions and make style

* onload for unmerge and merge_and_unload

* improved docstring

* onload target only and make style

* Update src/peft/tuners/tuners_utils.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* revised descriptions

* make style

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2024-01-09 06:31:30 +01:00
4186c9b104 FIX Use torch.long instead of torch.int in LoftQ for PyTorch versions <2.x (#1320)
Solves #1307

For PyTorch < v2.x, using torch.int does not work for indexing, thus
using torch.long.
2024-01-08 10:45:12 +01:00
8665e2b571 fix diffusers tests (#1317)
* fix diffusers tests

* quality
2024-01-03 20:05:06 +05:30
cbf346d962 fix the embedding saving for adaption prompt (#1314)
* fix the embedding saving for adaption prompt

* fix

* automate setting `save_embedding_layers` when embedding layer is resized during finetuning

* fix

* address comment

Co-Authored-By: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* oops

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2024-01-03 15:22:26 +05:30
2a0fb71f4f Mistral IA3 config defaults. (#1316) 2024-01-03 01:59:31 +05:30
c4cf9e7d3b FIX Set active adapter in bnb lora layers init (#1294)
Was accidentally removed in #1106
2024-01-02 13:02:42 +01:00
cf04d0353f [BNB] fix dockerfile for single gpu (#1305) 2023-12-27 15:41:33 +01:00
4023da904f fix fsdp auto wrap policy (#1302)
* fix fsdp policy

* fix fsdp

* revert

* refactor to be inline with Accelerate
2023-12-27 14:43:27 +05:30
6fe1aac65d [BNB] Fix bnb dockerfile for latest version (#1291)
* fix docker

* fix

* Update .github/workflows/nightly-bnb.yml

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-12-26 14:28:39 +01:00
799420aef1 Update nightly-bnb.yml (#1287) 2023-12-22 17:59:26 +01:00
993836ff90 DOC Improve target modules description (#1290)
For LoRA and IA³, it is allowed to not specify a target module, in which
case the correct layers are derived from the model architecture. This
was not documented so far.
2023-12-21 17:00:09 +01:00
1c9679ac71 [docs] Concept guides (#1269)
* concept-docs

* mpt and llama-adapter

* review

* feedback

* toctree

* Update docs/source/conceptual_guides/adapter.md

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-12-20 10:56:02 -08:00
e745ffd7d0 FIX Errors in StableDiffusion adapter conversion script (#1281) 2023-12-20 12:00:05 +01:00
029dcd5a1c [bnb] Add bnb nightly workflow (#1282)
* add bnb nightly workflow

* add matrix strategy

* temp

* oops

* temp

* oops

* nit

* fixes

* up

* up

* up

* add pytest cov

* up

* oops

* put correct dir

* fix

* fix dir in makefile + failing test

* revert

* Update .github/workflows/nightly.yml

* Update nightly-bnb.yml

* Update log_reports.py

* Update Makefile

* Update .github/workflows/nightly-bnb.yml

* Update .github/workflows/nightly-bnb.yml

* Update .github/workflows/nightly.yml

* Update nightly.yml

* Update .github/workflows/nightly-bnb.yml

* Update nightly-bnb.yml
2023-12-20 10:49:13 +01:00
482a2a6d9a TST Enable LoftQ 8bit tests (#1279)
Due to PR #1276, the bug that prevented use of LoftQ with 8bit
quantization has now been fixed. Therefore, the tests no longer need to
be skipped.
2023-12-18 17:29:33 +01:00
119de1715c [Tests] Add bitsandbytes installed from source on new docker images (#1275)
* add bnb from source dockerfile

* Update build_docker_images.yml

* Update build_docker_images.yml

* minor refactor
2023-12-18 15:15:43 +01:00
a0a46c06db Refactor and a couple of fixes for adapter layer updates (#1268)
* Refactor: Move LoRA update_layer to child classes

For LoRA, so far, we have update_layer for Linear,
update_layer_embedding for Embedding, and update_layer_conv2d for
Conv2d, all defined on LoraLayer.

We can simplify the code by always using the name update_layer, and by
moving the layer-specific methods to the subclasses. So e.g.
update_layer_embedding is moved to the Embedding class and renamed to
update_layer. This way, the caller does not need to differentiate which
type of layer it's calling.

Interestingly, this was already practiced for IA³, so the same change
was not necessary there. But I did find the same method implemented
twice, once on IA3Layer and once on Linear, so I removed one of the
duplicates

* Systematic handling of r (rank) <= 0

Always raise an error when r <= 0, not only for LoRA. Also, removed
later check for r > 0 in LoRA layers, since we already check for r <= 0.

* Fix broken __repr__ method on QuantLinear

Was indented too deep, thus not being applied.

* Fix bug for updating Lora GPTQ and IA3 bnb layers

Before this fix, when adding a 2nd adapter to a model, we did not
correctly check if there was already an adapter layer in the model when
dealing with LoRA GPTQ or IA3 bnb layers. As a consequence, instead of
updating the existing layers, we would create a new layer and the
existing layer would be set as the base_layer of that new layer. Now, we
correctly update the existing layer to add the new adapter.

Note that for this fix to work correctly with LoRA and GPTQ, I had to
add a check for qweight, since we only checked for weight before.

Tests were added to check this. They fail with the current main but are
fixed with this PR.

* Don't match AdaLoraLayer when updating LoraLayers

AdaLoraLayer is a subclass of LoraLayer, so just checking for
isinstance(target, LoraLayer) will match AdaLoraLayer, which we don't
want when it comes to updating a LoraLayer. Now, we explicitly check
that the layer is *not* an instance of AdaLoraLayer.
2023-12-18 10:59:17 +01:00
3708793ba9 TST Extend LoftQ tests to check CPU initialization (#1274)
Tests to complement PR #1256
2023-12-18 10:37:48 +01:00
46a84bd395 LoftQ: edit README.md and example files (#1276)
* fix when num_bits == 2 or 8

* try 13b
2023-12-17 15:21:25 +01:00
bd544bb2ce LoftQ: Allow quantizing models on CPU (#1256) 2023-12-15 16:43:33 +01:00
55c37e9c0b feat: add apple silicon GPU acceleration (#1217)
* feat: add apple silicon GPU acceleration

* Fix device compatibility issue in
load_peft_weights function

* Update save_and_load.py

* Update save_and_load.py

* Update save_and_load.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/peft/utils/save_and_load.py

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* Fix string formatting in image_classification_timm_peft_lora.ipynb and multilayer_perceptron_lora.ipynb

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2023-12-15 13:05:06 +01:00
997e6ec5ab ENH Rank-stabilized LoRA scaling option (#1244)
Add option to scale LoRA weights by alpha/sqrt(r) by passing
LoraConfig(..., use_rslora=True).

https://doi.org/10.48550/arXiv.2312.03732
2023-12-15 12:16:59 +01:00
ddb114af0a remove a duplicated description (#1271)
remove duplicated description for _check_target_module_exists in BaseTuner class
2023-12-15 11:04:29 +01:00
4b02148af2 TST Revert device_map for AdaLora 4bit GPU test (#1266)
This was recently added in #1242 but fails on CI with single GPU.
2023-12-14 11:41:31 +01:00
0f1e9091cc Fix ModulesToSaveWrapper __getattr__ (#1238)
* Update other.py

* Update other.py

* Update test_low_level_api.py
2023-12-13 12:52:56 +01:00
88e2e75cc3 FIX Error in log_reports.py (#1261)
Silly mistake...
2023-12-13 10:50:05 +01:00
c9df262d69 Bump version to 0.7.2.dev0 post release (#1258) 2023-12-12 18:30:41 +01:00
67a08009ff Release: 0.7.1 (#1257)
Also fix some more seeds to prevent flakiness
2023-12-12 17:53:36 +01:00
971dd6e815 Fix: Multiple adapters with bnb layers (#1243)
Resolves #1239

Fixes a bug that led to an error when loading multiple adapters into a
peft model that uses bnb layers.

Also: Fix for loading 2nd adapter with AutoGPTQ
2023-12-12 15:34:45 +01:00
ee6f6dcee7 FIX Issues with transformers 4.36 (#1252)
Adjust for different type of past_key_values when using caching.

Also: Fix some seeds for flaky tests.

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2023-12-12 15:16:00 +01:00
21c304f6f6 FIX Truncate slack message to not exceed 3000 char (#1251)
Should fix the issue of not receiving slack notifications because the
message is too long, see:

https://github.com/huggingface/peft/actions/runs/7148379741/job/19469273483

Currently, we get:

> Error: ver responded with: {'ok': False, 'error': 'invalid_blocks', 'errors': ['failed to match all allowed schemas [json-pointer:/blocks/1/text]', 'must be less than 3001 characters [json-pointer:/blocks/1/text/text]'], 'response_metadata': {'messages': ['[ERROR] failed to match all allowed schemas [json-pointer:/blocks/1/text]', '[ERROR] must be less than 3001 characters [json-pointer:/blocks/1/text/text]']}}

Fixing the error should also lead to a shorter message, but we should
ensure that even if the message is too long, we still get it.
2023-12-12 11:05:48 +01:00
e73967edea [docs] Quantization (#1236)
* first draft

* feedback

* update api doc

* feedback
2023-12-11 08:48:06 -08:00
b08e6faf2b TST: Add tests for 4bit LoftQ (#1208)
Add GPU tests for LoftQ with 4bit quantization.

Notes

Tests for 8bit quantization are already there but not run at the moment,
see this comment:

https://github.com/huggingface/peft/pull/1150#issuecomment-1838891499

In my testing, 8bit passes when using NFQuantizer, so if the original
author is fine with using that, I can make the adjustment.

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-12-11 15:34:36 +01:00
5c13ea3b12 FIX Use model argument consistently (#1198) (#1205)
Some methods were using model and self.model interchangeably. This was
fine, as they were referring to the same object, but is also confusing.
Now model is used consistently.
2023-12-11 12:35:28 +01:00
00b820061e Revert "FIX Pin bitsandbytes to <0.41.3 temporarily (#1234)" (#1250)
This reverts commit 86562eec49bede2f4525be343f642af8fb46ddbc.
2023-12-11 12:11:18 +01:00
504d3c8329 [docs] PEFT integrations (#1224)
* rough draft

* remove

* feedback

* fix image links and doc references

* resolve links manually

* use internal link
2023-12-08 13:01:37 -08:00
fc9f4b3176 Bnb integration test tweaks (#1242)
* allow bitsandbytes integration test selection

* fix typo: mutli -> multi

* enable tests to run on >2 GPUs

* fix for >3 GPUs, due to artidoro/qlora #186

* fix formatting
2023-12-08 13:20:13 +01:00
895513c465 TST: Add tolerance for regression tests (#1241)
Tests currently call torch.allclose without any tolerance, which is
probably the cause of the CI failure. Now, tolerance is set to 1e-4.
2023-12-08 11:50:48 +01:00
c893394808 [docs] PeftConfig and PeftModel (#1211)
* rough draft

* feedback

* feedback
2023-12-07 14:22:26 -08:00
86562eec49 FIX Pin bitsandbytes to <0.41.3 temporarily (#1234)
Some tests are failing with bitsandbytes 0.41.3:

python -m pytest -m single_gpu_tests tests/test_common_gpu.py -k
test_4bit_merge

For the time being, use the next smaller version.
2023-12-07 16:46:15 +01:00
b467e3de5c Lazy import of bitsandbytes (#1230)
Previously, we imported from bitsandbytes eagerly if the package was
installed. This caused two major issues:

- Slow loading time of PEFT (~4 sec)
- Errors with multiprocessing because bnb initializes CUDA

This commit fixes both issues by importing bitsandbytes lazily. PEFT
import time is now reduced to ~2sec.

Notes

Implementation-wise, I use a combination of local imports and
module-level __getattr__. The latter was introduced in Python 3.7 and
should therefore be safe to use.
2023-12-07 16:39:08 +01:00
2ab005f3ab TST Run regression test in nightly test runner (#1233)
Follow up to #1115
2023-12-07 15:11:40 +01:00
b482391b80 Don't set config attribute on custom models (#1200)
Initially, we had the issue that it was sometimes assumed that models
had a config attribute, as is given for transformers models. This made
PEFT fail with custom models, so we made a change to set a dummy config
on those.

However, this can lead to issues down the line. For example, when users
use the Trainer class from transformers, they can stumble upon lines
like this:

62ab32b299/src/transformers/integrations/integration_utils.py (L636-L637)

62ab32b299/src/transformers/integrations/integration_utils.py (L729-L730)

Here transformers assumes that if config attribute exists on the model,
it must have a to_json_string method or a to_dict method (as it assumes
the config to be a PretrainedConfig instance). Therefore, in order not
to trip up transformers, it is best not to set any config at all.

Alternative

Alternatively, transformers could be changed to check each time when the
config attributes exists, if it is a PretrainedConfig instance, but that
would be a much larger change (albeit a cleaner one).
2023-12-07 10:56:21 +01:00
d56df7fc64 Bump version to 0.7.1.dev0 post release (#1227)
Also updated the release instruction for installing from pypi, as the
previous command seems to be causing trouble recently (see internal
discussion).
2023-12-06 19:04:13 +01:00
a87ff4c744 [docs] OFT API docs (#1221) 2023-12-06 16:26:21 +01:00
2665f80a17 Release: 0.7.0 (#1214)
In preparation for the 0.7.0 release. Also remove obsolete TODO
comments.
2023-12-06 15:11:00 +01:00
9fd788bedb TST: Add regression tests 2 (#1115)
Description

In general, for regression tests, we need two steps:

1. Creating the regression artifacts, in this case the adapter
   checkpoint and the expected output of the model.
2. Running the regression tests, i.e. loading the adapter and checking
   that the output of the model is the same as the expected output.

My approach is to re-use as much code as possible between those two
steps. Therefore, the same test script can be used for both, with only
an environment variable to distinguish between the two. Step 1 is
invoked by calling:

`REGRESSION_CREATION_MODE=True pytest tests/regression/test_regression.py`

and to run the second step, we call:

`pytest tests/regression/test_regression.py`

Creating regression artifacts

The first step will create an adapter checkpoint and an output for the
given PEFT version and test setting in a new directory. E.g. it will
create a directory `tests/regression/lora_opt-125m_bnb_4bit/0.5.0/` that
contains adapter_model.bin and output.pt.

Before this step runs, there is a check that the git repo is clean (no
dirty worktree) and that the commit is tagged (i.e. corresponds to a
release version of PEFT). Otherwise, we may accidentally create
regression artifacts that do not correspond to any PEFT release.

The easiest way to get such a clean state (say, for PEFT v0.5.0) is by
checking out a tagged commit, e.g:

`git checkout v0.5.0`

before running the first step.

The first step will also skip the creation of regression artifacts if
they already exist.

It is possible to circumvent all the aforementioned checks by setting
the environment variable `REGRESSION_FORCE_MODE` to True like so:

`REGRESSION_FORCE_MODE=True REGRESSION_CREATION_MODE=True pytest tests/regression/test_regression.py`

You should only do this if you know exactly what you're doing.

Running regression tests

The second step is much simpler. It will load the adapters and the
output created in the first step, and compare the output to the output
from a new PEFT model using the loaded adapter. The outputs should be
the same.

If more than one version is discovered for a given test setting, all of
them are tested.

Notes

Regression artifacts are stored on HF Hub.
2023-12-06 15:07:05 +01:00
2336780f9e Raise error when modules_to_save is specified and multiple adapters are being unloaded (#1137)
* handle `modules_to_save` when unloading

* address comments

* Apply suggestions from code review

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* quality

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2023-12-06 19:14:58 +05:30
c22a8e5d47 DOC: How to configure new transformers models (#1195)
I believe that new transformers architectures could be the most common
case of users wanting to apply PEFT on a model that is not supported out
of the box. Thus I added a section specifically to help users configure
their configs for new transformers models.

As I wanted to point users to a single file that contains all the
existing transformers models, I added a new file
`src/peft/utils/constants.py`, which contains all the mappings that
previously lived in `src/peft/utils/other.py`. LMK if that makes sense.

Notes

To be absolutely backwards compatible, I re-imported the moved constants
into `other.py`. This way, if there is code that imports them directly
from there, it should continue to work.

To avoid getting a linter error for unused imports, I added those
constants to the `__all__` list in `other.py`.

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-12-05 18:51:12 +01:00
1a7433b136 TST Improve test for SD LoHa and OFT (#1210) 2023-12-05 18:12:39 +01:00
70d559d029 DOC Initialization options for LoRA (#1218)
Document the initialization options for LoRA. This is especially
important for LoftQ, since otherwise, it may not obvious for users how
to make use of it.
2023-12-05 18:01:47 +01:00
bffbbbf76a MNT Delete the delete doc workflows (#1213)
They are failing because the corresponding GH action no longer exists.

See discussion in #open-source-interal
2023-12-05 13:21:28 +01:00
9c70468a3c [docs] API docs (#1196)
* first draft

* fix path

* fix all paths

* typo

* last typo 🤞

* fix toctree

* typo

* fix section title

* feedback

* update
2023-12-04 11:45:26 -08:00
f7cf460f7c [docs] Update index and quicktour (#1191)
* first draft

* fix toctree

* lora subby section

* feedback

* iframe height

* feedback
2023-12-04 11:00:29 -08:00
1b1091c158 remove HF tokens (#1207) 2023-12-04 15:15:19 +01:00
c456d55216 DOC: Update & improve docstrings and type annotations for common methods and classes (#1201)
The docstrings of the most user-exposed methods and classes have been
updated, or added if not already present. Furthermore, type annotations
have been updated or added for those methods and classes.

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-12-04 12:22:03 +01:00
e05b2670c5 ENH: Enable OFT adapter for mixed adapter models (#1204)
This PR makes it possible to use the newly added OFT adapter in mixed
adapter type models, similar to LoRA, LoHa, etc.

Notes

Adding the integration was pretty straightforward, which is a good sign.

The difficult part was actually about the tests. This stems from the
fact that OFT is (if my understanding is correct) never commutative.
What I mean is that even if the adapters are applied to the last layer
of a model, it makes a difference whether we apply, say, first LoRA,
then OFT vs first OFT, then LoRA.

This is different for the other adapters that were added so far for
mixed models, as they basically do:

- Xa = X + dXa
- Xab = Xa + dXb = X + dXa + dXb = X + dXb + dXa = Xb + dXa = Xba

This is not true for OFT, so when OFT is used, I had to ensure
that no test was applied that (implicitly) assumes commutativity.

Furthermore, I had to increase the model size, see this comment:

https://github.com/huggingface/peft/pull/1160#issuecomment-1836107235
2023-12-04 12:18:49 +01:00
5ed46e4f04 FIX Issue with megatron parallel linear lora (#1202) 2023-12-04 12:16:58 +01:00
5bad88ba04 [DOCS] README.md (#1054)
minor fixes
2023-12-04 11:53:40 +01:00
6a57472665 Mixed adapter models (#1163)
Description

This PR allows to add adapters of different types, e.g. LoRA and LoHa:

base_model = ...
config0 = LoraConfig(...)
peft_model = get_peft_model(base_model, config0, mixed=True)
config1 = LoHaConfig(...)
peft_model.add_adapter(config1, "other")
peft_model.set_adapter(["default", "other"])
peft_model(x)

At this point, both adapters are active at the same time.

Existing code should not be affected by this change, since users need to
opt into this behavior by setting mixed=True, and a completely different
class is being used (PeftMixedModel).

Also interesting is that this method can be used for a single adapter
type but with very different configs. Right now, we have limited support
for that (e.g. for LoRA, different r values by using rank_pattern), but
with this, we don't need to special case the differing arguments
anymore.

Not implemented

- [ ] I'm not yet sure if the same logic can be applied to IA³ or if it
  may fail because IA³ can apply its scaling to the input, not the output.
- [ ] OFT is not supported yet but should work.
- [ ] It is currently not possible to represent a mixed adapter model as
  a single config. I think we can come up with a solution but I don't
  think it is necessary for a first version of this.
- [ ] Saving and loading is not yet implemented for mixed models.

Those could potentially be added in a future PR.

---------

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
2023-11-30 21:58:16 +01:00
da17ac0f48 [Feature] Support OFT (#1160)
* Support OFT

* add test

* Update README

* fix code quality

* fix test

* Skip 1 test

* fix eps rule and add more test

* feat: added examples to new OFT method

* fix: removed wrong arguments from model example

* fix: changed name of inference file

* fix: changed prompt variable

* fix docs

* fix: dreambooth inference revision based on feedback

* fix: review from BenjaminBossan

* apply safe merge

* del partially

* refactor oft

* refactor oft

* del unused line

* del unused line

* fix skip in windows

* skip test

* Add comments about bias added place

* rename orig_weights to new_weights

* use inverse instead of linalg.inv

* delete alpha and scaling

---------

Co-authored-by: Lukas Kuhn <lukaskuhn.lku@gmail.com>
Co-authored-by: Lukas Kuhn <lukas.kuhn@deutschebahn.com>
2023-11-30 21:28:42 +05:30
2674f5ea66 Megatron distributed parallel linear LoRA (#1092)
Adds option to use Megatron's ColumnParallelLinear and RowParallelLinear
for LoRA linear layers, leading to improved performance when using LoRA
with Megatron.
2023-11-30 16:24:58 +01:00
2b901ee572 Add LoftQ initialization method for LoRA (#1150)
---------

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2023-11-29 17:08:17 +01:00
8298f1a366 Training PEFT models with new tokens being added to the embedding layers and tokenizer (#1147)
* add support for saving base layers weights along with adapter weights

* Update save_and_load.py

* Add an example showing the usage of the added feature

* refactor the functionality

* fix

* refactoring code

1. Add `is_embedding_layer_resized` parameter to `save_pretrained`
2. Fix the deduplication in README when adding PEFT details.
3. `save_pretrained` should only save the model when `is_main_process=True` which is one of the parameters of `save_pretrained`.

* update example

* fix the model card

* fix model card

* 😅

* fix model card

* automate setting `is_embedding_layer_resized`

* nits

* Update peft_lora_clm_with_additional_tokens.ipynb

* add test

* fix tests

* maybe fixes the issue?

* address comments

Co-Authored-By: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2023-11-29 19:28:41 +05:30
f0fb9516d8 ENH: Different initialization methods for LoRA (#1189)
This PR adds the possibility to use different initialization methods for
LoRA, as is a requirement for a completely backwards compatible adoption
of PEFT in diffusers.

The default is still the same as always, namely the one from the
reference implementation by Microsoft. On top of that, it is now
possible to pass `init_lora_weights='gaussian'` to initialize the LoRA
weights in the same way as is default for diffusers, namely with a
normal distribution which is scaled by 1/r.

The init method currently applies to LoRA linear and conv layers, but
not embedding layers, which are always initialized from a normal
distribution (and are probably irrelevant for diffusers).

In the future, similar extensions could be added for other adapter
methods.
2023-11-29 12:37:39 +01:00
04c411010b Examples: add options to save or push model (#1159) 2023-11-28 16:04:52 +01:00
da29ae62d4 ENH Add support for phi model architecture (#1186) 2023-11-28 14:43:06 +01:00
64c8d1da85 FIX Pass HF token when calling PeftModel.from_pretrained (#1076) 2023-11-28 14:17:25 +01:00
e586f96740 DOC Update a few places in the README (#1152)
- fix bits_and_bytes => bitsandbytes
- add a few links
- add mistral to list of supported models
---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2023-11-28 11:04:57 +01:00
e35d46de19 Fix code example in quicktour.md (#1181) 2023-11-27 22:29:11 +01:00
b4faffea8a [Tests] Migrate to AWS runners (#1185)
* migrate single-gpu runners

* Update nightly.yml

* Update nightly.yml

---------

Co-authored-by: Guillaume LEGENDRE <glegendre01@gmail.com>
2023-11-24 18:40:19 +01:00
19145bba8a FIX Wrong use of base layer (#1183)
This is important if we have nested adapter layers. This was an overlook
during the refactoring #1106.
2023-11-24 17:03:59 +01:00
c0dd27bc97 Fix dockerfile build (#1177)
* Update Dockerfile

* Update build_docker_images.yml

* Update Dockerfile

* Update build_docker_images.yml
2023-11-23 15:40:35 +01:00
fb607d00ad DOC convert mdx to md (#1171)
Content can still technically be mdx but mdx is not rendered well on
GitHub, so this makes reviewing doc files easier.
2023-11-23 11:38:57 +01:00
a634f6a13e Update release checklist about release notes (#1170)
Add a reminder in the release checklist to consult the release note
google doc.
2023-11-23 10:35:53 +01:00
dd4771b2f4 (minor) correct type annotation (#1166)
* add correct type annotation

* make style
2023-11-22 20:52:26 +01:00
043238578f fix add_weighted_adapter method (#1169)
* fix `add_weighted_adapter` method

Co-Authored-By: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-Authored-By: jihuishan <151612440+jihuishan@users.noreply.github.com>

* Update testing_common.py

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: jihuishan <151612440+jihuishan@users.noreply.github.com>
2023-11-22 17:44:21 +05:30
b4ac2d840b FIX Dataset loaded twice in 4-bit finetuning script (#1164) 2023-11-22 12:23:50 +01:00
0ae52fece1 [Docs fix] Relative path issue (#1157) 2023-11-21 10:57:56 +01:00
8351331d78 ENH Delete IA3 adapters (#1153) 2023-11-20 18:22:52 +01:00
f1ecfa6ae6 Use huggingface_hub.file_exists instead of custom helper (#1145)
* Use 'huggingface_hub.file_exists' instead of custom helper

* make quality
2023-11-17 15:48:02 +01:00
b5a8a294ed FIX A few issues with AdaLora, adding tests (#1146)
This PR fixes a handful of issues with AdaLora, should resolve #1113.

Description

1. lora_A.weight.device was called but for AdaLora, lora_A is a
   nn.Paramter, not an nn.Module, so the weight attribute does not
   exist. lora_A.device is sufficient.
2. For 8bit, an inplace operation failed because it was on a view. Now
   the operation is no longer inplace.
3. The loss term of the model output is not necessarily a torch tensor.
   In the test, it was a dict and did not contain an actual loss.
   Therefore, I added a check to make sure the loss is a torch tensor.
2023-11-17 15:18:34 +01:00
9cdaed2769 CI Add Python 3.11 to test matrix (#1143)
Only required change was to call .value on some enums when used in
messages, as their repr has changed in Python 3.11.
2023-11-17 14:11:54 +01:00
18a0910113 [Tests] Do not stop tests if a job failed (#1141)
* Update nightly.yml

* Update nightly.yml
2023-11-16 18:11:19 +01:00
99e1a55f54 [core / LoRA] Add adapter_names in bnb layers (#1139)
* Update bnb.py

* fix style
2023-11-16 17:12:39 +01:00
21df968fd1 [Tests] Fix daily CI (#1136)
* fix daily CI

* adapt from suggestion
2023-11-16 14:43:36 +01:00
5a3a5acff2 Refactor base layer pattern (#1106)
Description

Refactor all tuners (where it applies, i.e. not prompt tuning) to use
the "base layer pattern". This means that the adapter layer will always
hold a reference to the original layer that it modifies. This pattern is
already partly used (e.g. LoRA bnb, gptq layers), now it is consistently
used everywhere when applicable.

This PR is a companion PR to #1069, where I first added these changes.
They are now extracted to a separate PR to make code review easier and
to advance more quickly.

Implementation

The main change is that the adapter layer wraps the original layer and
calls forward on that layer, instead of doing stuff like this:

F.linear(input, transpose(self.weight, self.fan_in_fan_out), bias=self.bias)

which completely circumvents the call to the target layer's forward
method. With the base layer pattern, we now call the target layer's
forward method. Therefore, if the target layer is another adapter
layer (which will be crucial for mixed adapters), we call its forward
method correctly. Also, this should allow passing extra arguments, like
lora_scale to forward.

This change has the nice side benefit that we no longer need to use
_init_empty_weights -- in fact, we don't initialize any of the target
layer's weights anymore, since we have a reference to it. There is thus
no risk of having slow but superfluous initialization of layers.

Moreover, I could greatly simplify merge_and_unload by just using the
base_layer instead of having to create a completely new layer. For
OPT-350m, this results in a 15x speedup.

Note that same as for the bnb layers, this should be backwards
incompatible, since the adapter weights and their state_dicts are not
affected by this change. I used #1115 for regression testing.

Somewhat unrelated changes

During debugging, I got very annoyed with the fact that the reprs of
adapter layers and normal PyTorch layers are hard to distinguish, e.g.
the type is just "Linear". Now, for adapter layers, it is prefixed by
the adapter type, e.g. "lora.Linear". This should have no further
implications except for the repr (e.g. state_dict remains unaffected).

For LoHa and LoKr, I had to change the init of weights when using
init_weights=False. This is because of what is discussed in Numerical
instabilities with LoHa #1058.

IA³ now has the unload method too.

LoHa and LoKr now support safe_merge=True when merging layers.

Migration guide

For 99% of users, the code should continue working as ususal, because
the API stays the same. Only low level details have been changed.

Code that relies on isinstance checks on specific PEFT classes may
break. E.g. the LoRA Linear layer no longer inherits from nn.Linear. It
is, however, still a BaseTunerLayer. The same logic applies for other
layer types like Conv2d and for other tuners like IA³.

To retrieve the base layer of an adapter layer, you should now call
module.get_base_layer() if you deal with a BaseTunerLayer. Don't rely on
something like module.weight being present (though it might be).
2023-11-16 12:45:12 +01:00
70302d7b4f FEAT: Merging only specified adapter_names when calling merge (#1132)
* working v1

* add tests

* remove

* add it also for lokr and loha, left a todo

* Update tests/testing_common.py

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

* better test

* up

* fix tests

* credits contrib and suggestions from disscussions

* credits contrib and suggestions from disscussions

* address last comments

---------

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2023-11-16 12:05:22 +01:00
3ff90626b6 FEAT: Make safe serialization the default one (#1088)
* make safe serialization the default one

* adapt tests

* fix final tests'

* adapt from suggestion
2023-11-15 11:21:23 +01:00
1877329093 TST Improve requires grad testing: (#1131)
Previously, the corresponding tests were testing only whether specific
parameters had requires_grad True or False. Now, all parameters are
being checked. This is more rigorous.

Also, tests for Embedding, Conv1D, Conv2d were added, thus superseding
PR #1115.

Finally, tests for LoHa and LoKr were added.

Note

I considered moving the tests to a separate module, as they were getting
quite big and this would help with readability. For now, I left them in
the same module because it leads to a better diff view and is thus
easier to review. LMK if I should move the tests to a separate file.
2023-11-14 17:44:49 +05:30
98429b8184 Fix: TorchTracemalloc ruins Windows performance (#1126)
* feat: added tracemalloc arg to train_dreambooth

* fix: added help for arg

* fix: changed arg name

* fix formatting

* fix: import order
2023-11-14 17:04:32 +05:30
d350a00ece Prompt tuning: fix AutoTokenizer.from_pretrained (#1053)
Fixes #1032

Description

Currently, when using prompt tuning with TEXT, we call
AutoTokenizer.from_pretrained with only the model id. However, it may be
necessary to pass additional arguments, e.g. trust_remote_code=True.
This fix allows to pass more arguments by setting the argument
tokenizer_kwargs in the PromptTuningConfig.

I also added a check that when tokenizer_kwargs is set, the TEXT option
is actually being used.

Moreover, I noticed that we have no tests for prompt tuning with TEXT,
so I added those tests for decoder models.

Additional changes

There was a bug in PromptEmbedding where the device of the
init_token_ids was not set, which resulted in errors when using CUDA.

Finally, I removed an unused constant CONFIG_CLASSES from a test.
2023-11-14 16:58:55 +05:30
ad756173f1 FIX: Adding 2 adapters when target_modules is a str fails (#1111)
* Fix adding 2 adapters when target_modules is a str

Problem description

Adding two adapters (e.g. LoRA) when using a list for `target_mdules`
works but passing a str fails. The issue is that for str, we do a
`re.fullmatch`, whereas for list, we just check `endswith`. After adding
the first adapter, though, the naming pattern of the modules changes. In
the example above, the name for the linear layer changes from `"lin0"`
to `"base_model.model.lin0"`, which is why the `fullmatch` fails but the
`endswith` still works.

Reproduction

from peft import LoraConfig, get_peft_model
from torch import nn

class MLP(nn.Module):
    def __init__(self, bias=True):
        super().__init__()
        self.lin0 = nn.Linear(10, 20, bias=bias)

def test_target_modules_list():
    config = LoraConfig(target_modules=["lin0"])
    test_it(config)
    print("Adding two adapters with target_module being a list works")

def test_target_modules_str():
    config = LoraConfig(target_modules="lin0")
    test_it(config)

def test_it(config):
    model = MLP()
    model = get_peft_model(model, config, "adapter0")
    model.add_adapter("adapter1", config)
    print("Adding two adapters with target_module being a str works")

if __name__ == "__main__":
    # works
    test_target_modules_list()
    # ValueError: Target modules lin0 not found in the base model
    test_target_modules_str()

I think that most users would be surprised that:

1. Adding the first adapter works but adding the second fails, even
   though they use the same config.
2. Using `target_modules=["lin0"]` works but `target_modules="lin0"`
   fails for the 2nd adapter.

Solution

We could change the logic of not using `re.fullmatch` for str, but I
think that could be tricky to achieve without breaking BC. Instead, I
chose to change the inject_adapter call in add_adapter to pass the base
model, not the whole peft model. This way, the naming pattern is
preserved.

Tests

I haven't added extra tests for this. The script above could serve as a
test. However, it will be sufficient to remove the guard added in #1105:

    if isinstance(config.target_str, modules):
        # TODO this should be doable
        self.skipTest("Multiple adapters cannot currently be added when target_modules is a string.")

as that will test exactly this behavior and was how the bug was
originally uncovered. Depending on what PR lands first, the guard has to
removed in this PR or in #1105.

* Enable tests for adding 2 adapters with str
2023-11-14 15:00:52 +05:30
94877b5008 Release: v0.6.3.dev0 (#1128) 2023-11-14 14:59:55 +05:30
f020404ee6 Release: v0.6.2 (#1125) 2023-11-14 11:13:21 +05:30
ChG
79298c7c24 fix doc typo (#1121) 2023-11-13 10:48:50 +01:00
b25ce8a0cd Correctly deal with ModulesToSaveWrapper when using Low-level API (#1112)
* correctly deal with  `ModulesToSaveWrapper`

* style

* fix tests (#1117)
2023-11-13 12:22:30 +05:30
5d84484079 fix import issue transformers (#1116) 2023-11-10 18:37:38 +01:00
49ddefa834 Add num_dataloader_workers arg to dreambooth script (#1107)
This is especially important for Windows users, who may have to set the
number of workers to 0.
2023-11-10 14:21:14 +01:00
3af469eeea Refactor adapter deletion (#1105)
Description

The job of deleting an adapter is now transferred to the adapter layer,
instead of the adapter model. This makes it easier for users or other
libraries who don't use the adapter model to delete adapters.

Implementation

The code should now be more generic, relying less on hard-coded
attributes.

As a precaution, I also changed the type of adapter_layer_names from
list to tuple, as it should not be mutated.

When deleting the active adapter, the logic for choosing the new active
adapter has been changed slightly to ensure consistency across layers.
In practice, this should rarely make a difference. An error is now
raised if the last remaining adapter is deleted.

Test coverage has been increased:

- Deleting adapters is now also tested for custom models.
- It is also tested for LoHa, LoKr, not only LoRA.
- I added a test for deleting the non-active adapter.

Not implemented

I did not add adapter deletion to IA³, since it is included in #980. LMK
if it should be added here instead.
2023-11-10 13:33:56 +01:00
5e7e5ad836 Avoid over-eager auto-gptq import (#1109) 2023-11-10 12:35:18 +01:00
9d8287f3e3 set dev version (#1104) 2023-11-09 15:44:28 +01:00
2efd02769b Release: 0.6.1 (#1103) 2023-11-09 15:16:33 +01:00
669dd4edeb Change to 0.6.1.dev0 (#1102)
* change to 0.6.1.dev0

* oops
2023-11-09 15:03:15 +01:00
b5641cc744 [core] Fix safetensors serialization for shared tensors (#1101)
* fix st serialization

* add test

* add CI test

* add comment
2023-11-09 14:50:35 +01:00
c5d94855cd FIX Failing nightly CI tests due to IA3 config (#1100)
Same idea as in PR as #1094, but for yet more ill-configured IA³
configs. The tests are now failing because we do stricter checks on
incorrect IA³ configs.
2023-11-09 13:50:44 +01:00
face67dfeb Fix IA3 config for Falcon models (#1007)
* fixed feedforward for falcon

* fixed target_modules for falcon
2023-11-09 12:41:57 +05:30
d9094cebea FIX: broken f-string in import_utils (#1091) 2023-11-08 12:12:24 +01:00
493ae58beb fix the failing CI tests (#1094) 2023-11-08 14:47:55 +05:30
ed4ce9fc94 fix-gptq-training (#1086)
* fix-gptq-training

* style

* review
2023-11-07 11:12:23 -05:00
4c48970cb0 Update the release checklist (#1075)
As discussed, we wanted to make small amendments to the release process,
so that we have a 0.N.0 commit on main. I also adjusted the wording here
and there.
2023-11-07 14:23:38 +01:00
46e03602ed [Docker] Update Dockerfile to force-use transformers main (#1085)
* Update Dockerfile

* Update Dockerfile

* Update Dockerfile
2023-11-07 12:20:15 +01:00
45343a4ccc Improve documentation for IA³ (#984)
- Improve ia3 documentation
- Raise value error for incorrect feedforward_module list
- Added tests

---------

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2023-11-07 11:44:27 +01:00
276c91b143 FIX: fix adaptation prompt CI and compatibility with latest transformers (4.35.0) (#1084)
* fix adaptation prompt CI

* undo some other changes
2023-11-06 14:04:19 +01:00
cfe35a7878 FIX: Skip adaption prompt tests with new transformers versions (#1077)
Adaption prompt is failing with transformers v4.35.0. This PR skips the
adaption prompt tests so that CI is green again. The PR also adds an
error when users try to use adaption prompt with that version,
instructing them to use an older transformers version instead.

This should be removed as soon as the issue is fixed in
PEFT/transformers.
2023-11-03 15:52:51 +01:00
d47d23aa0e After release: Bump version to 0.7.0.dev0 (#1074) 2023-11-03 11:25:04 +01:00
02f0a4ca59 Release version 0.6.0 (#1072) 2023-11-02 15:07:03 +01:00
23cfbf22eb Fix slow tests not running (#1071)
* Update nightly.yml

* Update nightly.yml
2023-11-02 10:23:17 +01:00
9da72d25ed Fix Slack bot not displaying error messages (#1068)
* Update log_reports.py

* Update log_reports.py

* Update log_reports.py

* change logic

* fix
2023-11-01 12:41:23 +01:00
0ad95fa361 TST test coverage for layer matching (#1031)
Add tests for module name matching using regex and other custom arguments.
2023-11-01 11:39:40 +01:00
6960076699 [tests] Update Dockerfile to use cuda 12.2 (#1050)
* [`tests`] Update Dockerfile to use cuda 12.2

* Update nightly.yml
2023-11-01 10:48:12 +01:00
bdeb06b16c [core] Fix use_reentrant issues (#1036)
* fix use_reentrant issues

* fix

* fixup

* address comments.

* add warnings

* oops

* fix

* quality
2023-10-31 16:51:41 +01:00
884b1ac3a8 Add implementation of LyCORIS LoKr for SD&SDXL models (#978)
KronA-like adapter
2023-10-30 15:36:41 +01:00
207229ad5e FIX Conv1D merge error for IA3 (#1014) 2023-10-26 15:51:49 +02:00
2464c572eb FIX setting active adapter correctly (#1051)
Currently, when calling set_adapter, the active adapter is not updated.
Tests have been added to trigger the bug and the method updated to fix
it.

Moreover, I created an active_adapters property on the PeftModel class
so that it behaves consistently with the underlying models like
LoraModel.
2023-10-25 14:53:45 +02:00
8b21a4e5ab DOC fix wrong import in p-tuning docs (#1049) 2023-10-25 11:16:08 +02:00
894e68a408 FIX: wrong construction of LoHa weights (#1021)
Also: Update convert_sd_adapter_to_peft.py to account for a bug in
Lycoris-LoRA. See https://github.com/KohakuBlueleaf/LyCORIS/pull/115
2023-10-24 15:26:42 +02:00
7594903444 DOC: Fix StackLLaMa link, typos in README (#1047) 2023-10-24 12:10:21 +02:00
1d0535e255 Fix target_modules type in config.from_pretrained (#1046)
Fixes #1045, supersedes #1041

Description

When loading a config from a file, we currently set the loaded
attributes on the config directly. However, this sidesteps the
__post_init__ call, which is required to convert the target_modules to a
set. This PR fixes this by avoiding to set attributes on the config
class directly, instead of going through __init__.

Other changes

While working on this, I did a slight refactor of the config tests.

1. All config classes are included now (some where missing before).
2. Use parameterized instead of looping through the classes.
3. Added a unit test for the aforementioned bug.
2023-10-24 12:06:01 +02:00
56556faa17 [LoRA] Raise error when adapter name not found in set_scale (#1034)
* fix scale nit

* style

* nit
2023-10-18 19:36:03 +02:00
15a013af5f [LoRA] Revert original behavior for scale / unscale (#1029)
* revert original behavior for scale / unscale

* harmonize arg name

* credits contrib

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2023-10-17 00:27:02 +02:00
45565f4357 fix lora scaling and unscaling (#1027) 2023-10-16 10:10:30 -07:00
aaa7e9f44a FEAT: Add fp16 + cpu merge support (#1017)
* add fp16 + cpu merge support

* fix tests

* add fp16 tests for custom models

* fix tests

* adapt from comments

* more clarifications
2023-10-13 12:23:16 +02:00
07f2b82dae Fix stale.py to use timezone-aware datetime (#1016)
Fix an error with our stale.py script:

> can't subtract offset-naive and offset-aware datetimes

https://github.com/huggingface/peft/actions/runs/6497439325/job/17646562512
2023-10-12 18:42:06 +02:00
eced2edff8 FIX Don't assume model_config contains model_type (#1012) 2023-10-11 10:34:28 +02:00
e98df91906 ENH: Refactor LoRA bnb layers for faster initialization (#994)
Partly addresses #896

Description

After speeding up normal LoRA layer initialization, this PR improves
initialization speed of bnb LoRA layers.

The method to achieve this is different from the one used before, namely
this time the base layer is stored as a reference on the LoRA layer.
This allows us to avoid calling __init__ on the bnb layer, which is what
is slow.

Notes

We cannot use the same method as for the normal LoRA layers, (i.e.
calling the super class's __init__ with meta device) because the bnb
layers have extra logic that still creates unnecessary weights.

However, the way used here could also be a solution to the normal
layers, so if we want to have consistency, the normal layers could be
refactored to use the same approach.

Interestingly, even though we now save the base layer as a reference,
which results in a different state_dict, the existing models can still
be loaded successfully. This is because the adapter state_dict is not
affected by the change, so users can still load their existing adapters.

The only problem would occur if users dump the whole model, i.e. base
model and adapter, using torch.save and then trying to load with
torch.load. For those users, we could theoretically provide a script to
convert the state_dict (i.e. renaming some keys).

To ensure that the old adapters can still be loaded successfully, I'm
working at the same time on adding regression tests. I'll create a
separate PR for those to avoid blowing up this one.

Tests

I ran a test on bloomz-1b1 for how long it takes to create the
PeftModel, the results are:

8bit: 1108.34 ms > 26.82 ms
4bit: 1101.96 ms > 23.69 ms
2023-10-10 16:47:35 +02:00
0c16918c34 [core / LoRA] Add safe_merge to bnb layers (#1009)
* add `safe_merge` to bnb layers

* adapt from suggestions
2023-10-10 11:30:21 +02:00
c2c544dc9f FEAT: Add safe_merge option in merge (#1001)
* add `safe_merge` option in `merge`

* oops

* Apply suggestions from code review

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* address final comments

* Update src/peft/tuners/lora/layer.py

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* Update src/peft/tuners/lora/layer.py

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* add it for ia3

* add it for adalora

* up

* revert for loha

* style

* fix CI

* adapt from suggestions

* add tests

* up

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2023-10-09 18:28:00 +02:00
d7f520a320 Fix word_embeddings match for deepspeed wrapped model (#1000)
* vocab size prompt vocab fix

* add comments

* Update src/peft/peft_model.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-10-09 14:25:07 +02:00
d17266d599 ENH Support Conv2d layers for IA³ (#972)
Adds support for Conv2D layers to the IA³ tuner. Tests are added to
check that they work.

Notes:

Unfortunately, when unmerging the Conv2d IA³ layers, there is quite a
bit of rounding error. I had to increase the tolerances for this
specific test case to make the tests pass. I'm not 100% sure why this
is, but I could imagine that for Conv2d, small errors accumulate because
of the convolution operation.

I also added tests for IA³ Linear layers for the custom models, which
also pass. However, there is an error when using Conv1D. The reason is
that merging fails because there is a shape mismatch when
fan_in_fan_out=True (which is set automatically for Conv1D). This is
left for a future PR.
2023-10-09 12:20:19 +02:00
dfd99f61f8 TST: Comment out flaky LoHA test (#1002)
This test is flaky when running on Windows. It is probably related to
PyTorch 2.1, as this test used to work. Further investigation is needed.
2023-10-09 10:33:54 +02:00
dbd40d96a1 Fix lora creation (#993)
* reducing the time for inject lora modules

* fix bugs

* fix bug

* fixes

* Revert "fixes"

This reverts commit c7f30627c1798db11be8a5da8f3c801f9469a5e3.

* refactor

* fix failing tests

* fix tests

* fix tests

* fix tests

* fix tests

* Apply suggestions from code review

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* address comments

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2023-10-05 13:27:49 +05:30
99f792e8a3 MNT Make .merged a property (#979)
After adding support for multiple active adapters, we have some double
bookkeeping when it comes to tracking merged adapters. On the one hand,
we have merged_adapters, which lists all merged adapters, and on the
other hand, we have the merged attribute, which indicates if at least
one adapter is merged.

Having two sources of truth is bad, because it's more work to keep them
in sync and there is a risk of them getting out of sync. Therefore, this
PR removes the .merged attribute. In order to keep the same interface,
we add a merged property, which returns True if there are any
merged_adapters.
2023-10-04 11:39:36 +02:00
a7fb9fb090 Add base model metadata to model card (#975)
Resolves #938

This PR adds the base model metadata, if present, to the model card.

On top of this, the code for creating the model card has been refactored
to use the huggingface_hub classes instead of doing ad hoc parsing and
writing.
---------

Co-authored-by: Lucain <lucainp@gmail.com>
2023-10-04 09:44:10 +02:00
a977ce69a5 Fix typo in custom_models.mdx (#964)
* Fix typo in custom_models.mdx

* Fix typo in low_level_api.mdx
2023-10-03 18:05:06 +05:30
3d0edccc4a Correct minor errors in example notebooks for causal language modelling (#926)
* updated Readme

* Corrected label bos token error; switched to eos token from pad token

* reverted readme change
2023-10-03 18:01:10 +05:30
763511dc28 add the lora target modules for stablelm models (#982) 2023-10-03 17:59:07 +05:30
1367bc6f0d FIX: issues with (un)merging multiple LoRA and IA³ adapters (#976)
* Fix issues with merging multiple adapters

This should resolve the failing slow test
test_4bit_merge_and_disable_lora.

While investigating, I also noticed that merging multiple adapters was
not correct for IA³. I added a test that should catch this bug and
provided a fix for it too. However, the test does not check IA³ at the
moment because the test parameters do not contain IA³. For this, #972
needs to be merged too, which adds IA³ to the test parameters.

* Small adjustments to tests

Previously, tests had some exploding gradients, making them unstable.
2023-10-03 16:53:33 +05:30
88dfc5d2a8 update Bibtex (#989) 2023-10-03 14:01:09 +05:30
7a5f17f39e FEAT Add LyCORIS LoHa for SD&SDXL models (#956)
https://arxiv.org/abs/2108.06098
2023-10-02 10:44:51 +02:00
52ff0cde9f Fix missing tokenizer attribute in test (#977)
Fix the error in test_causal_lm_training_mutli_gpu_4bit.
2023-09-29 15:34:43 +02:00
cacee957e6 [tests] add multiple active adapters tests (#961)
* add tests for multiple active adapters

* add multiple active adapter tests

* fix tests

* fix the device error

* fix typo

* fix the variables

* fix the `adalora` config

* add util function for proper naming of tests

* fix bugs

1. fix `add_weighted_adapter` when working with adapters targeting different layers
2. fix `ia3` model and layer to handle adapters targeting different layers
3. fix the multiple active adapter tests

* fix `ia3` issue

* remove debug statements

* fix test

* fix bug

* address comments

* address comments

Co-Authored-By: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* fix tests

* remove unused code

* Update test_custom_models.py

* increasing tolerance for a test

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2023-09-29 09:44:30 +05:30
bedcaa4f82 TST: Fix broken save_pretrained tests (#969)
Resolves #968

As the linked issue mentions, the test_save_pretrained_selected_adapters
test was broken because the 2nd adapter would load the weight of the
default adapter, instead of its own weights. This was a pretty quick
fix.

However, this made me wonder why the test was passing beforehand when it
is loading the wrong weights, so I dug deeper.

The first issue I encountered was that for IA³, we did not set
init_ia3_weights=False. For this reason, all weights were set to 1.0. Of
course, in that case, it doesn't matter what weights are loaded, they're
all the same. Therefore, I changed init_ia3_weights=False.

However, this still doesn't explain why this worked for LoRA, which,
even without init_lora_weights=True, should have some completely random
weights which should not be the same.

The reason for that is that we used get_peft_model_state_dict to get the
state_dict we used for comparison. This function only returns the
weights of one of the adapters (in this case default), so the weights
for the new adapter were never compared at all. Thus I changed this so
that all weights are now compared.

However, this now caused the tests for prompt encoders to fail. For some
reason, the state_dict from prompt encoding models is not the same after
loading. At this point, I stopped investigating and just wrote an
exception for prompt encoding to use get_peft_model_state_dict instead
of comparing the whole state_dict. Any insights would be appreciated.

Finally, for completeness, I added some checks for the existence of the
files of the new adapter.
2023-09-28 16:33:16 +02:00
f66c3859b0 add the lora target modules for Mistral Models (#974) 2023-09-28 14:55:13 +05:30
69665f24e9 Update integrations_tests.yml (#966)
* Update integrations_tests.yml

* Update integrations_tests.yml
2023-09-26 14:45:01 +02:00
08b6665167 ENH Add 4-bit support for IA3 (#864)
Notes:

- Add guard to IA³ Linear8bitLt definition (should have already been there).
- Merging not supported (yet).
2023-09-26 14:11:32 +02:00
d54a23d30e Fix integrations_tests.yml (#965) 2023-09-26 14:01:22 +02:00
9856f79cf9 [tests] add transformers & diffusers integration tests (#962)
* add transformers integration tests

* add diffusers

* test also on transformers release

* adapt from suggestions

* suggestions
2023-09-26 13:00:57 +02:00
634bd197f2 FIX: setting requires_grad on adapter layers (#905)
* [WIP] Fix setting requires_grad on adapter layers

This is an alternative to #900, resolves #899.

Description

Currently, we don't handle setting requires_grad on adapter layers
really well. The main issue is that it can be set to True on adapter
parameters that are not being used, e.g. the original_module in
ModulesToSaveWrapper or inactive adapters in LoRA.

Normally, this is not a big issue, except maybe if we want to correctly
count the number of trainable parameters. However, when training with
DistributedDataParallel, this results in errors, as PyTorch thinks that
all parameters with requires_grad=True should participate in the loss
computation, but those mentioned parameters don't. For that reason,
training with DDP currently fails when using modules_to_save or multiple
adapters.

Implementation

This turned out to be more complicated than I initially thought. The
logic for setting requires_grad is all over the place, it was hard to
encapsulate the logic and I only succeeded partially. As is, this PR is
more complex than the one it tries to supersede, #900, but it is also
"more correct".

Tests were added to check whether requires_grad is set correctly. There
are (so far) no tests for whether DDP indeed works, they could be added
with multi-GPU. I did, however, test an early stage of this PR with DDP
and setting requires_grad correctly will indeed fix the DDP error.

DONE/TODO

- [x] ModulesToSaveWrapper
- [x] LoRA
- [ ] IA³
- [ ] AdaLora

Since some tuners are not implemented yet, tests are expected to fail.
Check the new tests at the bottom of test_custom.py, those should pass.

* Refactor: move more requires_grad machinery to ABC

* [skip ci] [WIP] Add requires_grad logic to IA³

* Add AdaLora

* Fix some minor issues

* Make style
2023-09-26 13:01:05 +05:30
1af8ca484b feat: add type hints (#858)
* feat: add type hints

* build: trigger ci
2023-09-25 16:42:51 +02:00
1c0654b9a5 support multiple ranks and alphas for LoRA (#873)
* support multiple ranks and alphas

* Update lora.py

* Update lora.py

* commit suggestions

Co-Authored-By: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* address comments

Co-Authored-By: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* Fixed multirank + multialpha for sequential LoRAs, added correct support of LoRA-C3Lier conversion (#937)

* Fixed multirank multialpha for sequential loras, added tests, fixed docs

* Refactored kohya_ss conversion script for proper support of LoRA-C3Lier

* Fixed styling

* Removed old comment from docstring

* shift `scale_layer`/`unscale_layer` to `LoraLayer` class to support all the child classes

* support multiple active adapters

* add `active_adapters` property

Co-Authored-By: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* fix bug related to active adapter of `ModulesToSaveWrapper`

* revert the change wrt active_adapter assignment

Co-Authored-By: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* addressing comments

* address comments

* address comment

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Alexander Kovalchuk <kovalexal@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-09-23 01:33:44 +05:30
1dc4a6761e Install correct PyTorch nightly in GH action (#954)
For the GH action about running torch.compile, when using the nightly
options, install from the correct index (used to be test, now is
nightly).
2023-09-21 16:15:11 +02:00
f3d4fef6e6 Allow compile GH action to run on torch nightly (#952)
If the action is triggered with nightly=true, the nightly PyTorch
version will be installed.

Also, added a line that *may* enable the action to run on forks, not
sure.
2023-09-21 09:57:46 +02:00
39264a0141 Fix some tests that would fail with torch.compile (#949)
Some tests would currently fail with torch.compile, not because there is
anything wrong with how PEFT works with compiled models, but simply
because of the way the tests are written. This is because when models
are compiled, the keys of the state dict change. Tests have now been
adapted to unwrap the compiled model first before getting the state
dict.

Note that the mentioned issue does not affect saving and loading,
because save_pretrained is already called on the original module, so
there is no issue with mismatched keys.

Also fixed the docstring of get_peft_model_state_dict.
2023-09-21 09:46:28 +02:00
ba0477f298 ENH error message when choosing wrong bias (#946)
Raise an error with a helpful error message when the user chooses an incorrect
option for the bias argument.

---------

Co-authored-by: datta0 <venkatadattasanimmaturi@gmail.com>
2023-09-20 11:26:35 +02:00
139624750a FIX: torch compile gh action installs pytest (#944)
* FIX: Install pytest for torch compile GH action

* [skip ci] commit to skip CI
2023-09-18 17:17:01 +02:00
1bbde1bfe0 Add GH action to run unit tests with torch.compile (#943)
The GitHub action works by setting an environment variable
PEFT_DEBUG_WITH_TORCH_COMPILE=1. This causes the tests to run with
torch.compile if get_peft_model is being used. The action is triggered
manually and requires to indicate the branch to run it on.

With this action, we should be able to incrementally add support for
torch.compile in PEFT without disrupting the existing tests. Once we
fully support torch.compile, we can think about adding to the tests by
default and to remove the code from this PR.
2023-09-18 16:58:01 +02:00
6b4554e643 add scale_layer / unscale_layer (#935) 2023-09-15 13:47:09 +02:00
c8c936eddf pin diffusers (#936) 2023-09-15 13:46:43 +02:00
93d0c03d5b Fixed LoRA conversion for kohya_ss (#916) 2023-09-14 11:00:24 +02:00
5bdbf2bcd6 fix lora layer init (#928) 2023-09-14 03:41:21 -04:00
4c611f40b4 MNT Add accelerate min dependency (#892)
Because of is_npu_available import.
2023-09-12 11:25:33 +02:00
8bdd4848f4 Make base_model.peft_config single source of truth (#921)
Resolves #802, #923

For the problem description, please check the first issue.

I went with solution 2, i.e. making the base_model.peft_config the
"single source of truth" for the PEFT configuration. That way, we
minimize the risk of diverging configurations.

This does not apply for prompt learning, where we don't have a
peft_config on the base model (which is just the normal model, not a
PEFT class).

I added a setter for peft_config but from my testing, it isn't being
used. It's only there for completeness.
2023-09-12 11:12:40 +02:00
b786b884f6 ENH speed up init emb conv2d (#915)
Partly resolves #872

Description

After getting faster initialization of the LoRA Linear layer,
initialization of Conv2D and Embedding is now sped up.

Implementation

The approach of how to achieve the speed up has slightly changed
compared to last time. To refresh memory, in #887, we avoided the
unnecessary initialization of the full weight matrix by completely
skipping nn.Linear.__init__.

Although it is possible to do the same for Embedding and Conv2d, we run
into some trouble here. The issue is that the __init__ methods of these
classes have quite a lot more arguments and some custom logic (i.e. not
only self.foo = foo but more on top). If we wanted to skip __init__
entirely, we would have to basically copy all of that into our code.
Although that is possible, it is brittle (e.g. the logic could be
different for different PyTorch versions or change over time).

For that reason, I opted to implement this differently, using a
suggestion we had discussed earlier. The approach is to call __init__ of
the parent class but enforce empty weights (this is what
torch.nn.utils.skip_init does, although we cannot use that function
directly). This way, we can avoid having to copy the __init__ code while
still avoiding expensive initialization of the weights.

I did not change the code for Linear to also use this approach because
the logic inside of Linear.__init__ is quite simple (at least for now),
so we are good here with the existing approach.

However, I was curious how changing the approach for Linear would affect
the initialization speed. Therefore, I ran the script from #872 again, 3
times each.

Current approach:

test 1 with model bert-base took 0.021 sec.
test 1 with model bert-base took 0.020 sec.
test 1 with model bert-base took 0.020 sec.
test 2 with model bloomz-1b7 took 0.030 sec.
test 2 with model bloomz-1b7 took 0.030 sec.
test 2 with model bloomz-1b7 took 0.030 sec.

New approach if applied to Linear:

test 1 with model bert-base took 0.038 sec.
test 1 with model bert-base took 0.039 sec.
test 1 with model bert-base took 0.038 sec.
test 2 with model bloomz-1b7 took 0.072 sec.
test 2 with model bloomz-1b7 took 0.048 sec.
test 2 with model bloomz-1b7 took 0.048 sec.

This shows that the new approach is indeed a bit slower than the
existing one, though still a lot faster than what we had before. IMHO, I
think we're safe to leave the code inside of Linear as is and benefit
from the slightly better performance at the cost of slightly more
fragile code.
2023-09-12 11:05:29 +02:00
0fa63fb4a2 DOC: Section on common issues encountered with PEFT (#909)
So far, this section contains two types of issues, not using latest package
versions and issues with loading PEFT models. This is based on what I
feel are issues that are commonly brought up.

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2023-09-08 11:25:33 +02:00
f5aae1b47d ENH Merge lora module to 8bit model (#875)
Allows merging 8bit weights from bnb.

4bit weight merging was already implemented through the dequantization method
provided by bnb but there is no official dequantization method for 8bit weights.
This PR works by multiplying the weights to an identity matrix using bnb's
quantization aware matmul operation. Empirically, this results in a very small
rounding error.
2023-09-07 12:14:37 +02:00
6d140bad39 support prefix tuning for starcoder models (#913)
* support prefix tuning for starcoder models

* remoce the test filter for prefix tuning tests for StarCoder models
2023-09-07 15:06:42 +05:30
1f55957402 DOC remove double setup section (#911) 2023-09-07 10:41:14 +02:00
08368a1fba ENH Remove redundant initialization layer calls (#887)
This should lead to a big speedup when initializing LoRA layers.

---------

Co-authored-by: poedator <ruslansv@gmail.com>
2023-09-06 17:31:55 +02:00
20d9c175e2 FIX linting issue in example (#908) 2023-09-06 11:59:46 +02:00
d4dbf684e0 FIX gradient_accumulation_steps in examples (#898)
* fix gradient_accumulation_steps in examples
* update accelerator init
2023-09-06 11:14:57 +02:00
0c9354bda9 DOC Fix for semantic_segmentation_lora (#891)
An argument was renamed.

---------

Co-authored-by: Raghavan <oneraghavan@gmail.com>
2023-08-31 14:07:19 +02:00
f113af0b9e FIX: error using deepspeed zero2 + load_in_8bit + lora (#874)
Fix an issue in (Ada)LoRA forward of bnb layers when using bf16 + lora +
load_in_8bit.
2023-08-31 12:48:39 +02:00
43381008d6 Update build_docker_images.yml (#889) 2023-08-31 12:40:06 +02:00
7d99466446 DOC: PeftModel save_pretrained docstring (#881) (#888) 2023-08-30 17:16:22 +02:00
ecaaae8719 MNT Run tests that were skipped previously (#884)
Some tests were skipped because of an issue with how LoRA weights were
initialized for embeddings. This issue has been fixed for some time now,
so the tests no longer need to be skipped.
2023-08-30 14:40:56 +02:00
0b2f950cc2 FIX: Error in forward of 4bit linear lora layer (#878)
This was introduced during the refactoring of the forward function. It
should now be fixed and be equivalent to the forward function before the
refactoring:

4df9c5a243/src/peft/tuners/lora.py (L1207)

Bug reported by @jiqing-feng
2023-08-30 10:52:43 +02:00
85013987aa MNT: Move tuners to subpackages (#807)
For each tuner, created a sub-module that contains at least:

- config.py for config stuff
- model.py for the actual model/encoder/embedding
- __init__.py so that imports are preserved

Then, when there was a need, further files were created, like layer.py
or utils.py.

Imports were changed to absolute imports everywhere, except for the
sub-packages within a tuner directory, as these packages will always 
stay together in the same place.

For some existing modules, the license comment of the top of the file
was missing, I always added it.

There was a bug in the forward method of 4bit linear lora layers introduced
in #851, for the case that the model is merged AND adapters are disabled.
For that scenario, we need to unmerge first before generating the output,
same as we do for the vanilla Linear layer. This step was missing from the
code previously and is now implemented correctly. Tests were adjusted to
catch that error.
2023-08-29 11:32:29 +02:00
a23b9213f4 FIX: seq2seq prompt tuning (#439) (#809) 2023-08-29 10:53:14 +02:00
140a69bb90 Support merge lora module for 4bit and 8bit linear (#851)
* support merge lora module for 4bit and 8bit linear

* add tests for merging lora module to 8bit and 4bit model

* state shoule reset grad

* add prepare output before and after merge lora

* fix format

* fix format 2

* fix format 3

* add warning

* fix parameter format

* remove 8bit merge

* remove 8bit linear merge

* add comment for 4bit merge
2023-08-28 19:59:03 +05:30
8c17d556a8 DOC Fix typos in ia3 config docstring (#844) 2023-08-25 11:19:40 +02:00
0e37b85609 🎉 Add Multitask Prompt Tuning (#400)
* mpt

* fix save

* fix save

* add jupyter notebook

* add jupyter notebook

* add jupyter notebook

* drop shuffling

* drop classify_dataset

* drop classify_dataset

* fix keys

* fix keys

* add comments

* use EXACT_SOURCE_TASK in the example

* formatting

* Fix dict index in embedding retrieval

* run style and quality

* run style and quality

* run style and quality

* style

* final fix

* style

* comment out failing tests

* fix generation tests

* fix style and save test

* all testcases

* fix import

* add license header

* reformat

* fix encoder-decoder models

* fix tests running multiple times

* fix paper name for IA3 and add MPT paper

* Trigger CI

* address the recommended changes

* reformat

* address suggestions

* address suggestions

* revert reformatting

* revert reformatting

---------

Co-authored-by: Alex-Brooks <Alex.Brooks@ibm.com>
2023-08-25 11:42:11 +05:30
6e783780ca MNT: Refactor tuner forward methods for simplicity (#833)
This is to be consistent with changes in #794

Description

The forward methods of several tuner layers were partly unnecessarily
nested, which makes them harder to read and which can conceal bugs (as
in #794). Therefore, these methods have been refactored to (hopefully)
be as readable as possible. Moreover, the different methods are now
coded as similar as possible between the different implementations.

On top of those changes, I made the following adjustments:

- added some type hints to the methods of the layers
- removed a comment about code being copied which I think is not
  necessary
- for the lora embedding layer, we sometimes used F.embedding(...) and
  sometimes nn.Embedding.forward(self, ...) -- now, we consistently use
  only F.embedding(...)
- for IA³ Linear8bitLT, apply dtype conversion regardless of whether
  self.is_feedforward is True or False (as discussed internally)
- for the definition of lora Linear4bit, we (implicitly) checked if bnb
  is available and if bnb 4bit is available, but it is enough to check the
  latter, as it calls the former internally -- now we only check the
  latter, saving us 1 level of indentation
- a few times, ModuleDict.update(<dict>) is called when <dict> has only
  1 single item -- I simplified this code to just assign that item, which
  is more readable and consistent with other nearby code
- removed an unnecessary clone call (was copy/pasted)
2023-08-24 11:04:02 +02:00
fd1c0f66eb Remove backlog section from README.md (#853)
As discussed, since it is not kept up to date.
2023-08-23 17:22:35 +02:00
a4ca8fa3b6 DOC Clarify the new model size in README (#839) 2023-08-23 13:30:08 +02:00
3d9ceb5162 DOC: Add a contribution guide (#848)
As discussed internally, we would like to add a contribution guide to
facilitate contributions from the community and clarify the
requirements.

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2023-08-22 18:35:40 +02:00
bbaafc2fef Release version 0.6.0.dev0 (#849) 2023-08-22 17:04:42 +05:30
573cb35036 DOC Fixed typos in custom_models.mdx (#847) 2023-08-22 11:46:23 +02:00
6c44096c7b Type annotation fix (#840) 2023-08-19 17:53:15 +02:00
4b371b489b [Low-level-API] Add docs about LLAPI (#836)
* add docs about LLAPI

* address comments
2023-08-18 16:05:07 +02:00
87c1d2410e [Tests] Add 4bit slow training tests (#834)
* add 4bit slow training tests

* oops

* Apply suggestions from code review

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2023-08-18 12:00:20 +02:00
2439203eff fix gptq dockerfile (#835) 2023-08-18 10:54:50 +02:00
312d294fdd Fix unbound error in ia3.py (#794)
Fix an error in IA³'s Linear8bitLt's forward method that resulted in an unbound
local error when using FP16.

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2023-08-17 13:35:17 +02:00
369a0fba85 TST: add test about loading custom models (#827)
Prompted by #808, I added a test that shows that loading a trained
custom model works as expected. I only added this to custom models
because it involves a few steps of training and I didn't want to slow
down tests too much. LMK if this should be added to all tests.

In addition, I renamed some custom model tests which had strange
names (probably caused by an overeager query-replace).
2023-08-16 10:57:38 +02:00
438b16b8c9 Merging LoRA adapters: concatenation feature, more SVD options (#817)
Added a new feature to concatenate the LoRA weights as a mixing method.
SVD now accepts more options, does not clamp by default anymore.
2023-08-16 10:51:27 +02:00
dbe7e644f1 Only fail quantized Lora unload when actually merging (#822)
Fix an error when unloading and _not_ merging parameters. This used to raise an
error when the weights were quantized, even though the error is not necessary
when there is no merging.
2023-08-16 10:45:58 +02:00
a916465ad0 GPTQ Integration (#771)
* add gptq lora

* fix peft gptq

* fix condition

* fix test

* remove unused weights

* check type

* style

* change attribute

* remove print

* add exllama

* make style

* refactor + fix tests

* remove print

* remove dep on transformers
2023-08-11 17:31:17 -04:00
412d7bc985 Helper function to update model signature (#784)
Provides helper functions in peft.helpers to update the signature of the
forward or generate method of a PeftModel (or subclass). This can be
useful because the wrapping class may override the docstring and type
annotations of the underlying base model. Applying the helper functions
will restore those, leading to better tab completion, help text, etc.

For the time being, these helper functions are purely optional to use.
At a later stage, we may consider applying them automatically, but that
would require testing to ensure that nothing breaks.
2023-08-10 12:14:40 +02:00
7d44026dea fix crash when using torch.nn.DataParallel for LORA inference (#805)
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
2023-08-08 15:07:23 +02:00
ba90047d70 Update docstring of PeftModel.from_pretrained (#799)
1. Addresses
https://github.com/huggingface/peft/issues/430#issuecomment-1666312815
2. Reword docstring to not be LoRA-specific
2023-08-08 14:38:23 +02:00
10cf3a4fa3 add lora default target module for codegen (#787)
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
2023-08-08 18:08:04 +05:30
aac7722b9e Add adapter error handling (#800)
When a user tries to add a 2nd adapter, Lora and AdaLora make some checks to
ensure the new adapter is compatible with existing adapters. Currently, that
check is performed halfway through the method. This means that if the check
fails, the new adapter is partially applied, leaving the model in a bad state.
The main purpose of this PR is to ensure that the model state is correct after
such a failure is encountered.

Tests were added to catch this potential bug.

While working on this, I also did some related, but not strictly necessary
changes to the add_adapter methods:

- Previously, the peft_config from the PeftModel was passed to the base
  model. This meant that sometimes, the base model would hold a reference
  to PeftModel.peft_config, but not always, as some base models would
  create new dicts. This is problematic, because some code would rely on
  the objects being the same. Now, they are never the same, leading to
  more consistency.
- I think that the check if multiple adapters have biases (which is not
  supported) was accidentally removed by #749. It is added back in.
- Add some type annotations
- Extend docstrings to contain adapter_name
2023-08-08 14:35:19 +02:00
ed396a69ed [core] PEFT refactor + introducing inject_adapter_in_model public method (#749)
Refactors a bit the internals of some PEFT models and introduces a new
method inject_adapter_in_model for users that want to pass a bare model
and a peft config to inject adapters in-place into the model. These
changes are totally BC with the previous PEFT versions.

This PR makes things easier for the PEFT integration in transformers
huggingface/transformers#25077

The main goal of the PR is to expose a new API for advanced users that
want to integrate PEFT method without the need to use the PeftModel
wrapper. A simple use case could be someone that wants to inject adapters
into a model and wants to keep the original class of the model without
having to offload that to peft that will create a PeftModel. I have
faced this issue in huggingface/transformers#25077 Among other things,
this PR refactors some internals of PEFT library, while keeping it fully
backward compatible.

To tackle the main motivation I propose to differentiate things between
two type of adapters

1- adapters that are injectable (LoRA, AdaLoRA, IA3)
2- adapters that are not injectable (the rest)

As a first iteration this API would be supported only for the scenario
1- / therefore I decided to create 2 abstract classes to make things
easy to be able to determine if the adapter layer (e.g. LoraLayer) /
adapter module (e.g. LoraModel) does follow the minimal
requirement (i.e. needed attributes, etc.)

Other related changes:

1- Creates a new property method is_prompt_learning to avoid importing
   PromptLearningConfig all the way down
2- Introduces a new object TUNERS_MAPPING, which is a mapping of
   supported pluggable adapters
3- Creates two abstract classes
3.1- BaseTunerLayer: a mixin to check for minimal required attributes
     that a tuner layer should have active_adapter / _is_plugable
3.2- BaseTuner: a higher level module mixin that should be used for any
     injectable adapters in the future.

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-08-07 16:34:54 +02:00
ec267c644a Allow passing inputs_embeds instead of input_ids (#757)
Resolves #727

Right now, there is an issue with a few PeftModelForXxx classes when
users pass only inputs_embeds but not input_ids. First of all, the batch
size was being derived on input_ids, now it is derived from
inputs_embeds instead if input_ids is None. Furthermore, a few forward
calls to the base model were not passing the inputs_embeds along, which
resulted in errors down the line. These issues have been fixed now.
2023-08-02 16:59:11 +02:00
9b5808938f Support NPU adapter loading (#772) 2023-08-02 12:30:02 +02:00
b10a8cedf6 Support XPU adapter loading (#737) 2023-08-01 15:46:18 +02:00
bfb264ad96 Add progressbar unload/merge (#753)
* add progressbar unload/merge

* make style

* manual fix style

* Update src/peft/tuners/lora.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/peft/tuners/lora.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-08-01 12:26:17 +02:00
702f9377e3 Add tests for AdaLoRA, fix a few bugs (#734)
So far, there have been no tests for AdaLoRA. This PR adds tests similar
to the existing ones. While working on those tests, a few bugs were
encountered and fixed.

The changes made to AdaLoRA:

- Linked to paper abstract, not pdf.
- Don't assume that target modules have a .bias attribute (same as for
  LoRA).
- Fixed an issue where it was assumed that if an output object from
  forward has a .loss attribute, it is a scalar, when it can be None.
- Fixed an issue that when init_lora_weights=False, the weights were
  still initialized to be an identity transform.
- When replacing modules, if a target module is a ModuleList or
  ModuleDict, they are now skipped instead of raising an error that the
  module type is not supported. My reasoning was that it is never intended
  to change those modules, so if their names are matched, it must be a
  false positive. The issue arose because for some target modules, the
  names are just k" etc., and since we match with endswith, this can
  easily lead to modules like "block" to match.
2023-07-28 13:06:53 +02:00
0e33ac1efe DOC: Examples for LoRA with custom models (#724)
Example 1: training a multilayer perceptron
Example 2: fine-tuning a timm image classifier
New section "Developer Guides" in docs.

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2023-07-27 15:28:33 +02:00
e27e883443 [ModulesToSave] add correct hook management for modules to save (#755)
* add correct hook management for modules to save

* forward contrib credits from finding the solution

* add nice GPU tests

* quality

---------

Co-authored-by: BenjaminBossan <BenjaminBossan@users.noreply.github.com>
2023-07-27 10:29:32 +02:00
ffbb6bcf9c Add btlm to officially supported LoRA (#751) 2023-07-26 22:18:37 +05:30
8541b60acb fix adalora inference issue (#745) 2023-07-26 14:29:25 +02:00
96c0277a1b Updated Example in Class:LoraModel (#672)
* updated Example in Class:LoraModel

* update docstring

* Update src/peft/tuners/adalora.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/peft/tuners/lora.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* update adalora.py for doc style check

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-07-24 16:48:59 +02:00
b15c185939 FIX: Disabling adapter works with modules_to_save (#736)
Resolves #493

For LoRA and IA³, there was a bug that even even using the
disable_adapter context, if the module was listed in modules_to_save,
the updated weights would be used instead of the original weights. This
meant that disable_adapter would not return the same results as the base
model without adaptation. This PR fixes the issue and provides a test.

Note: I tried to adjust AdaLoRA too, since it seemed that the same
reasoning should apply there. However, I think that AdaLoRA does not
really support disabling adapters at all. E.g. there is no
disable_adapter_layers method. Therefore, AdaLoRA was not changed.
2023-07-24 13:23:23 +02:00
a955ef1088 ENH: Warn when disabling adapters and bias != 'none' (#741)
For LoRA, given that bias='all' or bias='none', when doing inference
with a model in the disable_adapter context, the output will not be
identical to the output of the base model. This may be surprising to
users. Therefore, a warning is given. Furthermore, the docstring has
been extended to reflect this fact.
2023-07-24 10:34:39 +02:00
e06d94ddeb Fixes warning when initializing prompt encoder (#716)
Right now, when the user initializes a prompt encoder with MLP, they get
a warning that a certain argument is ignored, and there is no possible
value for the argument that would stop the warning. Usually, warnings
are for issues that something is (probably) going wrong, but here,
everything is going as expected. Therefore, by default, I would not give
this warning, thus avoiding users getting confused.

However, I would still give the warning if the user set the argument for
encoder_num_layers explicitly to a different value. In that case, they
expect the change to make a difference, but since the argument is
ignored, their expectation is not met, which warrants a warning.
2023-07-19 16:08:29 +02:00
1681cebf60 [Patch] patch trainable params for 4bit layers (#733)
* patch trainable params for 4bit layers

* revert

* added tests.

* added comments.

* addressed final comments
2023-07-19 14:57:14 +02:00
a09f66c8cd [Llama2] Add disabling TP behavior (#728)
* add disabling TP behavior

* add comments

* adapt from new changes of transformers PR
2023-07-19 14:29:36 +02:00
1869fe6e05 FIX: add type information to package_data (#729)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-07-19 12:35:39 +02:00
1c27e24d50 revert change (#731) 2023-07-19 14:29:55 +05:30
30fd5a4c88 fix the param count when using 4-bit bnb 2023-07-19 13:22:25 +05:30
3040782e04 Add falcon to officially supported LoRA & IA3 modules (#722)
* add falcon to officially supported modules

* add lora

* add also `RefinedWeb`
2023-07-19 11:18:45 +05:30
1b8b17de86 Fix subfolder issue (#721)
* fix subfolder issue

* added tests
2023-07-19 11:17:15 +05:30
029f416fce Release version 0.5.0.dev0 (#717) 2023-07-17 16:30:46 +05:30
a1953baef6 FIX: Removes warnings about unknown pytest marker (#715)
This is a low prio PR but it solves an annoyance.

Right now, when running tests, the output is spammed by messages like:

> PytestUnknownMarkWarning: Unknown pytest.mark.multi_gpu_tests - is
this a typo? ...

This makes it more difficult to see the actually relevant information.
This PR fixes this by registering the two pytest markers we use, thus
removing the warnings.
2023-07-17 15:30:08 +05:30
e90dcc4be4 better hub kwargs management (#712) 2023-07-17 15:28:57 +05:30
71b326db68 FEAT: Make LoRA work with custom models (#676)
Enable custom models to work with LoRA

This PR enables custom models to work with LoRA in peft by performing a few
changes required for non-transformers models. New tests for linear,
transformers conv1d, and conv2d layers were added.

Not yet contained in this PR:

- support for AdaLoRA and IA³
- documentation
- examples

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-07-17 10:02:30 +02:00
42ab10699b [Auto] Support AutoPeftModel for custom HF models (#707)
* support `AutoPeftModel` for custom HF models

* added documentation.
2023-07-15 14:18:34 +02:00
5a0e19dda1 [Feature] Save only selected adapters for LoRA (#705)
* v1 working for LoRA

* more checks

* fix prompt learning issues

* fix failing test

* Apply suggestions from code review

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* fixed indentation

* move the check above

* added tests for adaption prompt, enc-dec and feature extraction

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2023-07-14 16:28:03 +02:00
86ad5ce55c [Core] Enhancements and refactoring of LoRA method (#695)
* refactor lora and add utils

1. Refactor LoRA code
2. Add method to delete LoRA adapters
3. Add method to unload the PEFT LoRA model.
4. Add `svd` weighted adapter support.
5. minor fixes

* fixes

* fixes

* Update lora.py

* fixes

* Update lora.py

* docstrings for the added public APIs

* docs

* Update src/peft/tuners/lora.py

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* resolve comments, refactoring and adding tests

* fix the remaining failing tests

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2023-07-14 19:44:51 +05:30
61a8e3a3bd [WIP] FIX for disabling adapter, adding tests (#683)
This PR deals with some issues with disabling adapter:

- typo in active.adapter
- prompt encoder could be on wrong device
- when using prompt learning + generate, disabling did not work

For the last point, there is a somewhat ugly fix in place for now,
pending a more comprehensive refactor (a comment was added to that
effect).

Comprehensive tests were added to check that everything works now.

The following tests still not working:

- adaption prompt
- seq2seq with prompt tuning/prompt encoding
- stable diffusion is a little bit flaky but test is hopefully robust enough

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-07-14 14:33:33 +02:00
0675541154 Introducing AutoPeftModelForxxx (#694)
* working v1 for LMs

* added tests.

* added documentation.

* fixed ruff issues.

* added `AutoPeftModelForFeatureExtraction` .

* replace with `TypeError`

* address last comments

* added comment.
2023-07-14 11:07:09 +02:00
fa5957f7ca chore: add py.typed (#678)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-07-14 13:40:26 +05:30
5265eb7ebd Fix code typo in int8-asr.mdx (#698)
Having `bias="None"` in `LoraConfig` raised a `NotImplementedError`. Replaced it with `bias="none"` as per the [`LoraConfig` reference](https://huggingface.co/docs/peft/main/en/package_reference/tuners#peft.LoraConfig) and now the code works, I can run training.
2023-07-14 09:27:37 +02:00
878a8bc990 update Readme to include IA3 (#702) 2023-07-14 09:10:49 +02:00
b1bafca333 Fix a small bug in forward method of IA³ (#696) 2023-07-13 14:39:13 +02:00
92d38b50af add support for Feature Extraction using PEFT (#647)
* add support for embedding with peft

* add example and resolve code quality issues

* update notebook example post fixing the loss

* adding full example with inference notebook

* quality 

* add tests, docs, guide and rename task_type to be inline with Hub

* fixes

* fixes

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update peft_model.py

* fixes

* final fixes

* Update _toctree.yml

* fixes and make style and make quality

* deberta exception with checkpointing

* Update docs/source/task_guides/semantic-similarity-lora.md

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update docs/source/task_guides/semantic-similarity-lora.md

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* resolve comments

* testing prompt learning methods

* Update testing_common.py

* fix the tests

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2023-07-13 18:04:28 +05:30
5de5c24a8a Init IA³ weights randomly when so configured (#693)
Right now, no matter what the value of init_ia3_weights, these weights
are always initialized to be 1 (i.e. identity transforms). With this
fix, when init_ia3_weights=False, the weights are initialized randomly.
This is a setting mostly used for testing, so this fix has no user
impact.
2023-07-13 12:55:29 +02:00
062d95a09e FIX: base_model_torch_dtype when using model.half() after init (#688) 2023-07-13 11:12:40 +02:00
c33c42f158 Add functionality to support IA3 (#578)
* Added initial ia3 code

* Implemented ia3 correctly for feedforward layers; Fixed regex matching

* Fixed module mapping for mt5

* Merged changes from huggingface:main

* Merged changes

* Fixed lora merge conflicts

* Different bloom config

* Added save option for ia3

* Added loading code for ia3

* Added feedforward implementation in utils and seq cls example

* Added feedforward implementation in utils and seq cls example

* Implemented merge, unmerge, enable/disable adapters functionality

* Fixed feedforward during merge

* Debugging Merge

* Removing debug messages

* Cleaned up repo

* Removed non-IA3 changes

* Refactor save and load

* Added support to all models in tests; Added IA3Config for common tests

* Added half-precision support and test for gradient checkpointing; Formatted jupyter notebooks

* Added target modules for new models GPTBigCode and LLama

* Cleaned up code

* Cleaned up code

* Cleaned up example notebook

* Cleaned up  seq2seq notebook

* Corrected function docstrings; refactored find_and_replace

* Corrected function docstrings; refactored find_and_replace

* Added basic docs for IA3

* Added new conceptual guide in source tree for documentation

* Minor fix to documentation

* Minor fixes to docstrings; Added error handling for 4bit quantization; Cleaned unused merge/unmerge methods

* styling changes after merge from main

* Update src/peft/tuners/ia3.py

Remove unused attribute merge_weights

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

---------

Co-authored-by: Abhishek2304 <abhishekgupta2304@gmail.com>
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
2023-07-13 13:15:50 +05:30
c46d76ae3a Update Dockerfile (#684) 2023-07-11 18:41:52 +02:00
4f542e319f Fix embedding LoRA weights initialization (#681)
When init_lora_weights=False, embedding LoRA weights were initialized as
all zeros, resulting in LoRA becoming an idenitity transform. This is
inconsistent with other module types, where init_lora_weights=False
results in random initialization and thus a non-identity operation.

As init_lora_weights=False is just for internal testing, users should
not be affected by this change. In fact, I updated the doc of this
parameter to - hopefully - better reflect this.

There is no direct test for this change. However, there are tests
in #676 that will fail without this fix, so it is tested indirectly.
2023-07-11 12:20:26 +02:00
b5e341bb8a Added wandb support for lora train_dreambooth (#639)
* Update train_dreambooth.py

Accelerator init updated from logging_dir to project_dir. Newer versions of accelerate uses project_dir. logging_dir is deprecated

* Bugfix: Adapter name variable inserted, when changing LORA_ADAPTER_NAME it causes error

* Adapter name added as kwarg

* Black code formatted

* Style & Quality check

* Wandb import added for logging and project initialization

* Wandb import added for logging and project initialization

* fix project_name

* print tqdm progress to wandb
2023-07-11 13:56:03 +05:30
06fd06a4d2 Remove skipping certain tests (#668)
The generate tests so far were skipped for non-lora, non-prefix tuning
cases. However, those cases are now passing, so it is no longer
necessary to skip the tests.
2023-07-07 14:19:10 +02:00
7d1d959879 Adding support for RoBERTa layers_transform in COMMON_LAYERS_PATTERN (#669)
* fix: add pattern layer to support RoBERTa layers_transform

* chore: fix code quality error
2023-07-07 14:19:01 +02:00
39ef2546d5 Update clm-prompt-tuning.mdx (#652)
Fixed typo that prevented training.
2023-07-06 09:21:23 -07:00
9f7492577f Fix bug resulting in config copies not working (#653)
Resolves #424

The bug was caused by __dict__ being overwritten to return a copy of the
dataclass. This can lead to unpredictable behavior, as shown in the
issue. This fix removes the __dict__ property and preservers the
original behavior where needed.

All three added tests would fail without the fix.
2023-07-06 09:06:41 +02:00
bef8e3584c [docs] API example (#650)
* api example

* apply feedback

* fix format

* make style
2023-07-05 11:19:20 -07:00
032fff92fb Fixed LoraConfig alpha modification on add_weighted_adapter (#654)
* Fixed LoraConfig modification on add_weighted_adapter

* Added test for issue with adding weighted adapter for LoRA

* Fixed formatting
2023-07-01 11:13:25 +05:30
6c8659f8f9 Require Python version 3.8 (#649) 2023-06-30 14:01:41 +02:00
5884bdbea4 Add pytest-cov for reporting test coverage (#641)
As discussed, this adds line coverage to the tests. This will allow us
to identify parts of the code that are missing coverage and make it
easier to ensure newly added code is well covered.

At the moment, CI is not set up to report if new, uncovered code is
being added. We could add codecov to the CI to get this functionality,
but having 100% coverage for new code is not always desired, so it's
debatable if it is needed.

Right now, there are multiple test commands (normal, single, multi GPU).
For each individual command, the coverage report would only include the
lines covered by that command, so the total coverage would be
underreported. It is possible to combine multiple coverage reports into
a single report:

https://coverage.readthedocs.io/en/stable/cmd.html#cmd-combine

Combining the reports will be added in a future PR.
2023-06-30 14:01:02 +02:00
86290e9660 style: tentatively add hints for some public function (#614)
* style: tentatively add hints for some public function

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* fix: import annotations to evaluate to str

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* fix: style

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-28 12:33:16 +05:30
563acf0832 Remove loralib reqs from examples, small fixes (#640)
- As discussed, loralib is no longer required, so the examples from the
  docs have been updated to no longer require loralib as dependencies
- In one example, a missing torch import was added
- In another example, a missing line was added (output of that line is
  shown, but not the line itself)
2023-06-28 12:23:09 +05:30
f4526d57fc importing peft with an old version of bitsandbytes causes an exception (#642) (#646) 2023-06-28 00:52:06 +02:00
d9b0a118af Update peft_model.py (#644) 2023-06-27 23:41:51 +02:00
f5352f08c5 feat(model): Allow from_pretrained to accept PeftConfig class (#612)
* feat(model): Allow from_pretrained to accept PeftConfig class

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* tests: add test cases for config construction

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: address comments and run tools

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* fix: style

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-27 18:27:57 +05:30
48ffd07276 fix ptun and prompt tuning generation issue (#543)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2023-06-27 16:56:47 +05:30
eb01b5ee1d fix Prefix-tuning error in clm Float16 evaluation (#520)
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
2023-06-27 13:57:21 +05:30
a7ea02a709 [Bugfix] Inserted adapter_name to get_peft_model_state_dict function (#626)
* Update train_dreambooth.py

Accelerator init updated from logging_dir to project_dir. Newer versions of accelerate uses project_dir. logging_dir is deprecated

* Bugfix: Adapter name variable inserted, when changing LORA_ADAPTER_NAME it causes error

* Adapter name added as kwarg

* Black code formatted

* Style & Quality check
2023-06-27 13:56:54 +05:30
66fd087205 [Bugfix] Fixed LoRA conv2d merge (#637)
* Fixed LoRA conv2d merge

* Fixed typo
2023-06-27 12:18:08 +05:30
0e8932f1cb Add seq2seq prompt tuning support (#519)
* Added prompt tuning for seq2seq and corresponding notebook examples

* Added prompt tuning for seq2seq and corresponding notebook examples

* Added prompt tuning for seq2seq and corresponding notebook examples

* Call encoder with get_encoder() and update notebook example

* Style formatting

* Add seq2seq p-tuning support, and improve seq2seq prompt tuning support, enabling the use of generate()

* Fix imports

* Fix imports

* Add co-author.

Co-authored-by: ZhengxiangShi michaelszx117@gmail.com

* Add co-author.

Co-authored-by: ZhengxiangShi <michaelszx117@gmail.com>

---------

Co-authored-by: Thomas SCHILLACI <tschilla@px101.prod.exalead.com>
Co-authored-by: ZhengxiangShi <michaelszx117@gmail.com>
2023-06-27 11:45:49 +05:30
e2b8e3260d [AdaptionPrompt] Add 8bit + 4bit support for adaption prompt (#604)
* add 8bit support for adaption prompt

* add 4bit support
2023-06-26 15:44:51 +02:00
c476c1e348 add adalora 4bit (#598) 2023-06-26 15:09:22 +02:00
18544647ac Update train_dreambooth.py (#624)
Accelerator init updated from logging_dir to project_dir. Newer versions of accelerate uses project_dir. logging_dir is deprecated
2023-06-26 18:18:24 +05:30
8af8dbd2ec Update README.md, citation (#616)
bibtex was giving me a "too many commas" error, this fixes it
2023-06-23 15:59:34 +05:30
39fc09ec1b update whisper test (#617) 2023-06-23 14:37:43 +05:30
016722addd Added Civitai LoRAs conversion to PEFT, PEFT LoRAs conversion to webui (#596)
* Fixed kohya_ss to peft lora conversion, added script for backward conversion

* Fixed getting alpha from PEFT

---------

Co-authored-by: Alexander Kovalchuk <a.kovalchuk@prequelapp.com>
2023-06-21 19:34:39 +05:30
fd10faedfa stronger import of bnb (#605) 2023-06-21 17:45:04 +05:30
702d06098e add adapter_name in get_peft_model (#610) 2023-06-21 17:43:40 +05:30
0b62b4378b fix final failing tests (#609) 2023-06-20 14:15:00 +02:00
b8b84cb6ce [tests] Fix dockerfile (#608)
* fix dockerfile and test

* relax constraints

* fix

* fix log reports and empty cache

* revert workflow

* add librosa
2023-06-20 12:33:14 +02:00
08cb3dde57 Improve the README when using PEFT (#594)
* add logic

* Update peft_model.py

* fix test failures

* fixes

* fix
2023-06-19 14:19:41 +05:30
03eb378eb9 feat: Add PeftModelForQuestionAnswering (#473)
* Added first try of supporting QuestionAnswering

* Updated example to be correct

* Added changes from PR 404

* Added missing mapping for task type

* Remove unrelated code

* Run make style
2023-06-16 16:53:58 +05:30
6b81d7179f when from_pretrained is called in finetune of lora with flag "is_trainable" True, should not call model.eval() (#591) 2023-06-16 16:34:07 +05:30
0270b7c780 add more CI tests (#586) 2023-06-16 11:06:48 +02:00
38e9c650ba Fix typo at peft_model.py (#588)
Fix typo on description:
- `imputs_embeds` to `inputs_embeds`
2023-06-16 14:28:51 +05:30
9320373c12 LoRA for Conv2d layer, script to convert kohya_ss LoRA to PEFT (#461)
* Added LoRA for Conv2d layer, script to convert kohya_ss linear lora to PEFT

* Fixed code style, added missing safetensors dependency for kohya_ss to peft conversion script
2023-06-15 16:03:38 +05:30
019b7ff9d6 fix adalora device mismatch issue (#583) 2023-06-15 12:25:36 +02:00
b519e3f9e1 [core] Correctly passing the kwargs all over the place (#575)
* v1 of the fix

* forward contrib credits from discussions

* add tests

---------

Co-authored-by: winglian <winglian@users.noreply.github.com>
2023-06-15 12:23:05 +02:00
e48dfc331c Fix minor typo bug-report.yml (#582) 2023-06-15 10:41:03 +02:00
4d51464045 enable lora for mpt (#576)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2023-06-15 13:05:43 +05:30
8563a63af2 [BugFix] Set alpha and dropout defaults in LoraConfig (#390)
* Set alpha and dropout defaults in LoraConfig

* Update src/peft/tuners/lora.py

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

---------

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
2023-06-15 12:41:35 +05:30
eb75374fb1 add issue template (#562) 2023-06-13 23:03:12 +05:30
1cbc985018 feat: add type hint to get_peft_model (#566)
Signed-off-by: samsja <sami.jaghouar@hotmail.fr>
2023-06-13 23:02:54 +05:30
58f4dee67a Fix typo and url to openai/whisper-large-v2 (#563) 2023-06-13 09:49:20 -07:00
a8d11b36a3 [core] Fix config kwargs (#561)
* fix config kwargs

* style

* fix order
2023-06-13 17:54:49 +02:00
189a6b8e35 [core] Add safetensors integration (#553)
* add v1

* clean up

* more improvements

* add device

* final adjustements

* use `EntryNotFoundError`

* better checks

* add tests and final fixes

* make style && make quality

* remove `push_to_hub` because of the release
2023-06-09 12:33:13 +02:00
e45529b149 improve code readability (#409) 2023-06-08 14:45:35 +05:30
ba7b1011b8 [doc build] Use secrets (#556) 2023-06-07 18:41:51 +02:00
c23be52881 add thousands separator in print_trainable_parameters (#443) 2023-06-07 18:09:17 +05:30
7fb5f90a38 add library name to model card (#549) 2023-06-05 18:44:40 +05:30
fcff23f005 Remove device_map when training 4,8-bit model. (#534)
* Remove device_map when training 4,8-bit model.

* Fix style
2023-06-02 13:37:46 +05:30
42a184f742 Fix a minor typo where a non-default token_dim would crash prompt tuning (#459) 2023-06-01 15:14:06 +05:30
7add756923 Add starcoder model to target modules dict (#528)
* Add starcoder model to target modules dict

* make style and quality
2023-06-01 14:49:58 +05:30
9914e76d5b Fixed problem with duplicate same code. (#517) 2023-06-01 14:47:05 +05:30
668f045972 return load_result when load_adapter (#481) 2023-06-01 14:46:38 +05:30
11b
38f48dd769 fix merge_and_unload when LoRA targets embedding layer (#438) 2023-06-01 14:45:06 +05:30
db55fb34b8 [Llama-Adapter] fix half precision inference + add tests (#456)
* fix + add tests

* forward contrib credits from discussions

---------

Co-authored-by: HamidShojanazeri <HamidShojanazeri@users.noreply.github.com>
2023-06-01 14:44:11 +05:30
76d4ecd40d Enable PeftConfig & PeftModel to load from revision (#433)
* Enable PeftConfig to load from revision

* Add revision to PeftModel

* Fix weights download with revision
2023-06-01 14:39:54 +05:30
27f956a73b [LoRA] Allow applying LoRA at different stages (#429)
* working v1

- working v1
- added tests
- needs some documentation

* more fixes

- stronger tests
- documentation
- remove unneeded common layers pattern

* add more docstring

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* quality & style

* style

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2023-06-01 09:35:24 +02:00
dd1c0d87fe change comment in tuners.lora, lora_alpha float to int (#448) 2023-06-01 12:22:00 +05:30
207d290865 [docs] Prettify index (#478)
* prettify index

* fix format
2023-05-31 09:32:22 -07:00
5e8ee44091 fix (#524) 2023-05-31 09:22:27 -07:00
662ebe593e [core] Add gradient checkpointing check (#404)
* add automatic input enable gradients when calling `get_peft_model`

* style

* better check

* add 4bit check
2023-05-31 12:14:27 +02:00
c42968617b Remove merge_weights (#392) 2023-05-31 11:38:12 +05:30
3714aa2fff [core] Raise warning on using prepare_model_for_int8_training (#483)
* raise warning on using older method

* Update src/peft/utils/other.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* quality

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-05-22 17:47:32 +02:00
0fcc30dd43 [core] Protect 4bit import (#480)
* protect 4bit import

* fix CI

* better check for python 3.7
2023-05-21 00:14:43 +02:00
d6015bc11f 4-bit QLoRA via bitsandbytes (4-bit base model + LoRA) (#476)
* 4bit lora

* 4bit test

* fixing 4bits bugs

* fp4 pass variables

* fix inference datatype and generation config

* updating prep for int8 function to work for 4-bit

* Added FP4 LoRA and FP4 fine-tuning example.

* LinearFP4 -> Linear4bit

* fixes

* Fixed 4-bit example.

* Style changes.

* final changes

---------

Co-authored-by: Artidoro Pagnoni <pagnoni.artidoro@gmail.com>
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
2023-05-20 17:47:15 +02:00
4fd374e80d add sound file to docker images (#401) 2023-05-10 09:00:07 +02:00
3d7770bfd5 Update README.md (#399) 2023-05-10 12:13:14 +05:30
f173f97e9d Fix documentation links on index page (#406) 2023-05-10 12:06:57 +05:30
ef8523b5a4 fix index alignment? (#397) 2023-05-10 12:05:17 +05:30
63c5c9a2c0 [CI] Fix CI - pin urlib (#402)
* fix CI - pin urlib

* revert
2023-05-10 11:56:51 +05:30
5ed95f49d0 add accelerate example for DDP and FSDP in sequence classification for non-lora case (#358)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2023-05-10 10:10:30 +05:30
8a3fcd060d do not use self.device. In FSDP cpu offload mode. self.device is "CPU" instead of "cuda" (#352)
and there's error like "Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:1"

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2023-05-10 10:09:28 +05:30
b1059b73aa Release: v0.4.0.dev0 (#391) 2023-05-04 01:38:33 +05:30
217 changed files with 62341 additions and 4126 deletions

71
.github/ISSUE_TEMPLATE/bug-report.yml vendored Normal file
View File

@ -0,0 +1,71 @@
name: "\U0001F41B Bug Report"
description: Submit a bug report to help us improve the library
body:
- type: textarea
id: system-info
attributes:
label: System Info
description: Please share your relevant system information with us
placeholder: peft & accelerate & transformers version, platform, python version, ...
validations:
required: true
- type: textarea
id: who-can-help
attributes:
label: Who can help?
description: |
Your issue will be replied to more quickly if you can figure out the right person to tag with @
If you know how to use git blame, that is the easiest way, otherwise, here is a rough guide of **who to tag**.
All issues are read by one of the core maintainers, so if you don't know who to tag, just leave this blank and
a core maintainer will ping the right person.
Please tag fewer than 3 people.
Library: @pacman100 @younesbelkada @sayakpaul
Documentation: @stevhliu and @MKhalusova
placeholder: "@Username ..."
- type: checkboxes
id: information-scripts-examples
attributes:
label: Information
description: 'The problem arises when using:'
options:
- label: "The official example scripts"
- label: "My own modified scripts"
- type: checkboxes
id: information-tasks
attributes:
label: Tasks
description: "The tasks I am working on are:"
options:
- label: "An officially supported task in the `examples` folder"
- label: "My own task or dataset (give details below)"
- type: textarea
id: reproduction
validations:
required: true
attributes:
label: Reproduction
description: |
Please provide a code sample that reproduces the problem you ran into. It can be a Colab link or just a code snippet.
Please provide the simplest reproducer as possible so that we can quickly fix the issue.
placeholder: |
Reproducer:
- type: textarea
id: expected-behavior
validations:
required: true
attributes:
label: Expected behavior
description: "A clear and concise description of what you would expect to happen."

View File

@ -0,0 +1,30 @@
name: "\U0001F680 Feature request"
description: Submit a proposal/request for a new feature
labels: [ "feature" ]
body:
- type: textarea
id: feature-request
validations:
required: true
attributes:
label: Feature request
description: |
A clear and concise description of the feature proposal. Please provide a link to the paper and code in case they exist.
- type: textarea
id: motivation
validations:
required: true
attributes:
label: Motivation
description: |
Please outline the motivation for the proposal. Is your feature request related to a problem?
- type: textarea
id: contribution
validations:
required: true
attributes:
label: Your contribution
description: |
Is there any way that you could help, e.g. by submitting a PR?

View File

@ -15,10 +15,20 @@ jobs:
name: "Latest Peft CPU [dev]"
runs-on: ubuntu-latest
steps:
- name: Cleanup disk
run: |
sudo ls -l /usr/local/lib/
sudo ls -l /usr/share/
sudo du -sh /usr/local/lib/
sudo du -sh /usr/share/
sudo rm -rf /usr/local/lib/android
sudo rm -rf /usr/share/dotnet
sudo du -sh /usr/local/lib/
sudo du -sh /usr/share/
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
- name: Check out code
uses: actions/checkout@v2
uses: actions/checkout@v3
- name: Login to DockerHub
uses: docker/login-action@v2
with:
@ -36,10 +46,20 @@ jobs:
name: "Latest Peft GPU [dev]"
runs-on: ubuntu-latest
steps:
- name: Cleanup disk
run: |
sudo ls -l /usr/local/lib/
sudo ls -l /usr/share/
sudo du -sh /usr/local/lib/
sudo du -sh /usr/share/
sudo rm -rf /usr/local/lib/android
sudo rm -rf /usr/share/dotnet
sudo du -sh /usr/local/lib/
sudo du -sh /usr/share/
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
- name: Check out code
uses: actions/checkout@v2
uses: actions/checkout@v3
- name: Login to DockerHub
uses: docker/login-action@v1
with:
@ -47,8 +67,71 @@ jobs:
password: ${{ secrets.DOCKERHUB_PASSWORD }}
- name: Build and Push GPU
uses: docker/build-push-action@v2
uses: docker/build-push-action@v4
with:
context: ./docker/peft-gpu
push: true
tags: huggingface/peft-gpu
tags: huggingface/peft-gpu
latest-cuda-bnb-source:
name: "Latest Peft GPU + bnb source [dev]"
runs-on: ubuntu-latest
steps:
- name: Cleanup disk
run: |
sudo ls -l /usr/local/lib/
sudo ls -l /usr/share/
sudo du -sh /usr/local/lib/
sudo du -sh /usr/share/
sudo rm -rf /usr/local/lib/android
sudo rm -rf /usr/share/dotnet
sudo du -sh /usr/local/lib/
sudo du -sh /usr/share/
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
- name: Check out code
uses: actions/checkout@v3
- name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
- name: Build and Push GPU
uses: docker/build-push-action@v4
with:
context: ./docker/peft-gpu-bnb-source
push: true
tags: huggingface/peft-gpu-bnb-source
latest-cuda-bnb-source-latest:
name: "Latest Peft GPU + bnb source [accelerate / peft / transformers latest]"
runs-on: ubuntu-latest
steps:
- name: Cleanup disk
run: |
sudo ls -l /usr/local/lib/
sudo ls -l /usr/share/
sudo du -sh /usr/local/lib/
sudo du -sh /usr/share/
sudo rm -rf /usr/local/lib/android
sudo rm -rf /usr/share/dotnet
sudo du -sh /usr/local/lib/
sudo du -sh /usr/share/
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
- name: Check out code
uses: actions/checkout@v3
- name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
- name: Build and Push GPU
uses: docker/build-push-action@v4
with:
context: ./docker/peft-gpu-bnb-latest
push: true
tags: huggingface/peft-gpu-bnb-latest

View File

@ -15,4 +15,5 @@ jobs:
package: peft
notebook_folder: peft_docs
secrets:
token: ${{ secrets.HUGGINGFACE_PUSH }}
token: ${{ secrets.HUGGINGFACE_PUSH }}
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}

View File

@ -1,13 +0,0 @@
name: Delete dev documentation
on:
pull_request:
types: [ closed ]
jobs:
delete:
uses: huggingface/doc-builder/.github/workflows/delete_doc_comment.yml@main
with:
pr_number: ${{ github.event.number }}
package: peft

View File

@ -0,0 +1,82 @@
name: integration tests
on:
workflow_dispatch:
inputs:
branch:
description: 'Branch to test on'
required: true
jobs:
run_transformers_integration_tests:
strategy:
fail-fast: false
matrix:
transformers-version: ['main', 'latest']
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
ref: ${{ github.event.inputs.branch }}
repository: ${{ github.event.pull_request.head.repo.full_name }}
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.10"
cache: "pip"
cache-dependency-path: "setup.py"
- name: print environment variables
run: |
echo "env.CI_BRANCH = ${{ env.CI_BRANCH }}"
echo "env.CI_SHA = ${{ env.CI_SHA }}"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install .[test]
if [ "${{ matrix.transformers-version }}" == "main" ]; then
pip install -U git+https://github.com/huggingface/transformers.git
else
echo "Nothing to do as transformers latest already installed"
fi
- name: Test transformers integration
run: |
cd .. && git clone https://github.com/huggingface/transformers.git && cd transformers/ && git rev-parse HEAD
RUN_SLOW=1 pytest tests/peft_integration/test_peft_integration.py
run_diffusers_integration_tests:
strategy:
fail-fast: false
matrix:
# For now diffusers integration is not on PyPI
diffusers-version: ['main']
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
ref: ${{ github.event.inputs.branch }}
repository: ${{ github.event.pull_request.head.repo.full_name }}
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.10"
cache: "pip"
cache-dependency-path: "setup.py"
- name: print environment variables
run: |
echo "env.CI_BRANCH = ${{ env.CI_BRANCH }}"
echo "env.CI_SHA = ${{ env.CI_SHA }}"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install .[test]
if [ "${{ matrix.diffusers-version }}" == "main" ]; then
pip install -U git+https://github.com/huggingface/diffusers.git
else
echo "Nothing to do as diffusers latest already installed"
fi
- name: Test diffusers integration
run: |
cd .. && git clone https://github.com/huggingface/diffusers.git && cd diffusers/ && git rev-parse HEAD
pytest tests/lora/test_lora_layers_peft.py

131
.github/workflows/nightly-bnb.yml vendored Normal file
View File

@ -0,0 +1,131 @@
name: BNB from source self-hosted runner with slow tests (scheduled)
on:
workflow_dispatch:
schedule:
- cron: "0 2 * * *"
env:
RUN_SLOW: "yes"
IS_GITHUB_CI: "1"
# To be able to run tests on CUDA 12.2
NVIDIA_DISABLE_REQUIRE: "1"
SLACK_API_TOKEN: ${{ secrets.SLACK_API_TOKEN }}
jobs:
run_all_tests_single_gpu:
strategy:
fail-fast: false
matrix:
docker-image-name: ["huggingface/peft-gpu-bnb-source:latest", "huggingface/peft-gpu-bnb-latest:latest"]
runs-on: [self-hosted, single-gpu, nvidia-gpu, t4, ci]
env:
CUDA_VISIBLE_DEVICES: "0"
TEST_TYPE: "single_gpu_${{ matrix.docker-image-name }}"
container:
image: ${{ matrix.docker-image-name }}
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
defaults:
run:
shell: bash
steps:
- uses: actions/checkout@v3
- name: Pip install
run: |
source activate peft
pip install -e . --no-deps
pip install pytest-reportlog pytest-cov parameterized datasets scipy einops
mkdir transformers-clone && git clone https://github.com/huggingface/transformers.git transformers-clone # rename to transformers clone to avoid modules conflict
if [ "${{ matrix.docker-image-name }}" == "huggingface/peft-gpu-bnb-latest:latest" ]; then
cd transformers-clone
transformers_version=$(pip show transformers | grep '^Version:' | cut -d ' ' -f2 | sed 's/\.dev0//')
echo "Checking out tag for Transformers version: v$transformers_version"
git fetch --tags
git checkout tags/v$transformers_version
cd ..
fi
- name: Run examples on single GPU
if: always()
run: |
source activate peft
make tests_examples_single_gpu_bnb
- name: Run core tests on single GPU
if: always()
run: |
source activate peft
make tests_core_single_gpu_bnb
- name: Run transformers tests on single GPU
if: always()
run: |
source activate peft
make transformers_tests
- name: Generate Report
if: always()
run: |
pip install slack_sdk tabulate
python scripts/log_reports.py --slack_channel_name bnb-daily-ci >> $GITHUB_STEP_SUMMARY
run_all_tests_multi_gpu:
strategy:
fail-fast: false
matrix:
docker-image-name: ["huggingface/peft-gpu-bnb-source:latest", "huggingface/peft-gpu-bnb-latest:latest"]
runs-on: [self-hosted, multi-gpu, nvidia-gpu, t4, ci]
env:
CUDA_VISIBLE_DEVICES: "0,1"
TEST_TYPE: "multi_gpu_${{ matrix.docker-image-name }}"
container:
image: ${{ matrix.docker-image-name }}
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
defaults:
run:
shell: bash
steps:
- uses: actions/checkout@v3
- name: Pip install
run: |
source activate peft
pip install -e . --no-deps
pip install pytest-reportlog pytest-cov parameterized datasets scipy einops
mkdir transformers-clone && git clone https://github.com/huggingface/transformers.git transformers-clone
if [ "${{ matrix.docker-image-name }}" == "huggingface/peft-gpu-bnb-latest:latest" ]; then
cd transformers-clone
transformers_version=$(pip show transformers | grep '^Version:' | cut -d ' ' -f2 | sed 's/\.dev0//')
echo "Checking out tag for Transformers version: v$transformers_version"
git fetch --tags
git checkout tags/v$transformers_version
cd ..
fi
- name: Run core GPU tests on multi-gpu
if: always()
run: |
source activate peft
- name: Run examples on multi GPU
if: always()
run: |
source activate peft
make tests_examples_multi_gpu_bnb
- name: Run core tests on multi GPU
if: always()
run: |
source activate peft
make tests_core_multi_gpu_bnb
- name: Run transformers tests on multi GPU
if: always()
run: |
source activate peft
make transformers_tests
- name: Generate Report
if: always()
run: |
pip install slack_sdk tabulate
python scripts/log_reports.py --slack_channel_name bnb-daily-ci >> $GITHUB_STEP_SUMMARY

View File

@ -8,28 +8,30 @@ on:
env:
RUN_SLOW: "yes"
IS_GITHUB_CI: "1"
# To be able to run tests on CUDA 12.2
NVIDIA_DISABLE_REQUIRE: "1"
SLACK_API_TOKEN: ${{ secrets.SLACK_API_TOKEN }}
jobs:
run_all_tests_single_gpu:
runs-on: [self-hosted, docker-gpu, multi-gpu]
strategy:
fail-fast: false
runs-on: [self-hosted, single-gpu, nvidia-gpu, t4, ci]
env:
CUDA_VISIBLE_DEVICES: "0"
TEST_TYPE: "single_gpu"
container:
image: huggingface/peft-gpu:latest
options: --gpus all --shm-size "16gb"
options: --gpus all --shm-size "16gb" -e NVIDIA_DISABLE_REQUIRE=true
defaults:
run:
working-directory: peft/
shell: bash
steps:
- name: Update clone & pip install
- uses: actions/checkout@v3
- name: Pip install
run: |
source activate peft
git config --global --add safe.directory '*'
git fetch && git checkout ${{ github.sha }}
pip install -e . --no-deps
pip install pytest-reportlog
@ -47,6 +49,11 @@ jobs:
run: |
source activate peft
make tests_core_single_gpu
- name: Run regression tests on single GPU
run: |
source activate peft
make tests_regression
- name: Generate Report
if: always()
@ -55,23 +62,23 @@ jobs:
python scripts/log_reports.py >> $GITHUB_STEP_SUMMARY
run_all_tests_multi_gpu:
runs-on: [self-hosted, docker-gpu, multi-gpu]
strategy:
fail-fast: false
runs-on: [self-hosted, multi-gpu, nvidia-gpu, t4, ci]
env:
CUDA_VISIBLE_DEVICES: "0,1"
TEST_TYPE: "multi_gpu"
container:
image: huggingface/peft-gpu:latest
options: --gpus all --shm-size "16gb"
options: --gpus all --shm-size "16gb" -e NVIDIA_DISABLE_REQUIRE=true
defaults:
run:
working-directory: peft/
shell: bash
steps:
- name: Update clone
- uses: actions/checkout@v3
- name: Pip install
run: |
source activate peft
git config --global --add safe.directory '*'
git fetch && git checkout ${{ github.sha }}
pip install -e . --no-deps
pip install pytest-reportlog

View File

@ -28,7 +28,7 @@ jobs:
needs: check_code_quality
strategy:
matrix:
python-version: ["3.8", "3.9", "3.10"]
python-version: ["3.8", "3.9", "3.10", "3.11"]
os: ["ubuntu-latest", "macos-latest", "windows-latest"]
runs-on: ${{ matrix.os }}
steps:
@ -43,7 +43,7 @@ jobs:
run: |
python -m pip install --upgrade pip
# cpu version of pytorch
pip install .[test]
pip install -e .[test]
- name: Test with pytest
run: |
make test

View File

@ -0,0 +1,43 @@
name: torch compile tests
# see peft/tests/__init__.py
on:
workflow_dispatch:
inputs:
branch:
description: 'Branch to test on'
required: true
pytorch_nightly:
description: 'Whether to use PyTorch nightly (true/false)'
required: false
default: false
jobs:
run_tests_with_compile:
runs-on: ubuntu-latest
env:
PEFT_DEBUG_WITH_TORCH_COMPILE: 1
steps:
- uses: actions/checkout@v4
with:
ref: ${{ github.event.inputs.branch }}
repository: ${{ github.event.pull_request.head.repo.full_name }}
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.10"
cache: "pip"
cache-dependency-path: "setup.py"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install .[test]
if [ "${{ github.event.inputs.pytorch_nightly }}" = "true" ]; then
python -m pip install --upgrade --pre torch --index-url https://download.pytorch.org/whl/nightly/cpu
fi
- name: Test compile with pytest
run: |
echo "PEFT_DEBUG_WITH_TORCH_COMPILE=$PEFT_DEBUG_WITH_TORCH_COMPILE"
git status
make test

View File

@ -0,0 +1,16 @@
name: Upload PR Documentation
on:
workflow_run:
workflows: ["Build PR Documentation"]
types:
- completed
jobs:
build:
uses: huggingface/doc-builder/.github/workflows/upload_pr_documentation.yml@main
with:
package_name: peft
secrets:
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
comment_bot_token: ${{ secrets.COMMENT_BOT_TOKEN }}

View File

@ -34,3 +34,22 @@ tests_core_single_gpu:
tests_common_gpu:
python -m pytest tests/test_decoder_models.py $(if $(IS_GITHUB_CI),--report-log "common_decoder.log",)
python -m pytest tests/test_encoder_decoder_models.py $(if $(IS_GITHUB_CI),--report-log "common_encoder_decoder.log",)
tests_examples_multi_gpu_bnb:
python -m pytest -m "multi_gpu_tests and bitsandbytes" tests/test_gpu_examples.py $(if $(IS_GITHUB_CI),--report-log "multi_gpu_examples.log",)
tests_examples_single_gpu_bnb:
python -m pytest -m "single_gpu_tests and bitsandbytes" tests/test_gpu_examples.py $(if $(IS_GITHUB_CI),--report-log "single_gpu_examples.log",)
tests_core_multi_gpu_bnb:
python -m pytest -m "multi_gpu_tests and bitsandbytes" tests/test_common_gpu.py $(if $(IS_GITHUB_CI),--report-log "core_multi_gpu.log",)
tests_core_single_gpu_bnb:
python -m pytest -m "single_gpu_tests and bitsandbytes" tests/test_common_gpu.py $(if $(IS_GITHUB_CI),--report-log "core_single_gpu.log",)
# For testing transformers tests for bnb runners
transformers_tests:
RUN_SLOW=1 python -m pytest transformers-clone/tests/quantization/bnb $(if $(IS_GITHUB_CI),--report-log "transformers_tests.log",)
tests_regression:
python -m pytest -s --regression tests/regression/ $(if $(IS_GITHUB_CI),--report-log "regression_tests.log",)

206
README.md
View File

@ -30,6 +30,12 @@ Supported methods:
3. P-Tuning: [GPT Understands, Too](https://arxiv.org/abs/2103.10385)
4. Prompt Tuning: [The Power of Scale for Parameter-Efficient Prompt Tuning](https://arxiv.org/abs/2104.08691)
5. AdaLoRA: [Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning](https://arxiv.org/abs/2303.10512)
6. $(IA)^3$: [Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning](https://arxiv.org/abs/2205.05638)
7. MultiTask Prompt Tuning: [Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning](https://arxiv.org/abs/2303.02861)
8. LoHa: [FedPara: Low-Rank Hadamard Product for Communication-Efficient Federated Learning](https://arxiv.org/abs/2108.06098)
9. LoKr: [KronA: Parameter Efficient Tuning with Kronecker Adapter](https://arxiv.org/abs/2212.10650) based on [Navigating Text-To-Image Customization:From LyCORIS Fine-Tuning to Model Evaluation](https://arxiv.org/abs/2309.14859) implementation
10. LoftQ: [LoftQ: LoRA-Fine-Tuning-aware Quantization for Large Language Models](https://arxiv.org/abs/2310.08659)
11. OFT: [Controlling Text-to-Image Diffusion by Orthogonal Finetuning](https://arxiv.org/abs/2306.07280)
## Getting started
@ -54,7 +60,7 @@ model.print_trainable_parameters()
### Get comparable performance to full finetuning by adapting LLMs to downstream tasks using consumer hardware
GPU memory required for adapting LLMs on the few-shot dataset [`ought/raft/twitter_complaints`](https://huggingface.co/datasets/ought/raft/viewer/twitter_complaints). Here, settings considered
are full finetuning, PEFT-LoRA using plain PyTorch and PEFT-LoRA using DeepSpeed with CPU Offloading.
are full finetuning, PEFT-LoRA using plain PyTorch and PEFT-LoRA using DeepSpeed with CPU Offloading.
Hardware: Single A100 80GB GPU with CPU RAM above 64GB
@ -66,7 +72,7 @@ Hardware: Single A100 80GB GPU with CPU RAM above 64GB
Performance of PEFT-LoRA tuned [`bigscience/T0_3B`](https://huggingface.co/bigscience/T0_3B) on [`ought/raft/twitter_complaints`](https://huggingface.co/datasets/ought/raft/viewer/twitter_complaints) leaderboard.
A point to note is that we didn't try to squeeze performance by playing around with input instruction templates, LoRA hyperparams and other training related hyperparams. Also, we didn't use the larger 13B [mt0-xxl](https://huggingface.co/bigscience/mt0-xxl) model.
So, we are already seeing comparable performance to SoTA with parameter efficient tuning. Also, the final checkpoint size is just `19MB` in comparison to `11GB` size of the backbone [`bigscience/T0_3B`](https://huggingface.co/bigscience/T0_3B) model.
So, we are already seeing comparable performance to SoTA with parameter efficient tuning. Also, the final additional checkpoint size is just `19MB` in comparison to `11GB` size of the backbone [`bigscience/T0_3B`](https://huggingface.co/bigscience/T0_3B) model, but one still has to load the original full size model.
| Submission Name | Accuracy |
| --------- | ---- |
@ -131,15 +137,17 @@ Try out the 🤗 Gradio Space which should run seamlessly on a T4 instance:
**NEW** ✨ Multi Adapter support and combining multiple LoRA adapters in a weighted combination
![peft lora dreambooth weighted adapter](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/weighted_adapter_dreambooth_lora.png)
**NEW** ✨ Dreambooth training for Stable Diffusion using LoHa and LoKr adapters [`examples/stable_diffusion/train_dreambooth.py`](examples/stable_diffusion/train_dreambooth.py)
### Parameter Efficient Tuning of LLMs for RLHF components such as Ranker and Policy
- Here is an example in [trl](https://github.com/lvwerra/trl) library using PEFT+INT8 for tuning policy model: [gpt2-sentiment_peft.py](https://github.com/lvwerra/trl/blob/main/examples/sentiment/scripts/gpt2-sentiment_peft.py) and corresponding [Blog](https://huggingface.co/blog/trl-peft)
- Example using PEFT for Instrction finetuning, reward model and policy : [stack_llama](https://github.com/lvwerra/trl/tree/main/examples/stack_llama/scripts) and corresponding [Blog](https://huggingface.co/blog/stackllama)
- Example using PEFT for Instruction finetuning, reward model and policy : [stack_llama](https://github.com/lvwerra/trl/tree/main/examples/research_projects/stack_llama/scripts) and corresponding [Blog](https://huggingface.co/blog/stackllama)
### INT8 training of large models in Colab using PEFT LoRA and bits_and_bytes
### INT8 training of large models in Colab using PEFT LoRA and bitsandbytes
- Here is now a demo on how to fine tune [OPT-6.7b](https://huggingface.co/facebook/opt-6.7b) (14GB in fp16) in a Google Colab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1jCkpikz0J2o20FBQmYmAGdiKmJGOMo-o?usp=sharing)
- Here is now a demo on how to fine tune [whishper-large](openai/whisper-large-v2) (1.5B params) (14GB in fp16) in a Google Colab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1DOkD_5OUjFa0r5Ik3SgywJLJtEo2qLxO?usp=sharing) and [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1vhF8yueFqha3Y3CpTHN6q9EVcII9EYzs?usp=sharing)
- Here is now a demo on how to fine tune [whisper-large](https://huggingface.co/openai/whisper-large-v2) (1.5B params) (14GB in fp16) in a Google Colab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1DOkD_5OUjFa0r5Ik3SgywJLJtEo2qLxO?usp=sharing) and [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1vhF8yueFqha3Y3CpTHN6q9EVcII9EYzs?usp=sharing)
### Save compute and storage even for medium and small models
@ -147,7 +155,7 @@ Save storage by avoiding full finetuning of models on each of the downstream tas
With PEFT methods, users only need to store tiny checkpoints in the order of `MBs` all the while retaining
performance comparable to full finetuning.
An example of using LoRA for the task of adapting `LayoutLMForTokenClassification` on `FUNSD` dataset is given in `~examples/token_classification/PEFT_LoRA_LayoutLMForTokenClassification_on_FUNSD.py`. We can observe that with only `0.62 %` of parameters being trainable, we achieve performance (F1 0.777) comparable to full finetuning (F1 0.786) (without any hyerparam tuning runs for extracting more performance), and the checkpoint of this is only `2.8MB`. Now, if there are `N` such datasets, just have these PEFT models one for each dataset and save a lot of storage without having to worry about the problem of catastrophic forgetting or overfitting of backbone/base model.
An example of using LoRA for the task of adapting `LayoutLMForTokenClassification` on `FUNSD` dataset is given in `~examples/token_classification/PEFT_LoRA_LayoutLMForTokenClassification_on_FUNSD.py`. We can observe that with only `0.62 %` of parameters being trainable, we achieve performance (F1 0.777) comparable to full finetuning (F1 0.786) (without any hyperparam tuning runs for extracting more performance), and the checkpoint of this is only `2.8MB`. Now, if there are `N` such datasets, just have these PEFT models one for each dataset and save a lot of storage without having to worry about the problem of catastrophic forgetting or overfitting of backbone/base model.
Another example is fine-tuning [`roberta-large`](https://huggingface.co/roberta-large) on [`MRPC` GLUE](https://huggingface.co/datasets/glue/viewer/mrpc) dataset using different PEFT methods. The notebooks are given in `~examples/sequence_classification`.
@ -217,74 +225,77 @@ DeepSpeed version required `v0.8.0`. An example is provided in `~examples/condit
```
### Example of PEFT model inference using 🤗 Accelerate's Big Model Inferencing capabilities
An example is provided in `~examples/causal_language_modeling/peft_lora_clm_accelerate_big_model_inference.ipynb`.
An example is provided in [this notebook](https://github.com/huggingface/peft/blob/main/examples/causal_language_modeling/peft_lora_clm_accelerate_big_model_inference.ipynb).
## Models support matrix
Find models that are supported out of the box below. Note that PEFT works with almost all models -- if it is not listed, you just need to [do some manual configuration](https://huggingface.co/docs/peft/developer_guides/custom_models).
### Causal Language Modeling
| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning |
|--------------| ---- | ---- | ---- | ---- |
| GPT-2 | ✅ | ✅ | ✅ | ✅ |
| Bloom | ✅ | ✅ | ✅ | ✅ |
| OPT | ✅ | ✅ | ✅ | ✅ |
| GPT-Neo | ✅ | ✅ | ✅ | ✅ |
| GPT-J | ✅ | ✅ | ✅ | ✅ |
| GPT-NeoX-20B | ✅ | ✅ | ✅ | ✅ |
| LLaMA | ✅ | ✅ | ✅ | ✅ |
| ChatGLM | ✅ | ✅ | ✅ | ✅ |
| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning | IA3 |
|--------------| ---- | ---- | ---- | ---- | ---- |
| GPT-2 | ✅ | ✅ | ✅ | ✅ | ✅ |
| Bloom | ✅ | ✅ | ✅ | ✅ | ✅ |
| OPT | ✅ | ✅ | ✅ | ✅ | ✅ |
| GPT-Neo | ✅ | ✅ | ✅ | ✅ | ✅ |
| GPT-J | ✅ | ✅ | ✅ | ✅ | ✅ |
| GPT-NeoX-20B | ✅ | ✅ | ✅ | ✅ | ✅ |
| LLaMA | ✅ | ✅ | ✅ | ✅ | ✅ |
| ChatGLM | ✅ | ✅ | ✅ | ✅ | ✅ |
| Mistral | ✅ | | | | |
### Conditional Generation
| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning |
| --------- | ---- | ---- | ---- | ---- |
| T5 | ✅ | ✅ | ✅ | ✅ |
| BART | ✅ | ✅ | ✅ | ✅ |
| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning | IA3 |
| --------- | ---- | ---- | ---- | ---- | ---- |
| T5 | ✅ | ✅ | ✅ | ✅ | ✅ |
| BART | ✅ | ✅ | ✅ | ✅ | ✅ |
### Sequence Classification
| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning |
| --------- | ---- | ---- | ---- | ---- |
| BERT | ✅ | ✅ | ✅ | ✅ |
| RoBERTa | ✅ | ✅ | ✅ | ✅ |
| GPT-2 | ✅ | ✅ | ✅ | ✅ |
| Bloom | ✅ | ✅ | ✅ | ✅ |
| OPT | ✅ | ✅ | ✅ | ✅ |
| GPT-Neo | ✅ | ✅ | ✅ | ✅ |
| GPT-J | ✅ | ✅ | ✅ | ✅ |
| Deberta | ✅ | | ✅ | ✅ |
| Deberta-v2 | ✅ | | ✅ | ✅ |
| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning | IA3 |
| --------- | ---- | ---- | ---- | ---- | ---- |
| BERT | ✅ | ✅ | ✅ | ✅ | ✅ |
| RoBERTa | ✅ | ✅ | ✅ | ✅ | ✅ |
| GPT-2 | ✅ | ✅ | ✅ | ✅ | |
| Bloom | ✅ | ✅ | ✅ | ✅ | |
| OPT | ✅ | ✅ | ✅ | ✅ | |
| GPT-Neo | ✅ | ✅ | ✅ | ✅ | |
| GPT-J | ✅ | ✅ | ✅ | ✅ | |
| Deberta | ✅ | | ✅ | ✅ | |
| Deberta-v2 | ✅ | | ✅ | ✅ | |
### Token Classification
| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning |
| --------- | ---- | ---- | ---- | ---- |
| BERT | ✅ | ✅ | | |
| RoBERTa | ✅ | ✅ | | |
| GPT-2 | ✅ | ✅ | | |
| Bloom | ✅ | ✅ | | |
| OPT | ✅ | ✅ | | |
| GPT-Neo | ✅ | ✅ | | |
| GPT-J | ✅ | ✅ | | |
| Deberta | ✅ | | | |
| Deberta-v2 | ✅ | | | |
| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning | IA3 |
| --------- | ---- | ---- | ---- | ---- | ---- |
| BERT | ✅ | ✅ | | | |
| RoBERTa | ✅ | ✅ | | | |
| GPT-2 | ✅ | ✅ | | | |
| Bloom | ✅ | ✅ | | | |
| OPT | ✅ | ✅ | | | |
| GPT-Neo | ✅ | ✅ | | | |
| GPT-J | ✅ | ✅ | | | |
| Deberta | ✅ | | | | |
| Deberta-v2 | ✅ | | | | |
### Text-to-Image Generation
| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning |
| --------- | ---- | ---- | ---- | ---- |
| Stable Diffusion | ✅ | | | |
| Model | LoRA | LoHa | LoKr | OFT | Prefix Tuning | P-Tuning | Prompt Tuning | IA3 |
| --------- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
| Stable Diffusion | ✅ | | | ✅ | | | |
### Image Classification
| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning |
| --------- | ---- | ---- | ---- | ---- |
| ViT | ✅ | | | |
| Swin | ✅ | | | |
| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning | IA3 |
| --------- | ---- | ---- | ---- | ---- | ---- |
| ViT | ✅ | | | | |
| Swin | ✅ | | | | |
### Image to text (Multi-modal models)
| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning |
| --------- | ---- | ---- | ---- | ---- |
| Blip-2 | ✅ | | | |
| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning | IA3
| --------- | ---- | ---- | ---- | ---- | ---- |
| Blip-2 | ✅ | | | | |
___Note that we have tested LoRA for [ViT](https://huggingface.co/docs/transformers/model_doc/vit) and [Swin](https://huggingface.co/docs/transformers/model_doc/swin) for fine-tuning on image classification. However, it should be possible to use LoRA for any compatible model [provided](https://huggingface.co/models?pipeline_tag=image-classification&sort=downloads&search=vit) by 🤗 Transformers. Check out the respective
examples to learn more. If you run into problems, please open an issue.___
@ -293,9 +304,9 @@ The same principle applies to our [segmentation models](https://huggingface.co/m
### Semantic Segmentation
| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning |
| --------- | ---- | ---- | ---- | ---- |
| SegFormer | ✅ | | | |
| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning | IA3 |
| --------- | ---- | ---- | ---- | ---- | ---- |
| SegFormer | ✅ | | | | |
## Caveats:
@ -352,19 +363,80 @@ any GPU memory savings. Please refer issue [[FSDP] FSDP with CPU offload consume
accelerate launch --config_file fsdp_config.yaml examples/peft_lora_seq2seq_accelerate_fsdp.py
```
2. When using `P_TUNING` or `PROMPT_TUNING` with `SEQ_2_SEQ` task, remember to remove the `num_virtual_token` virtual prompt predictions from the left side of the model outputs during evaluations.
2. When using ZeRO3 with zero3_init_flag=True, if you find the gpu memory increase with training steps. we might need to update deepspeed after [deepspeed commit 42858a9891422abc](https://github.com/microsoft/DeepSpeed/commit/42858a9891422abcecaa12c1bd432d28d33eb0d4) . The related issue is [[BUG] Peft Training with Zero.Init() and Zero3 will increase GPU memory every forward step ](https://github.com/microsoft/DeepSpeed/issues/3002)
3. For encoder-decoder models, `P_TUNING` or `PROMPT_TUNING` doesn't support `generate` functionality of transformers because `generate` strictly requires `decoder_input_ids` but
`P_TUNING`/`PROMPT_TUNING` appends soft prompt embeddings to `input_embeds` to create
new `input_embeds` to be given to the model. Therefore, `generate` doesn't support this yet.
## 🤗 PEFT as a utility library
4. When using ZeRO3 with zero3_init_flag=True, if you find the gpu memory increase with training steps. we might need to set zero3_init_flag=false in accelerate config.yaml. The related issue is [[BUG] memory leak under zero.Init](https://github.com/microsoft/DeepSpeed/issues/2637)
### Injecting adapters directly into the model
## Backlog:
- [x] Add tests
- [x] Multi Adapter training and inference support
- [x] Add more use cases and examples
- [ ] Explore and possibly integrate `Bottleneck Adapters`, `(IA)^3`, `AdaptionPrompt` ...
Inject trainable adapters on any `torch` model using `inject_adapter_in_model` method. Note the method will make no further change to the model.
```python
import torch
from peft import inject_adapter_in_model, LoraConfig
class DummyModel(torch.nn.Module):
def __init__(self):
super().__init__()
self.embedding = torch.nn.Embedding(10, 10)
self.linear = torch.nn.Linear(10, 10)
self.lm_head = torch.nn.Linear(10, 10)
def forward(self, input_ids):
x = self.embedding(input_ids)
x = self.linear(x)
x = self.lm_head(x)
return x
lora_config = LoraConfig(
lora_alpha=16,
lora_dropout=0.1,
r=64,
bias="none",
target_modules=["linear"],
)
model = DummyModel()
model = inject_adapter_in_model(lora_config, model)
dummy_inputs = torch.LongTensor([[0, 1, 2, 3, 4, 5, 6, 7]])
dummy_outputs = model(dummy_inputs)
```
Learn more about the [low level API in the docs](https://huggingface.co/docs/peft/developer_guides/low_level_api).
### Mixing different adapter types
Ususally, it is not possible to combine different adapter types in the same model, e.g. combining LoRA with AdaLoRA, LoHa, or LoKr. Using a mixed model, this can, however, be achieved:
```python
from peft import PeftMixedModel
model = AutoModelForCausalLM.from_pretrained("hf-internal-testing/tiny-random-OPTForCausalLM").eval()
peft_model = PeftMixedModel.from_pretrained(model, <path-to-adapter-0>, "adapter0")
peft_model.load_adapter(<path-to-adapter-1>, "adapter1")
peft_model.set_adapter(["adapter0", "adapter1"])
result = peft_model(**inputs)
```
The main intent is to load already trained adapters and use this only for inference. However, it is also possible to create a PEFT model for training by passing `mixed=True` to `get_peft_model`:
```python
from peft import get_peft_model, LoraConfig, LoKrConfig
base_model = ...
config0 = LoraConfig(...)
config1 = LoKrConfig(...)
peft_model = get_peft_model(base_model, config0, "adapter0", mixed=True)
peft_model.add_adapter(config1, "adapter1")
peft_model.set_adapter(["adapter0", "adapter1"])
for batch in dataloader:
...
```
## Contributing
If you would like to contribute to PEFT, please check out our [contributing guide](https://huggingface.co/docs/peft/developer_guides/contributing).
## Citing 🤗 PEFT
@ -373,7 +445,7 @@ If you use 🤗 PEFT in your publication, please cite it by using the following
```bibtex
@Misc{peft,
title = {PEFT: State-of-the-art Parameter-Efficient Fine-Tuning methods},
author = {Sourab Mangrulkar, Sylvain Gugger, Lysandre Debut, Younes Belkada, Sayak Paul},
author = {Sourab Mangrulkar and Sylvain Gugger and Lysandre Debut and Younes Belkada and Sayak Paul and Benjamin Bossan},
howpublished = {\url{https://github.com/huggingface/peft}},
year = {2022}
}

View File

@ -15,6 +15,7 @@ RUN apt-get update && \
RUN apt-get update && \
apt install -y ffmpeg
RUN apt install -y libsndfile1-dev
RUN git lfs install
# Create our conda env - copied from https://github.com/huggingface/accelerate/blob/main/docker/accelerate-gpu/Dockerfile
@ -31,9 +32,12 @@ SHELL ["/bin/bash", "-c"]
# Activate the conda env and install transformers + accelerate from source
RUN source activate peft && \
python3 -m pip install --no-cache-dir \
librosa \
"soundfile>=0.12.1" \
scipy \
git+https://github.com/huggingface/transformers \
git+https://github.com/huggingface/accelerate \
git+https://github.com/huggingface/peft#egg=peft[test]
peft[test]@git+https://github.com/huggingface/peft
# Install apt libs
RUN apt-get update && \

View File

@ -0,0 +1,67 @@
# Builds GPU docker image of PyTorch
# Uses multi-staged approach to reduce size
# Stage 1
# Use base conda image to reduce time
FROM continuumio/miniconda3:latest AS compile-image
# Specify py version
ENV PYTHON_VERSION=3.8
# Install apt libs - copied from https://github.com/huggingface/accelerate/blob/main/docker/accelerate-gpu/Dockerfile
RUN apt-get update && \
apt-get install -y curl git wget software-properties-common git-lfs && \
apt-get clean && \
rm -rf /var/lib/apt/lists*
# Install audio-related libraries
RUN apt-get update && \
apt install -y ffmpeg
RUN apt install -y libsndfile1-dev
RUN git lfs install
# Create our conda env - copied from https://github.com/huggingface/accelerate/blob/main/docker/accelerate-gpu/Dockerfile
RUN conda create --name peft python=${PYTHON_VERSION} ipython jupyter pip
RUN python3 -m pip install --no-cache-dir --upgrade pip
# Below is copied from https://github.com/huggingface/accelerate/blob/main/docker/accelerate-gpu/Dockerfile
# We don't install pytorch here yet since CUDA isn't available
# instead we use the direct torch wheel
ENV PATH /opt/conda/envs/peft/bin:$PATH
# Activate our bash shell
RUN chsh -s /bin/bash
SHELL ["/bin/bash", "-c"]
# Stage 2
FROM nvidia/cuda:12.2.2-devel-ubuntu22.04 AS build-image
COPY --from=compile-image /opt/conda /opt/conda
ENV PATH /opt/conda/bin:$PATH
RUN chsh -s /bin/bash
SHELL ["/bin/bash", "-c"]
# Install apt libs
RUN apt-get update && \
apt-get install -y curl git wget && \
apt-get clean && \
rm -rf /var/lib/apt/lists*
# Activate the conda env and install transformers + accelerate from latest pypi
# Also clone BNB and build it from source.
RUN source activate peft && \
python3 -m pip install -U --no-cache-dir \
librosa \
"soundfile>=0.12.1" \
scipy \
transformers \
accelerate \
peft \
optimum \
auto-gptq && \
git clone https://github.com/TimDettmers/bitsandbytes && cd bitsandbytes && \
CUDA_VERSION=121 make cuda12x && \
python setup.py develop && \
pip freeze | grep bitsandbytes
RUN echo "source activate peft" >> ~/.profile
# Activate the virtualenv
CMD ["/bin/bash"]

View File

@ -0,0 +1,67 @@
# Builds GPU docker image of PyTorch
# Uses multi-staged approach to reduce size
# Stage 1
# Use base conda image to reduce time
FROM continuumio/miniconda3:latest AS compile-image
# Specify py version
ENV PYTHON_VERSION=3.8
# Install apt libs - copied from https://github.com/huggingface/accelerate/blob/main/docker/accelerate-gpu/Dockerfile
RUN apt-get update && \
apt-get install -y curl git wget software-properties-common git-lfs && \
apt-get clean && \
rm -rf /var/lib/apt/lists*
# Install audio-related libraries
RUN apt-get update && \
apt install -y ffmpeg
RUN apt install -y libsndfile1-dev
RUN git lfs install
# Create our conda env - copied from https://github.com/huggingface/accelerate/blob/main/docker/accelerate-gpu/Dockerfile
RUN conda create --name peft python=${PYTHON_VERSION} ipython jupyter pip
RUN python3 -m pip install --no-cache-dir --upgrade pip
# Below is copied from https://github.com/huggingface/accelerate/blob/main/docker/accelerate-gpu/Dockerfile
# We don't install pytorch here yet since CUDA isn't available
# instead we use the direct torch wheel
ENV PATH /opt/conda/envs/peft/bin:$PATH
# Activate our bash shell
RUN chsh -s /bin/bash
SHELL ["/bin/bash", "-c"]
# Stage 2
FROM nvidia/cuda:12.2.2-devel-ubuntu22.04 AS build-image
COPY --from=compile-image /opt/conda /opt/conda
ENV PATH /opt/conda/bin:$PATH
RUN chsh -s /bin/bash
SHELL ["/bin/bash", "-c"]
# Install apt libs
RUN apt-get update && \
apt-get install -y curl git wget && \
apt-get clean && \
rm -rf /var/lib/apt/lists*
# Activate the conda env and install transformers + accelerate from source
# Also clone BNB and build it from source.
RUN source activate peft && \
python3 -m pip install -U --no-cache-dir \
librosa \
"soundfile>=0.12.1" \
scipy \
git+https://github.com/huggingface/transformers \
git+https://github.com/huggingface/accelerate \
peft[test]@git+https://github.com/huggingface/peft \
optimum \
auto-gptq && \
git clone https://github.com/TimDettmers/bitsandbytes && cd bitsandbytes && \
CUDA_VERSION=121 make cuda12x && \
python setup.py develop && \
pip freeze | grep bitsandbytes
RUN echo "source activate peft" >> ~/.profile
# Activate the virtualenv
CMD ["/bin/bash"]

View File

@ -15,6 +15,7 @@ RUN apt-get update && \
RUN apt-get update && \
apt install -y ffmpeg
RUN apt install -y libsndfile1-dev
RUN git lfs install
# Create our conda env - copied from https://github.com/huggingface/accelerate/blob/main/docker/accelerate-gpu/Dockerfile
@ -28,27 +29,37 @@ ENV PATH /opt/conda/envs/peft/bin:$PATH
# Activate our bash shell
RUN chsh -s /bin/bash
SHELL ["/bin/bash", "-c"]
# Activate the conda env and install transformers + accelerate from source
RUN source activate peft && \
python3 -m pip install --no-cache-dir \
git+https://github.com/huggingface/transformers \
git+https://github.com/huggingface/accelerate \
git+https://github.com/huggingface/peft#egg=peft[test]
RUN python3 -m pip install --no-cache-dir bitsandbytes
# Stage 2
FROM nvidia/cuda:11.2.2-cudnn8-devel-ubuntu20.04 AS build-image
FROM nvidia/cuda:12.2.2-devel-ubuntu22.04 AS build-image
COPY --from=compile-image /opt/conda /opt/conda
ENV PATH /opt/conda/bin:$PATH
RUN chsh -s /bin/bash
SHELL ["/bin/bash", "-c"]
RUN source activate peft && \
python3 -m pip install --no-cache-dir bitsandbytes optimum auto-gptq
# Install apt libs
RUN apt-get update && \
apt-get install -y curl git wget && \
apt-get clean && \
rm -rf /var/lib/apt/lists*
# Activate the conda env and install transformers + accelerate from source
RUN source activate peft && \
python3 -m pip install -U --no-cache-dir \
librosa \
"soundfile>=0.12.1" \
scipy \
git+https://github.com/huggingface/transformers \
git+https://github.com/huggingface/accelerate \
peft[test]@git+https://github.com/huggingface/peft
RUN source activate peft && \
pip freeze | grep transformers
RUN echo "source activate peft" >> ~/.profile
# Activate the virtualenv
CMD ["/bin/bash"]
CMD ["/bin/bash"]

View File

@ -33,7 +33,7 @@ pip install git+https://github.com/huggingface/doc-builder
**NOTE**
You only need to generate the documentation to inspect it locally (if you're planning changes and want to
check how they look before committing for instance). You don't have to commit the built documentation.
check how they look before committing for instance). You don't have to commit to the built documentation.
---
@ -46,7 +46,7 @@ typing the following command:
doc-builder build peft docs/source/ --build_dir ~/tmp/test-build
```
You can adapt the `--build_dir` to set any temporary folder that you prefer. This command will create it and generate
You can adapt the `--build_dir` to set any temporary folder you prefer. This command will create it and generate
the MDX files that will be rendered as the documentation on the main website. You can inspect them in your favorite
Markdown editor.
@ -124,7 +124,7 @@ Adding a new tutorial or section is done in two steps:
- Link that file in `./source/_toctree.yml` on the correct toc-tree.
Make sure to put your new file under the proper section. It's unlikely to go in the first section (*Get Started*), so
depending on the intended targets (beginners, more advanced users, or researchers) it should go in sections two, three, or
depending on the intended targets (beginners, more advanced users, or researchers) it should go into sections two, three, or
four.
### Writing source documentation
@ -188,7 +188,7 @@ then its documentation should look like this:
```
Note that we always omit the "defaults to \`None\`" when None is the default for any argument. Also note that even
if the first line describing your argument type and its default gets long, you can't break it on several lines. You can
if the first line describing your argument type and its default gets long, you can't break it into several lines. You can
however write as many lines as you want in the indented description (see the example above with `input_ids`).
#### Writing a multi-line code block
@ -234,13 +234,13 @@ We have an automatic script running with the `make style` comment that will make
- the docstrings fully take advantage of the line width
- all code examples are formatted using black, like the code of the Transformers library
This script may have some weird failures if you made a syntax mistake or if you uncover a bug. Therefore, it's
This script may have some weird failures if you make a syntax mistake or if you uncover a bug. Therefore, it's
recommended to commit your changes before running `make style`, so you can revert the changes done by that script
easily.
## Writing documentation examples
The syntax for Example docstrings can look as follows:
The syntax, for example, docstrings can look as follows:
```
Example:
@ -264,4 +264,4 @@ is to be used in inference and also include the expected (ideally sensible)
output.
Often, readers will try out the example before even going through the function
or class definitions. Therefore, it is of utmost importance that the example
works as expected.
works as expected.

View File

@ -7,24 +7,48 @@
- local: install
title: Installation
- title: Task guides
- title: Tutorial
sections:
- local: task_guides/image_classification_lora
title: Image classification using LoRA
- local: task_guides/seq2seq-prefix-tuning
title: Prefix tuning for conditional generation
- local: task_guides/clm-prompt-tuning
title: Prompt tuning for causal language modeling
- local: task_guides/semantic_segmentation_lora
title: Semantic segmentation using LoRA
- local: task_guides/ptuning-seq-classification
title: P-tuning for sequence classification
- local: task_guides/dreambooth_lora
title: Dreambooth fine-tuning with LoRA
- local: task_guides/token-classification-lora
title: LoRA for token classification
- local: task_guides/int8-asr
title: int8 training for automatic speech recognition
- local: tutorial/peft_model_config
title: Configurations and models
- local: tutorial/peft_integrations
title: Integrations
- title: PEFT method guides
sections:
- local: task_guides/prompt_based_methods
title: Prompt-based methods
- title: LoRA
sections:
- local: task_guides/image_classification_lora
title: Image classification
- local: task_guides/semantic_segmentation_lora
title: Semantic segmentation
- local: task_guides/token-classification-lora
title: Token classification
- local: task_guides/semantic-similarity-lora
title: Semantic similarity
- local: task_guides/int8-asr
title: int8 training for automatic speech recognition
- local: task_guides/dreambooth_lora
title: DreamBooth
- title: Developer guides
sections:
- local: developer_guides/quantization
title: Quantization
- local: developer_guides/lora
title: LoRA
- local: developer_guides/custom_models
title: Working with custom models
- local: developer_guides/low_level_api
title: PEFT low level API
- local: developer_guides/mixed_models
title: Mixing different adapter types
- local: developer_guides/contributing
title: Contributing to PEFT
- local: developer_guides/troubleshooting
title: Troubleshooting
- title: 🤗 Accelerate integrations
sections:
@ -35,16 +59,51 @@
- title: Conceptual guides
sections:
- local: conceptual_guides/lora
title: LoRA
- local: conceptual_guides/adapter
title: Adapters
- local: conceptual_guides/prompting
title: Prompting
title: Soft prompts
- local: conceptual_guides/ia3
title: IA3
- sections:
- sections:
- local: package_reference/auto_class
title: AutoPeftModel
- local: package_reference/peft_model
title: PEFT model
- local: package_reference/peft_types
title: PEFT types
- local: package_reference/config
title: Configuration
- local: package_reference/tuners
title: Tuner
title: Main classes
- sections:
- local: package_reference/adalora
title: AdaLoRA
- local: package_reference/ia3
title: IA3
- local: package_reference/llama_adapter
title: Llama-Adapter
- local: package_reference/loha
title: LoHa
- local: package_reference/lokr
title: LoKr
- local: package_reference/lora
title: LoRA
- local: package_reference/adapter_utils
title: LyCORIS
- local: package_reference/multitask_prompt_tuning
title: Multitask Prompt Tuning
- local: package_reference/oft
title: OFT
- local: package_reference/p_tuning
title: P-tuning
- local: package_reference/prefix_tuning
title: Prefix tuning
- local: package_reference/prompt_tuning
title: Prompt tuning
title: Adapters
title: API reference
- title: Reference
sections:
- local: package_reference/peft_model
title: PEFT model
- local: package_reference/config
title: Configuration
- local: package_reference/tuners
title: Tuners

View File

@ -1,3 +1,7 @@
<!--⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# DeepSpeed
[DeepSpeed](https://www.deepspeed.ai/) is a library designed for speed and scale for distributed training of large models with billions of parameters. At its core is the Zero Redundancy Optimizer (ZeRO) that shards optimizer states (ZeRO-1), gradients (ZeRO-2), and parameters (ZeRO-3) across data parallel processes. This drastically reduces memory usage, allowing you to scale your training to billion parameter models. To unlock even more memory efficiency, ZeRO-Offload reduces GPU compute and memory by leveraging CPU resources during optimization.

View File

@ -1,3 +1,7 @@
<!--⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Fully Sharded Data Parallel
[Fully sharded data parallel](https://pytorch.org/docs/stable/fsdp.html) (FSDP) is developed for distributed training of large pretrained models up to 1T parameters. FSDP achieves this by sharding the model parameters, gradients, and optimizer states across data parallel processes and it can also offload sharded model parameters to a CPU. The memory efficiency afforded by FSDP allows you to scale training to larger batch or model sizes.

View File

@ -0,0 +1,89 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Adapters
Adapter-based methods add extra trainable parameters after the attention and fully-connected layers of a frozen pretrained model to reduce memory-usage and speed up training. The method varies depending on the adapter, it could simply be an extra added layer or it could be expressing the weight updates ∆W as a low-rank decomposition of the weight matrix. Either way, the adapters are typically small but demonstrate comparable performance to a fully finetuned model and enable training larger models with fewer resources.
This guide will give you a brief overview of the adapter methods supported by PEFT (if you're interested in learning more details about a specific method, take a look at the linked paper).
## Low-Rank Adaptation (LoRA)
<Tip>
LoRA is one of the most popular PEFT methods and a good starting point if you're just getting started with PEFT. It was originally developed for large language models but it is a tremendously popular training method for diffusion models because of its efficiency and effectiveness.
</Tip>
As mentioned briefly earlier, [LoRA](https://hf.co/papers/2106.09685) is a technique that accelerates finetuning large models while consuming less memory.
LoRA represents the weight updates ∆W with two smaller matrices (called *update matrices*) through low-rank decomposition. These new matrices can be trained to adapt to the new data while keeping the overall number of parameters low. The original weight matrix remains frozen and doesn't receive any further updates. To produce the final results, the original and extra adapted weights are combined. You could also merge the adapter weights with the base model to eliminate inference latency.
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/lora_animated.gif"/>
</div>
This approach has a number of advantages:
* LoRA makes finetuning more efficient by drastically reducing the number of trainable parameters.
* The original pretrained weights are kept frozen, which means you can have multiple lightweight and portable LoRA models for various downstream tasks built on top of them.
* LoRA is orthogonal to other parameter-efficient methods and can be combined with many of them.
* Performance of models finetuned using LoRA is comparable to the performance of fully finetuned models.
In principle, LoRA can be applied to any subset of weight matrices in a neural network to reduce the number of trainable parameters. However, for simplicity and further parameter efficiency, LoRA is typically only applied to the attention blocks in Transformer models. The resulting number of trainable parameters in a LoRA model depends on the size of the update matrices, which is determined mainly by the rank `r` and the shape of the original weight matrix.
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/lora.png"/>
</div>
<small><a href="https://hf.co/papers/2103.10385">Navigating Text-To-Image Customization: From LyCORIS Fine-Tuning to Model Evaluation</a></small>
## Low-Rank Hadamard Product (LoHa)
Low-rank decomposition can impact performance because the weight updates are limited to the low-rank space, which can constrain a model's expressiveness. However, you don't necessarily want to use a larger rank because it increases the number of trainable parameters. To address this, [LoHa](https://huggingface.co/papers/2108.06098) (a method originally developed for computer vision) was applied to diffusion models where the ability to generate diverse images is an important consideration. LoHa should also work with general model types, but the embedding layers aren't currently implemented in PEFT.
LoHa uses the [Hadamard product](https://en.wikipedia.org/wiki/Hadamard_product_(matrices)) (element-wise product) instead of the matrix product. ∆W is represented by four smaller matrices instead of two - like in LoRA - and each pair of these low-rank matrices are combined with the Hadamard product. As a result, ∆W can have the same number of trainable parameters but a higher rank and expressivity.
## Low-Rank Kronecker Product (LoKr)
[LoKr](https://hf.co/papers/2309.14859) is very similar to LoRA and LoHa, and it is also mainly applied to diffusion models, though you could also use it with other model types. LoKr replaces the matrix product with the [Kronecker product](https://en.wikipedia.org/wiki/Kronecker_product) instead. The Kronecker product decomposition creates a block matrix which preserves the rank of the original weight matrix. Another benefit of the Kronecker product is that it can be vectorized by stacking the matrix columns. This can speed up the process because you're avoiding fully reconstructing ∆W.
## Orthogonal Finetuning (OFT)
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/oft.png"/>
</div>
<small><a href="https://hf.co/papers/2306.07280">Controlling Text-to-Image Diffusion by Orthogonal Finetuning</a></small>
[OFT](https://hf.co/papers/2306.07280) is a method that primarily focuses on preserving a pretrained model's generative performance in the finetuned model. It tries to maintain the same cosine similarity (hyperspherical energy) between all pairwise neurons in a layer because this better captures the semantic information among neurons. This means OFT is more capable at preserving the subject and it is better for controllable generation (similar to [ControlNet](https://huggingface.co/docs/diffusers/using-diffusers/controlnet)).
OFT preserves the hyperspherical energy by learning an orthogonal transformation for neurons to keep the cosine similarity between them unchanged. In practice, this means taking the matrix product of an orthogonal matrix with the pretrained weight matrix. However, to be parameter-efficient, the orthogonal matrix is represented as a block-diagonal matrix with rank `r` blocks. Whereas LoRA reduces the number of trainable parameters with low-rank structures, OFT reduces the number of trainable parameters with a sparse block-diagonal matrix structure.
## Adaptive Low-Rank Adaptation (AdaLoRA)
[AdaLoRA](https://hf.co/papers/2303.10512) manages the parameter budget introduced from LoRA by allocating more parameters - in other words, a higher rank `r` - for important weight matrices that are better adapted for a task and pruning less important ones. The rank is controlled by a method similar to singular value decomposition (SVD). The ∆W is parameterized with two orthogonal matrices and a diagonal matrix which contains singular values. This parametrization method avoids iteratively applying SVD which is computationally expensive. Based on this method, the rank of ∆W is adjusted according to an importance score. ∆W is divided into triplets and each triplet is scored according to its contribution to model performance. Triplets with low importance scores are pruned and triplets with high importance scores are kept for finetuning.
## Llama-Adapter
[Llama-Adapter](https://hf.co/papers/2303.16199) is a method for adapting Llama into a instruction-following model. To help adapt the model for instruction-following, the adapter is trained with a 52K instruction-output dataset.
A set of of learnable adaption prompts are prefixed to the input instruction tokens. These are inserted into the upper layers of the model because it is better to learn with the higher-level semantics of the pretrained model. The instruction-output tokens prefixed to the input guide the adaption prompt to generate a contextual response.
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/llama-adapter.png"/>
</div>
<small><a href="https://hf.co/papers/2303.16199">LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention</a></small>
To avoid adding noise to the tokens, the adapter uses zero-initialized attention. On top of this, the adapter adds a learnable gating factor (initialized with zeros) to progressively add information to the model during training. This prevents overwhelming the model's pretrained knowledge with the newly learned instructions.

View File

@ -0,0 +1,68 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# IA3
This conceptual guide gives a brief overview of [IA3](https://arxiv.org/abs/2205.05638), a parameter-efficient fine tuning technique that is
intended to improve over [LoRA](./lora).
To make fine-tuning more efficient, IA3 (Infused Adapter by Inhibiting and Amplifying Inner Activations)
rescales inner activations with learned vectors. These learned vectors are injected in the attention and feedforward modules
in a typical transformer-based architecture. These learned vectors are the only trainable parameters during fine-tuning, and thus the original
weights remain frozen. Dealing with learned vectors (as opposed to learned low-rank updates to a weight matrix like LoRA)
keeps the number of trainable parameters much smaller.
Being similar to LoRA, IA3 carries many of the same advantages:
* IA3 makes fine-tuning more efficient by drastically reducing the number of trainable parameters. (For T0, an IA3 model only has about 0.01% trainable parameters, while even LoRA has > 0.1%)
* The original pre-trained weights are kept frozen, which means you can have multiple lightweight and portable IA3 models for various downstream tasks built on top of them.
* Performance of models fine-tuned using IA3 is comparable to the performance of fully fine-tuned models.
* IA3 does not add any inference latency because adapter weights can be merged with the base model.
In principle, IA3 can be applied to any subset of weight matrices in a neural network to reduce the number of trainable
parameters. Following the authors' implementation, IA3 weights are added to the key, value and feedforward layers
of a Transformer model. To be specific, for transformer models, IA3 weights are added to the outputs of key and value layers, and to the input of the second feedforward layer
in each transformer block.
Given the target layers for injecting IA3 parameters, the number of trainable parameters
can be determined based on the size of the weight matrices.
## Common IA3 parameters in PEFT
As with other methods supported by PEFT, to fine-tune a model using IA3, you need to:
1. Instantiate a base model.
2. Create a configuration (`IA3Config`) where you define IA3-specific parameters.
3. Wrap the base model with `get_peft_model()` to get a trainable `PeftModel`.
4. Train the `PeftModel` as you normally would train the base model.
`IA3Config` allows you to control how IA3 is applied to the base model through the following parameters:
- `target_modules`: The modules (for example, attention blocks) to apply the IA3 vectors.
- `feedforward_modules`: The list of modules to be treated as feedforward layers in `target_modules`. While learned vectors are multiplied with
the output activation for attention blocks, the vectors are multiplied with the input for classic feedforward layers. Note that `feedforward_modules` must be a subset of `target_modules`.
- `modules_to_save`: List of modules apart from IA3 layers to be set as trainable and saved in the final checkpoint. These typically include model's custom head that is randomly initialized for the fine-tuning task.
## Example Usage
For the task of sequence classification, one can initialize the IA3 config for a Llama model as follows:
```py
peft_config = IA3Config(
task_type=TaskType.SEQ_CLS, target_modules=["k_proj", "v_proj", "down_proj"], feedforward_modules=["down_proj"]
)
```

View File

@ -1,61 +0,0 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# LoRA
This conceptual guide gives a brief overview of [LoRA](https://arxiv.org/abs/2106.09685), a technique that accelerates
the fine-tuning of large models while consuming less memory.
To make fine-tuning more efficient, LoRA's approach is to represent the weight updates with two smaller
matrices (called **update matrices**) through low-rank decomposition. These new matrices can be trained to adapt to the
new data while keeping the overall number of changes low. The original weight matrix remains frozen and doesn't receive
any further adjustments. To produce the final results, both the original and the adapted weights are combined.
This approach has a number of advantages:
* LoRA makes fine-tuning more efficient by drastically reducing the number of trainable parameters.
* The original pre-trained weights are kept frozen, which means you can have multiple lightweight and portable LoRA models for various downstream tasks built on top of them.
* LoRA is orthogonal to many other parameter-efficient methods and can be combined with many of them.
* Performance of models fine-tuned using LoRA is comparable to the performance of fully fine-tuned models.
* LoRA does not add any inference latency because adapter weights can be merged with the base model.
In principle, LoRA can be applied to any subset of weight matrices in a neural network to reduce the number of trainable
parameters. However, for simplicity and further parameter efficiency, in Transformer models LoRA is typically applied to
attention blocks only. The resulting number of trainable parameters in a LoRA model depends on the size of the low-rank
update matrices, which is determined mainly by the rank `r` and the shape of the original weight matrix.
## Common LoRA parameters in PEFT
As with other methods supported by PEFT, to fine-tune a model using LoRA, you need to:
1. Instantiate a base model.
2. Create a configuration (`LoraConfig`) where you define LoRA-specific parameters.
3. Wrap the base model with `get_peft_model()` to get a trainable `PeftModel`.
4. Train the `PeftModel` as you normally would train the base model.
`LoraConfig` allows you to control how LoRA is applied to the base model through the following parameters:
- `r`: the rank of the update matrices, expressed in `int`. Lower rank results in smaller update matrices with fewer trainable parameters.
- `target_modules`: The modules (for example, attention blocks) to apply the LoRA update matrices.
- `alpha`: LoRA scaling factor.
- `bias`: Specifies if the `bias` parameters should be trained. Can be `'none'`, `'all'` or `'lora_only'`.
- `modules_to_save`: List of modules apart from LoRA layers to be set as trainable and saved in the final checkpoint. These typically include model's custom head that is randomly initialized for the fine-tuning task.
## LoRA examples
For an example of LoRA method application to various downstream tasks, please refer to the following guides:
* [Image classification using LoRA](../task_guides/image_classification_lora)
* [Semantic segmentation](../task_guides/semantic_segmentation_lora)
While the original paper focuses on language models, the technique can be applied to any dense layers in deep learning
models. As such, you can leverage this technique with diffusion models. See [Dreambooth fine-tuning with LoRA](../task_guides/task_guides/dreambooth_lora) task guide for an example.

View File

@ -1,4 +1,8 @@
# Prompting
<!--⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Soft prompts
Training large pretrained language models is very time-consuming and compute-intensive. As they continue to grow in size, there is increasing interest in more efficient training methods such as *prompting*. Prompting primes a frozen pretrained model for a specific downstream task by including a text prompt that describes the task or even demonstrates an example of the task. With prompting, you can avoid fully training a separate model for each downstream task, and use the same frozen pretrained model instead. This is a lot easier because you can use the same model for several different tasks, and it is significantly more efficient to train and store a smaller set of prompt parameters than to train all the model's parameters.
@ -7,16 +11,16 @@ There are two categories of prompting methods:
- hard prompts are manually handcrafted text prompts with discrete input tokens; the downside is that it requires a lot of effort to create a good prompt
- soft prompts are learnable tensors concatenated with the input embeddings that can be optimized to a dataset; the downside is that they aren't human readable because you aren't matching these "virtual tokens" to the embeddings of a real word
This conceptual guide provides a brief overview of the soft prompt methods included in 🤗 PEFT: prompt tuning, prefix tuning, and P-tuning.
This conceptual guide provides a brief overview of the soft prompt methods included in 🤗 PEFT: prompt tuning, prefix tuning, P-tuning, and multitask prompt tuning.
## Prompt tuning
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/prompt-tuning.png"/>
</div>
<small>Only train and store a significantly smaller set of task-specific prompt parameters <a href="https://arxiv.org/abs/2104.08691">(image source)</a>.</small>
<small>Only train and store a significantly smaller set of task-specific prompt parameters <a href="https://hf.co/papers/2104.08691">(image source)</a>.</small>
Prompt tuning was developed for text classification tasks on T5 models, and all downstream tasks are cast as a text generation task. For example, sequence classification usually assigns a single class label to a sequence of text. By casting it as a text generation task, the tokens that make up the class label are *generated*. Prompts are added to the input as a series of tokens. Typically, the model parameters are fixed which means the prompt tokens are also fixed by the model parameters.
[Prompt tuning](https://hf.co/papers/2104.08691) was developed for text classification tasks on T5 models, and all downstream tasks are cast as a text generation task. For example, sequence classification usually assigns a single class label to a sequence of text. By casting it as a text generation task, the tokens that make up the class label are *generated*. Prompts are added to the input as a series of tokens. Typically, the model parameters are fixed which means the prompt tokens are also fixed by the model parameters.
The key idea behind prompt tuning is that prompt tokens have their own parameters that are updated independently. This means you can keep the pretrained model's parameters frozen, and only update the gradients of the prompt token embeddings. The results are comparable to the traditional method of training the entire model, and prompt tuning performance scales as model size increases.
@ -27,9 +31,9 @@ Take a look at [Prompt tuning for causal language modeling](../task_guides/clm-p
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/prefix-tuning.png"/>
</div>
<small>Optimize the prefix parameters for each task <a href="https://arxiv.org/abs/2101.00190">(image source)</a>.</small>
<small>Optimize the prefix parameters for each task <a href="https://hf.co/papers/2101.00190">(image source)</a>.</small>
Prefix tuning was designed for natural language generation (NLG) tasks on GPT models. It is very similar to prompt tuning; prefix tuning also prepends a sequence of task-specific vectors to the input that can be trained and updated while keeping the rest of the pretrained model's parameters frozen.
[Prefix tuning](https://hf.co/papers/2101.00190) was designed for natural language generation (NLG) tasks on GPT models. It is very similar to prompt tuning; prefix tuning also prepends a sequence of task-specific vectors to the input that can be trained and updated while keeping the rest of the pretrained model's parameters frozen.
The main difference is that the prefix parameters are inserted in **all** of the model layers, whereas prompt tuning only adds the prompt parameters to the model input embeddings. The prefix parameters are also optimized by a separate feed-forward network (FFN) instead of training directly on the soft prompts because it causes instability and hurts performance. The FFN is discarded after updating the soft prompts.
@ -42,9 +46,9 @@ Take a look at [Prefix tuning for conditional generation](../task_guides/seq2seq
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/p-tuning.png"/>
</div>
<small>Prompt tokens can be inserted anywhere in the input sequence, and they are optimized by a prompt encoder <a href="https://arxiv.org/abs/2103.10385">(image source)</a>.</small>
<small>Prompt tokens can be inserted anywhere in the input sequence, and they are optimized by a prompt encoder <a href="https://hf.co/papers/2103.10385">(image source)</a>.</small>
P-tuning is designed for natural language understanding (NLU) tasks and all language models.
[P-tuning](https://hf.co/papers/2103.10385) is designed for natural language understanding (NLU) tasks and all language models.
It is another variation of a soft prompt method; P-tuning also adds a trainable embedding tensor that can be optimized to find better prompts, and it uses a prompt encoder (a bidirectional long-short term memory network or LSTM) to optimize the prompt parameters. Unlike prefix tuning though:
- the prompt tokens can be inserted anywhere in the input sequence, and it isn't restricted to only the beginning
@ -53,4 +57,21 @@ It is another variation of a soft prompt method; P-tuning also adds a trainable
The results suggest that P-tuning is more efficient than manually crafting prompts, and it enables GPT-like models to compete with BERT-like models on NLU tasks.
Take a look at [P-tuning for sequence classification](../task_guides/ptuning-seq-classification) for a step-by-step guide on how to train a model with P-tuning.
Take a look at [P-tuning for sequence classification](../task_guides/ptuning-seq-classification) for a step-by-step guide on how to train a model with P-tuning.
## Multitask prompt tuning
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/mpt.png"/>
</div>
<small><a href="https://hf.co/papers/2103.10385">Multitask prompt tuning enables parameter-efficient transfer learning</a>.</small>
[Multitask prompt tuning (MPT)](https://hf.co/papers/2103.10385) learns a single prompt from data for multiple task types that can be shared for different target tasks. Other existing approaches learn a separate soft prompt for each task that need to be retrieved or aggregated for adaptation to target tasks. MPT consists of two stages:
1. source training - for each task, its soft prompt is decomposed into task-specific vectors. The task-specific vectors are multiplied together to form another matrix W, and the Hadamard product is used between W and a shared prompt matrix P to generate a task-specific prompt matrix. The task-specific prompts are distilled into a single prompt matrix that is shared across all tasks. This prompt is trained with multitask training.
2. target adaptation - to adapt the single prompt for a target task, a target prompt is initialized and expressed as the Hadamard product of the shared prompt matrix and the task-specific low-rank prompt matrix.
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/mpt-decomposition.png"/>
</div>
<small><a href="https://hf.co/papers/2103.10385">Prompt decomposition</a>.</small>

View File

@ -0,0 +1,93 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Contributing to PEFT
We are happy to accept contributions to PEFT. If you plan to contribute, please read this document to make the process as smooth as possible.
## Installation
The installation instructions can be found [here](https://huggingface.co/docs/peft/install). If you want to provide code contributions to PEFT, you should choose the "source" installation method.
If you are new to creating a pull request, follow [these instructions from GitHub](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request).
## Running tests and code quality checks
Regardless of the type of contribution (unless its only about the docs), you should run tests and code quality checks before creating a PR to ensure that your contribution doesnt break anything and follows the standards of the project.
We provide a Makefile to facilitate those steps. Run the code below for the unit test:
```sh
make test
```
Run one of the following to either check or check and fix code quality and style:
```sh
make quality # just check
make style # check and fix
```
Running all the tests can take a couple of minutes. Therefore, during development, it can be useful to run only those tests specific to your change:
```sh
pytest tests/ -k <name-of-test>
```
This should finish much quicker and allow faster iteration. Before creating the PR, however, please still run the whole test suite, as some changes can inadvertently break tests that at first glance are unrelated.
If your change is specific to a hardware setting (e.g. it requires CUDA), take a look at `tests/test_gpu_examples.py` and `tests/test_common_gpu.py` maybe it makes sense to add a test there. If your change could have an effect on saving and loading models, please run the tests with the `--regression` flag to trigger regression tests.
It can happen that while youre working on your PR, the underlying code base changes due to other changes being merged. If that happens especially when there is a merge conflict please update your branch to be on the latest changes. This can be a merge or a rebase, whatever you prefer. We will squash and merge the PR once its ready.
## PR description
When opening the PR, please provide a nice description of the change you provide. If it relates to other issues or PRs, please reference them. Providing a good description will not only help the reviewers review your code better and faster, it can also later be used (as a basis) for the commit message, which helps with long term maintenance of the project.
If your code makes some non-trivial changes, it can also be a good idea to add comments to the code to explain those changes. For example, if you had to iterate on your implementation multiple times because the most obvious way didnt work, its a good indication that a code comment is needed.
## Providing a bugfix
Please give a description of the circumstances that lead to the bug. If there is an existing issue, please link to it (e.g. “Resolves #12345”).
Ideally, when a bugfix is provided, it should be accompanied by a test for this bug. The test should fail with the current code and pass with the bugfix. Add a comment to the test that references the issue or PR. Without such a test, it is difficult to prevent regressions in the future.
## Adding a new fine-tuning method
New parameter-efficient fine-tuning methods are developed all the time. If you would like to add a new, promising method to PEFT, please follow these steps.
**Requirements**
1. Please add a link to the source (usually a paper) of the method.
2. Some evidence should be provided that there is general interest in using the method. We will not add new methods that are freshly published but without evidence that there is demand for it.
3. Ideally, we want to not only add the implementation of the new method, but also examples (notebooks, scripts), documentation, and an extensive test suite that proves that the method works with a variety of tasks. However, this can be very daunting. Therefore, it is also acceptable to only provide the implementation and at least one working example. Documentation and tests can be added in follow up PRs.
**Steps**
Before you start to implement the new method, please open an issue on GitHub with your proposal. That way, the maintainers can give you some early feedback.
When implementing the method, it makes sense to look for existing implementations that already exist as a guide. Moreover, when you structure your code, please take inspiration from the other PEFT methods. For example, if your method is similar to LoRA, it makes sense to structure your code similarly or even re-use some functions or classes where it makes sense (but dont overdo it, some code duplication is okay).
Once you have something that seems to be working, dont hesitate to create a draft PR, even if its not in a mergeable state yet. The maintainers will be happy to give you feedback and guidance along the way.
## Adding other features
It is best if you first open an issue on GitHub with a proposal to add the new feature. That way, you can discuss with the maintainers if it makes sense to add the feature before spending too much time on implementing it.
New features should generally be accompanied by tests and documentation or examples. Without the latter, users will have a hard time discovering your cool new feature.
Changes to the code should be implemented in a backward-compatible way. For example, existing code should continue to work the same way after the feature is merged.

View File

@ -0,0 +1,242 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Working with custom models
Some fine-tuning techniques, such as prompt tuning, are specific to language models. That means in 🤗 PEFT, it is
assumed a 🤗 Transformers model is being used. However, other fine-tuning techniques - like
[LoRA](../conceptual_guides/lora) - are not restricted to specific model types.
In this guide, we will see how LoRA can be applied to a multilayer perceptron, a computer vision model from the [timm](https://huggingface.co/docs/timm/index) library, or a new 🤗 Transformers architectures.
## Multilayer perceptron
Let's assume that we want to fine-tune a multilayer perceptron with LoRA. Here is the definition:
```python
from torch import nn
class MLP(nn.Module):
def __init__(self, num_units_hidden=2000):
super().__init__()
self.seq = nn.Sequential(
nn.Linear(20, num_units_hidden),
nn.ReLU(),
nn.Linear(num_units_hidden, num_units_hidden),
nn.ReLU(),
nn.Linear(num_units_hidden, 2),
nn.LogSoftmax(dim=-1),
)
def forward(self, X):
return self.seq(X)
```
This is a straightforward multilayer perceptron with an input layer, a hidden layer, and an output layer.
<Tip>
For this toy example, we choose an exceedingly large number of hidden units to highlight the efficiency gains
from PEFT, but those gains are in line with more realistic examples.
</Tip>
There are a few linear layers in this model that could be tuned with LoRA. When working with common 🤗 Transformers
models, PEFT will know which layers to apply LoRA to, but in this case, it is up to us as a user to choose the layers.
To determine the names of the layers to tune:
```python
print([(n, type(m)) for n, m in MLP().named_modules()])
```
This should print:
```
[('', __main__.MLP),
('seq', torch.nn.modules.container.Sequential),
('seq.0', torch.nn.modules.linear.Linear),
('seq.1', torch.nn.modules.activation.ReLU),
('seq.2', torch.nn.modules.linear.Linear),
('seq.3', torch.nn.modules.activation.ReLU),
('seq.4', torch.nn.modules.linear.Linear),
('seq.5', torch.nn.modules.activation.LogSoftmax)]
```
Let's say we want to apply LoRA to the input layer and to the hidden layer, those are `'seq.0'` and `'seq.2'`. Moreover,
let's assume we want to update the output layer without LoRA, that would be `'seq.4'`. The corresponding config would
be:
```python
from peft import LoraConfig
config = LoraConfig(
target_modules=["seq.0", "seq.2"],
modules_to_save=["seq.4"],
)
```
With that, we can create our PEFT model and check the fraction of parameters trained:
```python
from peft import get_peft_model
model = MLP()
peft_model = get_peft_model(model, config)
peft_model.print_trainable_parameters()
# prints trainable params: 56,164 || all params: 4,100,164 || trainable%: 1.369798866581922
```
Finally, we can use any training framework we like, or write our own fit loop, to train the `peft_model`.
For a complete example, check out [this notebook](https://github.com/huggingface/peft/blob/main/examples/multilayer_perceptron/multilayer_perceptron_lora.ipynb).
## timm models
The [timm](https://huggingface.co/docs/timm/index) library contains a large number of pretrained computer vision models.
Those can also be fine-tuned with PEFT. Let's check out how this works in practice.
To start, ensure that timm is installed in the Python environment:
```bash
python -m pip install -U timm
```
Next we load a timm model for an image classification task:
```python
import timm
num_classes = ...
model_id = "timm/poolformer_m36.sail_in1k"
model = timm.create_model(model_id, pretrained=True, num_classes=num_classes)
```
Again, we need to make a decision about what layers to apply LoRA to. Since LoRA supports 2D conv layers, and since
those are a major building block of this model, we should apply LoRA to the 2D conv layers. To identify the names of
those layers, let's look at all the layer names:
```python
print([(n, type(m)) for n, m in MLP().named_modules()])
```
This will print a very long list, we'll only show the first few:
```
[('', timm.models.metaformer.MetaFormer),
('stem', timm.models.metaformer.Stem),
('stem.conv', torch.nn.modules.conv.Conv2d),
('stem.norm', torch.nn.modules.linear.Identity),
('stages', torch.nn.modules.container.Sequential),
('stages.0', timm.models.metaformer.MetaFormerStage),
('stages.0.downsample', torch.nn.modules.linear.Identity),
('stages.0.blocks', torch.nn.modules.container.Sequential),
('stages.0.blocks.0', timm.models.metaformer.MetaFormerBlock),
('stages.0.blocks.0.norm1', timm.layers.norm.GroupNorm1),
('stages.0.blocks.0.token_mixer', timm.models.metaformer.Pooling),
('stages.0.blocks.0.token_mixer.pool', torch.nn.modules.pooling.AvgPool2d),
('stages.0.blocks.0.drop_path1', torch.nn.modules.linear.Identity),
('stages.0.blocks.0.layer_scale1', timm.models.metaformer.Scale),
('stages.0.blocks.0.res_scale1', torch.nn.modules.linear.Identity),
('stages.0.blocks.0.norm2', timm.layers.norm.GroupNorm1),
('stages.0.blocks.0.mlp', timm.layers.mlp.Mlp),
('stages.0.blocks.0.mlp.fc1', torch.nn.modules.conv.Conv2d),
('stages.0.blocks.0.mlp.act', torch.nn.modules.activation.GELU),
('stages.0.blocks.0.mlp.drop1', torch.nn.modules.dropout.Dropout),
('stages.0.blocks.0.mlp.norm', torch.nn.modules.linear.Identity),
('stages.0.blocks.0.mlp.fc2', torch.nn.modules.conv.Conv2d),
('stages.0.blocks.0.mlp.drop2', torch.nn.modules.dropout.Dropout),
('stages.0.blocks.0.drop_path2', torch.nn.modules.linear.Identity),
('stages.0.blocks.0.layer_scale2', timm.models.metaformer.Scale),
('stages.0.blocks.0.res_scale2', torch.nn.modules.linear.Identity),
('stages.0.blocks.1', timm.models.metaformer.MetaFormerBlock),
('stages.0.blocks.1.norm1', timm.layers.norm.GroupNorm1),
('stages.0.blocks.1.token_mixer', timm.models.metaformer.Pooling),
('stages.0.blocks.1.token_mixer.pool', torch.nn.modules.pooling.AvgPool2d),
...
('head.global_pool.flatten', torch.nn.modules.linear.Identity),
('head.norm', timm.layers.norm.LayerNorm2d),
('head.flatten', torch.nn.modules.flatten.Flatten),
('head.drop', torch.nn.modules.linear.Identity),
('head.fc', torch.nn.modules.linear.Linear)]
]
```
Upon closer inspection, we see that the 2D conv layers have names such as `"stages.0.blocks.0.mlp.fc1"` and
`"stages.0.blocks.0.mlp.fc2"`. How can we match those layer names specifically? You can write a [regular
expressions](https://docs.python.org/3/library/re.html) to match the layer names. For our case, the regex
`r".*\.mlp\.fc\d"` should do the job.
Furthermore, as in the first example, we should ensure that the output layer, in this case the classification head, is
also updated. Looking at the end of the list printed above, we can see that it's named `'head.fc'`. With that in mind,
here is our LoRA config:
```python
config = LoraConfig(target_modules=r".*\.mlp\.fc\d", modules_to_save=["head.fc"])
```
Then we only need to create the PEFT model by passing our base model and the config to `get_peft_model`:
```python
peft_model = get_peft_model(model, config)
peft_model.print_trainable_parameters()
# prints trainable params: 1,064,454 || all params: 56,467,974 || trainable%: 1.88505789139876
```
This shows us that we only need to train less than 2% of all parameters, which is a huge efficiency gain.
For a complete example, check out [this notebook](https://github.com/huggingface/peft/blob/main/examples/image_classification/image_classification_timm_peft_lora.ipynb).
## New transformers architectures
When new popular transformers architectures are released, we do our best to quickly add them to PEFT. If you come across a transformers model that is not supported out of the box, don't worry, it will most likely still work if the config is set correctly. Specifically, you have to identify the layers that should be adapted and set them correctly when initializing the corresponding config class, e.g. `LoraConfig`. Here are some tips to help with this.
As a first step, it is a good idea is to check the existing models for inspiration. You can find them inside of [constants.py](https://github.com/huggingface/peft/blob/main/src/peft/utils/constants.py) in the PEFT repository. Often, you'll find a similar architecture that uses the same names. For example, if the new model architecture is a variation of the "mistral" model and you want to apply LoRA, you can see that the entry for "mistral" in `TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING` contains `["q_proj", "v_proj"]`. This tells you that for "mistral" models, the `target_modules` for LoRA should be `["q_proj", "v_proj"]`:
```python
from peft import LoraConfig, get_peft_model
my_mistral_model = ...
config = LoraConfig(
target_modules=["q_proj", "v_proj"],
..., # other LoRA arguments
)
peft_model = get_peft_model(my_mistral_model, config)
```
If that doesn't help, check the existing modules in your model architecture with the `named_modules` method and try to identify the attention layers, especially the key, query, and value layers. Those will often have names such as `c_attn`, `query`, `q_proj`, etc. The key layer is not always adapted, and ideally, you should check whether including it results in better performance.
Additionally, linear layers are common targets to be adapted (e.g. in [QLoRA paper](https://arxiv.org/abs/2305.14314), authors suggest to adapt them as well). Their names will often contain the strings `fc` or `dense`.
If you want to add a new model to PEFT, please create an entry in [constants.py](https://github.com/huggingface/peft/blob/main/src/peft/utils/constants.py) and open a pull request on the [repository](https://github.com/huggingface/peft/pulls). Don't forget to update the [README](https://github.com/huggingface/peft#models-support-matrix) as well.
## Checking the result
When you think that you have correctly specified the `target_modules` and called `get_peft_model`, you can check the fraction of parameters that will be trainable like this:
```python
peft_model.print_trainable_parameters()
```
If this number is too low or high, check the model `repr` by printing the model. This will show you the names and type of all of all the layers in the model. Ensure that the intended layers, and only those, are replaced by adapter layers. For instance, for LoRA applied to `nn.Linear` layers, you should see that `lora.Linear` layers are being used.
To get a quick overview of all layers that were adapted, you can also use the the `targeted_module_names` attribute:
```python
print(peft_model.targeted_module_names)
```
This lists the names of each module that was actually adapted.

View File

@ -0,0 +1,190 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# LoRA
LoRA is low-rank decomposition method to reduce the number of trainable parameters which speeds up finetuning large models and uses less memory. In PEFT, using LoRA is as easy as setting up a [`LoraConfig`] and wrapping it with [`get_peft_model`] to create a trainable [`PeftModel`].
This guide explores in more detail other options and features for using LoRA.
## Initialization
The initialization of LoRA weights is controlled by the parameter `init_lora_weights` in [`LoraConfig`]. By default, PEFT initializes LoRA weights with Kaiming-uniform for weight A and zeros for weight B resulting in an identity transform (same as the reference [implementation](https://github.com/microsoft/LoRA)).
It is also possible to pass `init_lora_weights="gaussian"`. As the name suggests, this initializes weight A with a Gaussian distribution and zeros for weight B (this is how [Diffusers](https://huggingface.co/docs/diffusers/index) initializes LoRA weights).
```py
from peft import LoraConfig
config = LoraConfig(init_lora_weights="gaussian", ...)
```
There is also an option to set `init_lora_weights=False` which is useful for debugging and testing. This should be the only time you use this option. When choosing this option, the LoRA weights are initialized such that they do *not* result in an identity transform.
```py
from peft import LoraConfig
config = LoraConfig(init_lora_weights=False, ...)
```
### LoftQ
When quantizing the base model for QLoRA training, consider using the [LoftQ initialization](https://arxiv.org/abs/2310.08659), which has been shown to improve performance when training quantized models. The idea is that the LoRA weights are initialized such that the quantization error is minimized. If you're using LoftQ, *do not* quantize the base model. You should set up a [`LoftQConfig`] instead:
```python
from peft import LoftQConfig, LoraConfig, get_peft_model
base_model = AutoModelForCausalLM.from_pretrained(...) # don't quantize here
loftq_config = LoftQConfig(loftq_bits=4, ...) # set 4bit quantization
lora_config = LoraConfig(..., init_lora_weights="loftq", loftq_config=loftq_config)
peft_model = get_peft_model(base_model, lora_config)
```
<Tip>
Learn more about how PEFT works with quantization in the [Quantization](quantization) guide.
</Tip>
### Rank-stabilized LoRA
Another way to initialize [`LoraConfig`] is with the [rank-stabilized LoRA (rsLoRA)](https://huggingface.co/papers/2312.03732) method. The LoRA architecture scales each adapter during every forward pass by a fixed scalar which is set at initialization and depends on the rank `r`. The scalar is given by `lora_alpha/r` in the original implementation, but rsLoRA uses `lora_alpha/math.sqrt(r)` which stabilizes the adapters and increases the performance potential from using a higher `r`.
```py
from peft import LoraConfig
config = LoraConfig(use_rslora=True, ...)
```
## Merge adapters
While LoRA is significantly smaller and faster to train, you may encounter latency issues during inference due to separately loading the base model and the LoRA adapter. To eliminate latency, use the [`~LoraModel.merge_and_unload`] function to merge the adapter weights with the base model. This allows you to use the newly merged model as a standalone model. The [`~LoraModel.merge_and_unload`] function doesn't keep the adapter weights in memory.
```py
from transformers import AutoModelForCausalLM
from peft import PeftModel
base_model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
peft_model_id = "alignment-handbook/zephyr-7b-sft-lora"
model = PeftModel.from_pretrained(base_model, peft_model_id)
model.merge_and_unload()
```
If you need to keep a copy of the weights so you can unmerge the adapter later or delete and load different ones, you should use the [`~LoraModel.merge_adapter`] function instead. Now you have the option to use [`~LoraModel.unmerge_adapter`] to return the base model.
```py
from transformers import AutoModelForCausalLM
from peft import PeftModel
base_model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
peft_model_id = "alignment-handbook/zephyr-7b-sft-lora"
model = PeftModel.from_pretrained(base_model, peft_model_id)
model.merge_adapter()
# unmerge the LoRA layers from the base model
model.unmerge_adapter()
```
The [`~LoraModel.add_weighted_adapter`] function is useful for merging multiple LoRAs into a new adapter based on a user provided weighting scheme in the `weights` parameter. Below is an end-to-end example.
First load the base model:
```python
from transformers import AutoModelForCausalLM
from peft import PeftModel
import torch
base_model = AutoModelForCausalLM.from_pretrained(
"mistralai/Mistral-7B-v0.1", torch_dtype=torch.float16, device_map="auto"
)
```
Then we load the first adapter:
```python
peft_model_id = "alignment-handbook/zephyr-7b-sft-lora"
model = PeftModel.from_pretrained(base_model, peft_model_id, adapter_name="sft")
```
Then load a different adapter and merge it with the first one:
```python
model.load_adapter("alignment-handbook/zephyr-7b-dpo-lora", adapter_name="dpo")
model.add_weighted_adapter(
adapters=["sft", "dpo"],
weights=[0.7, 0.3],
adapter_name="sft-dpo",
combination_type="linear"
)
```
<Tip>
There are several supported methods for `combination_type`. Refer to the [documentation](../package_reference/lora#peft.LoraModel.add_weighted_adapter) for more details. Note that "svd" as the `combination_type` is not supported when using `torch.float16` or `torch.bfloat16` as the datatype.
</Tip>
Now, perform inference:
```python
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")
prompt = "Hey, are you conscious? Can you talk to me?"
inputs = tokenizer(prompt, return_tensors="pt")
inputs = {k: v.to("cuda") for k, v in inputs.items()}
with torch.no_grad():
generate_ids = model.generate(**inputs, max_length=30)
outputs = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
print(outputs)
```
## Load adapters
Adapters can be loaded onto a pretrained model with [`~PeftModel.load_adapter`], which is useful for trying out different adapters whose weights aren't merged. Set the active adapter weights with the [`~LoraModel.set_adapter`] function.
```py
from transformers import AutoModelForCausalLM
from peft import PeftModel
base_model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
peft_model_id = "alignment-handbook/zephyr-7b-sft-lora"
model = PeftModel.from_pretrained(base_model, peft_model_id)
# load different adapter
model.load_adapter("alignment-handbook/zephyr-7b-dpo-lora", adapter_name="dpo")
# set adapter as active
model.set_adapter("dpo")
```
To return the base model, you could use [`~LoraModel.unload`] to unload all of the LoRA modules or [`~LoraModel.delete_adapter`] to delete the adapter entirely.
```py
# unload adapter
model.unload()
# delete adapter
model.delete_adapter("dpo")
```
## QLoRA-style training
The default LoRA settings in 🤗PEFT follow the [original paper](https://hf.co/papers/2106.09685) and add trainable weights to the query and value layers of each attention block. However, in [QLoRA](https://hf.co/papers/2305.14314), it was found that adding trainable weights to all the linear layers of a transformer model is beneficial to match full-finetuning performance. Since the list of modules to add will vary depending on the architecture, we provided a convenient shorthand : simple specify `target_modules='all-linear'` and let 🤗PEFT handle the rest:
```py
config = LoraConfig(target_modules="all-linear", ...) # adds LoRA to all linear layers like in QLoRA
```

View File

@ -0,0 +1,107 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# PEFT as a utility library
Let's cover in this section how you can leverage PEFT's low level API to inject trainable adapters into any `torch` module.
The development of this API has been motivated by the need for super users to not rely on modeling classes that are exposed in PEFT library and still be able to use adapter methods such as LoRA, IA3 and AdaLoRA.
## Supported tuner types
Currently the supported adapter types are the 'injectable' adapters, meaning adapters where an inplace modification of the model is sufficient to correctly perform the fine tuning. As such, only [LoRA](../conceptual_guides/lora), AdaLoRA and [IA3](../conceptual_guides/ia3) are currently supported in this API.
## `inject_adapter_in_model` method
To perform the adapter injection, simply use `inject_adapter_in_model` method that takes 3 arguments, the PEFT config and the model itself and an optional adapter name. You can also attach multiple adapters in the model if you call multiple times `inject_adapter_in_model` with different adapter names.
Below is a basic example usage of how to inject LoRA adapters into the submodule `linear` of the module `DummyModel`.
```python
import torch
from peft import inject_adapter_in_model, LoraConfig
class DummyModel(torch.nn.Module):
def __init__(self):
super().__init__()
self.embedding = torch.nn.Embedding(10, 10)
self.linear = torch.nn.Linear(10, 10)
self.lm_head = torch.nn.Linear(10, 10)
def forward(self, input_ids):
x = self.embedding(input_ids)
x = self.linear(x)
x = self.lm_head(x)
return x
lora_config = LoraConfig(
lora_alpha=16,
lora_dropout=0.1,
r=64,
bias="none",
target_modules=["linear"],
)
model = DummyModel()
model = inject_adapter_in_model(lora_config, model)
dummy_inputs = torch.LongTensor([[0, 1, 2, 3, 4, 5, 6, 7]])
dummy_outputs = model(dummy_inputs)
```
If you print the model, you will notice that the adapters have been correctly injected into the model
```bash
DummyModel(
(embedding): Embedding(10, 10)
(linear): Linear(
in_features=10, out_features=10, bias=True
(lora_dropout): ModuleDict(
(default): Dropout(p=0.1, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=10, out_features=64, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=64, out_features=10, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
)
(lm_head): Linear(in_features=10, out_features=10, bias=True)
)
```
Note that it should be up to users to properly take care of saving the adapters (in case they want to save adapters only), as `model.state_dict()` will return the full state dict of the model.
In case you want to extract the adapters state dict you can use the `get_peft_model_state_dict` method:
```python
from peft import get_peft_model_state_dict
peft_state_dict = get_peft_model_state_dict(model)
print(peft_state_dict)
```
## Pros and cons
When to use this API and when to not use it? Let's discuss in this section the pros and cons
Pros:
- The model gets modified in-place, meaning the model will preserve all its original attributes and methods
- Works for any torch module, and any modality (vision, text, multi-modal)
Cons:
- You need to manually writing Hugging Face `from_pretrained` and `save_pretrained` utility methods if you want to easily save / load adapters from the Hugging Face Hub.
- You cannot use any of the utility method provided by `PeftModel` such as disabling adapters, merging adapters, etc.

View File

@ -0,0 +1,39 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Working with mixed adapter types
Normally, it is not possible to mix different adapter types in 🤗 PEFT. For example, even though it is possible to create a PEFT model that has two different LoRA adapters (that can have different config options), it is not possible to combine a LoRA adapter with a LoHa adapter. However, by using a mixed model, this works as long as the adapter types are compatible.
## Loading different adapter types into a PEFT model
To load different adapter types into a PEFT model, proceed the same as if you were loading two adapters of the same type, but use `PeftMixedModel` instead of `PeftModel`:
```py
from peft import PeftMixedModel
base_model = ... # load the base model, e.g. from transformers
# load first adapter, which will be called "default"
peft_model = PeftMixedModel.from_pretrained(base_model, <path_to_adapter1>)
peft_model.load_adapter(<path_to_adapter2>, adapter_name="other")
peft_model.set_adapter(["default", "other"])
```
The last line is necessary if you want to activate both adapters, otherwise, only the first adapter would be active. Of course, you can add more different adapters by calling `add_adapter` repeatedly.
Currently, the main purpose of mixed adapter types is to combine trained adapters for inference. Although it is technically also possible to train a mixed adapter model, this has not been tested and is not recommended.
## Tips
- Not all adapter types can be combined. See `peft.tuners.mixed.COMPATIBLE_TUNER_TYPES` for a list of compatible types. An error will be raised if you are trying to combine incompatible adapter types.
- It is possible to mix multiple adapters of the same type. This can be useful to combine adapters with very different configs.
- If you want to combine a lot of different adapters, it is most performant to add the same types of adapters consecutively. E.g., add LoRA1, LoRA2, LoHa1, LoHa2 in this order, instead of LoRA1, LoHa1, LoRA2, LoHa2. The order will make a difference for the outcome in most cases, but since no order is better a priori, it is best to choose the order that is most performant.

View File

@ -0,0 +1,146 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Quantization
Quantization represents data with fewer bits, making it a useful technique for reducing memory-usage and accelerating inference especially when it comes to large language models (LLMs). There are several ways to quantize a model including:
* optimizing which model weights are quantized with the [AWQ](https://hf.co/papers/2306.00978) algorithm
* independently quantizing each row of a weight matrix with the [GPTQ](https://hf.co/papers/2210.17323) algorithm
* quantizing to 8-bit and 4-bit precision with the [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) library
However, after a model is quantized it isn't typically further trained for downstream tasks because training can be unstable due to the lower precision of the weights and activations. But since PEFT methods only add *extra* trainable parameters, this allows you to train a quantized model with a PEFT adapter on top! Combining quantization with PEFT can be a good strategy for training even the largest models on a single GPU. For example, [QLoRA](https://hf.co/papers/2305.14314) is a method that quantizes a model to 4-bits and then trains it with LoRA. This method allows you to finetune a 65B parameter model on a single 48GB GPU!
In this guide, you'll see how to quantize a model to 4-bits and train it with LoRA.
## Quantize a model
[bitsandbytes](https://github.com/TimDettmers/bitsandbytes) is a quantization library with a Transformers integration. With this integration, you can quantize a model to 8 or 4-bits and enable many other options by configuring the [`~transformers.BitsAndBytesConfig`] class. For example, you can:
* set `load_in_4bit=True` to quantize the model to 4-bits when you load it
* set `bnb_4bit_quant_type="nf4"` to use a special 4-bit data type for weights initialized from a normal distribution
* set `bnb_4bit_use_double_quant=True` to use a nested quantization scheme to quantize the already quantized weights
* set `bnb_4bit_compute_dtype=torch.bfloat16` to use bfloat16 for faster computation
```py
import torch
from transformers import BitsAndBytesConfig
config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.bfloat16,
)
```
Pass the `config` to the [`~transformers.AutoModelForCausalLM.from_pretrained`] method.
```py
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1", quantization_config=config)
```
Next, you should call the [`~peft.utils.prepare_model_for_kbit_training`] function to preprocess the quantized model for traininng.
```py
from peft import prepare_model_for_kbit_training
model = prepare_model_for_kbit_training(model)
```
Now that the quantized model is ready, let's set up a configuration.
## LoraConfig
Create a [`LoraConfig`] with the following parameters (or choose your own):
```py
from peft import LoraConfig
config = LoraConfig(
r=16,
lora_alpha=8,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
lora_dropout=0.05
bias="none",
task_type="CAUSAL_LM"
)
```
Then use the [`get_peft_model`] function to create a [`PeftModel`] from the quantized model and configuration.
```py
from peft import get_peft_model
model = get_peft_model(model, config)
```
You're all set for training with whichever training method you prefer!
### LoftQ initialization
[LoftQ](https://hf.co/papers/2310.08659) initializes LoRA weights such that the quantization error is minimized, and it can improve performance when training quantized models. To get started, create a [`LoftQConfig`] and set `loftq_bits=4` for 4-bit quantization.
<Tip warning={true}>
LoftQ initialization does not require quantizing the base model with the `load_in_4bits` parameter in the [`~transformers.AutoModelForCausalLM.from_pretrained`] method! Learn more about LoftQ initialization in the [Initialization options](../developer_guides/lora#initialization) section.
Note: You can only perform LoftQ initialization on a GPU.
</Tip>
```py
from transformers import AutoModelForCausalLM
from peft import LoftQConfig, LoraConfig, get_peft_model
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1", device_map="auto")
loftq_config = LoftQConfig(loftq_bits=4)
```
Now pass the `loftq_config` to the [`LoraConfig`] to enable LoftQ initialization, and create a [`PeftModel`] for training.
```py
lora_config = LoraConfig(
init_lora_weights="loftq",
loftq_config=loftq_config,
r=16,
lora_alpha=8,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM"
)
model = get_peft_model(model, lora_config)
```
### QLoRA-style training
QLoRA adds trainable weights to all the linear layers in the transformer architecture. Since the attribute names for these linear layers can vary across architectures, we provide a convenient flag `'all-linear'` for this setting:
```py
config = LoraConfig(target_modules="all-linear", ...) # adds LoRA to all linear layers like in QLoRA
```
## Next steps
If you're interested in learning more about quantization, the following may be helpful:
* Learn more about details about QLoRA and check out some benchmarks on its impact in the [Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA](https://huggingface.co/blog/4bit-transformers-bitsandbytes) blog post.
* Read more about different quantization schemes in the Transformers [Quantization](https://hf.co/docs/transformers/main/quantization) guide.

View File

@ -0,0 +1,139 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Troubleshooting
If you encounter any issue when using PEFT, please check the following list of common issues and their solutions.
## Examples don't work
Examples often rely on the most recent package versions, so please ensure they're up-to-date. In particular, check the version of the following packages:
- `peft`
- `transformers`
- `accelerate`
- `torch`
In general, you can update the package version by running this command inside your Python environment:
```bash
python -m pip install -U <package_name>
```
Installing PEFT from source is useful for keeping up with the latest developments:
```bash
python -m pip install git+https://github.com/huggingface/peft
```
## Training errors
### Getting: ValueError: Attempting to unscale FP16 gradients
This error probably occurred because the model was loaded with `torch_dtype=torch.float16` and then used in an automatic mixed precision (AMP) context, e.g. by setting `fp16=True` in the `Trainer` class from 🤗 Transformers. The reason is that when using AMP, trainable weights should never use fp16. To make this work without having to load the whole model in FP32, add the following snippet to your code:
```python
peft_model = get_peft_model(...)
# add this:
for param in model.parameters():
if param.requires_grad:
param.data = param.data.float()
# proceed as usual
trainer = Trainer(model=peft_model, fp16=True, ...)
trainer.train()
```
Alternatively, you can use the utility function `cast_mixed_precision_params` from peft as shown below:
```python
from peft import cast_mixed_precision_params
peft_model = get_peft_model(...)
cast_mixed_precision_params(peft_model, dtype=torch.float16)
# proceed as usual
trainer = Trainer(model=peft_model, fp16=True, ...)
trainer.train()
```
## Bad results from a loaded PEFT model
There can be several reasons for getting a poor result from a loaded PEFT model, which are listed below. If you're still unable to troubleshoot the problem, see if anyone else had a similar [issue](https://github.com/huggingface/peft/issues) on GitHub, and if you can't find any, open a new issue.
When opening an issue, it helps a lot if you provide a minimal code example that reproduces the issue. Also, please report if the loaded model performs at the same level as the model did before fine-tuning, if it performs at a random level, or if it is only slightly worse than expected. This information helps us identify the problem more quickly.
### Random deviations
If your model outputs are not exactly the same as previous runs, there could be an issue with random elements. For example:
1. please ensure it is in `.eval()` mode, which is important, for instance, if the model uses dropout
2. if you use [`~transformers.GenerationMixin.generate`] on a language model, there could be random sampling, so obtaining the same result requires setting a random seed
3. if you used quantization and merged the weights, small deviations are expected due to rounding errors
### Incorrectly loaded model
Please ensure that you load the model correctly. A common error is trying to load a _trained_ model with `get_peft_model`, which is incorrect. Instead, the loading code should look like this:
```python
from peft import PeftModel, PeftConfig
base_model = ... # to load the base model, use the same code as when you trained it
config = PeftConfig.from_pretrained(peft_model_id)
peft_model = PeftModel.from_pretrained(base_model, peft_model_id)
```
### Randomly initialized layers
For some tasks, it is important to correctly configure `modules_to_save` in the config to account for randomly initialized layers.
As an example, this is necessary if you use LoRA to fine-tune a language model for sequence classification because 🤗 Transformers adds a randomly initialized classification head on top of the model. If you do not add this layer to `modules_to_save`, the classification head won't be saved. The next time you load the model, you'll get a _different_ randomly initialized classification head, resulting in completely different results.
In PEFT, we try to correctly guess the `modules_to_save` if you provide the `task_type` argument in the config. This should work for transformers models that follow the standard naming scheme. It is always a good idea to double check though because we can't guarantee all models follow the naming scheme.
When you load a transformers model that has randomly initialized layers, you should see a warning along the lines of:
```
Some weights of <MODEL> were not initialized from the model checkpoint at <ID> and are newly initialized: [<LAYER_NAMES>].
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
```
The mentioned layers should be added to `modules_to_save` in the config to avoid the described problem.
### Extending the vocabulary
For many language fine-tuning tasks, extending the model's vocabulary is necessary since new tokens are being introduced. This requires extending the embedding layer to account for the new tokens and also storing the embedding layer in addition to the adapter weights when saving the adapter.
Save the embedding layer by adding it to the `target_modules` of the config. The embedding layer name must follow the standard naming scheme from Transformers. For example, the Mistral config could look like this:
```python
config = LoraConfig(..., target_modules=["embed_tokens", "lm_head", "q_proj", "v_proj"])
```
Once added to `target_modules`, PEFT automatically stores the embedding layer when saving the adapter if the model has the [`~transformers.PreTrainedModel.get_input_embeddings`] and [`~transformers.PreTrainedModel.get_output_embeddings`]. This is generally the case for Transformers models.
If the model's embedding layer doesn't follow the Transformer's naming scheme, you can still save it by manually passing `save_embedding_layers=True` when saving the adapter:
```python
model = get_peft_model(...)
# train the model
model.save_adapter("my_adapter", save_embedding_layers=True)
```
For inference, load the base model first and resize it the same way you did before you trained the model. After you've resized the base model, you can load the PEFT checkpoint.
For a complete example, please check out [this notebook](https://github.com/huggingface/peft/blob/main/examples/causal_language_modeling/peft_lora_clm_with_additional_tokens.ipynb).

49
docs/source/index.md Normal file
View File

@ -0,0 +1,49 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# PEFT
🤗 PEFT (Parameter-Efficient Fine-Tuning) is a library for efficiently adapting large pretrained models to various downstream applications without fine-tuning all of a model's parameters because it is prohibitively costly. PEFT methods only fine-tune a small number of (extra) model parameters - significantly decreasing computational and storage costs - while yielding performance comparable to a fully fine-tuned model. This makes it more accessible to train and store large language models (LLMs) on consumer hardware.
PEFT is integrated with the Transformers, Diffusers, and Accelerate libraries to provide a faster and easier way to load, train, and use large models for inference.
<div class="mt-10">
<div class="w-full flex flex-col space-y-4 md:space-y-0 md:grid md:grid-cols-2 md:gap-y-4 md:gap-x-5">
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="quicktour"
><div class="w-full text-center bg-gradient-to-br from-blue-400 to-blue-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Get started</div>
<p class="text-gray-700">Start here if you're new to 🤗 PEFT to get an overview of the library's main features, and how to train a model with a PEFT method.</p>
</a>
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./task_guides/image_classification_lora"
><div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">How-to guides</div>
<p class="text-gray-700">Practical guides demonstrating how to apply various PEFT methods across different types of tasks like image classification, causal language modeling, automatic speech recognition, and more. Learn how to use 🤗 PEFT with the DeepSpeed and Fully Sharded Data Parallel scripts.</p>
</a>
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./conceptual_guides/lora"
><div class="w-full text-center bg-gradient-to-br from-pink-400 to-pink-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Conceptual guides</div>
<p class="text-gray-700">Get a better theoretical understanding of how LoRA and various soft prompting methods help reduce the number of trainable parameters to make training more efficient.</p>
</a>
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./package_reference/config"
><div class="w-full text-center bg-gradient-to-br from-purple-400 to-purple-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Reference</div>
<p class="text-gray-700">Technical descriptions of how 🤗 PEFT classes and methods work.</p>
</a>
</div>
</div>
<iframe
src="https://stevhliu-peft-methods.hf.space"
frameborder="0"
width="850"
height="620"
></iframe>

View File

@ -1,117 +0,0 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# PEFT
🤗 PEFT, or Parameter-Efficient Fine-Tuning (PEFT), is a library for efficiently adapting pre-trained language models (PLMs) to various downstream applications without fine-tuning all the model's parameters.
PEFT methods only fine-tune a small number of (extra) model parameters, significantly decreasing computational and storage costs because fine-tuning large-scale PLMs is prohibitively costly.
Recent state-of-the-art PEFT techniques achieve performance comparable to that of full fine-tuning.
PEFT is seamlessly integrated with 🤗 Accelerate for large-scale models leveraging DeepSpeed and [Big Model Inference](https://huggingface.co/docs/accelerate/usage_guides/big_modeling).
If you are new to PEFT, get started by reading the [Quicktour](quicktour) guide and conceptual guides for [LoRA](/conceptual_guides/lora) and [Prompting](/conceptual_guides/prompting) methods.
## Supported methods
1. LoRA: [LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS](https://arxiv.org/pdf/2106.09685.pdf)
2. Prefix Tuning: [Prefix-Tuning: Optimizing Continuous Prompts for Generation](https://aclanthology.org/2021.acl-long.353/), [P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks](https://arxiv.org/pdf/2110.07602.pdf)
3. P-Tuning: [GPT Understands, Too](https://arxiv.org/pdf/2103.10385.pdf)
4. Prompt Tuning: [The Power of Scale for Parameter-Efficient Prompt Tuning](https://arxiv.org/pdf/2104.08691.pdf)
5. AdaLoRA: [Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning](https://arxiv.org/abs/2303.10512)
6. [LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention](https://github.com/ZrrSkywalker/LLaMA-Adapter)
## Supported models
The tables provided below list the PEFT methods and models supported for each task. To apply a particular PEFT method for
a task, please refer to the corresponding Task guides.
### Causal Language Modeling
| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning |
|--------------| ---- | ---- | ---- | ---- |
| GPT-2 | ✅ | ✅ | ✅ | ✅ |
| Bloom | ✅ | ✅ | ✅ | ✅ |
| OPT | ✅ | ✅ | ✅ | ✅ |
| GPT-Neo | ✅ | ✅ | ✅ | ✅ |
| GPT-J | ✅ | ✅ | ✅ | ✅ |
| GPT-NeoX-20B | ✅ | ✅ | ✅ | ✅ |
| LLaMA | ✅ | ✅ | ✅ | ✅ |
| ChatGLM | ✅ | ✅ | ✅ | ✅ |
### Conditional Generation
| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning |
| --------- | ---- | ---- | ---- | ---- |
| T5 | ✅ | ✅ | ✅ | ✅ |
| BART | ✅ | ✅ | ✅ | ✅ |
### Sequence Classification
| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning |
| --------- | ---- | ---- | ---- | ---- |
| BERT | ✅ | ✅ | ✅ | ✅ |
| RoBERTa | ✅ | ✅ | ✅ | ✅ |
| GPT-2 | ✅ | ✅ | ✅ | ✅ |
| Bloom | ✅ | ✅ | ✅ | ✅ |
| OPT | ✅ | ✅ | ✅ | ✅ |
| GPT-Neo | ✅ | ✅ | ✅ | ✅ |
| GPT-J | ✅ | ✅ | ✅ | ✅ |
| Deberta | ✅ | | ✅ | ✅ |
| Deberta-v2 | ✅ | | ✅ | ✅ |
### Token Classification
| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning |
| --------- | ---- | ---- | ---- | ---- |
| BERT | ✅ | ✅ | | |
| RoBERTa | ✅ | ✅ | | |
| GPT-2 | ✅ | ✅ | | |
| Bloom | ✅ | ✅ | | |
| OPT | ✅ | ✅ | | |
| GPT-Neo | ✅ | ✅ | | |
| GPT-J | ✅ | ✅ | | |
| Deberta | ✅ | | | |
| Deberta-v2 | ✅ | | | |
### Text-to-Image Generation
| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning |
| --------- | ---- | ---- | ---- | ---- |
| Stable Diffusion | ✅ | | | |
### Image Classification
| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning |
| --------- | ---- | ---- | ---- | ---- |
| ViT | ✅ | | | |
| Swin | ✅ | | | |
### Image to text (Multi-modal models)
We have tested LoRA for [ViT](https://huggingface.co/docs/transformers/model_doc/vit) and [Swin](https://huggingface.co/docs/transformers/model_doc/swin) for fine-tuning on image classification.
However, it should be possible to use LoRA for any [ViT-based model](https://huggingface.co/models?pipeline_tag=image-classification&sort=downloads&search=vit) from 🤗 Transformers.
Check out the [Image classification](/task_guides/image_classification_lora) task guide to learn more. If you run into problems, please open an issue.
| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning |
| --------- | ---- | ---- | ---- | ---- |
| Blip-2 | ✅ | | | |
### Semantic Segmentation
As with image-to-text models, you should be able to apply LoRA to any of the [segmentation models](https://huggingface.co/models?pipeline_tag=image-segmentation&sort=downloads).
It's worth noting that we haven't tested this with every architecture yet. Therefore, if you come across any issues, kindly create an issue report.
| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning |
| --------- | ---- | ---- | ---- | ---- |
| SegFormer | ✅ | | | |

View File

@ -8,17 +8,21 @@ http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Installation
Before you start, you will need to setup your environment, install the appropriate packages, and configure 🤗 PEFT. 🤗 PEFT is tested on **Python 3.7+**.
Before you start, you will need to setup your environment, install the appropriate packages, and configure 🤗 PEFT. 🤗 PEFT is tested on **Python 3.8+**.
🤗 PEFT is available on pypi, as well as GitHub:
🤗 PEFT is available on PyPI, as well as GitHub:
## pip
## PyPI
To install 🤗 PEFT from pypi:
To install 🤗 PEFT from PyPI:
```bash
pip install peft

View File

@ -0,0 +1,31 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# AdaLoRA
[AdaLoRA](https://hf.co/papers/2303.10512) is a method for optimizing the number of trainable parameters to assign to weight matrices and layers, unlike LoRA, which distributes parameters evenly across all modules. More parameters are budgeted for important weight matrices and layers while less important ones receive fewer parameters.
The abstract from the paper is:
*Fine-tuning large pre-trained language models on downstream tasks has become an important paradigm in NLP. However, common practice fine-tunes all of the parameters in a pre-trained model, which becomes prohibitive when a large number of downstream tasks are present. Therefore, many fine-tuning methods are proposed to learn incremental updates of pre-trained weights in a parameter efficient way, e.g., low-rank increments. These methods often evenly distribute the budget of incremental updates across all pre-trained weight matrices, and overlook the varying importance of different weight parameters. As a consequence, the fine-tuning performance is suboptimal. To bridge this gap, we propose AdaLoRA, which adaptively allocates the parameter budget among weight matrices according to their importance score. In particular, AdaLoRA parameterizes the incremental updates in the form of singular value decomposition. Such a novel approach allows us to effectively prune the singular values of unimportant updates, which is essentially to reduce their parameter budget but circumvent intensive exact SVD computations. We conduct extensive experiments with several pre-trained models on natural language processing, question answering, and natural language generation to validate the effectiveness of AdaLoRA. Results demonstrate that AdaLoRA manifests notable improvement over baselines, especially in the low budget settings. Our code is publicly available at https://github.com/QingruZhang/AdaLoRA*.
## AdaLoraConfig
[[autodoc]] tuners.adalora.config.AdaLoraConfig
## AdaLoraModel
[[autodoc]] tuners.adalora.model.AdaLoraModel

View File

@ -0,0 +1,31 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# LyCORIS
[LyCORIS](https://hf.co/papers/2309.14859) (Lora beYond Conventional methods, Other Rank adaptation Implementations for Stable diffusion) are LoRA-like matrix decomposition adapters that modify the cross-attention layer of the UNet. The [LoHa](loha) and [LoKr](lokr) methods inherit from the `Lycoris` classes here.
## LycorisConfig
[[autodoc]] tuners.lycoris_utils.LycorisConfig
## LycorisLayer
[[autodoc]] tuners.lycoris_utils.LycorisLayer
## LycorisTuner
[[autodoc]] tuners.lycoris_utils.LycorisTuner

View File

@ -0,0 +1,48 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# AutoPeftModels
The `AutoPeftModel` classes loads the appropriate PEFT model for the task type by automatically inferring it from the configuration file. They are designed to quickly and easily load a PEFT model in a single line of code without having to worry about which exact model class you need or manually loading a [`PeftConfig`].
## AutoPeftModel
[[autodoc]] auto.AutoPeftModel
- from_pretrained
## AutoPeftModelForCausalLM
[[autodoc]] auto.AutoPeftModelForCausalLM
## AutoPeftModelForSeq2SeqLM
[[autodoc]] auto.AutoPeftModelForSeq2SeqLM
## AutoPeftModelForSequenceClassification
[[autodoc]] auto.AutoPeftModelForSequenceClassification
## AutoPeftModelForTokenClassification
[[autodoc]] auto.AutoPeftModelForTokenClassification
## AutoPeftModelForQuestionAnswering
[[autodoc]] auto.AutoPeftModelForQuestionAnswering
## AutoPeftModelForFeatureExtraction
[[autodoc]] auto.AutoPeftModelForFeatureExtraction

View File

@ -0,0 +1,22 @@
<!--⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Configuration
[`PeftConfigMixin`] is the base configuration class for storing the adapter configuration of a [`PeftModel`], and [`PromptLearningConfig`] is the base configuration class for soft prompt methods (p-tuning, prefix tuning, and prompt tuning). These base classes contain methods for saving and loading model configurations from the Hub, specifying the PEFT method to use, type of task to perform, and model configurations like number of layers and number of attention heads.
## PeftConfigMixin
[[autodoc]] config.PeftConfigMixin
- all
## PeftConfig
[[autodoc]] PeftConfig
- all
## PromptLearningConfig
[[autodoc]] PromptLearningConfig
- all

View File

@ -1,18 +0,0 @@
# Configuration
The configuration classes stores the configuration of a [`PeftModel`], PEFT adapter models, and the configurations of [`PrefixTuning`], [`PromptTuning`], and [`PromptEncoder`]. They contain methods for saving and loading model configurations from the Hub, specifying the PEFT method to use, type of task to perform, and model configurations like number of layers and number of attention heads.
## PeftConfigMixin
[[autodoc]] utils.config.PeftConfigMixin
- all
## PeftConfig
[[autodoc]] PeftConfig
- all
## PromptLearningConfig
[[autodoc]] PromptLearningConfig
- all

View File

@ -0,0 +1,31 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# IA3
Infused Adapter by Inhibiting and Amplifying Inner Activations, or [IA3](https://hf.co/papers/2205.05638), is a method that adds three learned vectors to rescale the keys and values of the self-attention and encoder-decoder attention layers, and the intermediate activation of the position-wise feed-forward network.
The abstract from the paper is:
*Few-shot in-context learning (ICL) enables pre-trained language models to perform a previously-unseen task without any gradient-based training by feeding a small number of training examples as part of the input. ICL incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made. Parameter-efficient fine-tuning (PEFT) (e.g. adapter modules, prompt tuning, sparse update methods, etc.) offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task. In this paper, we rigorously compare few-shot ICL and PEFT and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs. Along the way, we introduce a new PEFT method called (IA)^3 that scales activations by learned vectors, attaining stronger performance while only introducing a relatively tiny amount of new parameters. We also propose a simple recipe based on the T0 model called T-Few that can be applied to new tasks without task-specific tuning or modifications. We validate the effectiveness of T-Few on completely unseen tasks by applying it to the RAFT benchmark, attaining super-human performance for the first time and outperforming the state-of-the-art by 6% absolute. All of the code used in our experiments is publicly available*.
## IA3Config
[[autodoc]] tuners.ia3.config.IA3Config
## IA3Model
[[autodoc]] tuners.ia3.model.IA3Model

View File

@ -0,0 +1,31 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Llama-Adapter
[Llama-Adapter](https://hf.co/papers/2303.16199) is a PEFT method specifically designed for turning Llama into an instruction-following model. The Llama model is frozen and only a set of adaptation prompts prefixed to the input instruction tokens are learned. Since randomly initialized modules inserted into the model can cause the model to lose some of its existing knowledge, Llama-Adapter uses zero-initialized attention with zero gating to progressively add the instructional prompts to the model.
The abstract from the paper is:
*We present LLaMA-Adapter, a lightweight adaption method to efficiently fine-tune LLaMA into an instruction-following model. Using 52K self-instruct demonstrations, LLaMA-Adapter only introduces 1.2M learnable parameters upon the frozen LLaMA 7B model, and costs less than one hour for fine-tuning on 8 A100 GPUs. Specifically, we adopt a set of learnable adaption prompts, and prepend them to the input text tokens at higher transformer layers. Then, a zero-init attention mechanism with zero gating is proposed, which adaptively injects the new instructional cues into LLaMA, while effectively preserves its pre-trained knowledge. With efficient training, LLaMA-Adapter generates high-quality responses, comparable to Alpaca with fully fine-tuned 7B parameters. Furthermore, our approach can be simply extended to multi-modal input, e.g., images, for image-conditioned LLaMA, which achieves superior reasoning capacity on ScienceQA. We release our code at https://github.com/ZrrSkywalker/LLaMA-Adapter*.
## AdaptionPromptConfig
[[autodoc]] tuners.adaption_prompt.config.AdaptionPromptConfig
## AdaptionPromptModel
[[autodoc]] tuners.adaption_prompt.model.AdaptionPromptModel

View File

@ -0,0 +1,31 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# LoHa
Low-Rank Hadamard Product ([LoHa](https://huggingface.co/papers/2108.06098)), is similar to LoRA except it approximates the large weight matrix with more low-rank matrices and combines them with the Hadamard product. This method is even more parameter-efficient than LoRA and achieves comparable performance.
The abstract from the paper is:
*In this work, we propose a communication-efficient parameterization, FedPara, for federated learning (FL) to overcome the burdens on frequent model uploads and downloads. Our method re-parameterizes weight parameters of layers using low-rank weights followed by the Hadamard product. Compared to the conventional low-rank parameterization, our FedPara method is not restricted to low-rank constraints, and thereby it has a far larger capacity. This property enables to achieve comparable performance while requiring 3 to 10 times lower communication costs than the model with the original layers, which is not achievable by the traditional low-rank methods. The efficiency of our method can be further improved by combining with other efficient FL optimizers. In addition, we extend our method to a personalized FL application, pFedPara, which separates parameters into global and local ones. We show that pFedPara outperforms competing personalized FL methods with more than three times fewer parameters*.
## LoHaConfig
[[autodoc]] tuners.loha.config.LoHaConfig
## LoHaModel
[[autodoc]] tuners.loha.model.LoHaModel

View File

@ -0,0 +1,27 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# LoKr
Low-Rank Kronecker Product ([LoKr](https://hf.co/papers/2309.14859)), is a LoRA-variant method that approximates the large weight matrix with two low-rank matrices and combines them with the Kronecker product. LoKr also provides an optional third low-rank matrix to provide better control during fine-tuning.
## LoKrConfig
[[autodoc]] tuners.lokr.config.LoKrConfig
## LoKrModel
[[autodoc]] tuners.lokr.model.LoKrModel

View File

@ -0,0 +1,31 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# LoRA
Low-Rank Adaptation ([LoRA](https://huggingface.co/papers/2309.15223)) is a PEFT method that decomposes a large matrix into two smaller low-rank matrices in the attention layers. This drastically reduces the number of parameters that need to be fine-tuned.
The abstract from the paper is:
*We propose a neural language modeling system based on low-rank adaptation (LoRA) for speech recognition output rescoring. Although pretrained language models (LMs) like BERT have shown superior performance in second-pass rescoring, the high computational cost of scaling up the pretraining stage and adapting the pretrained models to specific domains limit their practical use in rescoring. Here we present a method based on low-rank decomposition to train a rescoring BERT model and adapt it to new domains using only a fraction (0.08%) of the pretrained parameters. These inserted matrices are optimized through a discriminative training objective along with a correlation-based regularization loss. The proposed low-rank adaptation Rescore-BERT (LoRB) architecture is evaluated on LibriSpeech and internal datasets with decreased training times by factors between 5.4 and 3.6.*.
## LoraConfig
[[autodoc]] tuners.lora.config.LoraConfig
## LoraModel
[[autodoc]] tuners.lora.model.LoraModel

View File

@ -0,0 +1,31 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Multitask Prompt Tuning
[Multitask Prompt Tuning](https://huggingface.co/papers/2303.02861) decomposes the soft prompts of each task into a single learned transferable prompt instead of a separate prompt for each task. The single learned prompt can be adapted for each task by multiplicative low rank updates.
The abstract from the paper is:
*Prompt tuning, in which a base pretrained model is adapted to each task via conditioning on learned prompt vectors, has emerged as a promising approach for efficiently adapting large language models to multiple downstream tasks. However, existing methods typically learn soft prompt vectors from scratch, and it has not been clear how to exploit the rich cross-task knowledge with prompt vectors in a multitask learning setting. We propose multitask prompt tuning (MPT), which first learns a single transferable prompt by distilling knowledge from multiple task-specific source prompts. We then learn multiplicative low rank updates to this shared prompt to efficiently adapt it to each downstream target task. Extensive experiments on 23 NLP datasets demonstrate that our proposed approach outperforms the state-of-the-art methods, including the full finetuning baseline in some cases, despite only tuning 0.035% as many task-specific parameters*.
## MultitaskPromptTuningConfig
[[autodoc]] tuners.multitask_prompt_tuning.config.MultitaskPromptTuningConfig
## MultitaskPromptEmbedding
[[autodoc]] tuners.multitask_prompt_tuning.model.MultitaskPromptEmbedding

View File

@ -0,0 +1,31 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# OFT
[Orthogonal Finetuning (OFT)](https://hf.co/papers/2306.07280) is a method developed for adapting text-to-image diffusion models. It works by reparameterizing the pretrained weight matrices with it's orthogonal matrix to preserve information in the pretrained model. To reduce the number of parameters, OFT introduces a block-diagonal structure in the orthogonal matrix.
The abstract from the paper is:
*Large text-to-image diffusion models have impressive capabilities in generating photorealistic images from text prompts. How to effectively guide or control these powerful models to perform different downstream tasks becomes an important open problem. To tackle this challenge, we introduce a principled finetuning method -- Orthogonal Finetuning (OFT), for adapting text-to-image diffusion models to downstream tasks. Unlike existing methods, OFT can provably preserve hyperspherical energy which characterizes the pairwise neuron relationship on the unit hypersphere. We find that this property is crucial for preserving the semantic generation ability of text-to-image diffusion models. To improve finetuning stability, we further propose Constrained Orthogonal Finetuning (COFT) which imposes an additional radius constraint to the hypersphere. Specifically, we consider two important finetuning text-to-image tasks: subject-driven generation where the goal is to generate subject-specific images given a few images of a subject and a text prompt, and controllable generation where the goal is to enable the model to take in additional control signals. We empirically show that our OFT framework outperforms existing methods in generation quality and convergence speed*.
## OFTConfig
[[autodoc]] tuners.oft.config.OFTConfig
## OFTModel
[[autodoc]] tuners.oft.model.OFTModel

View File

@ -0,0 +1,31 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# P-tuning
[P-tuning](https://hf.co/papers/2103.10385) adds trainable prompt embeddings to the input that is optimized by a prompt encoder to find a better prompt, eliminating the need to manually design prompts. The prompt tokens can be added anywhere in the input sequence, and p-tuning also introduces anchor tokens for improving performance.
The abstract from the paper is:
*While GPTs with traditional fine-tuning fail to achieve strong results on natural language understanding (NLU), we show that GPTs can be better than or comparable to similar-sized BERTs on NLU tasks with a novel method P-tuning -- which employs trainable continuous prompt embeddings. On the knowledge probing (LAMA) benchmark, the best GPT recovers 64\% (P@1) of world knowledge without any additional text provided during test time, which substantially improves the previous best by 20+ percentage points. On the SuperGlue benchmark, GPTs achieve comparable and sometimes better performance to similar-sized BERTs in supervised learning. Importantly, we find that P-tuning also improves BERTs' performance in both few-shot and supervised settings while largely reducing the need for prompt engineering. Consequently, P-tuning outperforms the state-of-the-art approaches on the few-shot SuperGlue benchmark.*.
## PromptEncoderConfig
[[autodoc]] tuners.p_tuning.config.PromptEncoderConfig
## PromptEncoder
[[autodoc]] tuners.p_tuning.model.PromptEncoder

View File

@ -1,6 +1,10 @@
<!--⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Models
[`PeftModel`] is the base model class for specifying the base Transformer model and configuration to apply a PEFT method to. The base `PeftModel` contains methods for loading and saving models from the Hub, and supports the [`PromptEncoder`] for prompt learning.
[`PeftModel`] is the base model class for specifying the base Transformer model and configuration to apply a PEFT method to. The base `PeftModel` contains methods for loading and saving models from the Hub.
## PeftModel
@ -34,3 +38,30 @@ A `PeftModel` for sequence-to-sequence language modeling.
[[autodoc]] PeftModelForSeq2SeqLM
- all
## PeftModelForQuestionAnswering
A `PeftModel` for question answering.
[[autodoc]] PeftModelForQuestionAnswering
- all
## PeftModelForFeatureExtraction
A `PeftModel` for getting extracting features/embeddings from transformer models.
[[autodoc]] PeftModelForFeatureExtraction
- all
## PeftMixedModel
A `PeftModel` for mixing different adapter types (e.g. LoRA and LoHa).
[[autodoc]] PeftMixedModel
- all
## Utilities
[[autodoc]] get_peft_model
[[autodoc]] utils.prepare_model_for_kbit_training

View File

@ -0,0 +1,27 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# PEFT types
[`PeftType`] includes the supported adapters in PEFT, and [`TaskType`] includes PEFT-supported tasks.
## PeftType
[[autodoc]] utils.peft_types.PeftType
## TaskType
[[autodoc]] utils.peft_types.TaskType

View File

@ -0,0 +1,31 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Prefix tuning
[Prefix tuning](https://hf.co/papers/2101.00190) prefixes a series of task-specific vectors to the input sequence that can be learned while keeping the pretrained model frozen. The prefix parameters are inserted in all of the model layers.
The abstract from the paper is:
*Fine-tuning is the de facto way to leverage large pretrained language models to perform downstream tasks. However, it modifies all the language model parameters and therefore necessitates storing a full copy for each task. In this paper, we propose prefix-tuning, a lightweight alternative to fine-tuning for natural language generation tasks, which keeps language model parameters frozen, but optimizes a small continuous task-specific vector (called the prefix). Prefix-tuning draws inspiration from prompting, allowing subsequent tokens to attend to this prefix as if it were "virtual tokens". We apply prefix-tuning to GPT-2 for table-to-text generation and to BART for summarization. We find that by learning only 0.1\% of the parameters, prefix-tuning obtains comparable performance in the full data setting, outperforms fine-tuning in low-data settings, and extrapolates better to examples with topics unseen during training*.
## PrefixTuningConfig
[[autodoc]] tuners.prefix_tuning.config.PrefixTuningConfig
## PrefixEncoder
[[autodoc]] tuners.prefix_tuning.model.PrefixEncoder

View File

@ -0,0 +1,31 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Prompt tuning
[Prompt tuning](https://hf.co/papers/2104.08691) adds task-specific prompts to the input, and these prompt parameters are updated independently of the pretrained model parameters which are frozen.
The abstract from the paper is:
*In this work, we explore "prompt tuning", a simple yet effective mechanism for learning "soft prompts" to condition frozen language models to perform specific downstream tasks. Unlike the discrete text prompts used by GPT-3, soft prompts are learned through backpropagation and can be tuned to incorporate signal from any number of labeled examples. Our end-to-end learned approach outperforms GPT-3's "few-shot" learning by a large margin. More remarkably, through ablations on model size using T5, we show that prompt tuning becomes more competitive with scale: as models exceed billions of parameters, our method "closes the gap" and matches the strong performance of model tuning (where all model weights are tuned). This finding is especially relevant in that large models are costly to share and serve, and the ability to reuse one frozen model for multiple downstream tasks can ease this burden. Our method can be seen as a simplification of the recently proposed "prefix tuning" of Li and Liang (2021), and we provide a comparison to this and other similar approaches. Finally, we show that conditioning a frozen model with soft prompts confers benefits in robustness to domain transfer, as compared to full model tuning*.
## PromptTuningConfig
[[autodoc]] tuners.prompt_tuning.config.PromptTuningConfig
## PromptEmbedding
[[autodoc]] tuners.prompt_tuning.model.PromptEmbedding

View File

@ -0,0 +1,27 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Tuners
A tuner (or adapter) is a module that can be plugged into a `torch.nn.Module`. [`BaseTuner`] base class for other tuners and provides shared methods and attributes for preparing an adapter configuration and replacing a target module with the adapter module. [`BaseTunerLayer`] is a base class for adapter layers. It offers methods and attributes for managing adapters such as activating and disabling adapters.
## BaseTuner
[[autodoc]] tuners.tuners_utils.BaseTuner
## BaseTunerLayer
[[autodoc]] tuners.tuners_utils.BaseTunerLayer

View File

@ -1,33 +0,0 @@
# Tuners
Each tuner (or PEFT method) has a configuration and model.
## LoRA
For finetuning a model with LoRA.
[[autodoc]] LoraConfig
[[autodoc]] LoraModel
[[autodoc]] tuners.lora.LoraLayer
[[autodoc]] tuners.lora.Linear
## P-tuning
[[autodoc]] tuners.p_tuning.PromptEncoderConfig
[[autodoc]] tuners.p_tuning.PromptEncoder
## Prefix tuning
[[autodoc]] tuners.prefix_tuning.PrefixTuningConfig
[[autodoc]] tuners.prefix_tuning.PrefixEncoder
## Prompt tuning
[[autodoc]] tuners.prompt_tuning.PromptTuningConfig
[[autodoc]] tuners.prompt_tuning.PromptEmbedding

170
docs/source/quicktour.md Normal file
View File

@ -0,0 +1,170 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Quicktour
PEFT offers parameter-efficient methods for finetuning large pretrained models. The traditional paradigm is to finetune all of a model's parameters for each downstream task, but this is becoming exceedingly costly and impractical because of the enormous number of parameters in models today. Instead, it is more efficient to train a smaller number of prompt parameters or use a reparametrization method like low-rank adaptation (LoRA) to reduce the number of trainable parameters.
This quicktour will show you PEFT's main features and how you can train or run inference on large models that would typically be inaccessible on consumer devices.
## Train
Each PEFT method is defined by a [`PeftConfig`] class that stores all the important parameters for building a [`PeftModel`]. For example, to train with LoRA, load and create a [`LoraConfig`] class and specify the following parameters:
- `task_type`: the task to train for (sequence-to-sequence language modeling in this case)
- `inference_mode`: whether you're using the model for inference or not
- `r`: the dimension of the low-rank matrices
- `lora_alpha`: the scaling factor for the low-rank matrices
- `lora_dropout`: the dropout probability of the LoRA layers
```python
from peft import LoraConfig, TaskType
peft_config = LoraConfig(task_type=TaskType.SEQ_2_SEQ_LM, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1)
```
<Tip>
See the [`LoraConfig`] reference for more details about other parameters you can adjust, such as the modules to target or the bias type.
</Tip>
Once the [`LoraConfig`] is setup, create a [`PeftModel`] with the [`get_peft_model`] function. It takes a base model - which you can load from the Transformers library - and the [`LoraConfig`] containing the parameters for how to configure a model for training with LoRA.
Load the base model you want to finetune.
```python
from transformers import AutoModelForSeq2SeqLM
model = AutoModelForSeq2SeqLM.from_pretrained("bigscience/mt0-large")
```
Wrap the base model and `peft_config` with the [`get_peft_model`] function to create a [`PeftModel`]. To get a sense of the number of trainable parameters in your model, use the [`print_trainable_parameters`] method.
```python
from peft import get_peft_model
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()
"output: trainable params: 2359296 || all params: 1231940608 || trainable%: 0.19151053100118282"
```
Out of [bigscience/mt0-large's](https://huggingface.co/bigscience/mt0-large) 1.2B parameters, you're only training 0.19% of them!
That is it 🎉! Now you can train the model with the Transformers [`~transformers.Trainer`], Accelerate, or any custom PyTorch training loop.
For example, to train with the [`~transformers.Trainer`] class, setup a [`~transformers.TrainingArguments`] class with some training hyperparameters.
```py
training_args = TrainingArguments(
output_dir="your-name/bigscience/mt0-large-lora",
learning_rate=1e-3,
per_device_train_batch_size=32,
per_device_eval_batch_size=32,
num_train_epochs=2,
weight_decay=0.01,
evaluation_strategy="epoch",
save_strategy="epoch",
load_best_model_at_end=True,
)
```
Pass the model, training arguments, dataset, tokenizer, and any other necessary component to the [`~transformers.Trainer`], and call [`~transformers.Trainer.train`] to start training.
```py
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_datasets["train"],
eval_dataset=tokenized_datasets["test"],
tokenizer=tokenizer,
data_collator=data_collator,
compute_metrics=compute_metrics,
)
trainer.train()
```
### Save model
After your model is finished training, you can save your model to a directory using the [`~transformers.PreTrainedModel.save_pretrained`] function.
```py
model.save_pretrained("output_dir")
```
You can also save your model to the Hub (make sure you're logged in to your Hugging Face account first) with the [`~transformers.PreTrainedModel.push_to_hub`] function.
```python
from huggingface_hub import notebook_login
notebook_login()
model.push_to_hub("your-name/bigscience/mt0-large-lora")
```
Both methods only save the extra PEFT weights that were trained, meaning it is super efficient to store, transfer, and load. For example, this [facebook/opt-350m](https://huggingface.co/ybelkada/opt-350m-lora) model trained with LoRA only contains two files: `adapter_config.json` and `adapter_model.safetensors`. The `adapter_model.safetensors` file is just 6.3MB!
<div class="flex flex-col justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/PEFT-hub-screenshot.png"/>
<figcaption class="text-center">The adapter weights for a opt-350m model stored on the Hub are only ~6MB compared to the full size of the model weights, which can be ~700MB.</figcaption>
</div>
## Inference
<Tip>
Take a look at the [AutoPeftModel](package_reference/auto_class) API reference for a complete list of available `AutoPeftModel` classes.
</Tip>
Easily load any PEFT-trained model for inference with the [`AutoPeftModel`] class and the [`~transformers.PreTrainedModel.from_pretrained`] method:
```py
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
import torch
model = AutoPeftModelForCausalLM.from_pretrained("ybelkada/opt-350m-lora")
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m")
model = model.to("cuda")
model.eval()
inputs = tokenizer("Preheat the oven to 350 degrees and place the cookie dough", return_tensors="pt")
outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=50)
print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0])
"Preheat the oven to 350 degrees and place the cookie dough in the center of the oven. In a large bowl, combine the flour, baking powder, baking soda, salt, and cinnamon. In a separate bowl, combine the egg yolks, sugar, and vanilla."
```
For other tasks that aren't explicitly supported with an `AutoPeftModelFor` class - such as automatic speech recognition - you can still use the base [`AutoPeftModel`] class to load a model for the task.
```py
from peft import AutoPeftModel
model = AutoPeftModel.from_pretrained("smangrul/openai-whisper-large-v2-LORA-colab")
```
## Next steps
Now that you've seen how to train a model with one of the PEFT methods, we encourage you to try out some of the other methods like prompt tuning. The steps are very similar to the ones shown in the quicktour:
1. prepare a [`PeftConfig`] for a PEFT method
2. use the [`get_peft_model`] method to create a [`PeftModel`] from the configuration and base model
Then you can train it however you like! To load a PEFT model for inference, you can use the [`AutoPeftModel`] class.
Feel free to also take a look at the task guides if you're interested in training a model with another PEFT method for a specific task such as semantic segmentation, multilingual automatic speech recognition, DreamBooth, token classification, and more.

View File

@ -1,111 +0,0 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Quicktour
🤗 PEFT contains parameter-efficient finetuning methods for training large pretrained models. The traditional paradigm is to finetune all of a model's parameters for each downstream task, but this is becoming exceedingly costly and impractical because of the enormous number of parameters in models today. Instead, it is more efficient to train a smaller number of prompt parameters or use a reparametrization method like low-rank adaptation (LoRA) to reduce the number of trainable parameters.
This quicktour will show you 🤗 PEFT's main features and help you train large pretrained models that would typically be inaccessible on consumer devices. You'll see how to train the 1.2B parameter [`bigscience/mt0-large`](https://huggingface.co/bigscience/mt0-large) model with LoRA to generate a classification label and use it for inference.
## PeftConfig
Each 🤗 PEFT method is defined by a [`PeftConfig`] class that stores all the important parameters for building a [`PeftModel`].
Because you're going to use LoRA, you'll need to load and create a [`LoraConfig`] class. Within `LoraConfig`, specify the following parameters:
- the `task_type`, or sequence-to-sequence language modeling in this case
- `inference_mode`, whether you're using the model for inference or not
- `r`, the dimension of the low-rank matrices
- `lora_alpha`, the scaling factor for the low-rank matrices
- `lora_dropout`, the dropout probability of the LoRA layers
```python
from peft import LoraConfig, TaskType
peft_config = LoraConfig(task_type=TaskType.SEQ_2_SEQ_LM, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1)
```
<Tip>
💡 See the [`LoraConfig`] reference for more details about other parameters you can adjust.
</Tip>
## PeftModel
A [`PeftModel`] is created by the [`get_peft_model`] function. It takes a base model - which you can load from the 🤗 Transformers library - and the [`PeftConfig`] containing the instructions for how to configure a model for a specific 🤗 PEFT method.
Start by loading the base model you want to finetune.
```python
from transformers import AutoModelForSeq2SeqLM
model_name_or_path = "bigscience/mt0-large"
tokenizer_name_or_path = "bigscience/mt0-large"
model = AutoModelForSeq2SeqLM.from_pretrained(model_name_or_path)
```
Wrap your base model and `peft_config` with the `get_peft_model` function to create a [`PeftModel`]. To get a sense of the number of trainable parameters in your model, use the [`print_trainable_parameters`] method. In this case, you're only training 0.19% of the model's parameters! 🤏
```python
from peft import get_peft_model
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()
"output: trainable params: 2359296 || all params: 1231940608 || trainable%: 0.19151053100118282"
```
That is it 🎉! Now you can train the model using the 🤗 Transformers [`~transformers.Trainer`], 🤗 Accelerate, or any custom PyTorch training loop.
## Save and load a model
After your model is finished training, you can save your model to a directory using the [`~transformers.PreTrainedModel.save_pretrained`] function. You can also save your model to the Hub (make sure you log in to your Hugging Face account first) with the [`~transformers.PreTrainedModel.push_to_hub`] function.
```python
model.save_pretrained("output_dir")
# if pushing to Hub
from huggingface_hub import notebook_login
notebook_login()
model.push_to_hub("my_awesome_peft_model")
```
This only saves the incremental 🤗 PEFT weights that were trained, meaning it is super efficient to store, transfer, and load. For example, this [`bigscience/T0_3B`](https://huggingface.co/smangrul/twitter_complaints_bigscience_T0_3B_LORA_SEQ_2_SEQ_LM) model trained with LoRA on the [`twitter_complaints`](https://huggingface.co/datasets/ought/raft/viewer/twitter_complaints/train) subset of the RAFT [dataset](https://huggingface.co/datasets/ought/raft) only contains two files: `adapter_config.json` and `adapter_model.bin`. The latter file is just 19MB!
Easily load your model for inference using the [`~transformers.PreTrainedModel.from_pretrained`] function:
```diff
from transformers import AutoModelForSeq2SeqLM
+ from peft import PeftModel, PeftConfig
+ peft_model_id = "smangrul/twitter_complaints_bigscience_T0_3B_LORA_SEQ_2_SEQ_LM"
+ config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path)
+ model = PeftModel.from_pretrained(model, peft_model_id)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
model = model.to(device)
model.eval()
inputs = tokenizer("Tweet text : @HondaCustSvc Your customer service has been horrible during the recall process. I will never purchase a Honda again. Label :", return_tensors="pt")
with torch.no_grad():
outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=10)
print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0])
'complaint'
```
## Next steps
Now that you've seen how to train a model with one of the 🤗 PEFT methods, we encourage you to try out some of the other methods like prompt tuning. The steps are very similar to the ones shown in this quickstart; prepare a [`PeftConfig`] for a 🤗 PEFT method, and use the `get_peft_model` to create a [`PeftModel`] from the configuration and base model. Then you can train it however you like!
Feel free to also take a look at the task guides if you're interested in training a model with a 🤗 PEFT method for a specific task such as semantic segmentation, multilingual automatic speech recognition, DreamBooth, and token classification.

View File

@ -1,288 +0,0 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Prompt tuning for causal language modeling
[[open-in-colab]]
Prompting helps guide language model behavior by adding some input text specific to a task. Prompt tuning is an additive method for only training and updating the newly added prompt tokens to a pretrained model. This way, you can use one pretrained model whose weights are frozen, and train and update a smaller set of prompt parameters for each downstream task instead of fully finetuning a separate model. As models grow larger and larger, prompt tuning can be more efficient, and results are even better as model parameters scale.
<Tip>
💡 Read [The Power of Scale for Parameter-Efficient Prompt Tuning](https://arxiv.org/abs/2104.08691) to learn more about prompt tuning.
</Tip>
This guide will show you how to apply prompt tuning to train a [`bloomz-560m`](https://huggingface.co/bigscience/bloomz-560m) model on the `twitter_complaints` subset of the [RAFT](https://huggingface.co/datasets/ought/raft) dataset.
Before you begin, make sure you have all the necessary libraries installed:
```bash
!pip install -q peft transformers datasets
```
## Setup
Start by defining the model and tokenizer, the dataset and the dataset columns to train on, some training hyperparameters, and the [`PromptTuningConfig`]. The [`PromptTuningConfig`] contains information about the task type, the text to initialize the prompt embedding, the number of virtual tokens, and the tokenizer to use:
```py
from transformers import AutoModelForCausalLM, AutoTokenizer, default_data_collator, get_linear_schedule_with_warmup
from peft import get_peft_config, get_peft_model, PromptTuningInit, PromptTuningConfig, TaskType, PeftType
import torch
from datasets import load_dataset
import os
from torch.utils.data import DataLoader
from tqdm import tqdm
device = "cuda"
model_name_or_path = "bigscience/bloomz-560m"
tokenizer_name_or_path = "bigscience/bloomz-560m"
peft_config = PromptTuningConfig(
task_type=TaskType.CAUSAL_LM,
prompt_tuning_init=PromptTuningInit.TEXT,
num_virtual_tokens=8,
prompt_tuning_init_text="Classify if the tweet is a complaint or not:",
tokenizer_name_or_path=model_name_or_path,
)
dataset_name = "twitter_complaints"
checkpoint_name = f"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}_v1.pt".replace(
"/", "_"
)
text_column = "Tweet text"
label_column = "text_label"
max_length = 64
lr = 3e-2
num_epochs = 50
batch_size = 8
```
## Load dataset
For this guide, you'll load the `twitter_complaints` subset of the [RAFT](https://huggingface.co/datasets/ought/raft) dataset. This subset contains tweets that are labeled either `complaint` or `no complaint`:
```py
dataset = load_dataset("ought/raft", dataset_name)
dataset["train"][0]
{"Tweet text": "@HMRCcustomers No this is my first job", "ID": 0, "Label": 2}
```
To make the `Label` column more readable, replace the `Label` value with the corresponding label text and store them in a `text_label` column. You can use the [`~datasets.Dataset.map`] function to apply this change over the entire dataset in one step:
```py
classes = [k.replace("_", " ") for k in dataset["train"].features["Label"].names]
dataset = dataset.map(
lambda x: {"text_label": [classes[label] for label in x["Label"]]},
batched=True,
num_proc=1,
)
{"Tweet text": "@HMRCcustomers No this is my first job", "ID": 0, "Label": 2, "text_label": "no complaint"}
```
## Preprocess dataset
Next, you'll setup a tokenizer; configure the appropriate padding token to use for padding sequences, and determine the maximum length of the tokenized labels:
```py
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
if tokenizer.pad_token_id is None:
tokenizer.pad_token_id = tokenizer.eos_token_id
target_max_length = max([len(tokenizer(class_label)["input_ids"]) for class_label in classes])
print(target_max_length)
3
```
Create a `preprocess_function` to:
1. Tokenize the input text and labels.
2. For each example in a batch, pad the labels with the tokenizers `pad_token_id`.
3. Concatenate the input text and labels into the `model_inputs`.
4. Create a separate attention mask for `labels` and `model_inputs`.
5. Loop through each example in the batch again to pad the input ids, labels, and attention mask to the `max_length` and convert them to PyTorch tensors.
```py
def preprocess_function(examples):
batch_size = len(examples[text_column])
inputs = [f"{text_column} : {x} Label : " for x in examples[text_column]]
targets = [str(x) for x in examples[label_column]]
model_inputs = tokenizer(inputs)
labels = tokenizer(targets)
for i in range(batch_size):
sample_input_ids = model_inputs["input_ids"][i]
label_input_ids = labels["input_ids"][i] + [tokenizer.pad_token_id]
# print(i, sample_input_ids, label_input_ids)
model_inputs["input_ids"][i] = sample_input_ids + label_input_ids
labels["input_ids"][i] = [-100] * len(sample_input_ids) + label_input_ids
model_inputs["attention_mask"][i] = [1] * len(model_inputs["input_ids"][i])
# print(model_inputs)
for i in range(batch_size):
sample_input_ids = model_inputs["input_ids"][i]
label_input_ids = labels["input_ids"][i]
model_inputs["input_ids"][i] = [tokenizer.pad_token_id] * (
max_length - len(sample_input_ids)
) + sample_input_ids
model_inputs["attention_mask"][i] = [0] * (max_length - len(sample_input_ids)) + model_inputs[
"attention_mask"
][i]
labels["input_ids"][i] = [-100] * (max_length - len(sample_input_ids)) + label_input_ids
model_inputs["input_ids"][i] = torch.tensor(model_inputs["input_ids"][i][:max_length])
model_inputs["attention_mask"][i] = torch.tensor(model_inputs["attention_mask"][i][:max_length])
labels["input_ids"][i] = torch.tensor(labels["input_ids"][i][:max_length])
model_inputs["labels"] = labels["input_ids"]
return model_inputs
```
Use the [`~datasets.Dataset.map`] function to apply the `preprocess_function` to the entire dataset. You can remove the unprocessed columns since the model won't need them:
```py
processed_datasets = dataset.map(
preprocess_function,
batched=True,
num_proc=1,
remove_columns=dataset["train"].column_names,
load_from_cache_file=False,
desc="Running tokenizer on dataset",
)
```
Create a [`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader) from the `train` and `eval` datasets. Set `pin_memory=True` to speed up the data transfer to the GPU during training if the samples in your dataset are on a CPU.
```py
train_dataset = processed_datasets["train"]
eval_dataset = processed_datasets["train"]
train_dataloader = DataLoader(
train_dataset, shuffle=True, collate_fn=default_data_collator, batch_size=batch_size, pin_memory=True
)
eval_dataloader = DataLoader(eval_dataset, collate_fn=default_data_collator, batch_size=batch_size, pin_memory=True)
```
## Train
You're almost ready to setup your model and start training!
Initialize a base model from [`~transformers.AutoModelForCausalLM`], and pass it and `peft_config` to the [`get_peft_model`] function to create a [`PeftModel`]. You can print the new [`PeftModel`]'s trainable parameters to see how much more efficient it is than training the full parameters of the original model!
```py
model = AutoModelForCausalLM.from_pretrained(model_name_or_path)
model = get_peft_model(model, peft_config)
print(model.print_trainable_parameters())
"trainable params: 8192 || all params: 559222784 || trainable%: 0.0014648902430985358"
```
Setup an optimizer and learning rate scheduler:
```py
optimizer = torch.optim.AdamW(model.parameters(), lr=lr)
lr_scheduler = get_linear_schedule_with_warmup(
optimizer=optimizer,
num_warmup_steps=0,
num_training_steps=(len(train_dataloader) * num_epochs),
)
```
Move the model to the GPU, then write a training loop to start training!
```py
model = model.to(device)
for epoch in range(num_epochs):
model.train()
total_loss = 0
for step, batch in enumerate(tqdm(train_dataloader)):
batch = {k: v.to(device) for k, v in batch.items()}
outputs = model(**batch)
loss = outputs.loss
total_loss += loss.detach().float()
loss.backward()
optimizer.step()
lr_scheduler.step()
optimizer.zero_grad()
model.eval()
eval_loss = 0
eval_preds = []
for step, batch in enumerate(tqdm(eval_dataloader)):
batch = {k: v.to(device) for k, v in batch.items()}
with torch.no_grad():
outputs = model(**batch)
loss = outputs.loss
eval_loss += loss.detach().float()
eval_preds.extend(
tokenizer.batch_decode(torch.argmax(outputs.logits, -1).detach().cpu().numpy(), skip_special_tokens=True)
)
eval_epoch_loss = eval_loss / len(eval_dataloader)
eval_ppl = torch.exp(eval_epoch_loss)
train_epoch_loss = total_loss / len(train_dataloader)
train_ppl = torch.exp(train_epoch_loss)
print(f"{epoch=}: {train_ppl=} {train_epoch_loss=} {eval_ppl=} {eval_epoch_loss=}")
```
## Share model
You can store and share your model on the Hub if you'd like. Log in to your Hugging Face account and enter your token when prompted:
```py
from huggingface_hub import notebook_login
notebook_login()
```
Use the [`~transformers.PreTrainedModel.push_to_hub`] function to upload your model to a model repository on the Hub:
```py
peft_model_id = "your-name/bloomz-560m_PROMPT_TUNING_CAUSAL_LM"
model.push_to_hub("your-name/bloomz-560m_PROMPT_TUNING_CAUSAL_LM", use_auth_token=True)
```
Once the model is uploaded, you'll see the model file size is only 33.5kB! 🤏
## Inference
Let's try the model on a sample input for inference. If you look at the repository you uploaded the model to, you'll see a `adapter_config.json` file. Load this file into [`PeftConfig`] to specify the `peft_type` and `task_type`. Then you can load the prompt tuned model weights, and the configuration into [`~PeftModel.from_pretrained`] to create the [`PeftModel`]:
```py
from peft import PeftModel, PeftConfig
peft_model_id = "stevhliu/bloomz-560m_PROMPT_TUNING_CAUSAL_LM"
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path)
model = PeftModel.from_pretrained(model, peft_model_id)
```
Grab a tweet and tokenize it:
```py
inputs = tokenizer(
f'{text_column} : {"@nationalgridus I have no water and the bill is current and paid. Can you do something about this?"} Label : ',
return_tensors="pt",
)
```
Put the model on a GPU and *generate* the predicted label:
```py
model.to(device)
with torch.no_grad():
inputs = {k: v.to(device) for k, v in inputs.items()}
outputs = model.generate(
input_ids=inputs["input_ids"], attention_mask=inputs["attention_mask"], max_new_tokens=10, eos_token_id=3
)
print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))
[
"Tweet text : @nationalgridus I have no water and the bill is current and paid. Can you do something about this? Label : complaint"
]
```

View File

@ -8,6 +8,10 @@ http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# DreamBooth fine-tuning with LoRA
@ -83,6 +87,7 @@ accelerate launch train_dreambooth.py \
--output_dir=$OUTPUT_DIR \
--train_text_encoder \
--with_prior_preservation --prior_loss_weight=1.0 \
--num_dataloader_workers=1 \
--instance_prompt="a photo of sks dog" \
--class_prompt="a photo of dog" \
--resolution=512 \
@ -101,6 +106,8 @@ accelerate launch train_dreambooth.py \
--max_train_steps=800
```
If you are running this script on Windows, you may need to set the `--num_dataloader_workers` to 0.
## Inference with a single adapter
To run inference with the fine-tuned model, first specify the base model with which the fine-tuned LoRA weights will be combined:
@ -171,7 +178,7 @@ image.save("DESTINATION_PATH_FOR_THE_IMAGE")
## Multi-adapter inference
With PEFT you can combine multiple adapters for inference. In the previous example you have fine-tuned Stable Diffusion on
some dog images. The pipeline created based on these weights got a name - `adapter_name="dog`. Now, suppose you also fine-tuned
some dog images. The pipeline created based on these weights got a name - `adapter_name="dog"`. Now, suppose you also fine-tuned
this base model on images of a crochet toy. Let's see how we can use both adapters.
First, you'll need to perform all the steps as in the single adapter inference example:

View File

@ -8,6 +8,10 @@ http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Image classification using LoRA
@ -26,7 +30,7 @@ For more information on LoRA, please refer to the [original LoRA paper](https://
Install the libraries required for model training:
```bash
!pip install transformers accelerate evaluate datasets loralib peft -q
!pip install transformers accelerate evaluate datasets peft -q
```
Check the versions of all required libraries to make sure you are up to date:
@ -324,7 +328,7 @@ Bring everything together - model, training arguments, data, collation function,
```python
trainer = Trainer(
model,
lora_model,
args,
train_dataset=train_ds,
eval_dataset=val_ds,

View File

@ -1,3 +1,7 @@
<!--⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# int8 training for automatic speech recognition
Quantization reduces the precision of floating point data types, decreasing the memory required to store model weights. However, quantization degrades inference performance because you lose information when you reduce the precision. 8-bit or `int8` quantization uses only a quarter precision, but it does not degrade performance because it doesn't just drop the bits or data. Instead, `int8` quantization *rounds* from one data type to another.
@ -205,7 +209,7 @@ Let's also apply LoRA to the training to make it even more efficient. Load a [`~
```py
from peft import LoraConfig, PeftModel, LoraModel, LoraConfig, get_peft_model
config = LoraConfig(r=32, lora_alpha=64, target_modules=["q_proj", "v_proj"], lora_dropout=0.05, bias="None")
config = LoraConfig(r=32, lora_alpha=64, target_modules=["q_proj", "v_proj"], lora_dropout=0.05, bias="none")
```
After you set up the [`~peft.LoraConfig`], wrap it and the base model with the [`get_peft_model`] function to create a [`PeftModel`]. Print out the number of trainable parameters to see how much more efficient LoRA is compared to fully training the model!
@ -375,4 +379,4 @@ with torch.cuda.amp.autocast():
text = pipe(audio, generate_kwargs={"forced_decoder_ids": forced_decoder_ids}, max_new_tokens=255)["text"]
text
"मी तुमच्यासाठी काही करू शकतो का?"
```
```

View File

@ -0,0 +1,305 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Prompt-based methods
A prompt can describe a task or provide an example of a task you want the model to learn. Instead of manually creating these prompts, soft prompting methods add learnable parameters to the input embeddings that can be optimized for a specific task while keeping the pretrained model's parameters frozen. This makes it both faster and easier to finetune large language models (LLMs) for new downstream tasks.
The PEFT library supports several types of prompting methods (p-tuning, prefix tuning, prompt tuning) and you can learn more about how these methods work conceptually in the [Soft prompts](../conceptual_guides/prompting) guide. If you're interested in applying these methods to other tasks and use cases, take a look at our [notebook collection](https://huggingface.co/spaces/PEFT/soft-prompting)!
This guide will show you how to train a causal language model - with a soft prompting method - to *generate a classification* for whether a tweet is a complaint or not.
<Tip>
Some familiarity with the general process of training a causal language model would be really helpful and allow you to focus on the soft prompting methods. If you're new, we recommend taking a look at the [Causal language modeling](https://huggingface.co/docs/transformers/tasks/language_modeling) guide first from the Transformers documentation. When you're ready, come back and see how easy it is to drop PEFT in to your training!
</Tip>
Before you begin, make sure you have all the necessary libraries installed.
```bash
pip install -q peft transformers datasets
```
## Dataset
For this guide, you'll use the `twitter_complaints` subset of the [RAFT](https://huggingface.co/datasets/ought/raft) dataset. The `twitter_complaints` subset contains tweets labeled as `complaint` and `no complaint` and you can check out the [dataset viewer](https://huggingface.co/datasets/ought/raft/viewer/twitter_complaints) for a better idea of what the data looks like.
Use the [`~datasets.load_dataset`] function to load the dataset and create a new `text_label` column so it is easier to understand what the `Label` values, `1` and `2` mean.
```py
from datasets import load_dataset
ds = load_dataset("ought/raft", "twitter_complaints")
classes = [k.replace("_", " ") for k in ds["train"].features["Label"].names]
ds = ds.map(
lambda x: {"text_label": [classes[label] for label in x["Label"]]},
batched=True,
num_proc=1,
)
ds["train"][0]
{"Tweet text": "@HMRCcustomers No this is my first job", "ID": 0, "Label": 2, "text_label": "no complaint"}
```
Load a tokenizer, define the padding token to use, and determine the maximum length of the tokenized label.
```py
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("bigscience/bloomz-560m")
if tokenizer.pad_token_id is None:
tokenizer.pad_token_id = tokenizer.eos_token_id
target_max_length = max([len(tokenizer(class_label)["input_ids"]) for class_label in classes])
print(target_max_length)
```
Create a preprocessing function that tokenizes the tweet text and labels, pad the inputs and labels in each batch, create an attention mask, and truncate sequences to the `max_length`. Then convert the `input_ids`, `attention_mask`, and `labels` to PyTorch tensors.
```py
import torch
max_length = 64
def preprocess_function(examples, text_column="Tweet text", label_column="text_label"):
batch_size = len(examples[text_column])
inputs = [f"{text_column} : {x} Label : " for x in examples[text_column]]
targets = [str(x) for x in examples[label_column]]
model_inputs = tokenizer(inputs)
labels = tokenizer(targets)
classes = [k.replace("_", " ") for k in ds["train"].features["Label"].names]
for i in range(batch_size):
sample_input_ids = model_inputs["input_ids"][i]
label_input_ids = labels["input_ids"][i]
model_inputs["input_ids"][i] = [tokenizer.pad_token_id] * (
max_length - len(sample_input_ids)
) + sample_input_ids
model_inputs["attention_mask"][i] = [0] * (max_length - len(sample_input_ids)) + model_inputs[
"attention_mask"
][i]
labels["input_ids"][i] = [-100] * (max_length - len(sample_input_ids)) + label_input_ids
model_inputs["input_ids"][i] = torch.tensor(model_inputs["input_ids"][i][:max_length])
model_inputs["attention_mask"][i] = torch.tensor(model_inputs["attention_mask"][i][:max_length])
labels["input_ids"][i] = torch.tensor(labels["input_ids"][i][:max_length])
model_inputs["labels"] = labels["input_ids"]
return model_inputs
```
Apply the preprocessing function to the entire dataset with the [`~datasets.Dataset.map`] function, and remove the unprocessed columns because the model won't need them.
```py
processed_ds = ds.map(
preprocess_function,
batched=True,
num_proc=1,
remove_columns=ds["train"].column_names,
load_from_cache_file=False,
desc="Running tokenizer on dataset",
)
```
Finally, create a training and evaluation [`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader). You can set `pin_memory=True` to speed up the data transfer to the GPU during training if the samples in your dataset are on a CPU.
```py
from torch.utils.data import DataLoader
from transformers import default_data_collator
train_ds = processed_ds["train"]
eval_ds = processed_ds["test"]
batch_size = 16
train_dataloader = DataLoader(train_ds, shuffle=True, collate_fn=default_data_collator, batch_size=batch_size, pin_memory=True)
eval_dataloader = DataLoader(eval_ds, collate_fn=default_data_collator, batch_size=batch_size, pin_memory=True)
```
## Model
Now let's load a pretrained model to use as the base model for the soft prompt method. This guide uses the [bigscience/bloomz-560m](https://huggingface.co/bigscience/bloomz-560m) model, but you can use any causal language model you want.
```py
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("bigscience/bloomz-560m")
```
### PEFT configuration and model
For any PEFT method, you'll need to create a configuration which contains all the parameters that specify how the PEFT method should be applied. Once the configuration is setup, pass it to the [`~peft.get_peft_model`] function along with the base model to create a trainable [`PeftModel`].
<Tip>
Call the [`~PeftModel.print_trainable_parameters`] method to compare the number of trainable parameters of [`PeftModel`] versus the number of parameters in the base model!
</Tip>
<hfoptions id="configurations">
<hfoption id="p-tuning">
[P-tuning](../conceptual_guides/prompting#p-tuning) adds a trainable embedding tensor where the prompt tokens can be added anywhere in the input sequence. Create a [`PromptEncoderConfig`] with the task type, the number of virtual tokens to add and learn, and the hidden size of the encoder for learning the prompt parameters.
```py
from peft import PromptEncoderConfig, get_peft_model
peft_config = PromptEncoderConfig(task_type="CAUSAL_LM", num_virtual_tokens=20, encoder_hidden_size=128)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()
"trainable params: 300,288 || all params: 559,514,880 || trainable%: 0.05366935013417338"
```
</hfoption>
<hfoption id="prefix tuning">
[Prefix tuning](../conceptual_guides/prompting#prefix-tuning) adds task-specific parameters in all of the model layers, which are optimized by a separate feed-forward network. Create a [`PrefixTuningConfig`] with the task type and number of virtual tokens to add and learn.
```py
from peft import PrefixTuningConfig, get_peft_model
peft_config = PrefixTuningConfig(task_type="CAUSAL_LM", num_virtual_tokens=20)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()
"trainable params: 983,040 || all params: 560,197,632 || trainable%: 0.1754809274167014"
```
</hfoption>
<hfoption id="prompt tuning">
[Prompt tuning](../conceptual_guides/prompting#prompt-tuning) formulates all tasks as a *generation* task and it adds a task-specific prompt to the input which is updated independently. The `prompt_tuning_init_text` parameter specifies how to finetune the model (in this case, it is classifying whether tweets are complaints or not). For the best results, the `prompt_tuning_init_text` should have the same number of tokens that should be predicted. To do this, you can set `num_virtual_tokens` to the number of tokens of the `prompt_tuning_init_text`.
Create a [`PromptTuningConfig`] with the task type, the initial prompt tuning text to train the model with, the number of virtual tokens to add and learn, and a tokenizer.
```py
from peft import PromptTuningConfig, PromptTuningInit, get_peft_model
prompt_tuning_init_text = "Classify if the tweet is a complaint or no complaint.\n"
peft_config = PromptTuningConfig(
task_type="CAUSAL_LM",
prompt_tuning_init=PromptTuningInit.TEXT,
num_virtual_tokens=len(tokenizer(prompt_tuning_init_text)["input_ids"]),,
prompt_tuning_init_text=prompt_tuning_init_text,
tokenizer_name_or_path="bigscience/bloomz-560m",
)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()
"trainable params: 8,192 || all params: 559,222,784 || trainable%: 0.0014648902430985358"
```
</hfoption>
</hfoptions>
### Training
Set up an optimizer and learning rate scheduler.
```py
from transformers import get_linear_schedule_with_warmup
lr = 3e-2
num_epochs = 50
optimizer = torch.optim.AdamW(model.parameters(), lr=lr)
lr_scheduler = get_linear_schedule_with_warmup(
optimizer=optimizer,
num_warmup_steps=0,
num_training_steps=(len(train_dataloader) * num_epochs),
)
```
Move the model to the GPU and create a training loop that reports the loss and perplexity for each epoch.
```py
from tqdm import tqdm
device = "cuda"
model = model.to(device)
for epoch in range(num_epochs):
model.train()
total_loss = 0
for step, batch in enumerate(tqdm(train_dataloader)):
batch = {k: v.to(device) for k, v in batch.items()}
outputs = model(**batch)
loss = outputs.loss
total_loss += loss.detach().float()
loss.backward()
optimizer.step()
lr_scheduler.step()
optimizer.zero_grad()
model.eval()
eval_loss = 0
eval_preds = []
for step, batch in enumerate(tqdm(eval_dataloader)):
batch = {k: v.to(device) for k, v in batch.items()}
with torch.no_grad():
outputs = model(**batch)
loss = outputs.loss
eval_loss += loss.detach().float()
eval_preds.extend(
tokenizer.batch_decode(torch.argmax(outputs.logits, -1).detach().cpu().numpy(), skip_special_tokens=True)
)
eval_epoch_loss = eval_loss / len(eval_dataloader)
eval_ppl = torch.exp(eval_epoch_loss)
train_epoch_loss = total_loss / len(train_dataloader)
train_ppl = torch.exp(train_epoch_loss)
print(f"{epoch=}: {train_ppl=} {train_epoch_loss=} {eval_ppl=} {eval_epoch_loss=}")
```
## Share your model
Once training is complete, you can upload your model to the Hub with the [`~transformers.PreTrainedModel.push_to_hub`] method. You'll need to login to your Hugging Face account first and enter your token when prompted.
```py
from huggingface_hub import notebook_login
account = <your-hf-account-name>
peft_model_id = f"{account}/bloomz-560-m-peft-method"
model.push_to_hub(peft_model_id)
```
If you check the model file size in the repository, youll see that it is a lot smaller than a full sized model!
<div class="flex flex-col justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/PEFT-hub-screenshot.png"/>
<figcaption class="text-center">For example, the adapter weights for a opt-350m model stored on the Hub are only ~6MB compared to the full model size which can be ~700MB.</figcaption>
</div>
## Inference
Let's load the model for inference and test it out on a tweet!
```py
from peft import AutoPeftModelForCausalLM
model = AutoPeftModelForCausalLM.from_pretrained("peft_model_id").to("cuda")
tokenizer = AutoTokenizer.from_pretrained("bigscience/bloomz-560m")
i = 15
inputs = tokenizer(f'{text_column} : {ds["test"][i]["Tweet text"]} Label : ', return_tensors="pt")
print(ds["test"][i]["Tweet text"])
"@NYTsupport i have complained a dozen times &amp; yet my papers are still thrown FAR from my door. Why is this so hard to resolve?"
```
Call the [`~transformers.GenerationMixin.generate`] method to generate the predicted classification label.
```py
with torch.no_grad():
inputs = {k: v.to(device) for k, v in inputs.items()}
outputs = model.generate(input_ids=inputs["input_ids"], max_new_tokens=10)
print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))
"['Tweet text : @NYTsupport i have complained a dozen times &amp; yet my papers are still thrown FAR from my door. Why is this so hard to resolve? Label : complaint']"
```

View File

@ -1,232 +0,0 @@
# P-tuning for sequence classification
It is challenging to finetune large language models for downstream tasks because they have so many parameters. To work around this, you can use *prompts* to steer the model toward a particular downstream task without fully finetuning a model. Typically, these prompts are handcrafted, which may be impractical because you need very large validation sets to find the best prompts. *P-tuning* is a method for automatically searching and optimizing for better prompts in a continuous space.
<Tip>
💡 Read [GPT Understands, Too](https://arxiv.org/abs/2103.10385) to learn more about p-tuning.
</Tip>
This guide will show you how to train a [`roberta-large`](https://huggingface.co/roberta-large) model (but you can also use any of the GPT, OPT, or BLOOM models) with p-tuning on the `mrpc` configuration of the [GLUE](https://huggingface.co/datasets/glue) benchmark.
Before you begin, make sure you have all the necessary libraries installed:
```bash
!pip install -q peft transformers datasets evaluate
```
## Setup
To get started, import 🤗 Transformers to create the base model, 🤗 Datasets to load a dataset, 🤗 Evaluate to load an evaluation metric, and 🤗 PEFT to create a [`PeftModel`] and setup the configuration for p-tuning.
Define the model, dataset, and some basic training hyperparameters:
```py
from transformers import (
AutoModelForSequenceClassification,
AutoTokenizer,
DataCollatorWithPadding,
TrainingArguments,
Trainer,
)
from peft import (
get_peft_config,
get_peft_model,
get_peft_model_state_dict,
set_peft_model_state_dict,
PeftType,
PromptEncoderConfig,
)
from datasets import load_dataset
import evaluate
import torch
model_name_or_path = "roberta-large"
task = "mrpc"
num_epochs = 20
lr = 1e-3
batch_size = 32
```
## Load dataset and metric
Next, load the `mrpc` configuration - a corpus of sentence pairs labeled according to whether they're semantically equivalent or not - from the [GLUE](https://huggingface.co/datasets/glue) benchmark:
```py
dataset = load_dataset("glue", task)
dataset["train"][0]
{
"sentence1": 'Amrozi accused his brother , whom he called " the witness " , of deliberately distorting his evidence .',
"sentence2": 'Referring to him as only " the witness " , Amrozi accused his brother of deliberately distorting his evidence .',
"label": 1,
"idx": 0,
}
```
From 🤗 Evaluate, load a metric for evaluating the model's performance. The evaluation module returns the accuracy and F1 scores associated with this specific task.
```py
metric = evaluate.load("glue", task)
```
Now you can use the `metric` to write a function that computes the accuracy and F1 scores. The `compute_metric` function calculates the scores from the model predictions and labels:
```py
import numpy as np
def compute_metrics(eval_pred):
predictions, labels = eval_pred
predictions = np.argmax(predictions, axis=1)
return metric.compute(predictions=predictions, references=labels)
```
## Preprocess dataset
Initialize the tokenizer and configure the padding token to use. If you're using a GPT, OPT, or BLOOM model, you should set the `padding_side` to the left; otherwise it'll be set to the right. Tokenize the sentence pairs and truncate them to the maximum length.
```py
if any(k in model_name_or_path for k in ("gpt", "opt", "bloom")):
padding_side = "left"
else:
padding_side = "right"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, padding_side=padding_side)
if getattr(tokenizer, "pad_token_id") is None:
tokenizer.pad_token_id = tokenizer.eos_token_id
def tokenize_function(examples):
# max_length=None => use the model max length (it's actually the default)
outputs = tokenizer(examples["sentence1"], examples["sentence2"], truncation=True, max_length=None)
return outputs
```
Use [`~datasets.Dataset.map`] to apply the `tokenize_function` to the dataset, and remove the unprocessed columns because the model won't need those. You should also rename the `label` column to `labels` because that is the expected name for the labels by models in the 🤗 Transformers library.
```py
tokenized_datasets = dataset.map(
tokenize_function,
batched=True,
remove_columns=["idx", "sentence1", "sentence2"],
)
tokenized_datasets = tokenized_datasets.rename_column("label", "labels")
```
Create a collator function with [`~transformers.DataCollatorWithPadding`] to pad the examples in the batches to the `longest` sequence in the batch:
```py
data_collator = DataCollatorWithPadding(tokenizer=tokenizer, padding="longest")
```
## Train
P-tuning uses a prompt encoder to optimize the prompt parameters, so you'll need to initialize the [`PromptEncoderConfig`] with several arguments:
- `task_type`: the type of task you're training on, in this case it is sequence classification or `SEQ_CLS`
- `num_virtual_tokens`: the number of virtual tokens to use, or in other words, the prompt
- `encoder_hidden_size`: the hidden size of the encoder used to optimize the prompt parameters
```py
peft_config = PromptEncoderConfig(task_type="SEQ_CLS", num_virtual_tokens=20, encoder_hidden_size=128)
```
Create the base `roberta-large` model from [`~transformers.AutoModelForSequenceClassification`], and then wrap the base model and `peft_config` with [`get_peft_model`] to create a [`PeftModel`]. If you're curious to see how many parameters you're actually training compared to training on all the model parameters, you can print it out with [`~peft.PeftModel.print_trainable_parameters`]:
```py
model = AutoModelForSequenceClassification.from_pretrained(model_name_or_path, return_dict=True)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()
"trainable params: 1351938 || all params: 355662082 || trainable%: 0.38011867680626127"
```
From the 🤗 Transformers library, set up the [`~transformers.TrainingArguments`] class with where you want to save the model to, the training hyperparameters, how to evaluate the model, and when to save the checkpoints:
```py
training_args = TrainingArguments(
output_dir="your-name/roberta-large-peft-p-tuning",
learning_rate=1e-3,
per_device_train_batch_size=32,
per_device_eval_batch_size=32,
num_train_epochs=2,
weight_decay=0.01,
evaluation_strategy="epoch",
save_strategy="epoch",
load_best_model_at_end=True,
)
```
Then pass the model, `TrainingArguments`, datasets, tokenizer, data collator, and evaluation function to the [`~transformers.Trainer`] class, which'll handle the entire training loop for you. Once you're ready, call [`~transformers.Trainer.train`] to start training!
```py
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_datasets["train"],
eval_dataset=tokenized_datasets["test"],
tokenizer=tokenizer,
data_collator=data_collator,
compute_metrics=compute_metrics,
)
trainer.train()
```
## Share model
You can store and share your model on the Hub if you'd like. Log in to your Hugging Face account and enter your token when prompted:
```py
from huggingface_hub import notebook_login
notebook_login()
```
Upload the model to a specifc model repository on the Hub with the [`~transformers.PreTrainedModel.push_to_hub`] function:
```py
model.push_to_hub("your-name/roberta-large-peft-p-tuning", use_auth_token=True)
```
## Inference
Once the model has been uploaded to the Hub, anyone can easily use it for inference. Load the configuration and model:
```py
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
peft_model_id = "smangrul/roberta-large-peft-p-tuning"
config = PeftConfig.from_pretrained(peft_model_id)
inference_model = AutoModelForSequenceClassification.from_pretrained(config.base_model_name_or_path)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
model = PeftModel.from_pretrained(inference_model, peft_model_id)
```
Get some text and tokenize it:
```py
classes = ["not equivalent", "equivalent"]
sentence1 = "Coast redwood trees are the tallest trees on the planet and can grow over 300 feet tall."
sentence2 = "The coast redwood trees, which can attain a height of over 300 feet, are the tallest trees on earth."
inputs = tokenizer(sentence1, sentence2, truncation=True, padding="longest", return_tensors="pt")
```
Pass the inputs to the model to classify the sentences:
```py
with torch.no_grad():
outputs = model(**inputs).logits
print(outputs)
paraphrased_text = torch.softmax(outputs, dim=1).tolist()[0]
for i in range(len(classes)):
print(f"{classes[i]}: {int(round(paraphrased_text[i] * 100))}%")
"not equivalent: 4%"
"equivalent: 96%"
```

View File

@ -0,0 +1,297 @@
<!--⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# LoRA for semantic similarity tasks
Low-Rank Adaptation (LoRA) is a reparametrization method that aims to reduce the number of trainable parameters with low-rank representations. The weight matrix is broken down into low-rank matrices that are trained and updated. All the pretrained model parameters remain frozen. After training, the low-rank matrices are added back to the original weights. This makes it more efficient to store and train a LoRA model because there are significantly fewer parameters.
<Tip>
💡 Read [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685) to learn more about LoRA.
</Tip>
In this guide, we'll be using a LoRA [script](https://github.com/huggingface/peft/tree/main/examples/lora_dreambooth) to fine-tune a [`intfloat/e5-large-v2`](https://huggingface.co/intfloat/e5-large-v2) model on the [`smangrul/amazon_esci`](https://huggingface.co/datasets/smangrul/amazon_esci) dataset for semantic similarity tasks. Feel free to explore the script to learn how things work in greater detail!
## Setup
Start by installing 🤗 PEFT from [source](https://github.com/huggingface/peft), and then navigate to the directory containing the training scripts for fine-tuning DreamBooth with LoRA:
```bash
cd peft/examples/feature_extraction
```
Install all the necessary required libraries with:
```bash
pip install -r requirements.txt
```
Next, import all the necessary libraries:
- 🤗 Transformers for loading the `intfloat/e5-large-v2` model and tokenizer
- 🤗 Accelerate for the training loop
- 🤗 Datasets for loading and preparing the `smangrul/amazon_esci` dataset for training and inference
- 🤗 Evaluate for evaluating the model's performance
- 🤗 PEFT for setting up the LoRA configuration and creating the PEFT model
- 🤗 huggingface_hub for uploading the trained model to HF hub
- hnswlib for creating the search index and doing fast approximate nearest neighbor search
<Tip>
It is assumed that PyTorch with CUDA support is already installed.
</Tip>
## Train
Launch the training script with `accelerate launch` and pass your hyperparameters along with the `--use_peft` argument to enable LoRA.
This guide uses the following [`LoraConfig`]:
```py
peft_config = LoraConfig(
r=8,
lora_alpha=16,
bias="none",
task_type=TaskType.FEATURE_EXTRACTION,
target_modules=["key", "query", "value"],
)
```
Here's what a full set of script arguments may look like when running in Colab on a V100 GPU with standard RAM:
```bash
accelerate launch \
--mixed_precision="fp16" \
peft_lora_embedding_semantic_search.py \
--dataset_name="smangrul/amazon_esci" \
--max_length=70 --model_name_or_path="intfloat/e5-large-v2" \
--per_device_train_batch_size=64 \
--per_device_eval_batch_size=128 \
--learning_rate=5e-4 \
--weight_decay=0.0 \
--num_train_epochs 3 \
--gradient_accumulation_steps=1 \
--output_dir="results/peft_lora_e5_ecommerce_semantic_search_colab" \
--seed=42 \
--push_to_hub \
--hub_model_id="smangrul/peft_lora_e5_ecommerce_semantic_search_colab" \
--with_tracking \
--report_to="wandb" \
--use_peft \
--checkpointing_steps "epoch"
```
## Dataset for semantic similarity
The dataset we'll be using is a small subset of the [esci-data](https://github.com/amazon-science/esci-data.git) dataset (it can be found on Hub at [smangrul/amazon_esci](https://huggingface.co/datasets/smangrul/amazon_esci)).
Each sample contains a tuple of `(query, product_title, relevance_label)` where `relevance_label` is `1` if the product matches the intent of the `query`, otherwise it is `0`.
Our task is to build an embedding model that can retrieve semantically similar products given a product query.
This is usually the first stage in building a product search engine to retrieve all the potentially relevant products of a given query.
Typically, this involves using Bi-Encoder models to cross-join the query and millions of products which could blow up quickly.
Instead, you can use a Transformer model to retrieve the top K nearest similar products for a given query by
embedding the query and products in the same latent embedding space.
The millions of products are embedded offline to create a search index.
At run time, only the query is embedded by the model, and products are retrieved from the search index with a
fast approximate nearest neighbor search library such as [FAISS](https://github.com/facebookresearch/faiss) or [HNSWlib](https://github.com/nmslib/hnswlib).
The next stage involves reranking the retrieved list of products to return the most relevant ones;
this stage can utilize cross-encoder based models as the cross-join between the query and a limited set of retrieved products.
The diagram below from [awesome-semantic-search](https://github.com/rom1504/awesome-semantic-search) outlines a rough semantic search pipeline:
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/semantic_search_pipeline.png"
alt="Semantic Search Pipeline"/>
</div>
For this task guide, we will explore the first stage of training an embedding model to predict semantically similar products
given a product query.
## Training script deep dive
We finetune [e5-large-v2](https://huggingface.co/intfloat/e5-large-v2) which tops the [MTEB benchmark](https://huggingface.co/spaces/mteb/leaderboard) using PEFT-LoRA.
[`AutoModelForSentenceEmbedding`] returns the query and product embeddings, and the `mean_pooling` function pools them across the sequence dimension and normalizes them:
```py
class AutoModelForSentenceEmbedding(nn.Module):
def __init__(self, model_name, tokenizer, normalize=True):
super(AutoModelForSentenceEmbedding, self).__init__()
self.model = AutoModel.from_pretrained(model_name)
self.normalize = normalize
self.tokenizer = tokenizer
def forward(self, **kwargs):
model_output = self.model(**kwargs)
embeddings = self.mean_pooling(model_output, kwargs["attention_mask"])
if self.normalize:
embeddings = torch.nn.functional.normalize(embeddings, p=2, dim=1)
return embeddings
def mean_pooling(self, model_output, attention_mask):
token_embeddings = model_output[0] # First element of model_output contains all token embeddings
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
def __getattr__(self, name: str):
"""Forward missing attributes to the wrapped module."""
try:
return super().__getattr__(name) # defer to nn.Module's logic
except AttributeError:
return getattr(self.model, name)
def get_cosine_embeddings(query_embs, product_embs):
return torch.sum(query_embs * product_embs, axis=1)
def get_loss(cosine_score, labels):
return torch.mean(torch.square(labels * (1 - cosine_score) + torch.clamp((1 - labels) * cosine_score, min=0.0)))
```
The `get_cosine_embeddings` function computes the cosine similarity and the `get_loss` function computes the loss. The loss enables the model to learn that a cosine score of `1` for query and product pairs is relevant, and a cosine score of `0` or below is irrelevant.
Define the [`PeftConfig`] with your LoRA hyperparameters, and create a [`PeftModel`]. We use 🤗 Accelerate for handling all device management, mixed precision training, gradient accumulation, WandB tracking, and saving/loading utilities.
## Results
The table below compares the training time, the batch size that could be fit in Colab, and the best ROC-AUC scores between a PEFT model and a fully fine-tuned model:
| Training Type | Training time per epoch (Hrs) | Batch Size that fits | ROC-AUC score (higher is better) |
| ----------------- | ------------- | ---------- | -------- |
| Pre-Trained e5-large-v2 | - | - | 0.68 |
| PEFT | 1.73 | 64 | 0.787 |
| Full Fine-Tuning | 2.33 | 32 | 0.7969 |
The PEFT-LoRA model trains **1.35X** faster and can fit **2X** batch size compared to the fully fine-tuned model, and the performance of PEFT-LoRA is comparable to the fully fine-tuned model with a relative drop of **-1.24%** in ROC-AUC. This gap can probably be closed with bigger models as mentioned in [The Power of Scale for Parameter-Efficient Prompt Tuning
](https://huggingface.co/papers/2104.08691).
## Inference
Let's go! Now we have the model, we need to create a search index of all the products in our catalog.
Please refer to `peft_lora_embedding_semantic_similarity_inference.ipynb` for the complete inference code.
1. Get a list of ids to products which we can call `ids_to_products_dict`:
```bash
{0: 'RamPro 10" All Purpose Utility Air Tires/Wheels with a 5/8" Diameter Hole with Double Sealed Bearings (Pack of 2)',
1: 'MaxAuto 2-Pack 13x5.00-6 2PLY Turf Mower Tractor Tire with Yellow Rim, (3" Centered Hub, 3/4" Bushings )',
2: 'NEIKO 20601A 14.5 inch Steel Tire Spoon Lever Iron Tool Kit | Professional Tire Changing Tool for Motorcycle, Dirt Bike, Lawn Mower | 3 pcs Tire Spoons | 3 Rim Protector | Valve Tool | 6 Valve Cores',
3: '2PK 13x5.00-6 13x5.00x6 13x5x6 13x5-6 2PLY Turf Mower Tractor Tire with Gray Rim',
4: '(Set of 2) 15x6.00-6 Husqvarna/Poulan Tire Wheel Assy .75" Bearing',
5: 'MaxAuto 2 Pcs 16x6.50-8 Lawn Mower Tire for Garden Tractors Ridings, 4PR, Tubeless',
6: 'Dr.Roc Tire Spoon Lever Dirt Bike Lawn Mower Motorcycle Tire Changing Tools with Durable Bag 3 Tire Irons 2 Rim Protectors 1 Valve Stems Set TR412 TR413',
7: 'MARASTAR 21446-2PK 15x6.00-6" Front Tire Assembly Replacement-Craftsman Mower, Pack of 2',
8: '15x6.00-6" Front Tire Assembly Replacement for 100 and 300 Series John Deere Riding Mowers - 2 pack',
9: 'Honda HRR Wheel Kit (2 Front 44710-VL0-L02ZB, 2 Back 42710-VE2-M02ZE)',
10: 'Honda 42710-VE2-M02ZE (Replaces 42710-VE2-M01ZE) Lawn Mower Rear Wheel Set of 2' ...
```
2. Use the trained [smangrul/peft_lora_e5_ecommerce_semantic_search_colab](https://huggingface.co/smangrul/peft_lora_e5_ecommerce_semantic_search_colab) model to get the product embeddings:
```py
# base model
model = AutoModelForSentenceEmbedding(model_name_or_path, tokenizer)
# peft config and wrapping
model = PeftModel.from_pretrained(model, peft_model_id)
device = "cuda"
model.to(device)
model.eval()
model = model.merge_and_unload()
import numpy as np
num_products= len(dataset)
d = 1024
product_embeddings_array = np.zeros((num_products, d))
for step, batch in enumerate(tqdm(dataloader)):
with torch.no_grad():
with torch.amp.autocast(dtype=torch.bfloat16, device_type="cuda"):
product_embs = model(**{k:v.to(device) for k, v in batch.items()}).detach().float().cpu()
start_index = step*batch_size
end_index = start_index+batch_size if (start_index+batch_size) < num_products else num_products
product_embeddings_array[start_index:end_index] = product_embs
del product_embs, batch
```
3. Create a search index using HNSWlib:
```py
def construct_search_index(dim, num_elements, data):
# Declaring index
search_index = hnswlib.Index(space = 'ip', dim = dim) # possible options are l2, cosine or ip
# Initializing index - the maximum number of elements should be known beforehand
search_index.init_index(max_elements = num_elements, ef_construction = 200, M = 100)
# Element insertion (can be called several times):
ids = np.arange(num_elements)
search_index.add_items(data, ids)
return search_index
product_search_index = construct_search_index(d, num_products, product_embeddings_array)
```
4. Get the query embeddings and nearest neighbors:
```py
def get_query_embeddings(query, model, tokenizer, device):
inputs = tokenizer(query, padding="max_length", max_length=70, truncation=True, return_tensors="pt")
model.eval()
with torch.no_grad():
query_embs = model(**{k:v.to(device) for k, v in inputs.items()}).detach().cpu()
return query_embs[0]
def get_nearest_neighbours(k, search_index, query_embeddings, ids_to_products_dict, threshold=0.7):
# Controlling the recall by setting ef:
search_index.set_ef(100) # ef should always be > k
# Query dataset, k - number of the closest elements (returns 2 numpy arrays)
labels, distances = search_index.knn_query(query_embeddings, k = k)
return [(ids_to_products_dict[label], (1-distance)) for label, distance in zip(labels[0], distances[0]) if (1-distance)>=threshold]
```
5. Let's test it out with the query `deep learning books`:
```py
query = "deep learning books"
k = 10
query_embeddings = get_query_embeddings(query, model, tokenizer, device)
search_results = get_nearest_neighbours(k, product_search_index, query_embeddings, ids_to_products_dict, threshold=0.7)
print(f"{query=}")
for product, cosine_sim_score in search_results:
print(f"cosine_sim_score={round(cosine_sim_score,2)} {product=}")
```
Output:
```bash
query='deep learning books'
cosine_sim_score=0.95 product='Deep Learning (The MIT Press Essential Knowledge series)'
cosine_sim_score=0.93 product='Practical Deep Learning: A Python-Based Introduction'
cosine_sim_score=0.9 product='Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems'
cosine_sim_score=0.9 product='Machine Learning: A Hands-On, Project-Based Introduction to Machine Learning for Absolute Beginners: Mastering Engineering ML Systems using Scikit-Learn and TensorFlow'
cosine_sim_score=0.9 product='Mastering Machine Learning on AWS: Advanced machine learning in Python using SageMaker, Apache Spark, and TensorFlow'
cosine_sim_score=0.9 product='The Hundred-Page Machine Learning Book'
cosine_sim_score=0.89 product='Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems'
cosine_sim_score=0.89 product='Machine Learning: A Journey from Beginner to Advanced Including Deep Learning, Scikit-learn and Tensorflow'
cosine_sim_score=0.88 product='Mastering Machine Learning with scikit-learn'
cosine_sim_score=0.88 product='Mastering Machine Learning with scikit-learn - Second Edition: Apply effective learning algorithms to real-world problems using scikit-learn'
```
Books on deep learning and machine learning are retrieved even though `machine learning` wasn't included in the query. This means the model has learned that these books are semantically relevant to the query based on the purchase behavior of customers on Amazon.
The next steps would ideally involve using ONNX/TensorRT to optimize the model and using a Triton server to host it. Check out 🤗 [Optimum](https://huggingface.co/docs/optimum/index) for related optimizations for efficient serving!

View File

@ -8,6 +8,10 @@ http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Semantic segmentation using LoRA
@ -26,7 +30,7 @@ For more information on LoRA, please refer to the [original LoRA paper](https://
Install the libraries required for model training:
```bash
!pip install transformers accelerate evaluate datasets loralib peft -q
!pip install transformers accelerate evaluate datasets peft -q
```
## Authenticate to share your model
@ -80,14 +84,14 @@ num_labels = len(id2label)
## Prepare datasets for training and evaluation
Next, load the SegFormer image processor to prepare the images and annotations for the model. This dataset uses the
zero-index as the background class, so make sure to set `reduce_labels=True` to subtract one from all labels since the
zero-index as the background class, so make sure to set `do_reduce_labels=True` to subtract one from all labels since the
background class is not among the 150 classes.
```python
from transformers import AutoImageProcessor
checkpoint = "nvidia/mit-b0"
image_processor = AutoImageProcessor.from_pretrained(checkpoint, reduce_labels=True)
image_processor = AutoImageProcessor.from_pretrained(checkpoint, do_reduce_labels=True)
```
Add a function to apply data augmentation to the images, so that the model is more robust against overfitting. Here we use the
@ -180,7 +184,7 @@ def compute_metrics(eval_pred):
references=labels,
num_labels=len(id2label),
ignore_index=0,
reduce_labels=image_processor.reduce_labels,
reduce_labels=image_processor.do_reduce_labels,
)
per_category_accuracy = metrics.pop("per_category_accuracy").tolist()

View File

@ -1,251 +0,0 @@
# Prefix tuning for conditional generation
[[open-in-colab]]
Prefix tuning is an additive method where only a sequence of continuous task-specific vectors is attached to the beginning of the input, or *prefix*. Only the prefix parameters are optimized and added to the hidden states in every layer of the model. The tokens of the input sequence can still attend to the prefix as *virtual tokens*. As a result, prefix tuning stores 1000x fewer parameters than a fully finetuned model, which means you can use one large language model for many tasks.
<Tip>
💡 Read [Prefix-Tuning: Optimizing Continuous Prompts for Generation](https://arxiv.org/abs/2101.00190) to learn more about prefix tuning.
</Tip>
This guide will show you how to apply prefix tuning to train a [`t5-large`](https://huggingface.co/t5-large) model on the `sentences_allagree` subset of the [financial_phrasebank](https://huggingface.co/datasets/financial_phrasebank) dataset.
Before you begin, make sure you have all the necessary libraries installed:
```bash
!pip install -q peft transformers datasets
```
## Setup
Start by defining the model and tokenizer, text and label columns, and some hyperparameters so it'll be easier to start training faster later. Set the environment variable `TOKENIZERS_PARALLELSIM` to `false` to disable the fast Rust-based tokenizer which processes data in parallel by default so you can use multiprocessing in Python.
```py
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, default_data_collator, get_linear_schedule_with_warmup
from peft import get_peft_config, get_peft_model, get_peft_model_state_dict, PrefixTuningConfig, TaskType
from datasets import load_dataset
from torch.utils.data import DataLoader
from tqdm import tqdm
import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"
os.environ["CUDA_VISIBLE_DEVICES"] = "3"
device = "cuda"
model_name_or_path = "t5-large"
tokenizer_name_or_path = "t5-large"
text_column = "sentence"
label_column = "text_label"
max_length = 128
lr = 1e-2
num_epochs = 5
batch_size = 8
```
## Load dataset
For this guide, you'll train on the `sentences_allagree` subset of the [`financial_phrasebank`](https://huggingface.co/datasets/financial_phrasebank) dataset. This dataset contains financial news categorized by sentiment.
Use 🤗 [Datasets](https://huggingface.co/docs/datasets/index) [`~datasets.Dataset.train_test_split`] function to create a training and validation split and convert the `label` value to the more readable `text_label`. All of the changes can be applied with the [`~datasets.Dataset.map`] function:
```py
from datasets import load_dataset
dataset = load_dataset("financial_phrasebank", "sentences_allagree")
dataset = dataset["train"].train_test_split(test_size=0.1)
dataset["validation"] = dataset["test"]
del dataset["test"]
classes = dataset["train"].features["label"].names
dataset = dataset.map(
lambda x: {"text_label": [classes[label] for label in x["label"]]},
batched=True,
num_proc=1,
)
dataset["train"][0]
{"sentence": "Profit before taxes was EUR 4.0 mn , down from EUR 4.9 mn .", "label": 0, "text_label": "negative"}
```
## Preprocess dataset
Initialize a tokenizer, and create a function to pad and truncate the `model_inputs` and `labels`:
```py
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
def preprocess_function(examples):
inputs = examples[text_column]
targets = examples[label_column]
model_inputs = tokenizer(inputs, max_length=max_length, padding="max_length", truncation=True, return_tensors="pt")
labels = tokenizer(targets, max_length=2, padding="max_length", truncation=True, return_tensors="pt")
labels = labels["input_ids"]
labels[labels == tokenizer.pad_token_id] = -100
model_inputs["labels"] = labels
return model_inputs
```
Use the [`~datasets.Dataset.map`] function to apply the `preprocess_function` to the dataset. You can remove the unprocessed columns since the model doesn't need them anymore:
```py
processed_datasets = dataset.map(
preprocess_function,
batched=True,
num_proc=1,
remove_columns=dataset["train"].column_names,
load_from_cache_file=False,
desc="Running tokenizer on dataset",
)
```
Create a [`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader) from the `train` and `eval` datasets. Set `pin_memory=True` to speed up the data transfer to the GPU during training if the samples in your dataset are on a CPU.
```py
train_dataset = processed_datasets["train"]
eval_dataset = processed_datasets["validation"]
train_dataloader = DataLoader(
train_dataset, shuffle=True, collate_fn=default_data_collator, batch_size=batch_size, pin_memory=True
)
eval_dataloader = DataLoader(eval_dataset, collate_fn=default_data_collator, batch_size=batch_size, pin_memory=True)
```
## Train model
Now you can setup your model and make sure it is ready for training. Specify the task in [`PrefixTuningConfig`], create the base `t5-large` model from [`~transformers.AutoModelForSeq2SeqLM`], and then wrap the model and configuration in a [`PeftModel`]. Feel free to print the [`PeftModel`]'s parameters and compare it to fully training all the model parameters to see how much more efficient it is!
```py
peft_config = PrefixTuningConfig(task_type=TaskType.SEQ_2_SEQ_LM, inference_mode=False, num_virtual_tokens=20)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name_or_path)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()
"trainable params: 983040 || all params: 738651136 || trainable%: 0.13308583065659835"
```
Setup the optimizer and learning rate scheduler:
```py
optimizer = torch.optim.AdamW(model.parameters(), lr=lr)
lr_scheduler = get_linear_schedule_with_warmup(
optimizer=optimizer,
num_warmup_steps=0,
num_training_steps=(len(train_dataloader) * num_epochs),
)
```
Move the model to the GPU, and then write a training loop to begin!
```py
model = model.to(device)
for epoch in range(num_epochs):
model.train()
total_loss = 0
for step, batch in enumerate(tqdm(train_dataloader)):
batch = {k: v.to(device) for k, v in batch.items()}
outputs = model(**batch)
loss = outputs.loss
total_loss += loss.detach().float()
loss.backward()
optimizer.step()
lr_scheduler.step()
optimizer.zero_grad()
model.eval()
eval_loss = 0
eval_preds = []
for step, batch in enumerate(tqdm(eval_dataloader)):
batch = {k: v.to(device) for k, v in batch.items()}
with torch.no_grad():
outputs = model(**batch)
loss = outputs.loss
eval_loss += loss.detach().float()
eval_preds.extend(
tokenizer.batch_decode(torch.argmax(outputs.logits, -1).detach().cpu().numpy(), skip_special_tokens=True)
)
eval_epoch_loss = eval_loss / len(eval_dataloader)
eval_ppl = torch.exp(eval_epoch_loss)
train_epoch_loss = total_loss / len(train_dataloader)
train_ppl = torch.exp(train_epoch_loss)
print(f"{epoch=}: {train_ppl=} {train_epoch_loss=} {eval_ppl=} {eval_epoch_loss=}")
```
Let's see how well the model performs on the validation set:
```py
correct = 0
total = 0
for pred, true in zip(eval_preds, dataset["validation"]["text_label"]):
if pred.strip() == true.strip():
correct += 1
total += 1
accuracy = correct / total * 100
print(f"{accuracy=} % on the evaluation dataset")
print(f"{eval_preds[:10]=}")
print(f"{dataset['validation']['text_label'][:10]=}")
"accuracy=97.3568281938326 % on the evaluation dataset"
"eval_preds[:10]=['neutral', 'positive', 'neutral', 'positive', 'neutral', 'negative', 'negative', 'neutral', 'neutral', 'neutral']"
"dataset['validation']['text_label'][:10]=['neutral', 'positive', 'neutral', 'positive', 'neutral', 'negative', 'negative', 'neutral', 'neutral', 'neutral']"
```
97% accuracy in just a few minutes; pretty good!
## Share model
You can store and share your model on the Hub if you'd like. Login to your Hugging Face account and enter your token when prompted:
```py
from huggingface_hub import notebook_login
notebook_login()
```
Upload the model to a specifc model repository on the Hub with the [`~transformers.PreTrainedModel.push_to_hub`] function:
```py
peft_model_id = "your-name/t5-large_PREFIX_TUNING_SEQ2SEQ"
model.push_to_hub("your-name/t5-large_PREFIX_TUNING_SEQ2SEQ", use_auth_token=True)
```
If you check the model file size in the repository, you'll see that it is only 3.93MB! 🤏
## Inference
Once the model has been uploaded to the Hub, anyone can easily use it for inference. Load the configuration and model:
```py
from peft import PeftModel, PeftConfig
peft_model_id = "stevhliu/t5-large_PREFIX_TUNING_SEQ2SEQ"
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path)
model = PeftModel.from_pretrained(model, peft_model_id)
```
Get and tokenize some text about financial news:
```py
inputs = tokenizer(
"The Lithuanian beer market made up 14.41 million liters in January , a rise of 0.8 percent from the year-earlier figure , the Lithuanian Brewers ' Association reporting citing the results from its members .",
return_tensors="pt",
)
```
Put the model on a GPU and *generate* the predicted text sentiment:
```py
model.to(device)
with torch.no_grad():
inputs = {k: v.to(device) for k, v in inputs.items()}
outputs = model.generate(input_ids=inputs["input_ids"], max_new_tokens=10)
print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))
["positive"]
```

View File

@ -1,3 +1,7 @@
<!--⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# LoRA for token classification
Low-Rank Adaptation (LoRA) is a reparametrization method that aims to reduce the number of trainable parameters with low-rank representations. The weight matrix is broken down into low-rank matrices that are trained and updated. All the pretrained model parameters remain frozen. After training, the low-rank matrices are added back to the original weights. This makes it more efficient to store and train a LoRA model because there are significantly fewer parameters.

View File

@ -0,0 +1,141 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# PEFT integrations
PEFT's practical benefits extends to other Hugging Face libraries like [Diffusers](https://hf.co/docs/diffusers) and [Transformers](https://hf.co/docs/transformers). One of the main benefits of PEFT is that an adapter file generated by a PEFT method is a lot smaller than the original model, which makes it super easy to manage and use multiple adapters. You can use one pretrained base model for multiple tasks by simply loading a new adapter finetuned for the task you're solving. Or you can combine multiple adapters with a text-to-image diffusion model to create new effects.
This tutorial will show you how PEFT can help you manage adapters in Diffusers and Transformers.
## Diffusers
Diffusers is a generative AI library for creating images and videos from text or images with diffusion models. LoRA is an especially popular training method for diffusion models because you can very quickly train and share diffusion models to generate images in new styles. To make it easier to use and try multiple LoRA models, Diffusers uses the PEFT library to help manage different adapters for inference.
For example, load a base model and then load the [artificialguybr/3DRedmond-V1](https://huggingface.co/artificialguybr/3DRedmond-V1) adapter for inference with the [`load_lora_weights`](https://huggingface.co/docs/diffusers/v0.24.0/en/api/loaders/lora#diffusers.loaders.LoraLoaderMixin.load_lora_weights) method. The `adapter_name` argument in the loading method is enabled by PEFT and allows you to set a name for the adapter so it is easier to reference.
```py
import torch
from diffusers import DiffusionPipeline
pipeline = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16
).to("cuda")
pipeline.load_lora_weights(
"peft-internal-testing/artificialguybr__3DRedmond-V1",
weight_name="3DRedmond-3DRenderStyle-3DRenderAF.safetensors",
adapter_name="3d"
)
image = pipeline("sushi rolls shaped like kawaii cat faces").images[0]
image
```
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/ybelkada/documentation-images/resolve/main/test-lora-diffusers.png"/>
</div>
Now let's try another cool LoRA model, [ostris/super-cereal-sdxl-lora](https://huggingface.co/ostris/super-cereal-sdxl-lora). All you need to do is load and name this new adapter with `adapter_name`, and use the [`set_adapters`](https://huggingface.co/docs/diffusers/api/loaders/unet#diffusers.loaders.UNet2DConditionLoadersMixin.set_adapters) method to set it as the currently active adapter.
```py
pipeline.load_lora_weights(
"ostris/super-cereal-sdxl-lora",
weight_name="cereal_box_sdxl_v1.safetensors",
adapter_name="cereal"
)
pipeline.set_adapters("cereal")
image = pipeline("sushi rolls shaped like kawaii cat faces").images[0]
image
```
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/ybelkada/documentation-images/resolve/main/test-lora-diffusers-2.png"/>
</div>
Finally, you can call the [`disable_lora`](https://huggingface.co/docs/diffusers/api/loaders/unet#diffusers.loaders.UNet2DConditionLoadersMixin.disable_lora) method to restore the base model.
```py
pipeline.disable_lora()
```
Learn more about how PEFT supports Diffusers in the [Inference with PEFT](https://huggingface.co/docs/diffusers/tutorials/using_peft_for_inference) tutorial.
## Transformers
Transformers is a collection of pretrained models for all types of tasks in all modalities. You can load these models for training or inference. Many of the models are large language models (LLMs), so it makes sense to integrate PEFT with Transformers to manage and train adapters.
Load a base pretrained model to train.
```py
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m")
```
Next, add an adapter configuration to specify how to adapt the model parameters. Call the [`~PeftModel.add_adapter`] method to add the configuration to the base model.
```py
from peft import LoraConfig
config = LoraConfig(
lora_alpha=16,
lora_dropout=0.1,
r=64,
bias="none",
task_type="CAUSAL_LM"
)
model.add_adapter(peft_config)
```
Now you can train the model with Transformer's [`~transformers.Trainer`] class or whichever training framework you prefer.
To use the newly trained model for inference, the [`~transformers.AutoModel`] class uses PEFT on the backend to load the adapter weights and configuration file into a base pretrained model.
```py
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("ybelkada/opt-350m-lora")
```
If you're interested in comparing or using more than one adapter, you can also call the [`~PeftModel.add_adapter`] method to add the adapter configuration to the base model. The only requirement is the adapter type must be the same (you can't mix a LoRA and LoHa adapter).
```py
from transformers import AutoModelForCausalLM
from peft import LoraConfig
model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m")
model.add_adapter(lora_config_1, adapter_name="adapter_1")
```
Call [`~PeftModel.add_adapter`] again to attach a new adapter to the base model.
```py
model.add_adapter(lora_config_2, adapter_name="adapter_2")
```
Then you can use [`~PeftModel.set_adapter`] to set the currently active adapter.
```py
model.set_adapter("adapter_1")
output = model.generate(**inputs)
print(tokenizer.decode(output_disabled[0], skip_special_tokens=True))
```
To disable the adapter, call the [`~PeftModel.disable_adapter`] method.
```py
model.disable_adapter()
```
If you're curious, check out the [Load and train adapters with PEFT](https://huggingface.co/docs/transformers/main/peft) tutorial to learn more.

View File

@ -0,0 +1,182 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# PEFT configurations and models
The sheer size of today's large pretrained models - which commonly have billions of parameters - present a significant training challenge because they require more storage space and more computational power to crunch all those calculations. You'll need access to powerful GPUs or TPUs to train these large pretrained models which is expensive, not widely accessible to everyone, not environmentally friendly, and not very practical. PEFT methods address many of these challenges. There are several types of PEFT methods (soft prompting, matrix decomposition, adapters), but they all focus on the same thing, reduce the number of trainable parameters. This makes it more accessible to train and store large models on consumer hardware.
The PEFT library is designed to help you quickly train large models on free or low-cost GPUs, and in this tutorial, you'll learn how to setup a configuration to apply a PEFT method to a pretrained base model for training. Once the PEFT configuration is setup, you can use any training framework you like (Transformer's [`~transformers.Trainer`] class, [Accelerate](https://hf.co/docs/accelerate), a custom PyTorch training loop).
## PEFT configurations
<Tip>
Learn more about the parameters you can configure for each PEFT method in their respective API reference page.
</Tip>
A configuration stores important parameters that specify how a particular PEFT method should be applied.
For example, take a look at the following [`LoraConfig`](https://huggingface.co/ybelkada/opt-350m-lora/blob/main/adapter_config.json) for applying LoRA and [`PromptEncoderConfig`](https://huggingface.co/smangrul/roberta-large-peft-p-tuning/blob/main/adapter_config.json) for applying p-tuning (these configuration files are already JSON-serialized). Whenever you load a PEFT adapter, it is a good idea to check whether it has an associated adapter_config.json file which is required.
<hfoptions id="config">
<hfoption id="LoraConfig">
```json
{
"base_model_name_or_path": "facebook/opt-350m", #base model to apply LoRA to
"bias": "none",
"fan_in_fan_out": false,
"inference_mode": true,
"init_lora_weights": true,
"layers_pattern": null,
"layers_to_transform": null,
"lora_alpha": 32,
"lora_dropout": 0.05,
"modules_to_save": null,
"peft_type": "LORA", #PEFT method type
"r": 16,
"revision": null,
"target_modules": [
"q_proj", #model modules to apply LoRA to (query and value projection layers)
"v_proj"
],
"task_type": "CAUSAL_LM" #type of task to train model on
}
```
You can create your own configuration for training by initializing a [`LoraConfig`].
```py
from peft import LoraConfig, TaskType
lora_config = LoraConfig(
r=16,
target_modules=["q_proj", "v_proj"],
task_type=TaskType.CAUSAL_LM,
lora_alpha=32,
lora_dropout=0.05
)
```
</hfoption>
<hfoption id="PromptEncoderConfig">
```json
{
"base_model_name_or_path": "roberta-large", #base model to apply p-tuning to
"encoder_dropout": 0.0,
"encoder_hidden_size": 128,
"encoder_num_layers": 2,
"encoder_reparameterization_type": "MLP",
"inference_mode": true,
"num_attention_heads": 16,
"num_layers": 24,
"num_transformer_submodules": 1,
"num_virtual_tokens": 20,
"peft_type": "P_TUNING", #PEFT method type
"task_type": "SEQ_CLS", #type of task to train model on
"token_dim": 1024
}
```
You can create your own configuration for training by initializing a [`PromptEncoderConfig`].
```py
from peft import PromptEncoderConfig, TaskType
p_tuning_config = PromptEncoderConfig(
encoder_reprameterization_type="MLP",
encoder_hidden_size=128,
num_attention_heads=16,
num_layers=24,
num_transformer_submodules=1,
num_virtual_tokens=20,
token_dim=1024,
task_type=TaskType.SEQ_CLS
)
```
</hfoption>
</hfoptions>
## PEFT models
With a PEFT configuration in hand, you can now apply it to any pretrained model to create a [`PeftModel`]. Choose from any of the state-of-the-art models from the [Transformers](https://hf.co/docs/transformers) library, a custom model, and even new and unsupported transformer architectures.
For this tutorial, load a base [facebook/opt-350m](https://huggingface.co/facebook/opt-350m) model to finetune.
```py
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m")
```
Use the [`get_peft_model`] function to create a [`PeftModel`] from the base facebook/opt-350m model and the `lora_config` you created earlier.
```py
from peft import get_peft_model
lora_model = get_peft_model(model, lora_config)
lora_model.print_trainable_parameters()
"trainable params: 1,572,864 || all params: 332,769,280 || trainable%: 0.472659014678278"
```
Now you can train the [`PeftModel`] with your preferred training framework! After training, you can save your model locally with [`~PeftModel.save_pretrained`] or upload it to the Hub with the [`~transformers.PreTrainedModel.push_to_hub`] method.
```py
# save locally
lora_model.save_pretrained("your-name/opt-350m-lora")
# push to Hub
lora_model.push_to_hub("your-name/opt-350m-lora")
```
To load a [`PeftModel`] for inference, you'll need to provide the [`PeftConfig`] used to create it and the base model it was trained from.
```py
from peft import PeftModel, PeftConfig
config = PeftConfig.from_pretrained("ybelkada/opt-350m-lora")
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path)
lora_model = PeftModel.from_pretrained(model, "ybelkada/opt-350m-lora")
```
<Tip>
By default, the [`PeftModel`] is set for inference, but if you'd like to train the adapter some more you can set `is_trainable=True`.
```py
lora_model = PeftModel.from_pretrained(model, "ybelkada/opt-350m-lora", is_trainable=True)
```
</Tip>
The [`PeftModel.from_pretrained`] method is the most flexible way to load a [`PeftModel`] because it doesn't matter what model framework was used (Transformers, timm, a generic PyTorch model). Other classes, like [`AutoPeftModel`], are just a convenient wrapper around the base [`PeftModel`], and makes it easier to load PEFT models directly from the Hub or locally where the PEFT weights are stored.
```py
from peft import AutoPeftModelForCausalLM
lora_model = AutoPeftModelForCausalLM.from_pretrained("ybelkada/opt-350m-lora")
```
Take a look at the [AutoPeftModel](package_reference/auto_class) API reference to learn more about the [`AutoPeftModel`] classes.
## Next steps
With the appropriate [`PeftConfig`], you can apply it to any pretrained model to create a [`PeftModel`] and train large powerful models faster on freely available GPUs! To learn more about PEFT configurations and models, the following guide may be helpful:
* Learn how to configure a PEFT method for models that aren't from Transformers in the [Working with custom models](../developer_guides/custom_models) guide.

View File

@ -124,10 +124,10 @@
" inputs = [f\"{text_column} : {x} Label : \" for x in examples[text_column]]\n",
" targets = [str(x) for x in examples[label_column]]\n",
" model_inputs = tokenizer(inputs)\n",
" labels = tokenizer(targets)\n",
" labels = tokenizer(targets, add_special_tokens=False) # don't add bos token because we concatenate with inputs\n",
" for i in range(batch_size):\n",
" sample_input_ids = model_inputs[\"input_ids\"][i]\n",
" label_input_ids = labels[\"input_ids\"][i] + [tokenizer.pad_token_id]\n",
" label_input_ids = labels[\"input_ids\"][i] + [tokenizer.eos_token_id]\n",
" # print(i, sample_input_ids, label_input_ids)\n",
" model_inputs[\"input_ids\"][i] = sample_input_ids + label_input_ids\n",
" labels[\"input_ids\"][i] = [-100] * len(sample_input_ids) + label_input_ids\n",
@ -210,6 +210,23 @@
"print(next(iter(test_dataloader)))"
]
},
{
"cell_type": "markdown",
"id": "42b14a11",
"metadata": {},
"source": [
"You can load model from hub or local\n",
"\n",
"- Load model from Hugging Face Hub, you can change to your own model id\n",
"```python\n",
"peft_model_id = \"username/twitter_complaints_bigscience_bloomz-7b1_LORA_CAUSAL_LM\"\n",
"```\n",
"- Or load model form local\n",
"```python\n",
"peft_model_id = \"twitter_complaints_bigscience_bloomz-7b1_LORA_CAUSAL_LM\"\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 5,
@ -244,7 +261,6 @@
"\n",
"max_memory = {0: \"1GIB\", 1: \"1GIB\", 2: \"2GIB\", 3: \"10GIB\", \"cpu\": \"30GB\"}\n",
"peft_model_id = \"smangrul/twitter_complaints_bigscience_bloomz-7b1_LORA_CAUSAL_LM\"\n",
"\n",
"config = PeftConfig.from_pretrained(peft_model_id)\n",
"model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, device_map=\"auto\", max_memory=max_memory)\n",
"model = PeftModel.from_pretrained(model, peft_model_id, device_map=\"auto\", max_memory=max_memory)"

View File

@ -136,10 +136,10 @@ def main():
inputs = [f"{text_column} : {x} Label : " for x in examples[text_column]]
targets = [str(x) for x in examples[label_column]]
model_inputs = tokenizer(inputs)
labels = tokenizer(targets)
labels = tokenizer(targets, add_special_tokens=False) # don't add bos token because we concatenate with inputs
for i in range(batch_size):
sample_input_ids = model_inputs["input_ids"][i]
label_input_ids = labels["input_ids"][i] + [tokenizer.pad_token_id]
label_input_ids = labels["input_ids"][i] + [tokenizer.eos_token_id]
model_inputs["input_ids"][i] = sample_input_ids + label_input_ids
labels["input_ids"][i] = [-100] * len(sample_input_ids) + label_input_ids
model_inputs["attention_mask"][i] = [1] * len(model_inputs["input_ids"][i])
@ -349,12 +349,21 @@ def main():
pred_df.to_csv(f"data/{dataset_name}/predictions.csv", index=False)
accelerator.wait_for_everyone()
model.push_to_hub(
"smangrul/"
+ f"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}".replace("/", "_"),
state_dict=accelerator.get_state_dict(model),
use_auth_token=True,
# Option1: Pushing the model to Hugging Face Hub
# model.push_to_hub(
# f"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}".replace("/", "_"),
# token = "hf_..."
# )
# token (`bool` or `str`, *optional*):
# `token` is to be used for HTTP Bearer authorization when accessing remote files. If `True`, will use the token generated
# when running `huggingface-cli login` (stored in `~/.huggingface`). Will default to `True` if `repo_url`
# is not specified.
# Or you can get your token from https://huggingface.co/settings/token
# Option2: Saving the model locally
peft_model_id = f"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}".replace(
"/", "_"
)
model.save_pretrained(peft_model_id)
accelerator.wait_for_everyone()

File diff suppressed because it is too large Load Diff

View File

@ -173,10 +173,10 @@
" inputs = [f\"{text_column} : {x} Label : \" for x in examples[text_column]]\n",
" targets = [str(x) for x in examples[label_column]]\n",
" model_inputs = tokenizer(inputs)\n",
" labels = tokenizer(targets)\n",
" labels = tokenizer(targets, add_special_tokens=False) # don't add bos token because we concatenate with inputs\n",
" for i in range(batch_size):\n",
" sample_input_ids = model_inputs[\"input_ids\"][i]\n",
" label_input_ids = labels[\"input_ids\"][i] + [tokenizer.pad_token_id]\n",
" label_input_ids = labels[\"input_ids\"][i] + [tokenizer.eos_token_id]\n",
" # print(i, sample_input_ids, label_input_ids)\n",
" model_inputs[\"input_ids\"][i] = sample_input_ids + label_input_ids\n",
" labels[\"input_ids\"][i] = [-100] * len(sample_input_ids) + label_input_ids\n",
@ -1228,6 +1228,33 @@
" print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))"
]
},
{
"cell_type": "markdown",
"id": "0e21c49b",
"metadata": {},
"source": [
"You can push model to hub or save model locally. \n",
"\n",
"- Option1: Pushing the model to Hugging Face Hub\n",
"```python\n",
"model.push_to_hub(\n",
" f\"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}\".replace(\"/\", \"_\"),\n",
" token = \"hf_...\"\n",
")\n",
"```\n",
"token (`bool` or `str`, *optional*):\n",
" `token` is to be used for HTTP Bearer authorization when accessing remote files. If `True`, will use the token generated\n",
" when running `huggingface-cli login` (stored in `~/.huggingface`). Will default to `True` if `repo_url`\n",
" is not specified.\n",
" Or you can get your token from https://huggingface.co/settings/token\n",
"```\n",
"- Or save model locally\n",
"```python\n",
"peft_model_id = f\"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}\".replace(\"/\", \"_\")\n",
"model.save_pretrained(peft_model_id)\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 16,
@ -1236,7 +1263,9 @@
"outputs": [],
"source": [
"# saving model\n",
"peft_model_id = f\"{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}\"\n",
"peft_model_id = f\"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}\".replace(\n",
" \"/\", \"_\"\n",
")\n",
"model.save_pretrained(peft_model_id)"
]
},
@ -1260,7 +1289,9 @@
"source": [
"from peft import PeftModel, PeftConfig\n",
"\n",
"peft_model_id = f\"{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}\"\n",
"peft_model_id = f\"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}\".replace(\n",
" \"/\", \"_\"\n",
")\n",
"\n",
"config = PeftConfig.from_pretrained(peft_model_id)\n",
"model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path)\n",

View File

@ -83,10 +83,10 @@
" inputs = [f\"{text_column} : {x} Label : \" for x in examples[text_column]]\n",
" targets = [str(x) for x in examples[label_column]]\n",
" model_inputs = tokenizer(inputs)\n",
" labels = tokenizer(targets)\n",
" labels = tokenizer(targets, add_special_tokens=False) # don't add bos token because we concatenate with inputs\n",
" for i in range(batch_size):\n",
" sample_input_ids = model_inputs[\"input_ids\"][i]\n",
" label_input_ids = labels[\"input_ids\"][i] + [tokenizer.pad_token_id]\n",
" label_input_ids = labels[\"input_ids\"][i] + [tokenizer.eos_token_id]\n",
" # print(i, sample_input_ids, label_input_ids)\n",
" model_inputs[\"input_ids\"][i] = sample_input_ids + label_input_ids\n",
" labels[\"input_ids\"][i] = [-100] * len(sample_input_ids) + label_input_ids\n",
@ -1072,6 +1072,33 @@
" print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))"
]
},
{
"cell_type": "markdown",
"id": "c8f35152",
"metadata": {},
"source": [
"You can push model to hub or save model locally. \n",
"\n",
"- Option1: Pushing the model to Hugging Face Hub\n",
"```python\n",
"model.push_to_hub(\n",
" f\"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}\".replace(\"/\", \"_\"),\n",
" token = \"hf_...\"\n",
")\n",
"```\n",
"token (`bool` or `str`, *optional*):\n",
" `token` is to be used for HTTP Bearer authorization when accessing remote files. If `True`, will use the token generated\n",
" when running `huggingface-cli login` (stored in `~/.huggingface`). Will default to `True` if `repo_url`\n",
" is not specified.\n",
" Or you can get your token from https://huggingface.co/settings/token\n",
"```\n",
"- Or save model locally\n",
"```python\n",
"peft_model_id = f\"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}\".replace(\"/\", \"_\")\n",
"model.save_pretrained(peft_model_id)\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 12,
@ -1080,7 +1107,9 @@
"outputs": [],
"source": [
"# saving model\n",
"peft_model_id = f\"{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}\"\n",
"peft_model_id = f\"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}\".replace(\n",
" \"/\", \"_\"\n",
")\n",
"model.save_pretrained(peft_model_id)"
]
},
@ -1116,7 +1145,9 @@
"source": [
"from peft import PeftModel, PeftConfig\n",
"\n",
"peft_model_id = f\"{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}\"\n",
"peft_model_id = f\"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}\".replace(\n",
" \"/\", \"_\"\n",
")\n",
"\n",
"config = PeftConfig.from_pretrained(peft_model_id)\n",
"model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path)\n",

View File

@ -1,6 +1,5 @@
transformers
accelerate
loralib
evaluate
deepspeed
tqdm

View File

@ -0,0 +1,408 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "58ff91ca-ce92-43d0-ae8b-4e9e89e193f6",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from datasets import load_dataset\n",
"from transformers import set_seed, AutoModelForSeq2SeqLM, AutoTokenizer\n",
"from peft import get_peft_model, MultitaskPromptTuningConfig, TaskType, MultitaskPromptTuningInit\n",
"\n",
"set_seed(42)\n",
"\n",
"model_name = \"google/flan-t5-base\"\n",
"\n",
"peft_config = MultitaskPromptTuningConfig(\n",
" tokenizer_name_or_path=model_name,\n",
" num_tasks=2,\n",
" task_type=TaskType.SEQ_2_SEQ_LM,\n",
" prompt_tuning_init=MultitaskPromptTuningInit.TEXT,\n",
" num_virtual_tokens=50,\n",
" num_transformer_submodules=1,\n",
" prompt_tuning_init_text=\"classify the following into either positive or negative, or entailment, neutral or contradiction:\",\n",
")\n",
"\n",
"tokenizer = AutoTokenizer.from_pretrained(model_name)\n",
"model = AutoModelForSeq2SeqLM.from_pretrained(model_name)\n",
"model = get_peft_model(model, peft_config)\n",
"\n",
"model = model.cuda()\n",
"\n",
"\n",
"def send_to_device(batch):\n",
" for i in batch:\n",
" batch[i] = batch[i].cuda()\n",
" return batch"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "eb112bc1-ffaf-49fa-a216-0d601ec304ee",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"def get_sst2(split: str):\n",
" examples = load_dataset(\"sst2\")[split]\n",
" result_examples = []\n",
" for example in examples:\n",
" result_examples.append({})\n",
"\n",
" result_examples[-1][\"input\"] = example[\"sentence\"].strip() + \"</s>\"\n",
" result_examples[-1][\"output\"] = (\n",
" f\"positive{tokenizer.eos_token}\" if example[\"label\"] == 1 else f\"negative{tokenizer.eos_token}\"\n",
" )\n",
" result_examples[-1][\"task_id\"] = 0\n",
"\n",
" return result_examples\n",
"\n",
"\n",
"def get_mnli(split: str):\n",
" examples = load_dataset(\"multi_nli\")[split]\n",
" result_examples = []\n",
" for example in examples:\n",
" result_examples.append({})\n",
"\n",
" result_examples[-1][\"input\"] = example[\"premise\"].strip() + \" \" + example[\"hypothesis\"].strip() + \"</s>\"\n",
"\n",
" if example[\"label\"] == 0:\n",
" result_examples[-1][\"output\"] = f\"entailment{tokenizer.eos_token}\"\n",
" elif example[\"label\"] == 1:\n",
" result_examples[-1][\"output\"] = f\"neutral{tokenizer.eos_token}\"\n",
" else:\n",
" result_examples[-1][\"output\"] = f\"contradiction{tokenizer.eos_token}\"\n",
"\n",
" result_examples[-1][\"task_id\"] = 1\n",
"\n",
" return result_examples"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e5a16ec4-8fef-4ba9-95b6-a661eb51e50c",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from typing import Tuple\n",
"from torch.utils.data import Dataset, DataLoader\n",
"import torch\n",
"\n",
"\n",
"class MyDataset(Dataset):\n",
" def __init__(self, split: str, mode: str = \"source\") -> None:\n",
" super().__init__()\n",
"\n",
" if split == \"train\":\n",
" if mode == \"source\":\n",
" self.examples = get_sst2(split) + get_mnli(split)\n",
" elif mode == \"target\":\n",
" self.examples = get_sst2(split)\n",
" if split == \"val\":\n",
" self.examples = get_sst2(\"validation\")\n",
" if split == \"test\":\n",
" self.examples = get_sst2(\"validation\")\n",
"\n",
" def __getitem__(self, index) -> dict:\n",
" return self.examples[index]\n",
"\n",
" def __len__(self) -> int:\n",
" return len(self.examples)\n",
"\n",
" def __getitem__(self, index) -> dict:\n",
" return self.examples[index]\n",
"\n",
" def __len__(self) -> int:\n",
" return len(self.examples)\n",
"\n",
"\n",
"def collate_fn(batch: dict) -> Tuple[torch.Tensor, torch.Tensor]:\n",
" input = [i[\"input\"] for i in batch]\n",
" input = tokenizer(input, add_special_tokens=False, return_tensors=\"pt\", padding=True)\n",
"\n",
" output = [i[\"output\"] for i in batch]\n",
" output = tokenizer(output, add_special_tokens=False, return_tensors=\"pt\", padding=True).input_ids\n",
" output[output == tokenizer.pad_token_id] = -100\n",
"\n",
" task_ids = [i[\"task_id\"] for i in batch]\n",
" task_ids = torch.tensor(task_ids)\n",
"\n",
" return {\n",
" \"input_ids\": input.input_ids,\n",
" \"attention_mask\": input.attention_mask,\n",
" \"labels\": output,\n",
" \"task_ids\": task_ids,\n",
" }\n",
"\n",
"\n",
"train = DataLoader(MyDataset(\"train\"), shuffle=True, batch_size=8, collate_fn=collate_fn)\n",
"val = DataLoader(MyDataset(\"val\"), shuffle=False, batch_size=8, collate_fn=collate_fn)\n",
"test = DataLoader(MyDataset(\"test\"), shuffle=False, batch_size=8, collate_fn=collate_fn)"
]
},
{
"cell_type": "markdown",
"id": "fe0aec7b-f61e-4b00-a90e-c1201dc1f84c",
"metadata": {},
"source": [
"## source training"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cceecc94-f43a-4f62-8d45-926f2f02f36d",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from torch.optim.adamw import AdamW\n",
"from transformers import get_cosine_schedule_with_warmup\n",
"from tqdm import tqdm\n",
"from sklearn.metrics import f1_score"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "eae5516b-73ab-44a8-a083-4e8de6127f30",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"POSITIVE_TOKEN_ID = tokenizer(\" positive\", add_special_tokens=False)[\"input_ids\"][0]\n",
"NEGATIVE_TOKEN_ID = tokenizer(\" negative\", add_special_tokens=False)[\"input_ids\"][0]\n",
"\n",
"\n",
"def classify(batch):\n",
" batch = send_to_device(batch)\n",
" # we pass labels here since we need to generate and peft doesn't support generation yet.\n",
" # No clue how to get around this\n",
" scores = model(**batch).logits\n",
" preds = []\n",
" for i in range(scores.shape[0]):\n",
" if scores[i, 0, POSITIVE_TOKEN_ID] > scores[i, 0, NEGATIVE_TOKEN_ID]:\n",
" preds.append(POSITIVE_TOKEN_ID)\n",
" else:\n",
" preds.append(NEGATIVE_TOKEN_ID)\n",
" return preds\n",
"\n",
"\n",
"@torch.inference_mode()\n",
"def evaluate(model, data):\n",
" loss = 0\n",
" preds = []\n",
" golds = []\n",
"\n",
" for batch in tqdm(data):\n",
" batch = send_to_device(batch)\n",
" loss += model(**batch).loss\n",
" golds.extend(batch[\"labels\"][:, 0].tolist())\n",
" preds.extend(classify(batch))\n",
"\n",
" return loss / len(val), f1_score(golds, preds, pos_label=POSITIVE_TOKEN_ID)\n",
"\n",
"\n",
"optimizer = AdamW(model.parameters(), lr=1e-4)\n",
"scheduler = get_cosine_schedule_with_warmup(optimizer, 200, len(train))\n",
"\n",
"n = 1000\n",
"step = 0\n",
"train_ = tqdm(train)\n",
"\n",
"val_loss, f1 = evaluate(model, val)\n",
"print(\n",
" f\"\"\"\n",
"before source training\n",
"val loss = {val_loss}\n",
"f1 = {f1}\"\"\"\n",
")\n",
"\n",
"for batch in train_:\n",
" if step % n == 0:\n",
" val_loss, f1 = evaluate(model, val)\n",
" print(\n",
" f\"\"\"\n",
"step = {step}\n",
"val loss = {val_loss}\n",
"f1 = {f1}\"\"\"\n",
" )\n",
" model.save_pretrained(f\"checkpoints_source/{step}\")\n",
"\n",
" step += 1\n",
" batch = send_to_device(batch)\n",
" loss = model(**batch).loss\n",
" loss.backward()\n",
" optimizer.step()\n",
" scheduler.step()\n",
" train_.set_postfix(train_loss=loss)"
]
},
{
"cell_type": "markdown",
"id": "74168ef3-66f3-41a7-a40b-7840b103fbf9",
"metadata": {},
"source": [
"## target training"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b09fd456-163e-4dc1-b24d-f2d0d349036c",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"train = DataLoader(MyDataset(\"train\", \"target\"), shuffle=True, batch_size=8, collate_fn=collate_fn)\n",
"val = DataLoader(MyDataset(\"val\", \"target\"), shuffle=False, batch_size=8, collate_fn=collate_fn)\n",
"test = DataLoader(MyDataset(\"test\", \"target\"), shuffle=False, batch_size=8, collate_fn=collate_fn)"
]
},
{
"cell_type": "markdown",
"id": "4a539944-f16c-4c3f-bb4a-7b5d9a6042e2",
"metadata": {},
"source": [
"#### create a fresh model"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5520d904-aa6c-4654-9335-ed4e7d76cba2",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"peft_config = MultitaskPromptTuningConfig(\n",
" tokenizer_name_or_path=model_name,\n",
" num_tasks=1,\n",
" task_type=TaskType.SEQ_2_SEQ_LM,\n",
" prompt_tuning_init=MultitaskPromptTuningInit.EXACT_SOURCE_TASK,\n",
" prompt_tuning_init_state_dict_path=\"checkpoints_source/50000/adapter_model.bin\",\n",
" num_virtual_tokens=50,\n",
" num_transformer_submodules=1,\n",
")\n",
"\n",
"tokenizer = AutoTokenizer.from_pretrained(model_name)\n",
"model = AutoModelForSeq2SeqLM.from_pretrained(model_name)\n",
"model = get_peft_model(model, peft_config)\n",
"\n",
"model = model.cuda()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dfa39c2d-d1c5-4ed4-90f8-26e8e324371c",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"optimizer = AdamW(model.parameters(), lr=1e-4)\n",
"scheduler = get_cosine_schedule_with_warmup(optimizer, 200, len(train))\n",
"\n",
"n = 1000\n",
"step = 0\n",
"train_ = tqdm(train)\n",
"\n",
"val_loss, f1 = evaluate(model, val)\n",
"print(\n",
" f\"\"\"\n",
"before target training\n",
"val loss = {val_loss}\n",
"f1 = {f1}\"\"\"\n",
")\n",
"\n",
"for batch in train_:\n",
" if step % n == 0:\n",
" val_loss, f1 = evaluate(model, val)\n",
" print(\n",
" f\"\"\"\n",
"step = {step}\n",
"val loss = {val_loss}\n",
"f1 = {f1}\"\"\"\n",
" )\n",
" model.save_pretrained(f\"checkpoints_target/{step}\")\n",
"\n",
" step += 1\n",
" batch = send_to_device(batch)\n",
" loss = model(**batch).loss\n",
" loss.backward()\n",
" optimizer.step()\n",
" scheduler.step()\n",
" train_.set_postfix(train_loss=loss)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b6a6eeda-1e09-49a6-8845-cd96c8573145",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# load last checkpoint for now\n",
"from peft import set_peft_model_state_dict\n",
"\n",
"sd_6000 = torch.load(\"checkpoints_target/6000/adapter_model.bin\")\n",
"set_peft_model_state_dict(model, sd_6000)\n",
"\n",
"# evaluate val\n",
"val_loss, f1 = evaluate(model, val)\n",
"print(\n",
" f\"\"\"\n",
"final\n",
"val loss = {val_loss}\n",
"f1 = {f1}\"\"\"\n",
")\n",
"\n",
"# evaluate test\n",
"test_loss, f1 = evaluate(model, test)\n",
"print(\n",
" f\"\"\"\n",
"final\n",
"test loss = {test_loss}\n",
"f1 = {f1}\"\"\"\n",
")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

File diff suppressed because it is too large Load Diff

View File

@ -298,12 +298,22 @@ def main():
pred_df.to_csv(f"data/{dataset_name}/predictions.csv", index=False)
accelerator.wait_for_everyone()
model.push_to_hub(
"smangrul/"
+ f"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}".replace("/", "_"),
state_dict=accelerator.get_state_dict(model),
use_auth_token=True,
# Option1: Pushing the model to Hugging Face Hub
# model.push_to_hub(
# f"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}".replace("/", "_"),
# token = "hf_..."
# )
# token (`bool` or `str`, *optional*):
# `token` is to be used for HTTP Bearer authorization when accessing remote files. If `True`, will use the token generated
# when running `huggingface-cli login` (stored in `~/.huggingface`). Will default to `True` if `repo_url`
# is not specified.
# Or you can get your token from https://huggingface.co/settings/token
# Option2: Saving the model locally
peft_model_id = f"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}".replace(
"/", "_"
)
model.save_pretrained(peft_model_id)
accelerator.wait_for_everyone()

View File

@ -125,11 +125,19 @@ def main():
accelerator.print(f"{eval_preds[:10]=}")
accelerator.print(f"{dataset['validation'][label_column][:10]=}")
accelerator.wait_for_everyone()
model.push_to_hub(
"smangrul/" + f"{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}".replace("/", "_"),
state_dict=accelerator.get_state_dict(model),
use_auth_token=True,
)
# Option1: Pushing the model to Hugging Face Hub
# model.push_to_hub(
# f"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}".replace("/", "_"),
# token = "hf_..."
# )
# token (`bool` or `str`, *optional*):
# `token` is to be used for HTTP Bearer authorization when accessing remote files. If `True`, will use the token generated
# when running `huggingface-cli login` (stored in `~/.huggingface`). Will default to `True` if `repo_url`
# is not specified.
# Or you can get your token from https://huggingface.co/settings/token
# Option2: Saving the model locally
peft_model_id = f"{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}".replace("/", "_")
model.save_pretrained(peft_model_id)
accelerator.wait_for_everyone()

View File

@ -0,0 +1,804 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"id": "5f93b7d1",
"metadata": {
"ExecuteTime": {
"end_time": "2023-05-30T08:37:58.711225Z",
"start_time": "2023-05-30T08:37:56.881307Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"===================================BUG REPORT===================================\n",
"Welcome to bitsandbytes. For bug reports, please run\n",
"\n",
"python -m bitsandbytes\n",
"\n",
" and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues\n",
"================================================================================\n",
"bin /udir/tschilla/anaconda3/envs/peft/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda117.so\n",
"CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...\n",
"CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0\n",
"CUDA SETUP: Highest compute capability among GPUs detected: 8.0\n",
"CUDA SETUP: Detected CUDA version 117\n",
"CUDA SETUP: Loading binary /udir/tschilla/anaconda3/envs/peft/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/udir/tschilla/anaconda3/envs/peft/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /udir/tschilla/anaconda3 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...\n",
" warn(msg)\n",
"/udir/tschilla/anaconda3/envs/peft/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('Europe/Paris')}\n",
" warn(msg)\n",
"/udir/tschilla/anaconda3/envs/peft/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/udir/tschilla/.cache/dotnet_bundle_extract')}\n",
" warn(msg)\n",
"/udir/tschilla/anaconda3/envs/peft/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('5002'), PosixPath('http'), PosixPath('//127.0.0.1')}\n",
" warn(msg)\n",
"/udir/tschilla/anaconda3/envs/peft/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('() { ( alias;\\n eval ${which_declare} ) | /usr/bin/which --tty-only --read-alias --read-functions --show-tilde --show-dot $@\\n}')}\n",
" warn(msg)\n",
"/udir/tschilla/anaconda3/envs/peft/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('module'), PosixPath('//matplotlib_inline.backend_inline')}\n",
" warn(msg)\n",
"/udir/tschilla/anaconda3/envs/peft/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0'), PosixPath('/usr/local/cuda/lib64/libcudart.so')}.. We'll flip a coin and try one of these, in order to fail forward.\n",
"Either way, this might cause trouble in the future:\n",
"If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.\n",
" warn(msg)\n"
]
}
],
"source": [
"import os\n",
"\n",
"import torch\n",
"from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, default_data_collator, get_linear_schedule_with_warmup\n",
"from peft import get_peft_model, PromptTuningConfig, TaskType, PromptTuningInit\n",
"from torch.utils.data import DataLoader\n",
"from tqdm import tqdm\n",
"from datasets import load_dataset\n",
"\n",
"os.environ[\"TOKENIZERS_PARALLELISM\"] = \"false\"\n",
"\n",
"device = \"cuda\"\n",
"model_name_or_path = \"t5-large\"\n",
"tokenizer_name_or_path = \"t5-large\"\n",
"\n",
"checkpoint_name = \"financial_sentiment_analysis_prompt_tuning_v1.pt\"\n",
"text_column = \"sentence\"\n",
"label_column = \"text_label\"\n",
"max_length = 128\n",
"lr = 1\n",
"num_epochs = 5\n",
"batch_size = 8"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "8d0850ac",
"metadata": {
"ExecuteTime": {
"end_time": "2023-05-30T08:38:12.413984Z",
"start_time": "2023-05-30T08:38:04.601042Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"trainable params: 40960 || all params: 737709056 || trainable%: 0.005552324411210698\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/udir/tschilla/anaconda3/envs/peft/lib/python3.9/site-packages/transformers/models/t5/tokenization_t5_fast.py:155: FutureWarning: This tokenizer was incorrectly instantiated with a model max length of 512 which will be corrected in Transformers v5.\n",
"For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.\n",
"- Be aware that you SHOULD NOT rely on t5-large automatically truncating your input to 512 when padding/encoding.\n",
"- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.\n",
"- To avoid this warning, please instantiate this tokenizer with `model_max_length` set to your preferred value.\n",
" warnings.warn(\n"
]
},
{
"data": {
"text/plain": [
"PeftModelForSeq2SeqLM(\n",
" (base_model): T5ForConditionalGeneration(\n",
" (shared): Embedding(32128, 1024)\n",
" (encoder): T5Stack(\n",
" (embed_tokens): Embedding(32128, 1024)\n",
" (block): ModuleList(\n",
" (0): T5Block(\n",
" (layer): ModuleList(\n",
" (0): T5LayerSelfAttention(\n",
" (SelfAttention): T5Attention(\n",
" (q): Linear(in_features=1024, out_features=1024, bias=False)\n",
" (k): Linear(in_features=1024, out_features=1024, bias=False)\n",
" (v): Linear(in_features=1024, out_features=1024, bias=False)\n",
" (o): Linear(in_features=1024, out_features=1024, bias=False)\n",
" (relative_attention_bias): Embedding(32, 16)\n",
" )\n",
" (layer_norm): T5LayerNorm()\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" (1): T5LayerFF(\n",
" (DenseReluDense): T5DenseActDense(\n",
" (wi): Linear(in_features=1024, out_features=4096, bias=False)\n",
" (wo): Linear(in_features=4096, out_features=1024, bias=False)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" (act): ReLU()\n",
" )\n",
" (layer_norm): T5LayerNorm()\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" )\n",
" )\n",
" (1-23): 23 x T5Block(\n",
" (layer): ModuleList(\n",
" (0): T5LayerSelfAttention(\n",
" (SelfAttention): T5Attention(\n",
" (q): Linear(in_features=1024, out_features=1024, bias=False)\n",
" (k): Linear(in_features=1024, out_features=1024, bias=False)\n",
" (v): Linear(in_features=1024, out_features=1024, bias=False)\n",
" (o): Linear(in_features=1024, out_features=1024, bias=False)\n",
" )\n",
" (layer_norm): T5LayerNorm()\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" (1): T5LayerFF(\n",
" (DenseReluDense): T5DenseActDense(\n",
" (wi): Linear(in_features=1024, out_features=4096, bias=False)\n",
" (wo): Linear(in_features=4096, out_features=1024, bias=False)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" (act): ReLU()\n",
" )\n",
" (layer_norm): T5LayerNorm()\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" )\n",
" )\n",
" )\n",
" (final_layer_norm): T5LayerNorm()\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" (decoder): T5Stack(\n",
" (embed_tokens): Embedding(32128, 1024)\n",
" (block): ModuleList(\n",
" (0): T5Block(\n",
" (layer): ModuleList(\n",
" (0): T5LayerSelfAttention(\n",
" (SelfAttention): T5Attention(\n",
" (q): Linear(in_features=1024, out_features=1024, bias=False)\n",
" (k): Linear(in_features=1024, out_features=1024, bias=False)\n",
" (v): Linear(in_features=1024, out_features=1024, bias=False)\n",
" (o): Linear(in_features=1024, out_features=1024, bias=False)\n",
" (relative_attention_bias): Embedding(32, 16)\n",
" )\n",
" (layer_norm): T5LayerNorm()\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" (1): T5LayerCrossAttention(\n",
" (EncDecAttention): T5Attention(\n",
" (q): Linear(in_features=1024, out_features=1024, bias=False)\n",
" (k): Linear(in_features=1024, out_features=1024, bias=False)\n",
" (v): Linear(in_features=1024, out_features=1024, bias=False)\n",
" (o): Linear(in_features=1024, out_features=1024, bias=False)\n",
" )\n",
" (layer_norm): T5LayerNorm()\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" (2): T5LayerFF(\n",
" (DenseReluDense): T5DenseActDense(\n",
" (wi): Linear(in_features=1024, out_features=4096, bias=False)\n",
" (wo): Linear(in_features=4096, out_features=1024, bias=False)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" (act): ReLU()\n",
" )\n",
" (layer_norm): T5LayerNorm()\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" )\n",
" )\n",
" (1-23): 23 x T5Block(\n",
" (layer): ModuleList(\n",
" (0): T5LayerSelfAttention(\n",
" (SelfAttention): T5Attention(\n",
" (q): Linear(in_features=1024, out_features=1024, bias=False)\n",
" (k): Linear(in_features=1024, out_features=1024, bias=False)\n",
" (v): Linear(in_features=1024, out_features=1024, bias=False)\n",
" (o): Linear(in_features=1024, out_features=1024, bias=False)\n",
" )\n",
" (layer_norm): T5LayerNorm()\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" (1): T5LayerCrossAttention(\n",
" (EncDecAttention): T5Attention(\n",
" (q): Linear(in_features=1024, out_features=1024, bias=False)\n",
" (k): Linear(in_features=1024, out_features=1024, bias=False)\n",
" (v): Linear(in_features=1024, out_features=1024, bias=False)\n",
" (o): Linear(in_features=1024, out_features=1024, bias=False)\n",
" )\n",
" (layer_norm): T5LayerNorm()\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" (2): T5LayerFF(\n",
" (DenseReluDense): T5DenseActDense(\n",
" (wi): Linear(in_features=1024, out_features=4096, bias=False)\n",
" (wo): Linear(in_features=4096, out_features=1024, bias=False)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" (act): ReLU()\n",
" )\n",
" (layer_norm): T5LayerNorm()\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" )\n",
" )\n",
" )\n",
" (final_layer_norm): T5LayerNorm()\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" (lm_head): Linear(in_features=1024, out_features=32128, bias=False)\n",
" )\n",
" (prompt_encoder): ModuleDict(\n",
" (default): PromptEmbedding(\n",
" (embedding): Embedding(40, 1024)\n",
" )\n",
" )\n",
" (word_embeddings): Embedding(32128, 1024)\n",
")"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# creating model\n",
"peft_config = PromptTuningConfig(\n",
" task_type=TaskType.SEQ_2_SEQ_LM,\n",
" prompt_tuning_init=PromptTuningInit.TEXT,\n",
" num_virtual_tokens=20,\n",
" prompt_tuning_init_text=\"What is the sentiment of this article?\\n\",\n",
" inference_mode=False,\n",
" tokenizer_name_or_path=model_name_or_path,\n",
")\n",
"\n",
"model = AutoModelForSeq2SeqLM.from_pretrained(model_name_or_path)\n",
"model = get_peft_model(model, peft_config)\n",
"model.print_trainable_parameters()\n",
"model"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "4ee2babf",
"metadata": {
"ExecuteTime": {
"end_time": "2023-05-30T08:38:18.759143Z",
"start_time": "2023-05-30T08:38:17.881621Z"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Found cached dataset financial_phrasebank (/data/proxem/huggingface/datasets/financial_phrasebank/sentences_allagree/1.0.0/550bde12e6c30e2674da973a55f57edde5181d53f5a5a34c1531c53f93b7e141)\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "fb63f50cb7cb4f5aae10648ba74d6c4e",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/1 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Map: 0%| | 0/2037 [00:00<?, ? examples/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Map: 0%| | 0/227 [00:00<?, ? examples/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"{'sentence': '`` Lining stone sales were also good in the early autumn , and order books are strong to the end of the year .',\n",
" 'label': 2,\n",
" 'text_label': 'positive'}"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# loading dataset\n",
"dataset = load_dataset(\"financial_phrasebank\", \"sentences_allagree\")\n",
"dataset = dataset[\"train\"].train_test_split(test_size=0.1)\n",
"dataset[\"validation\"] = dataset[\"test\"]\n",
"del dataset[\"test\"]\n",
"\n",
"classes = dataset[\"train\"].features[\"label\"].names\n",
"dataset = dataset.map(\n",
" lambda x: {\"text_label\": [classes[label] for label in x[\"label\"]]},\n",
" batched=True,\n",
" num_proc=1,\n",
")\n",
"\n",
"dataset[\"train\"][0]"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "adf9608c",
"metadata": {
"ExecuteTime": {
"end_time": "2023-05-30T08:38:21.132266Z",
"start_time": "2023-05-30T08:38:20.340722Z"
}
},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Running tokenizer on dataset: 0%| | 0/2037 [00:00<?, ? examples/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Running tokenizer on dataset: 0%| | 0/227 [00:00<?, ? examples/s]"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# data preprocessing\n",
"tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)\n",
"target_max_length = max([len(tokenizer(class_label)[\"input_ids\"]) for class_label in classes])\n",
"\n",
"\n",
"def preprocess_function(examples):\n",
" inputs = examples[text_column]\n",
" targets = examples[label_column]\n",
" model_inputs = tokenizer(inputs, max_length=max_length, padding=\"max_length\", truncation=True, return_tensors=\"pt\")\n",
" labels = tokenizer(\n",
" targets, max_length=target_max_length, padding=\"max_length\", truncation=True, return_tensors=\"pt\"\n",
" )\n",
" labels = labels[\"input_ids\"]\n",
" labels[labels == tokenizer.pad_token_id] = -100\n",
" model_inputs[\"labels\"] = labels\n",
" return model_inputs\n",
"\n",
"\n",
"processed_datasets = dataset.map(\n",
" preprocess_function,\n",
" batched=True,\n",
" num_proc=1,\n",
" remove_columns=dataset[\"train\"].column_names,\n",
" load_from_cache_file=False,\n",
" desc=\"Running tokenizer on dataset\",\n",
")\n",
"\n",
"train_dataset = processed_datasets[\"train\"]\n",
"eval_dataset = processed_datasets[\"validation\"]\n",
"\n",
"train_dataloader = DataLoader(\n",
" train_dataset, shuffle=True, collate_fn=default_data_collator, batch_size=batch_size, pin_memory=True\n",
")\n",
"eval_dataloader = DataLoader(eval_dataset, collate_fn=default_data_collator, batch_size=batch_size, pin_memory=True)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "f733a3c6",
"metadata": {
"ExecuteTime": {
"end_time": "2023-05-30T08:38:22.907922Z",
"start_time": "2023-05-30T08:38:22.901057Z"
}
},
"outputs": [],
"source": [
"# optimizer and lr scheduler\n",
"optimizer = torch.optim.AdamW(model.parameters(), lr=lr)\n",
"lr_scheduler = get_linear_schedule_with_warmup(\n",
" optimizer=optimizer,\n",
" num_warmup_steps=0,\n",
" num_training_steps=(len(train_dataloader) * num_epochs),\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "6b3a4090",
"metadata": {
"ExecuteTime": {
"end_time": "2023-05-30T08:42:29.409070Z",
"start_time": "2023-05-30T08:38:50.102263Z"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 255/255 [00:42<00:00, 6.05it/s]\n",
"100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:02<00:00, 14.40it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch=0: train_ppl=tensor(8.0846, device='cuda:0') train_epoch_loss=tensor(2.0900, device='cuda:0') eval_ppl=tensor(1.3542, device='cuda:0') eval_epoch_loss=tensor(0.3032, device='cuda:0')\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 255/255 [00:41<00:00, 6.15it/s]\n",
"100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:02<00:00, 14.42it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch=1: train_ppl=tensor(1.5088, device='cuda:0') train_epoch_loss=tensor(0.4113, device='cuda:0') eval_ppl=tensor(1.2692, device='cuda:0') eval_epoch_loss=tensor(0.2384, device='cuda:0')\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 255/255 [00:41<00:00, 6.18it/s]\n",
"100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:02<00:00, 14.45it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch=2: train_ppl=tensor(1.5322, device='cuda:0') train_epoch_loss=tensor(0.4267, device='cuda:0') eval_ppl=tensor(1.2065, device='cuda:0') eval_epoch_loss=tensor(0.1877, device='cuda:0')\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 255/255 [00:41<00:00, 6.17it/s]\n",
"100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:02<00:00, 14.38it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch=3: train_ppl=tensor(1.4475, device='cuda:0') train_epoch_loss=tensor(0.3699, device='cuda:0') eval_ppl=tensor(1.2346, device='cuda:0') eval_epoch_loss=tensor(0.2107, device='cuda:0')\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 255/255 [00:42<00:00, 5.94it/s]\n",
"100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:02<00:00, 14.42it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch=4: train_ppl=tensor(1.3428, device='cuda:0') train_epoch_loss=tensor(0.2948, device='cuda:0') eval_ppl=tensor(1.2041, device='cuda:0') eval_epoch_loss=tensor(0.1857, device='cuda:0')\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n"
]
}
],
"source": [
"# training and evaluation\n",
"model = model.to(device)\n",
"\n",
"for epoch in range(num_epochs):\n",
" model.train()\n",
" total_loss = 0\n",
" for step, batch in enumerate(tqdm(train_dataloader)):\n",
" batch = {k: v.to(device) for k, v in batch.items()}\n",
" outputs = model(**batch)\n",
" loss = outputs.loss\n",
" total_loss += loss.detach().float()\n",
" loss.backward()\n",
" optimizer.step()\n",
" lr_scheduler.step()\n",
" optimizer.zero_grad()\n",
"\n",
" model.eval()\n",
" eval_loss = 0\n",
" eval_preds = []\n",
" for step, batch in enumerate(tqdm(eval_dataloader)):\n",
" batch = {k: v.to(device) for k, v in batch.items()}\n",
" with torch.no_grad():\n",
" outputs = model(**batch)\n",
" loss = outputs.loss\n",
" eval_loss += loss.detach().float()\n",
" eval_preds.extend(\n",
" tokenizer.batch_decode(torch.argmax(outputs.logits, -1).detach().cpu().numpy(), skip_special_tokens=True)\n",
" )\n",
"\n",
" eval_epoch_loss = eval_loss / len(eval_dataloader)\n",
" eval_ppl = torch.exp(eval_epoch_loss)\n",
" train_epoch_loss = total_loss / len(train_dataloader)\n",
" train_ppl = torch.exp(train_epoch_loss)\n",
" print(f\"{epoch=}: {train_ppl=} {train_epoch_loss=} {eval_ppl=} {eval_epoch_loss=}\")"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "6cafa67b",
"metadata": {
"ExecuteTime": {
"end_time": "2023-05-30T08:42:42.844671Z",
"start_time": "2023-05-30T08:42:42.840447Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"accuracy=85.46255506607929 % on the evaluation dataset\n",
"eval_preds[:10]=['neutral', 'neutral', 'neutral', 'neutral', 'neutral', 'positive', 'neutral', 'negative', 'neutral', 'positive']\n",
"dataset['validation']['text_label'][:10]=['neutral', 'neutral', 'neutral', 'neutral', 'neutral', 'positive', 'neutral', 'negative', 'positive', 'neutral']\n"
]
}
],
"source": [
"# print accuracy\n",
"correct = 0\n",
"total = 0\n",
"for pred, true in zip(eval_preds, dataset[\"validation\"][\"text_label\"]):\n",
" if pred.strip() == true.strip():\n",
" correct += 1\n",
" total += 1\n",
"accuracy = correct / total * 100\n",
"print(f\"{accuracy=} % on the evaluation dataset\")\n",
"print(f\"{eval_preds[:10]=}\")\n",
"print(f\"{dataset['validation']['text_label'][:10]=}\")"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "a8de6005",
"metadata": {
"ExecuteTime": {
"end_time": "2023-05-30T08:42:45.752765Z",
"start_time": "2023-05-30T08:42:45.742397Z"
}
},
"outputs": [],
"source": [
"# saving model\n",
"peft_model_id = f\"{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}\"\n",
"model.save_pretrained(peft_model_id)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "bd20cd4c",
"metadata": {
"ExecuteTime": {
"end_time": "2023-05-30T08:42:47.660873Z",
"start_time": "2023-05-30T08:42:47.488293Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"164K\tt5-large_PROMPT_TUNING_SEQ_2_SEQ_LM/adapter_model.bin\r\n"
]
}
],
"source": [
"ckpt = f\"{peft_model_id}/adapter_model.bin\"\n",
"!du -h $ckpt"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "76c2fc29",
"metadata": {
"ExecuteTime": {
"end_time": "2023-05-30T08:42:56.721990Z",
"start_time": "2023-05-30T08:42:49.060700Z"
}
},
"outputs": [],
"source": [
"from peft import PeftModel, PeftConfig\n",
"\n",
"peft_model_id = f\"{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}\"\n",
"\n",
"config = PeftConfig.from_pretrained(peft_model_id)\n",
"model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path)\n",
"model = PeftModel.from_pretrained(model, peft_model_id)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "d997f1cc",
"metadata": {
"ExecuteTime": {
"end_time": "2023-05-30T08:42:59.600916Z",
"start_time": "2023-05-30T08:42:58.961468Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Danske Bank is Denmark 's largest bank with 3.5 million customers .\n",
"tensor([[ 3039, 1050, 1925, 19, 18001, 3, 31, 7, 2015, 2137,\n",
" 28, 3, 9285, 770, 722, 3, 5, 1]])\n",
"tensor([[ 0, 7163, 1]])\n",
"['neutral']\n"
]
}
],
"source": [
"model.eval()\n",
"i = 107\n",
"input_ids = tokenizer(dataset[\"validation\"][text_column][i], return_tensors=\"pt\").input_ids\n",
"print(dataset[\"validation\"][text_column][i])\n",
"print(input_ids)\n",
"\n",
"with torch.no_grad():\n",
" outputs = model.generate(input_ids=input_ids, max_new_tokens=10)\n",
" print(outputs)\n",
" print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "peft",
"language": "python",
"name": "peft"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
},
"varInspector": {
"cols": {
"lenName": 16,
"lenType": 16,
"lenVar": 40
},
"kernels_config": {
"python": {
"delete_cmd_postfix": "",
"delete_cmd_prefix": "del ",
"library": "var_list.py",
"varRefreshCmd": "print(var_dic_list())"
},
"r": {
"delete_cmd_postfix": ") ",
"delete_cmd_prefix": "rm(",
"library": "var_list.r",
"varRefreshCmd": "cat(var_dic_list()) "
}
},
"types_to_exclude": [
"module",
"function",
"builtin_function_or_method",
"instance",
"_Feature"
],
"window_display": false
},
"vscode": {
"interpreter": {
"hash": "aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49"
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}

File diff suppressed because one or more lines are too long

View File

@ -1,6 +1,5 @@
transformers
accelerate
loralib
evaluate
deepspeed
tqdm

View File

@ -0,0 +1,495 @@
# coding=utf-8
# Copyright 2023-present the HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import logging
import math
import os
import random
from pathlib import Path
import datasets
import evaluate
import torch
import transformers
from accelerate import Accelerator
from accelerate.logging import get_logger
from accelerate.utils import set_seed
from datasets import DatasetDict, load_dataset
from huggingface_hub import Repository, create_repo
from torch import nn
from torch.utils.data import DataLoader
from tqdm import tqdm
from transformers import AutoModel, AutoTokenizer, SchedulerType, default_data_collator, get_scheduler
from transformers.utils import get_full_repo_name
from peft import LoraConfig, TaskType, get_peft_model
logger = get_logger(__name__)
def parse_args():
parser = argparse.ArgumentParser(description="Training a PEFT model for Sematic Search task")
parser.add_argument("--dataset_name", type=str, default=None, help="dataset name on HF hub")
parser.add_argument(
"--max_length",
type=int,
default=128,
help=(
"The maximum total input sequence length after tokenization. Sequences longer than this will be truncated,"
" sequences shorter will be padded if `--pad_to_max_length` is passed."
),
)
parser.add_argument(
"--model_name_or_path",
type=str,
help="Path to pretrained model or model identifier from huggingface.co/models.",
required=True,
)
parser.add_argument(
"--per_device_train_batch_size",
type=int,
default=8,
help="Batch size (per device) for the training dataloader.",
)
parser.add_argument(
"--per_device_eval_batch_size",
type=int,
default=8,
help="Batch size (per device) for the evaluation dataloader.",
)
parser.add_argument(
"--learning_rate",
type=float,
default=5e-5,
help="Initial learning rate (after the potential warmup period) to use.",
)
parser.add_argument("--weight_decay", type=float, default=0.0, help="Weight decay to use.")
parser.add_argument("--num_train_epochs", type=int, default=3, help="Total number of training epochs to perform.")
parser.add_argument(
"--max_train_steps",
type=int,
default=None,
help="Total number of training steps to perform. If provided, overrides num_train_epochs.",
)
parser.add_argument(
"--gradient_accumulation_steps",
type=int,
default=1,
help="Number of updates steps to accumulate before performing a backward/update pass.",
)
parser.add_argument(
"--lr_scheduler_type",
type=SchedulerType,
default="linear",
help="The scheduler type to use.",
choices=["linear", "cosine", "cosine_with_restarts", "polynomial", "constant", "constant_with_warmup"],
)
parser.add_argument(
"--num_warmup_steps", type=int, default=0, help="Number of steps for the warmup in the lr scheduler."
)
parser.add_argument("--output_dir", type=str, default=None, help="Where to store the final model.")
parser.add_argument("--seed", type=int, default=None, help="A seed for reproducible training.")
parser.add_argument("--push_to_hub", action="store_true", help="Whether or not to push the model to the Hub.")
parser.add_argument(
"--hub_model_id", type=str, help="The name of the repository to keep in sync with the local `output_dir`."
)
parser.add_argument("--hub_token", type=str, help="The token to use to push to the Model Hub.")
parser.add_argument(
"--checkpointing_steps",
type=str,
default=None,
help="Whether the various states should be saved at the end of every n steps, or 'epoch' for each epoch.",
)
parser.add_argument(
"--resume_from_checkpoint",
type=str,
default=None,
help="If the training should continue from a checkpoint folder.",
)
parser.add_argument(
"--with_tracking",
action="store_true",
help="Whether to enable experiment trackers for logging.",
)
parser.add_argument(
"--report_to",
type=str,
default="all",
help=(
'The integration to report the results and logs to. Supported platforms are `"tensorboard"`,'
' `"wandb"`, `"comet_ml"` and `"clearml"`. Use `"all"` (default) to report to all integrations.'
"Only applicable when `--with_tracking` is passed."
),
)
parser.add_argument(
"--sanity_test",
action="store_true",
help="Whether to enable sanity test.",
)
parser.add_argument(
"--use_peft",
action="store_true",
help="Whether to use PEFT.",
)
args = parser.parse_args()
if args.push_to_hub:
assert args.output_dir is not None, "Need an `output_dir` to create a repo when `--push_to_hub` is passed."
return args
def save_model_hook(models, weights, output_dir):
for i, model in enumerate(models):
model.save_pretrained(output_dir, state_dict=weights[i])
# make sure to pop weight so that corresponding model is not saved again
weights.pop()
def load_model_hook(models, input_dir):
while len(models) > 0:
model = models.pop()
# pop models so that they are not loaded again
if hasattr(model, "active_adapter") and hasattr(model, "load_adapter"):
model.load_adapter(input_dir, model.active_adapter, is_trainable=True)
class AutoModelForSentenceEmbedding(nn.Module):
def __init__(self, model_name, tokenizer, normalize=True):
super(AutoModelForSentenceEmbedding, self).__init__()
self.model = AutoModel.from_pretrained(model_name) # , load_in_8bit=True, device_map={"":0})
self.normalize = normalize
self.tokenizer = tokenizer
def forward(self, **kwargs):
model_output = self.model(**kwargs)
embeddings = self.mean_pooling(model_output, kwargs["attention_mask"])
if self.normalize:
embeddings = torch.nn.functional.normalize(embeddings, p=2, dim=1)
return embeddings
def mean_pooling(self, model_output, attention_mask):
token_embeddings = model_output[0] # First element of model_output contains all token embeddings
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
def __getattr__(self, name: str):
"""Forward missing attributes to the wrapped module."""
try:
return super().__getattr__(name) # defer to nn.Module's logic
except AttributeError:
return getattr(self.model, name)
def get_cosing_embeddings(query_embs, product_embs):
return torch.sum(query_embs * product_embs, axis=1)
def get_loss(cosine_score, labels):
return torch.mean(torch.square(labels * (1 - cosine_score) + torch.clamp((1 - labels) * cosine_score, min=0.0)))
def main():
args = parse_args()
accelerator_kwargs = {"gradient_accumulation_steps": args.gradient_accumulation_steps}
if args.with_tracking:
accelerator_kwargs["log_with"] = args.report_to
accelerator_kwargs["project_dir"] = args.output_dir
accelerator = Accelerator(**accelerator_kwargs)
# Make one log on every process with the configuration for debugging.
logging.basicConfig(
format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
datefmt="%m/%d/%Y %H:%M:%S",
level=logging.INFO,
)
logger.info(accelerator.state, main_process_only=False)
if accelerator.is_local_main_process:
datasets.utils.logging.set_verbosity_warning()
transformers.utils.logging.set_verbosity_info()
else:
datasets.utils.logging.set_verbosity_error()
transformers.utils.logging.set_verbosity_error()
# If passed along, set the training seed now.
if args.seed is not None:
set_seed(args.seed)
# Handle the repository creation
if accelerator.is_main_process:
if args.push_to_hub:
if args.hub_model_id is None:
repo_name = get_full_repo_name(Path(args.output_dir).name, token=args.hub_token)
else:
repo_name = args.hub_model_id
create_repo(repo_name, exist_ok=True, token=args.hub_token)
repo = Repository(args.output_dir, clone_from=repo_name, token=args.hub_token)
with open(os.path.join(args.output_dir, ".gitignore"), "w+") as gitignore:
if "step_*" not in gitignore:
gitignore.write("step_*\n")
if "epoch_*" not in gitignore:
gitignore.write("epoch_*\n")
elif args.output_dir is not None:
os.makedirs(args.output_dir, exist_ok=True)
accelerator.wait_for_everyone()
# get the tokenizer
tokenizer = AutoTokenizer.from_pretrained(args.model_name_or_path)
# dataset download and preprocessing
if args.sanity_test:
train_dataset = load_dataset("smangrul/amazon_esci", split="train[:1024]")
val_dataset = load_dataset("smangrul/amazon_esci", split="validation[:1024]")
dataset = DatasetDict({"train": train_dataset, "validation": val_dataset})
else:
dataset = load_dataset(args.dataset_name)
def preprocess_function(examples):
queries = examples["query"]
result = tokenizer(queries, padding="max_length", max_length=70, truncation=True)
result = {f"query_{k}": v for k, v in result.items()}
products = examples["product_title"]
result_products = tokenizer(products, padding="max_length", max_length=70, truncation=True)
for k, v in result_products.items():
result[f"product_{k}"] = v
result["labels"] = examples["relevance_label"]
return result
processed_datasets = dataset.map(
preprocess_function,
batched=True,
remove_columns=dataset["train"].column_names,
desc="Running tokenizer on dataset",
)
# Log a few random samples from the training set:
for index in random.sample(range(len(processed_datasets["train"])), 3):
logger.info(f"Sample {index} of the training set: {processed_datasets['train'][index]}.")
# base model
model = AutoModelForSentenceEmbedding(args.model_name_or_path, tokenizer)
if args.use_peft:
# peft config and wrapping
peft_config = LoraConfig(
r=8,
lora_alpha=16,
bias="none",
task_type=TaskType.FEATURE_EXTRACTION,
target_modules=["key", "query", "value"],
)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()
accelerator.print(model)
# get dataloaders
train_dataloader = DataLoader(
processed_datasets["train"],
shuffle=True,
collate_fn=default_data_collator,
batch_size=args.per_device_train_batch_size,
pin_memory=True,
)
eval_dataloader = DataLoader(
processed_datasets["validation"],
shuffle=False,
collate_fn=default_data_collator,
batch_size=args.per_device_eval_batch_size,
pin_memory=True,
)
optimizer = torch.optim.Adam(model.parameters(), lr=args.learning_rate)
# Scheduler and math around the number of training steps.
overrode_max_train_steps = False
num_update_steps_per_epoch = math.ceil(len(train_dataloader) / args.gradient_accumulation_steps)
if args.max_train_steps is None:
args.max_train_steps = args.num_train_epochs * num_update_steps_per_epoch
overrode_max_train_steps = True
lr_scheduler = get_scheduler(
name=args.lr_scheduler_type,
optimizer=optimizer,
num_warmup_steps=args.num_warmup_steps,
num_training_steps=args.max_train_steps,
)
# Prepare everything with our `accelerator`.
model, optimizer, train_dataloader, eval_dataloader, lr_scheduler = accelerator.prepare(
model, optimizer, train_dataloader, eval_dataloader, lr_scheduler
)
# We need to recalculate our total training steps as the size of the training dataloader may have changed
num_update_steps_per_epoch = math.ceil(len(train_dataloader) / args.gradient_accumulation_steps)
if overrode_max_train_steps:
args.max_train_steps = args.num_train_epochs * num_update_steps_per_epoch
# Afterwards we recalculate our number of training epochs
args.num_train_epochs = math.ceil(args.max_train_steps / num_update_steps_per_epoch)
# Figure out how many steps we should save the Accelerator states
checkpointing_steps = args.checkpointing_steps
if checkpointing_steps is not None and checkpointing_steps.isdigit():
checkpointing_steps = int(checkpointing_steps)
# We need to initialize the trackers we use, and also store our configuration.
# The trackers initializes automatically on the main process.
if args.with_tracking:
experiment_config = vars(args)
# TensorBoard cannot log Enums, need the raw value
experiment_config["lr_scheduler_type"] = experiment_config["lr_scheduler_type"].value
accelerator.init_trackers("peft_semantic_search", experiment_config)
metric = evaluate.load("roc_auc")
total_batch_size = args.per_device_train_batch_size * accelerator.num_processes * args.gradient_accumulation_steps
if args.use_peft:
# saving and loading checkpoints for resuming training
accelerator.register_save_state_pre_hook(save_model_hook)
accelerator.register_load_state_pre_hook(load_model_hook)
logger.info("***** Running training *****")
logger.info(f" Num examples = {len(processed_datasets['train'])}")
logger.info(f" Num Epochs = {args.num_train_epochs}")
logger.info(f" Instantaneous batch size per device = {args.per_device_train_batch_size}")
logger.info(f" Total train batch size (w. parallel, distributed & accumulation) = {total_batch_size}")
logger.info(f" Gradient Accumulation steps = {args.gradient_accumulation_steps}")
logger.info(f" Total optimization steps = {args.max_train_steps}")
# Only show the progress bar once on each machine.
progress_bar = tqdm(range(args.max_train_steps), disable=not accelerator.is_local_main_process)
completed_steps = 0
starting_epoch = 0
# Potentially load in the weights and states from a previous save
if args.resume_from_checkpoint:
if args.resume_from_checkpoint is not None or args.resume_from_checkpoint != "":
accelerator.print(f"Resumed from checkpoint: {args.resume_from_checkpoint}")
accelerator.load_state(args.resume_from_checkpoint)
path = os.path.basename(args.resume_from_checkpoint)
else:
# Get the most recent checkpoint
dirs = [f.name for f in os.scandir(os.getcwd()) if f.is_dir()]
dirs.sort(key=os.path.getctime)
path = dirs[-1] # Sorts folders by date modified, most recent checkpoint is the last
# Extract `epoch_{i}` or `step_{i}`
training_difference = os.path.splitext(path)[0]
if "epoch" in training_difference:
starting_epoch = int(training_difference.replace("epoch_", "")) + 1
resume_step = None
completed_steps = starting_epoch * num_update_steps_per_epoch
else:
# need to multiply `gradient_accumulation_steps` to reflect real steps
resume_step = int(training_difference.replace("step_", "")) * args.gradient_accumulation_steps
starting_epoch = resume_step // len(train_dataloader)
resume_step -= starting_epoch * len(train_dataloader)
completed_steps = resume_step // args.gradient_accumulation_steps
# update the progress_bar if load from checkpoint
progress_bar.update(completed_steps)
for epoch in range(starting_epoch, args.num_train_epochs):
model.train()
if args.with_tracking:
total_loss = 0
if args.resume_from_checkpoint and epoch == starting_epoch and resume_step is not None:
# We skip the first `n` batches in the dataloader when resuming from a checkpoint
active_dataloader = accelerator.skip_first_batches(train_dataloader, resume_step)
else:
active_dataloader = train_dataloader
for step, batch in enumerate(active_dataloader):
with accelerator.accumulate(model):
query_embs = model(**{k.replace("query_", ""): v for k, v in batch.items() if "query" in k})
product_embs = model(**{k.replace("product_", ""): v for k, v in batch.items() if "product" in k})
loss = get_loss(get_cosing_embeddings(query_embs, product_embs), batch["labels"])
total_loss += accelerator.reduce(loss.detach().float(), reduction="sum")
accelerator.backward(loss)
optimizer.step()
lr_scheduler.step()
model.zero_grad()
# Checks if the accelerator has performed an optimization step behind the scenes
if accelerator.sync_gradients:
progress_bar.update(1)
completed_steps += 1
if (step + 1) % 100 == 0:
logger.info(f"Step: {step+1}, Loss: {total_loss/(step+1)}")
if args.with_tracking:
accelerator.log({"train/loss": total_loss / (step + 1)}, step=completed_steps)
if isinstance(checkpointing_steps, int):
if completed_steps % checkpointing_steps == 0:
output_dir = f"step_{completed_steps }"
if args.output_dir is not None:
output_dir = os.path.join(args.output_dir, output_dir)
accelerator.save_state(output_dir)
if completed_steps >= args.max_train_steps:
break
model.eval()
for step, batch in enumerate(eval_dataloader):
with torch.no_grad():
query_embs = model(**{k.replace("query_", ""): v for k, v in batch.items() if "query" in k})
product_embs = model(**{k.replace("product_", ""): v for k, v in batch.items() if "product" in k})
prediction_scores = get_cosing_embeddings(query_embs, product_embs)
prediction_scores, references = accelerator.gather_for_metrics((prediction_scores, batch["labels"]))
metric.add_batch(
prediction_scores=prediction_scores,
references=references,
)
result = metric.compute()
result = {f"eval/{k}": v for k, v in result.items()}
# Use accelerator.print to print only on the main process.
accelerator.print(f"epoch {epoch}:", result)
if args.with_tracking:
result["train/epoch_loss"] = total_loss.item() / len(train_dataloader)
accelerator.log(result, step=completed_steps)
if args.output_dir is not None:
accelerator.wait_for_everyone()
if accelerator.is_main_process:
if isinstance(checkpointing_steps, str):
accelerator.save_state(os.path.join(args.output_dir, f"epoch_{epoch}"))
accelerator.unwrap_model(model).save_pretrained(
args.output_dir, state_dict=accelerator.get_state_dict(accelerator.unwrap_model(model))
)
tokenizer.save_pretrained(args.output_dir)
if args.push_to_hub:
commit_message = (
f"Training in progress epoch {epoch}"
if epoch < args.num_train_epochs - 1
else "End of training"
)
repo.push_to_hub(commit_message=commit_message, blocking=False, auto_lfs_prune=True)
accelerator.wait_for_everyone()
accelerator.end_training()
if __name__ == "__main__":
main()

View File

@ -0,0 +1,10 @@
git+https://github.com/huggingface/peft
git+https://github.com/huggingface/accelerate
git+https://github.com/huggingface/transformers
datasets
evaluate
hnswlib
pandas
tqdm
huggingface_hub
wandb

View File

@ -0,0 +1,193 @@
import os
import torch
import torch.nn as nn
import transformers
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
# -*- coding: utf-8 -*-
"""Finetune-opt-bnb-peft.ipynb
Automatically generated by Colaboratory.
Original file is located at
https://colab.research.google.com/drive/1jCkpikz0J2o20FBQmYmAGdiKmJGOMo-o
## Fine-tune large models using 🤗 `peft` adapters, `transformers` & `bitsandbytes`
In this tutorial we will cover how we can fine-tune large language models using the very recent `peft` library and `bitsandbytes` for loading large models in 8-bit.
The fine-tuning method will rely on a recent method called "Low Rank Adapters" (LoRA), instead of fine-tuning the entire model you just have to fine-tune these adapters and load them properly inside the model.
After fine-tuning the model you can also share your adapters on the 🤗 Hub and load them very easily. Let's get started!
### Install requirements
First, run the cells below to install the requirements:
"""
"""### Model loading
Here let's load the `opt-6.7b` model, its weights in half-precision (float16) are about 13GB on the Hub! If we load them in 8-bit we would require around 7GB of memory instead.
"""
free_in_GB = int(torch.cuda.mem_get_info()[0] / 1024**3)
max_memory = f"{free_in_GB-2}GB"
n_gpus = torch.cuda.device_count()
max_memory = {i: max_memory for i in range(n_gpus)}
model = AutoModelForCausalLM.from_pretrained(
"facebook/opt-350m",
max_memory=max_memory,
quantization_config=BitsAndBytesConfig(
load_in_4bit=True,
llm_int8_threshold=6.0,
llm_int8_has_fp16_weight=False,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
),
torch_dtype=torch.float16,
)
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m")
"""### Post-processing on the model
Finally, we need to apply some post-processing on the 8-bit model to enable training, let's freeze all our layers, and cast the layer-norm in `float32` for stability. We also cast the output of the last layer in `float32` for the same reasons.
"""
print(model)
for param in model.parameters():
param.requires_grad = False # freeze the model - train adapters later
if param.ndim == 1:
# cast the small parameters (e.g. layernorm) to fp32 for stability
param.data = param.data.to(torch.float32)
# model.gradient_checkpointing_enable() # reduce number of stored activations
# model.model.decoder.project_in = lambda x: x.requires_grad_(True)
class CastOutputToFloat(nn.Sequential):
def forward(self, x):
return super().forward(x).to(torch.float32)
model.lm_head = CastOutputToFloat(model.lm_head)
"""### Apply LoRA
Here comes the magic with `peft`! Let's load a `PeftModel` and specify that we are going to use low-rank adapters (LoRA) using `get_peft_model` utility function from `peft`.
"""
def print_trainable_parameters(model):
"""
Prints the number of trainable parameters in the model.
"""
trainable_params = 0
all_param = 0
for _, param in model.named_parameters():
all_param += param.numel()
if param.requires_grad:
trainable_params += param.numel()
print(
f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
)
config = LoraConfig(
r=64,
lora_alpha=32,
target_modules=["q_proj", "v_proj", "out_proj", "fc1", "fc2"],
lora_dropout=0.01,
bias="none",
task_type="CAUSAL_LM",
)
model = get_peft_model(model, config)
print_trainable_parameters(model)
# Verifying the datatypes.
dtypes = {}
for _, p in model.named_parameters():
dtype = p.dtype
if dtype not in dtypes:
dtypes[dtype] = 0
dtypes[dtype] += p.numel()
total = 0
for k, v in dtypes.items():
total += v
for k, v in dtypes.items():
print(k, v, v / total)
"""### Training"""
data = load_dataset("Abirate/english_quotes")
data = data.map(lambda samples: tokenizer(samples["quote"]), batched=True)
trainer = transformers.Trainer(
model=model,
train_dataset=data["train"],
args=transformers.TrainingArguments(
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
warmup_steps=10,
max_steps=20,
learning_rate=3e-4,
fp16=True,
logging_steps=1,
output_dir="outputs",
),
data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)
model.config.use_cache = False # silence the warnings. Please re-enable for inference!
trainer.train()
# from huggingface_hub import notebook_login
# notebook_login()
# model.push_to_hub("ybelkada/opt-6.7b-lora", use_auth_token=True)
"""## Load adapters from the Hub
You can also directly load adapters from the Hub using the commands below:
"""
# import torch
# from peft import PeftModel, PeftConfig
# from transformers import AutoModelForCausalLM, AutoTokenizer
#
# peft_model_id = "ybelkada/opt-6.7b-lora"
# config = PeftConfig.from_pretrained(peft_model_id)
# model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, return_dict=True, load_in_8bit=True, device_map='auto')
# tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
#
## Load the Lora model
# model = PeftModel.from_pretrained(model, peft_model_id)
#
# """## Inference
#
# You can then directly use the trained model or the model that you have loaded from the 🤗 Hub for inference as you would do it usually in `transformers`.
# """
#
batch = tokenizer("Two things are infinite: ", return_tensors="pt")
model.config.use_cache = False # silence the warnings. Please re-enable for inference!
model.eval()
with torch.cuda.amp.autocast():
output_tokens = model.generate(**batch, max_new_tokens=50)
print("\n\n", tokenizer.decode(output_tokens[0], skip_special_tokens=True))
# model.save('./test.pt')
# """As you can see by fine-tuning for few steps we have almost recovered the quote from Albert Einstein that is present in the [training data](https://huggingface.co/datasets/Abirate/english_quotes)."""

View File

@ -1,7 +1,15 @@
# Fine-tuning for image classification using LoRA and 🤗 PEFT
## Vision Transformer model from transformers
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/peft/blob/main/examples/image_classification/image_classification_peft_lora.ipynb)
We provide a notebook (`image_classification_peft_lora.ipynb`) where we learn how to use [LoRA](https://arxiv.org/abs/2106.09685) from 🤗 PEFT to fine-tune an image classification model by ONLY using **0.7%** of the original trainable parameters of the model.
LoRA adds low-rank "update matrices" to certain blocks in the underlying model (in this case the attention blocks) and ONLY trains those matrices during fine-tuning. During inference, these update matrices are _merged_ with the original model parameters. For more details, check out the [original LoRA paper](https://arxiv.org/abs/2106.09685).
## PoolFormer model from timm
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/peft/blob/main/examples/image_classification/image_classification_timm_peft_lora.ipynb)
The notebook `image_classification_timm_peft_lora.ipynb` showcases fine-tuning an image classification model using from the [timm](https://huggingface.co/docs/timm/index) library. Again, LoRA is used to reduce the numberof trainable parameters to a fraction of the total.

View File

@ -61,7 +61,7 @@
}
],
"source": [
"!pip install transformers accelerate evaluate datasets loralib git+https://github.com/huggingface/peft -q"
"!pip install transformers accelerate evaluate datasets git+https://github.com/huggingface/peft -q"
]
},
{

File diff suppressed because one or more lines are too long

View File

@ -71,7 +71,7 @@
}
],
"source": [
"!pip install -q bitsandbytes datasets accelerate loralib\n",
"!pip install -q bitsandbytes datasets accelerate\n",
"!pip install -q git+https://github.com/huggingface/transformers.git@main git+https://github.com/huggingface/peft.git@main"
]
},
@ -305,7 +305,7 @@
"\n",
"model_name = \"google/flan-t5-large\"\n",
"\n",
"model = AutoModelForSeq2SeqLM.from_pretrained(model_name, load_in_8bit=True, device_map=\"auto\")\n",
"model = AutoModelForSeq2SeqLM.from_pretrained(model_name, load_in_8bit=True)\n",
"tokenizer = AutoTokenizer.from_pretrained(model_name)"
]
},
@ -1186,7 +1186,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 1,
"id": "j097aaPWJ-9u",
"metadata": {
"id": "j097aaPWJ-9u"
@ -1209,7 +1209,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 2,
"id": "jmjwWYt0KI_I",
"metadata": {
"colab": {
@ -1264,7 +1264,7 @@
},
"gpuClass": "standard",
"kernelspec": {
"display_name": "fix-test",
"display_name": "Python 3.10.11 ('accelerate': conda)",
"language": "python",
"name": "python3"
},
@ -1278,11 +1278,11 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
"version": "3.10.11"
},
"vscode": {
"interpreter": {
"hash": "6c4e21ff5edce2fb2cfe7eb854551da92c6ec05cac2504057bb7aba62f43a5ec"
"hash": "1219a10c7def3e2ad4f431cfa6f49d569fcc5949850132f23800e792129eefbb"
}
},
"widgets": {

View File

@ -59,7 +59,7 @@
}
],
"source": [
"!pip install -q bitsandbytes datasets accelerate loralib\n",
"!pip install -q bitsandbytes datasets accelerate\n",
"!pip install -q git+https://github.com/huggingface/transformers.git@main git+https://github.com/huggingface/peft.git"
]
},
@ -76,7 +76,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 4,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
@ -198,76 +198,10 @@
"outputId": "135a7675-6a4d-4786-b5dc-34cb867f40c7"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"===================================BUG REPORT===================================\n",
"Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues\n",
"================================================================================\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "d4de260ffd8a440eb87eb900fc1bb1d3",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Downloading (…)lve/main/config.json: 0%| | 0.00/651 [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "fc2d5ffe254d425b939252ec46ec27cc",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Downloading (…)model.bin.index.json: 0%| | 0.00/41.9k [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "c6f712eadc4d49019b2bd355968cc155",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Downloading (…)00001-of-00002.bin\";: 0%| | 0.00/9.96G [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "5aa74b9b30614172b07f88873cf89471",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Downloading (…)00002-of-00002.bin\";: 0%| | 0.00/3.36G [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "e73e5388182040a8937ccf1748171a87",
"model_id": "bee2f575b3e64c30b2f3afa137802406",
"version_major": 2,
"version_minor": 0
},
@ -277,92 +211,17 @@
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "a994beafbf3f4c20880a7bbe3898db36",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Downloading (…)neration_config.json: 0%| | 0.00/137 [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "1e9391f6c89c4d08859ef3413edb19be",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Downloading (…)okenizer_config.json: 0%| | 0.00/685 [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "4e6d5943bc374b388b93ed115e44b6a5",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Downloading (…)olve/main/vocab.json: 0%| | 0.00/899k [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "1ca7684b79c5438fa06b047bd2b3283f",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Downloading (…)olve/main/merges.txt: 0%| | 0.00/456k [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "d46b5725c35142a89617e46c0e8d3679",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Downloading (…)cial_tokens_map.json: 0%| | 0.00/441 [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import os\n",
"\n",
"os.environ[\"CUDA_VISIBLE_DEVICES\"] = \"0\"\n",
"import torch\n",
"import torch.nn as nn\n",
"import bitsandbytes as bnb\n",
"from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM\n",
"\n",
"model = AutoModelForCausalLM.from_pretrained(\n",
" \"facebook/opt-6.7b\",\n",
" load_in_8bit=True,\n",
" device_map=\"auto\",\n",
")\n",
"model = AutoModelForCausalLM.from_pretrained(\"facebook/opt-6.7b\", load_in_8bit=True)\n",
"\n",
"tokenizer = AutoTokenizer.from_pretrained(\"facebook/opt-6.7b\")"
]
@ -384,7 +243,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 5,
"metadata": {
"id": "T-gy-LxM0yAi"
},
@ -408,7 +267,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 6,
"metadata": {
"id": "4W1j6lxaNnxC"
},
@ -431,7 +290,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 7,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
@ -477,7 +336,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 9,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
@ -1520,7 +1379,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 10,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
@ -1555,45 +1414,23 @@
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"===================================BUG REPORT===================================\n",
"Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues\n",
"For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link\n",
"================================================================================\n",
"CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...\n",
"CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so\n",
"CUDA SETUP: Highest compute capability among GPUs detected: 7.5\n",
"CUDA SETUP: Detected CUDA version 112\n",
"CUDA SETUP: Loading binary /usr/local/lib/python3.8/dist-packages/bitsandbytes/libbitsandbytes_cuda112.so...\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/usr/local/lib/python3.8/dist-packages/bitsandbytes/cuda_setup/main.py:134: UserWarning: /usr/lib64-nvidia did not contain libcudart.so as expected! Searching further paths...\n",
" warn(msg)\n",
"/usr/local/lib/python3.8/dist-packages/bitsandbytes/cuda_setup/main.py:134: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/sys/fs/cgroup/memory.events /var/colab/cgroup/jupyter-children/memory.events')}\n",
" warn(msg)\n",
"/usr/local/lib/python3.8/dist-packages/bitsandbytes/cuda_setup/main.py:134: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/share/tcltk/tcllib1.19')}\n",
" warn(msg)\n",
"/usr/local/lib/python3.8/dist-packages/bitsandbytes/cuda_setup/main.py:134: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('--listen_host=172.28.0.12 --target_host=172.28.0.12 --tunnel_background_save_url=https'), PosixPath('//colab.research.google.com/tun/m/cc48301118ce562b961b3c22d803539adc1e0c19/gpu-t4-s-38j9a9wfgbvb0 --tunnel_background_save_delay=10s --tunnel_periodic_background_save_frequency=30m0s --enable_output_coalescing=true --output_coalescing_required=true')}\n",
" warn(msg)\n",
"/usr/local/lib/python3.8/dist-packages/bitsandbytes/cuda_setup/main.py:134: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/env/python')}\n",
" warn(msg)\n",
"/usr/local/lib/python3.8/dist-packages/bitsandbytes/cuda_setup/main.py:134: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('6000,\"kernelManagerProxyHost\"'), PosixPath('true}'), PosixPath('[\"--ip=172.28.0.12\",\"--transport=ipc\"],\"debugAdapterMultiplexerPath\"'), PosixPath('\"172.28.0.12\",\"jupyterArgs\"'), PosixPath('\"/usr/local/bin/dap_multiplexer\",\"enableLsp\"'), PosixPath('{\"kernelManagerProxyPort\"')}\n",
" warn(msg)\n",
"/usr/local/lib/python3.8/dist-packages/bitsandbytes/cuda_setup/main.py:134: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//ipykernel.pylab.backend_inline'), PosixPath('module')}\n",
" warn(msg)\n"
]
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "621d427f78fb458e8ae25262f2ab7ca8",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Downloading (…)/adapter_config.json: 0%| | 0.00/332 [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "ff2454cf69b346fea70070522cf93689",
"model_id": "4a2107423a164efd89002e031126c8b5",
"version_major": 2,
"version_minor": 0
},
@ -1607,12 +1444,12 @@
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "75913676a5df43fbbfe744b8882188df",
"model_id": "43f2a9b0f37e4caab35d7dda43f051f9",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Downloading (…)\"adapter_model.bin\";: 0%| | 0.00/33.6M [00:00<?, ?B/s]"
"Downloading adapter_model.bin: 0%| | 0.00/33.6M [00:00<?, ?B/s]"
]
},
"metadata": {},
@ -1648,7 +1485,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 11,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
@ -1661,10 +1498,8 @@
"name": "stderr",
"output_type": "stream",
"text": [
"/usr/local/lib/python3.8/dist-packages/transformers/generation/utils.py:1359: UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your model's device. `input_ids` is on cpu, whereas the model is on cuda. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for example input_ids = input_ids.to('cuda') before running `.generate()`.\n",
" warnings.warn(\n",
"/usr/local/lib/python3.8/dist-packages/bitsandbytes/autograd/_functions.py:233: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization\n",
" warnings.warn(f\"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization\")\n"
"/home/marc/anaconda3/envs/accelerate/lib/python3.10/site-packages/transformers/generation/utils.py:1448: UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your model's device. `input_ids` is on cpu, whereas the model is on cuda. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for example input_ids = input_ids.to('cuda') before running `.generate()`.\n",
" warnings.warn(\n"
]
},
{
@ -1705,7 +1540,7 @@
},
"gpuClass": "standard",
"kernelspec": {
"display_name": "fix-test",
"display_name": "Python 3.10.11 ('accelerate': conda)",
"language": "python",
"name": "python3"
},
@ -1719,11 +1554,11 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
"version": "3.10.11"
},
"vscode": {
"interpreter": {
"hash": "6c4e21ff5edce2fb2cfe7eb854551da92c6ec05cac2504057bb7aba62f43a5ec"
"hash": "1219a10c7def3e2ad4f431cfa6f49d569fcc5949850132f23800e792129eefbb"
}
},
"widgets": {

View File

@ -29,7 +29,7 @@ config = LoraConfig(
)
# We load our model and processor using `transformers`
model = AutoModelForVision2Seq.from_pretrained("Salesforce/blip2-opt-2.7b", load_in_8bit=True, device_map={"": 0})
model = AutoModelForVision2Seq.from_pretrained("Salesforce/blip2-opt-2.7b", load_in_8bit=True)
processor = AutoProcessor.from_pretrained("Salesforce/blip2-opt-2.7b")
# Get our peft model and print the number of trainable parameters

View File

@ -422,16 +422,11 @@ def evaluation_loop(model, eval_dataloader, processor, normalizer, metric, force
def main():
args = parse_args()
# initialize accelerator
accelerator = (
Accelerator(
log_with=args.report_to,
project_dir=args.output_dir,
gradient_accumulation_steps=args.gradient_accumulation_steps,
)
if args.with_tracking
else Accelerator(gradient_accumulation_steps=args.gradient_accumulation_steps)
)
accelerator_kwargs = {"gradient_accumulation_steps": args.gradient_accumulation_steps}
if args.with_tracking:
accelerator_kwargs["log_with"] = args.report_to
accelerator_kwargs["project_dir"] = args.output_dir
accelerator = Accelerator(**accelerator_kwargs)
# Make one log on every process with the configuration for debugging.
logging.basicConfig(
@ -538,9 +533,7 @@ def main():
metric = evaluate.load("wer")
# model
model = WhisperForConditionalGeneration.from_pretrained(
args.model_name_or_path, load_in_8bit=True, device_map="auto"
)
model = WhisperForConditionalGeneration.from_pretrained(args.model_name_or_path, load_in_8bit=True)
model.config.forced_decoder_ids = None
model.config.suppress_tokens = []
if len(set(model.hf_device_map.values()).intersection({"cpu", "disk"})) > 0:

View File

@ -64,7 +64,7 @@
"!pip install evaluate>=0.30\n",
"!pip install jiwer\n",
"!pip install gradio\n",
"!pip install -q bitsandbytes datasets accelerate loralib\n",
"!pip install -q bitsandbytes datasets accelerate\n",
"!pip install -q git+https://github.com/huggingface/transformers.git@main git+https://github.com/huggingface/peft.git@main"
]
},
@ -1104,7 +1104,7 @@
"source": [
"from transformers import WhisperForConditionalGeneration\n",
"\n",
"model = WhisperForConditionalGeneration.from_pretrained(model_name_or_path, load_in_8bit=True, device_map=\"auto\")\n",
"model = WhisperForConditionalGeneration.from_pretrained(model_name_or_path, load_in_8bit=True)\n",
"\n",
"# model.hf_device_map - this should be {\" \": 0}"
]
@ -1930,7 +1930,7 @@
},
"gpuClass": "standard",
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "Python 3.10.11 ('accelerate': conda)",
"language": "python",
"name": "python3"
},
@ -1944,7 +1944,12 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
"version": "3.10.11"
},
"vscode": {
"interpreter": {
"hash": "1219a10c7def3e2ad4f431cfa6f49d569fcc5949850132f23800e792129eefbb"
}
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {

View File

@ -0,0 +1,140 @@
# LoftQ: LoRA-fine-tuning-aware Quantization
## Introduction
LoftQ finds quantized LoRA initialization: quantized backbone Q and LoRA adapters A and B, given a pre-trained weight W.
## Quick Start
Steps:
1. Apply LoftQ to a full-precision pre-trained weight and save.
2. Load LoftQ initialization and train.
For step 1, we have provided off-the-shelf LoftQ initializations (see [supported model list](#appendix-off-the-shelf-model-table))
in [Huggingface Hub LoftQ](https://huggingface.co/LoftQ).
If you want to do it yourself, jump to [LoftQ DIY](#loftq-diy).
For step 2, below is an example of loading 4bit Mistral-7B with 64rank LoRA adapters from Huggingface Hub.
```python
import torch
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
MODEL_ID = "LoftQ/Mistral-7B-v0.1-4bit-64rank"
base_model = AutoModelForCausalLM.from_pretrained(
MODEL_ID,
torch_dtype=torch.bfloat16, # you may change it with different models
quantization_config=BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16, # bfloat16 is recommended
bnb_4bit_use_double_quant=False,
bnb_4bit_quant_type='nf4',
),
)
peft_model = PeftModel.from_pretrained(
base_model,
MODEL_ID,
subfolder="loftq_init",
is_trainable=True,
)
# Do training with peft_model ...
```
## LoftQ DIY
### Apply LoftQ and save
We provide [quantize_save_load.py](quantize_save_load.py) as an example to apply LoftQ with
different bits(`--bits`), ranks(`--rank`), and alternating steps (`--iter`, a hyper-parameter in LoftQ, see Algorithm 1 in [LoftQ paper](https://arxiv.org/abs/2310.08659)). Currently, this example supports
`llama-2`, `falcon`, `mistral`, `bart`, `t5`, `deberta`, `bert`, `roberta`.
Below is an example of obtaining 4bit LLAMA-2-7b with 16-rank LoRA adapters by 5 alternating steps.
```sh
SAVE_DIR="model_zoo/loftq/"
python quantize_save_load.py \
--model_name_or_path meta-llama/Llama-2-7b-hf \ # high-precision model id in HF
--token HF_TOKEN \ # your HF token if the model is private, e.g., llama-2
--bits 4 \
--iter 5 \
--rank 16 \
--save_dir $SAVE_DIR
```
The above commands end up with creating the model directory under `$SAVE_DIR`.
Specifically, the model directory is named as
`MODEL_DIR = SAVE_DIR + f"{args.model_name_or_path.split('/')[-1]}-{args.bits}bits-{args.rank}rank"`
In this example, `MODEL_DIR="model_zoo/loftq/Llama-2-7b-hf-4bit-16rank"`, where the backbone is stored in `$MODEL_DIR`
and the LoRA adapters are at the sub-folder `$MODEL_DIR/loftq_init`.
### Load and train
Similar to loading from Huggingface Hub, we only need to change the `MODEL_ID` to the `MODEL_DIR`.
```python
import torch
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
MODEL_DIR = "model_zoo/loftq/Llama-2-7b-hf-4bit-16rank"
base_model = AutoModelForCausalLM.from_pretrained(
MODEL_DIR,
torch_dtype=torch.bfloat16,
quantization_config=BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=False,
bnb_4bit_quant_type='nf4',
),
)
peft_model = PeftModel.from_pretrained(
base_model,
MODEL_DIR,
subfolder="loftq_init",
is_trainable=True,
)
# Do training with peft_model ...
```
## LoftQ Fine-tuning
We also provide an example to fine-tune LoftQ on GSM8K.
We load the quantized backbone and LoRA adapters from the [LoftQ Huggingface hub](https://huggingface.co/LoftQ).
```sh
python train_gsm8k_llama.py \
--model_name_or_path LoftQ/Llama-2-13b-hf-4bit-64rank \
--output_dir exp_results/gsm8k/llama-2-13b/bit4-rank64/lr1e-4 \
--learning_rate 1e-4 \
--weight_decay 0.1 \
--lr_scheduler_type cosine \
--num_warmup_steps 100 \
--seed 202 \
--dataset_name gsm8k \
--dataset_config main \
--pad_to_max_length \
--max_source_length 128 \
--max_target_length 256 \
--num_train_epochs 5 \
--per_device_train_batch_size 4 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 4 \
--with_tracking \
--report_to tensorboard
```
## Appendix: Off-the-shelf Model List
| Model Name | Bits | Ranks |
| ----------- | ---- | ----- |
| LLAMA-2-7b | 4 | 64 |
| LLAMA-2-13b | 4 | 64 |
| LLAMA-2-70b | 4 | 64 |
| Mistral | 4 | 64 |
| Mistral | 4 | 32 |
| BART-large | 4 | 8 |
| BART-large | 4 | 16 |
| BART-large | 4 | 32 |
| BART-large | 2 | 8 |

View File

@ -0,0 +1,194 @@
# coding=utf-8
# Copyright 2023-present the HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import os
import torch
import torch.nn as nn
from transformers import (
AutoModelForCausalLM,
AutoModelForSeq2SeqLM,
AutoModelForSequenceClassification,
AutoTokenizer,
)
from peft import LoftQConfig, LoraConfig, TaskType, get_peft_model
class Shell(nn.Module):
def __init__(self, weight, bias=None):
super().__init__()
self.weight = nn.Parameter(weight, requires_grad=False)
if bias is not None:
self.bias = nn.Parameter(bias, requires_grad=False)
def unwrap_model(model, sub_module_name=".base_layer"):
sub_module_name_list = [k.split(sub_module_name)[0] for k in model.state_dict().keys() if sub_module_name in k]
sub_module_name_set = set(sub_module_name_list)
for name in sub_module_name_set:
# get the parent of the submodule
name_parent = ".".join(name.split(".")[:-1])
name_child = name.split(".")[-1]
sub_module = model.get_submodule(name_parent)
print(sub_module)
# replace with shell
child = getattr(sub_module, name_child)
weight = getattr(child.base_layer, "weight", None)
bias = getattr(child.base_layer, "bias", None)
shell = Shell(weight, bias)
setattr(sub_module, name_child, shell)
print("You have unwrapped the model. Use it on your own risk.")
def print_model(model, name):
print("=" * 10 + name + "=" * 10)
print(model)
for name, param in model.named_parameters():
if torch.is_tensor(param):
if param.dtype in [torch.float32, torch.float16]:
print(
name,
param.shape,
param.device,
param.dtype,
param.requires_grad,
param.mean().item(),
param.max().item(),
)
else:
print(name, param.shape, param.device, param.dtype, param.requires_grad)
def arg_parse():
parser = argparse.ArgumentParser(description="Quantize a model with LoftQ.")
parser.add_argument(
"--model_name_or_path",
type=str,
default=None,
required=True,
help="The name or path of the fp32/16 model.",
)
parser.add_argument(
"--token",
type=str,
default=None,
help="The access token to download model from HuggingFace Hub.",
)
parser.add_argument(
"--bits",
type=int,
default=4,
help="The quantized bits",
)
parser.add_argument(
"--iter",
type=int,
default=1,
help="The alternating steps in LoftQ",
)
parser.add_argument(
"--rank",
type=int,
default=16,
help="The rank of the LoRA adapter",
)
parser.add_argument(
"--save_dir",
type=str,
default="./model_zoo/loftq/",
help="The rank of the LoRA adapter",
)
args = parser.parse_args()
return args
def quantize_and_save():
args = arg_parse()
# Download weights and configure LoRA
tokenizer = AutoTokenizer.from_pretrained(args.model_name_or_path, token=args.token, trust_remote_code=True)
if any(name in args.model_name_or_path.lower() for name in ["llama", "mistral", "falcon"]):
model = AutoModelForCausalLM.from_pretrained(args.model_name_or_path, token=args.token, trust_remote_code=True)
task_type = TaskType.CAUSAL_LM
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "up_proj", "down_proj", "gate_proj"]
elif any(name in args.model_name_or_path.lower() for name in ["bart", "t5"]):
model = AutoModelForSeq2SeqLM.from_pretrained(args.model_name_or_path, token=args.token)
task_type = TaskType.SEQ_2_SEQ_LM
target_modules = ["q_proj", "k_proj", "v_proj", "fc1", "fc2", "out_proj"]
elif any(name in args.model_name_or_path.lower() for name in ["deberta", "roberta", "bert"]):
model = AutoModelForSequenceClassification.from_pretrained(args.model_name_or_path, token=args.token)
task_type = TaskType.SEQ_CLS
target_modules = ["query_proj", "key_proj", "value_proj", "dense"] # embeddings not supported by peft
else:
raise NotImplementedError("Other models not supported yet.")
# Config of LoftQ
loftq_config = LoftQConfig(loftq_bits=args.bits, loftq_iter=args.iter)
lora_config = LoraConfig(
task_type=task_type,
inference_mode=True,
r=args.rank,
lora_alpha=16 if task_type is TaskType.CAUSAL_LM else args.rank,
lora_dropout=0.1,
target_modules=target_modules,
init_lora_weights="loftq",
loftq_config=loftq_config,
)
# Obtain LoftQ model
lora_model = get_peft_model(model, lora_config)
base_model = lora_model.get_base_model()
# Save LoftQ model
model_name = args.model_name_or_path.split("/")[-1] + f"-{args.bits}bit" + f"-{args.rank}rank"
base_model_dir = os.path.join(args.save_dir, model_name)
lora_model_dir = os.path.join(args.save_dir, model_name, "loft_init")
# save lora adapters first
lora_model.base_model.peft_config[
"default"
].base_model_name_or_path = base_model_dir # This can be a local path or Hub model id
lora_model.base_model.peft_config["default"].init_lora_weights = True # Don't apply LoftQ when loading again
lora_model.save_pretrained(lora_model_dir)
print_model(lora_model, "lora_model")
# remove lora adapters and save the backbone
unwrap_model(base_model)
base_model.save_pretrained(base_model_dir)
tokenizer.save_pretrained(base_model_dir)
print_model(base_model, "base_model")
return base_model_dir, lora_model_dir
if __name__ == "__main__":
base_dir, lora_dir = quantize_and_save()
# example command:
# python quantize_save_load.py \
# --model_name_or_path meta-llama/Llama-2-7b-hf \
# --token XXX \
# --bits 4 --iter 5 --rank 16 \
# --save_dir ./model_zoo/loftq/

View File

@ -0,0 +1,846 @@
# coding=utf-8
# Copyright 2023-present the HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import copy
import logging
import math
import os
import random
import re
from pathlib import Path
import datasets
import torch
import transformers
from accelerate import Accelerator, DistributedType
from accelerate.logging import get_logger
from accelerate.utils import set_seed
from datasets import load_dataset
from huggingface_hub import Repository, create_repo
from torch.utils.data import DataLoader
from tqdm.auto import tqdm
from transformers import (
CONFIG_MAPPING,
MODEL_MAPPING,
AutoConfig,
AutoModelForCausalLM,
AutoTokenizer,
BitsAndBytesConfig,
SchedulerType,
default_data_collator,
get_scheduler,
)
from transformers.utils import send_example_telemetry
from transformers.utils.versions import require_version
from peft import PeftModel
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
# check_min_version("4.32.0.dev0")
logger = get_logger(__name__)
require_version("datasets>=1.8.0", "To fix: pip install -r examples/pytorch/language-modeling/requirements.txt")
MODEL_CONFIG_CLASSES = list(MODEL_MAPPING.keys())
MODEL_TYPES = tuple(conf.model_type for conf in MODEL_CONFIG_CLASSES)
def parse_args():
parser = argparse.ArgumentParser(description="Finetune a transformers model on a causal language modeling task")
parser.add_argument(
"--dataset_name",
type=str,
default=None,
help="The name of the dataset to use (via the datasets library).",
)
parser.add_argument(
"--dataset_config_name",
type=str,
default=None,
help="The configuration name of the dataset to use (via the datasets library).",
)
parser.add_argument(
"--train_file", type=str, default=None, help="A csv, txt or a json file containing the training data."
)
parser.add_argument(
"--validation_file", type=str, default=None, help="A csv, txt or a json file containing the validation data."
)
parser.add_argument(
"--validation_split_percentage",
default=5,
help="The percentage of the train set used as validation set in case there's no validation split",
)
parser.add_argument(
"--model_name_or_path",
type=str,
help="Path to pretrained model or model identifier from huggingface.co/models.",
required=False,
)
parser.add_argument(
"--config_name",
type=str,
default=None,
help="Pretrained config name or path if not the same as model_name",
)
parser.add_argument(
"--tokenizer_name",
type=str,
default=None,
help="Pretrained tokenizer name or path if not the same as model_name",
)
parser.add_argument(
"--use_slow_tokenizer",
action="store_true",
help="If passed, will use a slow tokenizer (not backed by the 🤗 Tokenizers library).",
)
parser.add_argument(
"--per_device_train_batch_size",
type=int,
default=8,
help="Batch size (per device) for the training dataloader.",
)
parser.add_argument(
"--per_device_eval_batch_size",
type=int,
default=8,
help="Batch size (per device) for the evaluation dataloader.",
)
parser.add_argument(
"--learning_rate",
type=float,
default=5e-5,
help="Initial learning rate (after the potential warmup period) to use.",
)
parser.add_argument("--weight_decay", type=float, default=0.0, help="Weight decay to use.")
parser.add_argument("--num_train_epochs", type=int, default=3, help="Total number of training epochs to perform.")
parser.add_argument(
"--max_train_steps",
type=int,
default=None,
help="Total number of training steps to perform. If provided, overrides num_train_epochs.",
)
parser.add_argument(
"--gradient_accumulation_steps",
type=int,
default=1,
help="Number of updates steps to accumulate before performing a backward/update pass.",
)
parser.add_argument(
"--lr_scheduler_type",
type=SchedulerType,
default="linear",
help="The scheduler type to use.",
choices=["linear", "cosine", "cosine_with_restarts", "polynomial", "constant", "constant_with_warmup"],
)
parser.add_argument(
"--num_warmup_steps", type=int, default=0, help="Number of steps for the warmup in the lr scheduler."
)
parser.add_argument("--output_dir", type=str, default=None, help="Where to store the final model.")
parser.add_argument("--seed", type=int, default=None, help="A seed for reproducible training.")
parser.add_argument(
"--model_type",
type=str,
default=None,
help="Model type to use if training from scratch.",
choices=MODEL_TYPES,
)
parser.add_argument(
"--ignore_pad_token_for_loss",
type=bool,
default=True,
help="Whether to ignore the tokens corresponding to padded labels in the loss computation or not.",
)
parser.add_argument(
"--max_source_length",
type=int,
default=128,
help=(
"The maximum total input sequence length after "
"tokenization.Sequences longer than this will be truncated, sequences shorter will be padded."
),
)
parser.add_argument(
"--max_target_length",
type=int,
default=128,
help=(
"The maximum total sequence length for target text after "
"tokenization. Sequences longer than this will be truncated, sequences shorter will be padded."
"during ``evaluate`` and ``predict``."
),
)
parser.add_argument(
"--pad_to_max_length",
action="store_true",
help="If passed, pad all samples to `max_length`. Otherwise, dynamic padding is used.",
)
parser.add_argument(
"--preprocessing_num_workers",
type=int,
default=None,
help="The number of processes to use for the preprocessing.",
)
parser.add_argument(
"--overwrite_cache", action="store_true", help="Overwrite the cached training and evaluation sets"
)
parser.add_argument(
"--no_keep_linebreaks", action="store_true", help="Do not keep line breaks when using TXT files."
)
parser.add_argument("--push_to_hub", action="store_true", help="Whether or not to push the model to the Hub.")
parser.add_argument(
"--hub_model_id", type=str, help="The name of the repository to keep in sync with the local `output_dir`."
)
parser.add_argument("--hub_token", type=str, help="The token to use to push to the Model Hub.")
parser.add_argument(
"--trust_remote_code",
type=bool,
default=False,
help=(
"Whether or not to allow for custom models defined on the Hub in their own modeling files. This option"
"should only be set to `True` for repositories you trust and in which you have read the code, as it will"
"execute code present on the Hub on your local machine."
),
)
parser.add_argument(
"--checkpointing_steps",
type=str,
default=None,
help="Whether the various states should be saved at the end of every n steps, or 'epoch' for each epoch.",
)
parser.add_argument(
"--resume_from_checkpoint",
type=str,
default=None,
help="If the training should continue from a checkpoint folder.",
)
parser.add_argument(
"--with_tracking",
action="store_true",
help="Whether to enable experiment trackers for logging.",
)
parser.add_argument(
"--report_to",
type=str,
default="tensorboard",
help=(
'The integration to report the results and logs to. Supported platforms are `"tensorboard"`,'
' `"wandb"`, `"comet_ml"` and `"clearml"`. Use `"all"` (default) to report to all integrations.'
"Only applicable when `--with_tracking` is passed."
),
)
parser.add_argument(
"--low_cpu_mem_usage",
action="store_true",
help=(
"It is an option to create the model as an empty shell, then only materialize its parameters when the pretrained weights are loaded."
"If passed, LLM loading time and RAM consumption will be benefited."
),
)
##########################
# Generation Config #
##########################
parser.add_argument(
"--temperature",
type=float,
default=0.8,
help="temperature of 1.0 has no effect, lower tend toward greedy sampling",
)
parser.add_argument("--k", type=int, default=40, help="Choose k candidate words")
parser.add_argument("--p", type=float, default=0.95, help="The sum of probability of candidate words is 0.9 ")
##########################
# Exp Args #
##########################
parser.add_argument(
"--adapter_name_or_path",
type=str,
default=None,
help=(
"The LoRA adapter checkpoint. Set None if you want to fine-tune from LoftQ."
"Specify a path if you want to evaluate."
),
)
args = parser.parse_args()
# Sanity checks
if args.dataset_name is None and args.train_file is None and args.validation_file is None:
raise ValueError("Need either a dataset name or a training/validation file.")
else:
if args.train_file is not None:
extension = args.train_file.split(".")[-1]
assert extension in ["csv", "json", "txt"], "`train_file` should be a csv, json or txt file."
if args.validation_file is not None:
extension = args.validation_file.split(".")[-1]
assert extension in ["csv", "json", "txt"], "`validation_file` should be a csv, json or txt file."
if args.push_to_hub:
assert args.output_dir is not None, "Need an `output_dir` to create a repo when `--push_to_hub` is passed."
return args
def main():
args = parse_args()
# Sending telemetry. Tracking the example usage helps us better allocate resources to maintain them. The
# information sent is the one passed as arguments along with your Python/PyTorch versions.
send_example_telemetry("run_clm_no_trainer", args)
# Initialize the accelerator. We will let the accelerator handle device placement for us in this example.
# If we're using tracking, we also need to initialize it here and it will by default pick up all supported trackers
# in the environment
accelerator_log_kwargs = {}
if args.with_tracking:
accelerator_log_kwargs["log_with"] = args.report_to
accelerator_log_kwargs["project_dir"] = args.output_dir
accelerator = Accelerator(gradient_accumulation_steps=args.gradient_accumulation_steps, **accelerator_log_kwargs)
# Make one log on every process with the configuration for debugging.
logging.basicConfig(
format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
datefmt="%m/%d/%Y %H:%M:%S",
level=logging.INFO,
)
logger.info(accelerator.state, main_process_only=False)
if accelerator.is_local_main_process:
datasets.utils.logging.set_verbosity_warning()
transformers.utils.logging.set_verbosity_info()
else:
datasets.utils.logging.set_verbosity_error()
transformers.utils.logging.set_verbosity_error()
# If passed along, set the training seed now.
if args.seed is not None:
set_seed(args.seed)
# Handle the repository creation
if accelerator.is_main_process:
if args.push_to_hub:
# Retrieve of infer repo_name
repo_name = args.hub_model_id
if repo_name is None:
repo_name = Path(args.output_dir).absolute().name
# Create repo and retrieve repo_id
repo_id = create_repo(repo_name, exist_ok=True, token=args.hub_token).repo_id
# Clone repo locally
repo = Repository(args.output_dir, clone_from=repo_id, token=args.hub_token)
with open(os.path.join(args.output_dir, ".gitignore"), "w+") as gitignore:
if "step_*" not in gitignore:
gitignore.write("step_*\n")
if "epoch_*" not in gitignore:
gitignore.write("epoch_*\n")
elif args.output_dir is not None:
os.makedirs(args.output_dir, exist_ok=True)
accelerator.wait_for_everyone()
# Get the datasets: you can either provide your own CSV/JSON/TXT training and evaluation files (see below)
# or just provide the name of one of the public datasets available on the hub at https://huggingface.co/datasets/
# (the dataset will be downloaded automatically from the datasets Hub).
#
# For CSV/JSON files, this script will use the column called 'text' or the first column if no column called
# 'text' is found. You can easily tweak this behavior (see below).
#
# In distributed training, the load_dataset function guarantee that only one local process can concurrently
# download the dataset.
if args.dataset_name is not None:
# Downloading and loading a dataset from the hub.
raw_datasets = load_dataset(args.dataset_name, args.dataset_config_name)
if "validation" not in raw_datasets.keys():
raw_datasets["validation"] = load_dataset(
args.dataset_name,
args.dataset_config_name,
split=f"train[:{args.validation_split_percentage}%]",
)
raw_datasets["train"] = load_dataset(
args.dataset_name,
args.dataset_config_name,
split=f"train[{args.validation_split_percentage}%:]",
)
else:
data_files = {}
dataset_args = {}
if args.train_file is not None:
data_files["train"] = args.train_file
if args.validation_file is not None:
data_files["validation"] = args.validation_file
extension = args.train_file.split(".")[-1]
if extension == "txt":
extension = "text"
dataset_args["keep_linebreaks"] = not args.no_keep_linebreaks
raw_datasets = load_dataset(extension, data_files=data_files, **dataset_args)
# If no validation data is there, validation_split_percentage will be used to divide the dataset.
if "validation" not in raw_datasets.keys():
raw_datasets["validation"] = load_dataset(
extension,
data_files=data_files,
split=f"train[:{args.validation_split_percentage}%]",
**dataset_args,
)
raw_datasets["train"] = load_dataset(
extension,
data_files=data_files,
split=f"train[{args.validation_split_percentage}%:]",
**dataset_args,
)
# See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at
# https://huggingface.co/docs/datasets/loading_datasets.html.
# Load pretrained model and tokenizer
#
# In distributed training, the .from_pretrained methods guarantee that only one local process can concurrently
# download model & vocab.
if args.config_name:
config = AutoConfig.from_pretrained(
args.config_name,
trust_remote_code=args.trust_remote_code,
)
elif args.model_name_or_path:
config = AutoConfig.from_pretrained(
args.model_name_or_path,
trust_remote_code=args.trust_remote_code,
)
else:
config = CONFIG_MAPPING[args.model_type]()
logger.warning("You are instantiating a new config instance from scratch.")
if args.tokenizer_name:
tokenizer = AutoTokenizer.from_pretrained(
args.tokenizer_name, use_fast=not args.use_slow_tokenizer, trust_remote_code=args.trust_remote_code
)
elif args.model_name_or_path:
tokenizer = AutoTokenizer.from_pretrained(
args.model_name_or_path,
use_fast=not args.use_slow_tokenizer,
trust_remote_code=args.trust_remote_code,
)
else:
raise ValueError(
"You are instantiating a new tokenizer from scratch. This is not supported by this script."
"You can do it from another script, save it, and load it from here, using --tokenizer_name."
)
##########################
# Tokenizer #
##########################
tokenizer.pad_token_id = 0 # unk. we want this to be different from the eos token
tokenizer.padding_side = "left" # Allow batched inference
tokenizer.truncation_side = "left"
if args.model_name_or_path:
model = AutoModelForCausalLM.from_pretrained(
args.model_name_or_path,
from_tf=bool(".ckpt" in args.model_name_or_path),
config=config,
low_cpu_mem_usage=True,
quantization_config=BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=False,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=config.torch_dtype,
),
)
else:
logger.info("Training new model from scratch")
model = AutoModelForCausalLM.from_config(config, trust_remote_code=args.trust_remote_code)
##########################
# Peft Model #
##########################
if args.adapter_name_or_path is None:
model = PeftModel.from_pretrained(model, args.model_name_or_path, subfolder="loftq_init", is_trainable=True)
else:
model = PeftModel.from_pretrained(model, args.adapter_name_or_path, is_trainable=True)
model.print_trainable_parameters()
# We resize the embeddings only when necessary to avoid index errors. If you are creating a model from scratch
# on a small vocab and want a smaller embedding size, remove this test.
embedding_size = model.get_input_embeddings().weight.shape[0]
if len(tokenizer) > embedding_size:
model.resize_token_embeddings(len(tokenizer))
# Preprocessing the datasets.
# First we tokenize all the texts.
##########################
# GSM8K dataset #
##########################
# Preprocessing the datasets.
# First we tokenize all the texts.
column_names = raw_datasets["train"].column_names
# Get the column names for source/target.
source_column, target_column = "question", "answer"
# Temporarily set max_target_length for training.
padding = "max_length" if args.pad_to_max_length else False
task_prompt = "\nAnswer the above question. First think step by step and then answer the final number.\n"
def prompt_process(sent_1, sent_2, prompt_1="", prompt_2="", prompt_3=""):
sent_2 = sent_2.replace("####", "The final answer is")
return prompt_1 + sent_1 + prompt_2 + sent_2 + prompt_3
def preprocess_function_train(examples):
sources = examples[source_column]
targets = examples[target_column]
inputs = [prompt_process(source, target, prompt_2=task_prompt) for (source, target) in zip(sources, targets)]
model_inputs = tokenizer(
inputs,
max_length=args.max_source_length + args.max_target_length,
padding=padding,
truncation=True,
return_tensors="pt",
)
labels = copy.deepcopy(model_inputs)
# If we are padding here, replace all tokenizer.pad_token_id in the labels by -100 when we want to ignore
# padding in the loss.
if padding == "max_length" and args.ignore_pad_token_for_loss:
# get the length of the target tokens. -1 to kick out the <BOS> token
target_tokens = tokenizer(targets, padding=False)
target_len = [len(label) - 1 for label in target_tokens["input_ids"]]
# don't calculate the loss from source and padding (left padding)
for i in range(len(labels["input_ids"])):
labels["input_ids"][i, : -target_len[i]] = -100
model_inputs["labels"] = labels["input_ids"]
return model_inputs
def preprocess_function_test(examples):
sources = examples[source_column]
labels = examples[target_column]
inputs = [source + task_prompt for source in sources]
model_inputs = tokenizer(inputs, max_length=args.max_source_length, padding=padding, truncation=True)
labels = tokenizer(labels, max_length=args.max_target_length, padding=padding, truncation=True)
model_inputs["labels"] = labels["input_ids"]
return model_inputs
with accelerator.main_process_first():
train_dataset = raw_datasets["train"].map(
preprocess_function_train,
batched=True,
num_proc=args.preprocessing_num_workers,
remove_columns=column_names,
load_from_cache_file=not args.overwrite_cache,
desc="Running tokenizer on training dataset",
)
eval_dataset = raw_datasets["test"].map(
preprocess_function_test,
batched=True,
num_proc=args.preprocessing_num_workers,
remove_columns=column_names,
load_from_cache_file=not args.overwrite_cache,
desc="Running tokenizer on test dataset",
)
# Log a few random samples from the set:
for index in random.sample(range(len(train_dataset)), 2):
logger.info(f"Sample {index} of the training set: {train_dataset[index]}.")
for index in random.sample(range(len(eval_dataset)), 2):
logger.info(f"Sample {index} of the validation set: {eval_dataset[index]}.")
# DataLoaders creation:
train_dataloader = DataLoader(
train_dataset, shuffle=True, collate_fn=default_data_collator, batch_size=args.per_device_train_batch_size
)
eval_dataloader = DataLoader(
eval_dataset, collate_fn=default_data_collator, batch_size=args.per_device_eval_batch_size
)
# Optimizer
# Split weights in two groups, one with weight decay and the other not.
no_decay = ["bias", "layer_norm.weight"]
optimizer_grouped_parameters = [
{
"params": [p for n, p in model.named_parameters() if not any(nd in n for nd in no_decay) and "lora" in n],
"weight_decay": args.weight_decay,
},
{
"params": [p for n, p in model.named_parameters() if any(nd in n for nd in no_decay)],
"weight_decay": 0.0,
},
]
optimizer = torch.optim.AdamW(optimizer_grouped_parameters, lr=args.learning_rate)
# Scheduler and math around the number of training steps.
overrode_max_train_steps = False
num_update_steps_per_epoch = math.ceil(len(train_dataloader) / args.gradient_accumulation_steps)
if args.max_train_steps is None:
args.max_train_steps = args.num_train_epochs * num_update_steps_per_epoch
overrode_max_train_steps = True
lr_scheduler = get_scheduler(
name=args.lr_scheduler_type,
optimizer=optimizer,
num_warmup_steps=args.num_warmup_steps * args.gradient_accumulation_steps,
num_training_steps=args.max_train_steps * args.gradient_accumulation_steps,
)
# Prepare everything with our `accelerator`.
model, optimizer, train_dataloader, eval_dataloader, lr_scheduler = accelerator.prepare(
model, optimizer, train_dataloader, eval_dataloader, lr_scheduler
)
# On TPU, the tie weights in our model have been disconnected, so we need to restore the ties.
if accelerator.distributed_type == DistributedType.TPU:
model.tie_weights()
# We need to recalculate our total training steps as the size of the training dataloader may have changed.
num_update_steps_per_epoch = math.ceil(len(train_dataloader) / args.gradient_accumulation_steps)
if overrode_max_train_steps:
args.max_train_steps = args.num_train_epochs * num_update_steps_per_epoch
# Afterwards we recalculate our number of training epochs
args.num_train_epochs = math.ceil(args.max_train_steps / num_update_steps_per_epoch)
# Figure out how many steps we should save the Accelerator states
checkpointing_steps = args.checkpointing_steps
if checkpointing_steps is not None and checkpointing_steps.isdigit():
checkpointing_steps = int(checkpointing_steps)
# We need to initialize the trackers we use, and also store our configuration.
# The trackers initializes automatically on the main process.
if args.with_tracking:
experiment_config = vars(args)
# TensorBoard cannot log Enums, need the raw value
experiment_config["lr_scheduler_type"] = experiment_config["lr_scheduler_type"].value
accelerator.init_trackers("clm_no_trainer", experiment_config)
# Train!
total_batch_size = args.per_device_train_batch_size * accelerator.num_processes * args.gradient_accumulation_steps
logger.info("***** Running training *****")
logger.info(f" Num examples = {len(train_dataset)}")
logger.info(f" Num Epochs = {args.num_train_epochs}")
logger.info(f" Instantaneous batch size per device = {args.per_device_train_batch_size}")
logger.info(f" Total train batch size (w. parallel, distributed & accumulation) = {total_batch_size}")
logger.info(f" Gradient Accumulation steps = {args.gradient_accumulation_steps}")
logger.info(f" Total optimization steps = {args.max_train_steps}")
# Only show the progress bar once on each machine.
progress_bar = tqdm(range(args.max_train_steps), disable=not accelerator.is_local_main_process)
completed_steps = 0
starting_epoch = 0
# Potentially load in the weights and states from a previous save
if args.resume_from_checkpoint:
if args.resume_from_checkpoint is not None or args.resume_from_checkpoint != "":
checkpoint_path = args.resume_from_checkpoint
path = os.path.basename(args.resume_from_checkpoint)
else:
# Get the most recent checkpoint
dirs = [f.name for f in os.scandir(os.getcwd()) if f.is_dir()]
dirs.sort(key=os.path.getctime)
path = dirs[-1] # Sorts folders by date modified, most recent checkpoint is the last
checkpoint_path = path
path = os.path.basename(checkpoint_path)
accelerator.print(f"Resumed from checkpoint: {checkpoint_path}")
accelerator.load_state(path)
# Extract `epoch_{i}` or `step_{i}`
training_difference = os.path.splitext(path)[0]
if "epoch" in training_difference:
starting_epoch = int(training_difference.replace("epoch_", "")) + 1
resume_step = None
completed_steps = starting_epoch * num_update_steps_per_epoch
else:
# need to multiply `gradient_accumulation_steps` to reflect real steps
resume_step = int(training_difference.replace("step_", "")) * args.gradient_accumulation_steps
starting_epoch = resume_step // len(train_dataloader)
resume_step -= starting_epoch * len(train_dataloader)
completed_steps = resume_step // args.gradient_accumulation_steps
# update the progress_bar if load from checkpoint
progress_bar.update(completed_steps)
for epoch in range(starting_epoch, args.num_train_epochs):
model.train()
if args.with_tracking:
total_loss = 0
if args.resume_from_checkpoint and epoch == starting_epoch and resume_step is not None:
# We skip the first `n` batches in the dataloader when resuming from a checkpoint
active_dataloader = accelerator.skip_first_batches(train_dataloader, resume_step)
else:
active_dataloader = train_dataloader
for step, batch in enumerate(active_dataloader):
with accelerator.accumulate(model):
outputs = model(**batch)
loss = outputs.loss
# We keep track of the loss at each epoch
if args.with_tracking:
total_loss += loss.detach().float()
accelerator.backward(loss)
if completed_steps % 50:
accelerator.print(f"Epoch: {epoch} | Step: {completed_steps} | Loss: {loss}")
optimizer.step()
lr_scheduler.step()
optimizer.zero_grad()
# Checks if the accelerator has performed an optimization step behind the scenes
if accelerator.sync_gradients:
progress_bar.update(1)
completed_steps += 1
if isinstance(checkpointing_steps, int):
if completed_steps % checkpointing_steps == 0:
output_dir = f"step_{completed_steps}"
if args.output_dir is not None:
output_dir = os.path.join(args.output_dir, output_dir)
accelerator.save_state(output_dir)
if completed_steps >= args.max_train_steps:
break
model.eval()
gen_kwargs = {
"max_new_tokens": args.max_target_length,
"temperature": args.temperature,
"top_k": args.k,
"top_p": args.p,
"do_sample": True,
}
ans_pred_list = []
ans_gold_list = []
for step, batch in enumerate(eval_dataloader):
with torch.no_grad():
gen_kwargs["input_ids"] = batch["input_ids"]
gen_kwargs["attention_mask"] = batch["attention_mask"]
generated_tokens = accelerator.unwrap_model(model).generate(**gen_kwargs)
pred_tokens = generated_tokens[:, args.max_source_length :]
pred_tokens = accelerator.pad_across_processes(pred_tokens, dim=1, pad_index=tokenizer.pad_token_id)
gold_tokens = batch["labels"]
if not args.pad_to_max_length:
# If we did not pad to max length, we need to pad the labels too
gold_tokens = accelerator.pad_across_processes(
batch["labels"], dim=1, pad_index=tokenizer.pad_token_id
)
pred_tokens, gold_tokens = accelerator.gather_for_metrics((pred_tokens, gold_tokens))
pred_tokens, gold_tokens = pred_tokens.cpu().numpy(), gold_tokens.cpu().numpy()
if isinstance(pred_tokens, tuple):
pred_tokens = pred_tokens[0]
decoded_pred = tokenizer.batch_decode(pred_tokens, skip_special_tokens=True)
decoded_gold = tokenizer.batch_decode(gold_tokens, skip_special_tokens=True)
# Extract the numbers in sentences
accelerator.print(decoded_pred)
ans_pred_list += [extract_answer_number(sentence_pred) for sentence_pred in decoded_pred]
ans_gold_list += [extract_answer_number(sentence_gold) for sentence_gold in decoded_gold]
accelerator.print(ans_pred_list)
accelerator.print(ans_gold_list)
accuracy = compute_accuracy(ans_gold_list, ans_pred_list)
logger.info(f"epoch {epoch}: accuracy: {accuracy}")
if args.with_tracking:
accelerator.log(
{
"accuracy": accuracy,
"train_loss": total_loss.item() / len(train_dataloader),
"epoch": epoch,
"step": completed_steps,
},
step=completed_steps,
)
if args.push_to_hub and epoch < args.num_train_epochs - 1:
accelerator.wait_for_everyone()
unwrapped_model = accelerator.unwrap_model(model)
unwrapped_model.save_pretrained(
args.output_dir, is_main_process=accelerator.is_main_process, save_function=accelerator.save
)
if accelerator.is_main_process:
tokenizer.save_pretrained(args.output_dir)
repo.push_to_hub(
commit_message=f"Training in progress epoch {epoch}", blocking=False, auto_lfs_prune=True
)
if args.checkpointing_steps == "epoch":
output_dir = f"epoch_{epoch}"
if args.output_dir is not None:
output_dir = os.path.join(args.output_dir, output_dir)
accelerator.save_state(output_dir)
if args.with_tracking:
accelerator.end_training()
if args.output_dir is not None:
accelerator.wait_for_everyone()
unwrapped_model = accelerator.unwrap_model(model)
unwrapped_model.save_pretrained(
args.output_dir, is_main_process=accelerator.is_main_process, save_function=accelerator.save
)
if accelerator.is_main_process:
tokenizer.save_pretrained(args.output_dir)
if args.push_to_hub:
repo.push_to_hub(commit_message="End of training", auto_lfs_prune=True)
PATTERN_NUMBER = re.compile(r"-?\d+\.?\d*")
def extract_answer_number(sentence: str) -> float:
sentence = sentence.replace(",", "")
pred = PATTERN_NUMBER.findall(sentence)
if not pred:
return float("inf")
segment = sentence.split("The final answer is ")
if len(segment) > 1:
pred_answer = segment[1]
pred_answer = PATTERN_NUMBER.findall(pred_answer)
if len(pred_answer) > 0:
pred_answer = pred_answer[0]
else:
pred_answer = float(pred[-1])
else:
pred_answer = float(pred[-1])
if isinstance(pred_answer, str):
try:
pred_answer = float(pred_answer)
except ValueError:
pred_answer = float("inf")
return pred_answer
def compute_accuracy(pred: list, gold: list):
acc = 0.0
for p, g in zip(pred, gold):
if p == g:
acc += 1
return acc / len(pred)
if __name__ == "__main__":
main()

View File

@ -0,0 +1,175 @@
import argparse
import os
from collections import Counter
from dataclasses import dataclass
from typing import Dict, Optional
import safetensors
import torch
from diffusers import UNet2DConditionModel
from transformers import CLIPTextModel
from peft import LoraConfig, get_peft_model, get_peft_model_state_dict, set_peft_model_state_dict
# Default kohya_ss LoRA replacement modules
# https://github.com/kohya-ss/sd-scripts/blob/c924c47f374ac1b6e33e71f82948eb1853e2243f/networks/lora.py#L661
UNET_TARGET_REPLACE_MODULE = ["Transformer2DModel", "Attention"]
UNET_TARGET_REPLACE_MODULE_CONV2D_3X3 = ["ResnetBlock2D", "Downsample2D", "Upsample2D"]
TEXT_ENCODER_TARGET_REPLACE_MODULE = ["CLIPAttention", "CLIPMLP"]
LORA_PREFIX_UNET = "lora_unet"
LORA_PREFIX_TEXT_ENCODER = "lora_te"
@dataclass
class LoRAInfo:
kohya_key: str
peft_key: str
alpha: Optional[float] = None
rank: Optional[int] = None
lora_A: Optional[torch.Tensor] = None
lora_B: Optional[torch.Tensor] = None
def peft_state_dict(self) -> Dict[str, torch.Tensor]:
if self.lora_A is None or self.lora_B is None:
raise ValueError("At least one of lora_A or lora_B is None, they must both be provided")
return {f"{peft_key}.lora_A.weight": self.lora_A, f"{peft_key}.lora_B.weight": self.lora_A}
def construct_peft_loraconfig(info: Dict[str, LoRAInfo]) -> LoraConfig:
"""Constructs LoraConfig from data extracted from kohya checkpoint
Args:
info (Dict[str, LoRAInfo]): Information extracted from kohya checkpoint
Returns:
LoraConfig: config for constructing LoRA
"""
# Unpack all ranks and alphas
ranks = {x[0]: x[1].rank for x in info.items()}
alphas = {x[0]: x[1].alpha or x[1].rank for x in info.items()}
# Determine which modules needs to be transformed
target_modules = list(info.keys())
# Determine most common rank and alpha
r = Counter(ranks.values()).most_common(1)[0]
lora_alpha = Counter(alphas.values()).most_common(1)[0]
# Determine which modules have different rank and alpha
rank_pattern = dict(filter(lambda x: x[1] != r, ranks.items()))
alpha_pattern = dict(filter(lambda x: x[1] != lora_alpha, alphas.items()))
config = LoraConfig(
r=r,
lora_alpha=lora_alpha,
target_modules=target_modules,
lora_dropout=0.0,
bias="none",
init_lora_weights=False,
rank_pattern=rank_pattern,
alpha_pattern=alpha_pattern,
)
return config
def combine_peft_state_dict(info: Dict[str, LoRAInfo]) -> Dict[str, torch.Tensor]:
result = {}
for key_name, key_info in info.items():
result[f"base_model.model.{key_name}.lora_A.weight"] = key_info.lora_A
result[f"base_model.model.{key_name}.lora_B.weight"] = key_info.lora_B
return result
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--sd_checkpoint", default=None, type=str, required=True, help="SD checkpoint to use")
parser.add_argument(
"--kohya_lora_path", default=None, type=str, required=True, help="Path to kohya_ss trained LoRA"
)
parser.add_argument("--dump_path", default=None, type=str, required=True, help="Path to the output model.")
parser.add_argument("--half", action="store_true", help="Save weights in half precision.")
args = parser.parse_args()
# Load all models that we need to add adapter to
text_encoder = CLIPTextModel.from_pretrained(args.sd_checkpoint, subfolder="text_encoder")
unet = UNet2DConditionModel.from_pretrained(args.sd_checkpoint, subfolder="unet")
# Construct possible mapping from kohya keys to peft keys
models_keys = {}
for model, model_key, model_name in [
(text_encoder, LORA_PREFIX_TEXT_ENCODER, "text_encoder"),
(unet, LORA_PREFIX_UNET, "unet"),
]:
models_keys.update(
{
f"{model_key}.{peft_key}".replace(".", "_"): peft_key
for peft_key in (x[0] for x in model.named_modules())
}
)
# Store conversion info (model_type -> peft_key -> LoRAInfo)
lora_info: Dict[str, Dict[str, LoRAInfo]] = {
"text_encoder": {},
"unet": {},
}
# Open kohya_ss checkpoint
with safetensors.safe_open(args.kohya_lora_path, framework="pt", device="cpu") as f:
# Extract information about LoRA structure
metadata = f.metadata()
# Iterate through available info and unpack all the values
for key in f.keys():
kohya_key, kohya_type = key.split(".")[:2]
# Find which model this key belongs to
if kohya_key.startswith(LORA_PREFIX_TEXT_ENCODER):
model_type = "text_encoder"
elif kohya_key.startswith(LORA_PREFIX_UNET):
model_type = "unet"
else:
raise ValueError(f"Cannot determine model for key: {key}")
# Find corresponding peft key
if kohya_key not in models_keys:
raise ValueError(f"Cannot find corresponding key for diffusers/transformers model: {kohya_key}")
peft_key = models_keys[kohya_key]
if peft_key not in lora_info[model_type]:
lora_info[model_type][peft_key] = LoRAInfo(kohya_key=kohya_key, peft_key=peft_key)
if kohya_type == "alpha":
lora_info[model_type][peft_key].alpha = f.get_tensor(key).item()
elif kohya_type == "lora_down":
tensor = f.get_tensor(key)
lora_info[model_type][peft_key].lora_A = tensor
lora_info[model_type][peft_key].rank = tensor.shape[0]
elif kohya_type == "lora_up":
tensor = f.get_tensor(key)
lora_info[model_type][peft_key].lora_B = f.get_tensor(key)
lora_info[model_type][peft_key].rank = tensor.shape[1]
else:
raise ValueError(f"Unknown weight name in key: {key} - {kohya_type}")
# Process each model
for model, model_name in [(text_encoder, "text_encoder"), (unet, "unet")]:
config = construct_peft_loraconfig(lora_info[model_name])
model = get_peft_model(model, config)
keys_peft = list(get_peft_model_state_dict(model).keys())
keys_new = list(combine_peft_state_dict(lora_info[model_name]).keys())
set_peft_model_state_dict(model, combine_peft_state_dict(lora_info[model_name]))
if args.half:
model.to(torch.float16)
# Save model to disk
model.save_pretrained(os.path.join(args.dump_path, model_name))

View File

@ -0,0 +1,101 @@
import argparse
import os
from typing import Dict
import torch
from diffusers import UNet2DConditionModel
from safetensors.torch import save_file
from transformers import CLIPTextModel
from peft import PeftModel, get_peft_model_state_dict
# Default kohya_ss LoRA replacement modules
# https://github.com/kohya-ss/sd-scripts/blob/c924c47f374ac1b6e33e71f82948eb1853e2243f/networks/lora.py#L664
LORA_PREFIX_UNET = "lora_unet"
LORA_PREFIX_TEXT_ENCODER = "lora_te"
LORA_ADAPTER_NAME = "default"
def get_module_kohya_state_dict(
module: PeftModel, prefix: str, dtype: torch.dtype, adapter_name: str = LORA_ADAPTER_NAME
) -> Dict[str, torch.Tensor]:
kohya_ss_state_dict = {}
for peft_key, weight in get_peft_model_state_dict(module, adapter_name=adapter_name).items():
kohya_key = peft_key.replace("base_model.model", prefix)
kohya_key = kohya_key.replace("lora_A", "lora_down")
kohya_key = kohya_key.replace("lora_B", "lora_up")
kohya_key = kohya_key.replace(".", "_", kohya_key.count(".") - 2)
kohya_ss_state_dict[kohya_key] = weight.to(dtype)
# Set alpha parameter
if "lora_down" in kohya_key:
alpha_key = f'{kohya_key.split(".")[0]}.alpha'
kohya_ss_state_dict[alpha_key] = torch.tensor(module.peft_config[adapter_name].lora_alpha).to(dtype)
return kohya_ss_state_dict
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
"--sd_checkpoint",
default=None,
type=str,
required=True,
help="Path to pretrained model or model identifier from huggingface.co/models.",
)
parser.add_argument(
"--sd_checkpoint_revision",
type=str,
default=None,
required=False,
help="Revision of pretrained model identifier from huggingface.co/models.",
)
parser.add_argument("--peft_lora_path", default=None, type=str, required=True, help="Path to peft trained LoRA")
parser.add_argument(
"--dump_path",
default=None,
type=str,
required=True,
help="Path to the output safetensors file for use with webui.",
)
parser.add_argument("--half", action="store_true", help="Save weights in half precision.")
args = parser.parse_args()
# Store kohya_ss state dict
kohya_ss_state_dict = {}
dtype = torch.float16 if args.half else torch.float32
# Load Text Encoder LoRA model
text_encoder_peft_lora_path = os.path.join(args.peft_lora_path, "text_encoder")
if os.path.exists(text_encoder_peft_lora_path):
text_encoder = CLIPTextModel.from_pretrained(
args.sd_checkpoint, subfolder="text_encoder", revision=args.sd_checkpoint_revision
)
text_encoder = PeftModel.from_pretrained(
text_encoder, text_encoder_peft_lora_path, adapter_name=LORA_ADAPTER_NAME
)
kohya_ss_state_dict.update(
get_module_kohya_state_dict(text_encoder, LORA_PREFIX_TEXT_ENCODER, dtype, LORA_ADAPTER_NAME)
)
# Load UNet LoRA model
unet_peft_lora_path = os.path.join(args.peft_lora_path, "unet")
if os.path.exists(unet_peft_lora_path):
unet = UNet2DConditionModel.from_pretrained(
args.sd_checkpoint, subfolder="unet", revision=args.sd_checkpoint_revision
)
unet = PeftModel.from_pretrained(unet, unet_peft_lora_path, adapter_name=LORA_ADAPTER_NAME)
kohya_ss_state_dict.update(get_module_kohya_state_dict(unet, LORA_PREFIX_UNET, dtype, LORA_ADAPTER_NAME))
# Save state dict
save_file(
kohya_ss_state_dict,
args.dump_path,
)

View File

@ -1,10 +1,11 @@
transformers
accelerate
loralib
evaluate
tqdm
datasets
diffusers
Pillow
torchvision
huggingface_hub
huggingface_hub
safetensors
wandb

View File

@ -7,6 +7,7 @@ import math
import os
import threading
import warnings
from contextlib import nullcontext
from pathlib import Path
from typing import Optional
@ -213,6 +214,17 @@ def parse_args(input_args=None):
help="Bias type for Lora. Can be 'none', 'all' or 'lora_only', only used if use_lora and `train_text_encoder` are True",
)
parser.add_argument(
"--num_dataloader_workers", type=int, default=1, help="Num of workers for the training dataloader."
)
parser.add_argument(
"--no_tracemalloc",
default=False,
action="store_true",
help="Flag to stop memory allocation tracing during training. This could speed up training on Windows.",
)
parser.add_argument(
"--train_batch_size", type=int, default=4, help="Batch size (per device) for the training dataloader."
)
@ -329,6 +341,18 @@ def parse_args(input_args=None):
' (default), `"wandb"` and `"comet_ml"`. Use `"all"` to report to all integrations.'
),
)
parser.add_argument(
"--wandb_key",
type=str,
default=None,
help=("If report to option is set to wandb, api-key for wandb used for login to wandb "),
)
parser.add_argument(
"--wandb_project_name",
type=str,
default=None,
help=("If report to option is set to wandb, project name in wandb for log tracking "),
)
parser.add_argument(
"--mixed_precision",
type=str,
@ -569,9 +593,13 @@ def main(args):
gradient_accumulation_steps=args.gradient_accumulation_steps,
mixed_precision=args.mixed_precision,
log_with=args.report_to,
logging_dir=logging_dir,
project_dir=logging_dir,
)
if args.report_to == "wandb":
import wandb
wandb.login(key=args.wandb_key)
wandb.init(project=args.wandb_project_name)
# Currently, it's not possible to do gradient accumulation when training two models with accelerate.accumulate
# This will be enabled soon in accelerate. For now, we don't allow gradient accumulation when training two models.
# TODO (patil-suraj): Remove this check when gradient accumulation with two models is enabled in accelerate.
@ -783,7 +811,7 @@ def main(args):
batch_size=args.train_batch_size,
shuffle=True,
collate_fn=lambda examples: collate_fn(examples, args.with_prior_preservation),
num_workers=1,
num_workers=args.num_dataloader_workers,
)
# Scheduler and math around the number of training steps.
@ -877,12 +905,14 @@ def main(args):
unet.train()
if args.train_text_encoder:
text_encoder.train()
with TorchTracemalloc() as tracemalloc:
with TorchTracemalloc() if not args.no_tracemalloc else nullcontext() as tracemalloc:
for step, batch in enumerate(train_dataloader):
# Skip steps until we reach the resumed step
if args.resume_from_checkpoint and epoch == first_epoch and step < resume_step:
if step % args.gradient_accumulation_steps == 0:
progress_bar.update(1)
if args.report_to == "wandb":
accelerator.print(progress_bar)
continue
with accelerator.accumulate(unet):
@ -948,6 +978,8 @@ def main(args):
# Checks if the accelerator has performed an optimization step behind the scenes
if accelerator.sync_gradients:
progress_bar.update(1)
if args.report_to == "wandb":
accelerator.print(progress_bar)
global_step += 1
# if global_step % args.checkpointing_steps == 0:
@ -1014,23 +1046,29 @@ def main(args):
if global_step >= args.max_train_steps:
break
# Printing the GPU memory usage details such as allocated memory, peak memory, and total memory usage
accelerator.print("GPU Memory before entering the train : {}".format(b2mb(tracemalloc.begin)))
accelerator.print("GPU Memory consumed at the end of the train (end-begin): {}".format(tracemalloc.used))
accelerator.print("GPU Peak Memory consumed during the train (max-begin): {}".format(tracemalloc.peaked))
accelerator.print(
"GPU Total Peak Memory consumed during the train (max): {}".format(
tracemalloc.peaked + b2mb(tracemalloc.begin)
)
)
accelerator.print("CPU Memory before entering the train : {}".format(b2mb(tracemalloc.cpu_begin)))
accelerator.print("CPU Memory consumed at the end of the train (end-begin): {}".format(tracemalloc.cpu_used))
accelerator.print("CPU Peak Memory consumed during the train (max-begin): {}".format(tracemalloc.cpu_peaked))
accelerator.print(
"CPU Total Peak Memory consumed during the train (max): {}".format(
tracemalloc.cpu_peaked + b2mb(tracemalloc.cpu_begin)
if not args.no_tracemalloc:
accelerator.print("GPU Memory before entering the train : {}".format(b2mb(tracemalloc.begin)))
accelerator.print("GPU Memory consumed at the end of the train (end-begin): {}".format(tracemalloc.used))
accelerator.print("GPU Peak Memory consumed during the train (max-begin): {}".format(tracemalloc.peaked))
accelerator.print(
"GPU Total Peak Memory consumed during the train (max): {}".format(
tracemalloc.peaked + b2mb(tracemalloc.begin)
)
)
accelerator.print("CPU Memory before entering the train : {}".format(b2mb(tracemalloc.cpu_begin)))
accelerator.print(
"CPU Memory consumed at the end of the train (end-begin): {}".format(tracemalloc.cpu_used)
)
accelerator.print(
"CPU Peak Memory consumed during the train (max-begin): {}".format(tracemalloc.cpu_peaked)
)
accelerator.print(
"CPU Total Peak Memory consumed during the train (max): {}".format(
tracemalloc.cpu_peaked + b2mb(tracemalloc.cpu_begin)
)
)
)
# Create the pipeline using using the trained modules and save it.
accelerator.wait_for_everyone()

Some files were not shown because too many files have changed in this diff Show More