Compare commits

...

148 Commits

Author SHA1 Message Date
b8da272660 Release 0.13.1 (patch release for #2113) 2024-10-08 14:17:55 +02:00
61c57f4f65 FIX low_cpu_mem_usage consolidates devices (#2113)
See: https://github.com/huggingface/diffusers/pull/9510#issuecomment-2378316687

Right now, the low_cpu_mem_usage=True option does not consolidate the
devices. E.g. when the model is on GPU and the state_dict on CPU, the
adapter weight will be on CPU after loading, when it should be GPU. This
fix ensures that the devices are consolidated.
2024-10-08 14:16:53 +02:00
f0b066eae8 Release v0.13.0 (#2093) 2024-09-25 13:09:08 +02:00
8f39708650 ENH: Better DoRA check in mixed adapter batch inference (#2089)
This is a bit of an edge case, but I noticed this while working on
something else.

PEFT allows mixed batch adapter inference, i.e. when predicting, the
same batch can use different adapters by passing the adapter_names
argument. However, this is not supported for DoRA (yet), so there is a
check that raises an error if DoRA is used.

Previously, this check would check all adapters for DoRA, even if those
adapters are not being used in adapter_names. This was unnecessarily
strict and with this PR, we only check the adapters that are actually
being used.
2024-09-24 10:16:31 +02:00
f4cf170a9c DOC Docstring of load_adapter, type annotation (#2087) 2024-09-23 11:18:24 +02:00
b67c9b64fd FIX: Bug in find_minimal_target_modules (#2083)
This bug was reported by Sayak and would occur if a required suffix had
itself as suffix a string that was already determined to be required, in
which case this required suffix would not be added.

The fix consists of prefixing a "." to the suffix before checking if it is
required or not.

On top of this, the algorithm has been changed to be deterministic.
Previously, it was not deterministic because a dictionary that was
looped over was built from a set, and sets don't guarantee order. This
would result in the loop being in arbitrary order.

As long as the algorithm is 100% correct, the order should not matter.
But in case we find bugs like this, the order does matter. We don't want
bugs to be flaky, therefore it is best to sort the dict and remove
randomness from the function.

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-09-23 11:16:29 +02:00
5efeba1856 ENH: Add default target layers for gemma2 architecture (#2078)
Google's gemma 2 models have a slightly different architecture than
gemma 1 and thus a different model_type attribute. This PR adds default
target_layer for gemma 2 that correspond to the default target_layer of
gemma 1.

LayerNorm tuning adds one more LN layer.
2024-09-23 11:15:08 +02:00
af275d2d42 ENH: Allow empty initialization of adapter weight (#1961)
This PR allows to initialize the adpater weights as empty, i.e. on meta
device, by passing low_cpu_mem_usage=True.

Why would this be useful? For PEFT training, it is indeed not useful, as
we need the real weights in order to train the model. However, when
loading a trained PEFT adapter, it is unnecessary to initialize the
adapters for real, as we override them with the loaded weights later.

In the grand scheme of things, loading the base model will typically be
much slower, but if the user loads, say, dozens of adapters, the
overhead could add up. Of course, besides loading the model, this has no
performance impact and is thus not a high priority feature.

For the time being, this is completely opt in. However, it should be safe to
make this default for loading adapters. Therefore, in the future we may change
the default there.

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-09-23 11:13:51 +02:00
9bc670eafb MNT Update author email in setup.py (#2086) 2024-09-23 10:43:57 +02:00
5d944589d2 ENH Expose bias of ModulesToSaveWrapper (#2081) 2024-09-20 19:35:24 +02:00
152ed70b00 ENH PiSSA/OLoRA: Preserve original config on save (#2077)
Resolves #2075

When saving PiSSA or OLoRA with the option to convert to normal LoRA,
the LoRA weight shapes change, which means that some values like r and
alpha need to be adjusted in the saved PEFT config. However, these
modifications should be limited to the saved config, while the loaded
config should stay the same.

This PR implements this change by creating a copy of the config before
modifying it.
2024-09-20 12:11:24 +02:00
f5dd2acfed TST Skip some quantization tests on XPU (#2074)
Eetq/hqq/aqlm don't support XPU yet.
2024-09-18 11:27:19 +02:00
3b2ebf1ba1 FIX Bug that prevents BOFT from loading 2 adapters (#2068)
There was a bug in BOFT that made it impossible in some circumstances to
load more than one adapter (creating more than 1 adapter was possible
though). This was because a code path that adjusts
boft_n_butterfly_factor was only visited when creating a fresh adapter,
but not when updating with the 2nd adapter. This was fixed by moving
this code path from the BOFT layer's __init__ method to update_layer.

A test for loading multiple adapters was added. Since this was a gap in
our test suite, this test will be applied to all appropriate PEFT
methods, not only BOFT, but the others methods are all passing without
needing further changes.

For good measure, I also added BOFT to the test suite that checks
multiple active adapters. These tests would have also passed without the
fix in this PR, since these tests do not load multiple adapters but
instead create them, which always worked. Still it's better to have
these tests as well.
2024-09-18 11:19:16 +02:00
adf0a1dc96 ENH Multi adapters in same batch: modules_to_save (#1990)
Extend the functionality of having different adapters in the same batch to also
work with `modules_to_save`.
2024-09-17 13:50:47 +02:00
18f3efe5c0 MNT Update deprecated evaluation_strategy (#1664)
In docs and examples, use eval_strategy instead of evaluation_strategy, which is
deprecated.
2024-09-13 18:01:26 +02:00
4a8dedb2a7 FIX Command line args in PiSSA preprocess (#2053)
Fix bug in parsing command line arguments in the PiSSA preprocess.py script from
the PiSSA example.
2024-09-13 13:59:27 +02:00
25202271bc ENH BOFT don't save boft_P buffer (#2050)
The buffer does not need to be part of the checkpoint, by making it
non-persistent, the file size can be greatly reduced.
2024-09-13 13:56:47 +02:00
214f891cd2 MAINT: Give stale bot permissions for PRs too (#2064) 2024-09-12 12:18:20 -04:00
7868d0372b MNT Permission for GH token in stale.yml (#2061) 2024-09-11 12:36:25 +02:00
734ea9a014 TST Make X-LoRA tests faster (#2059)
After some recent optimizations, the X-LoRA tests are now the slowest
ones. Part of that is that the lora adapters are re-created for each
test. By changing the fixture scope, they're now only created once. I
think this should be safe, as these files are not modified in the tests.

I also enabled test_scalings_logging_methods with the latest
transformers to ensure that this test also passes.
2024-09-11 12:13:24 +02:00
54be5a3db6 TST Speed up vision model tests (#2058)
The HRA vision model test is extremely slow on CI (> 600 sec, 50% of
total time). This change speeds up the test by using a smaller ResNet
model to run the tests.

It's still not clear why HRA was so slow specifically -- LoRA is 40x
faster -- but that can be fixed separately.
2024-09-10 16:15:51 +02:00
b180ae46f8 TST Fewer inference steps for stable diffusion (#2051)
Reduce the number of inference steps for stable diffusion tests. These
tests are the slowest ones on CI, this should help (~3 min on average).
2024-09-06 09:57:56 +02:00
31fbbd2203 FIX TST Scalings logging test latest transformers (#2042)
Fix test for latest transformers, skip for earlier versions.
2024-09-05 14:50:46 +02:00
c9f7240afc FEAT Add VB-LoRA (#2039)
Implements "VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector
Banks"

https://arxiv.org/abs/2405.15179
2024-09-04 11:02:34 +02:00
95b39642fb FIX: Small numerical discrepancy for p-tuning after loading the model (#2047)
There is a small numerical discrepancy between the outputs of a p-tuning
model before and after loading. Even though it is small, it can still
affect generations, so this PR eliminates it.

As an example, without the fix, this is the difference in logits for
opt-125m:

>       torch.testing.assert_close(output_loaded, output_peft)
E       AssertionError: Tensor-likes are not close!
E
E       Mismatched elements: 30 / 10557120 (0.0%)
E       Greatest absolute difference: 1.1086463928222656e-05 at index (0, 9, 9314) (up to 1e-05 allowed)
E       Greatest relative difference: 0.00021288332936819643 at index (0, 9, 9314) (up to 1.3e-06 allowed)

Details about how this comes about are explained here:

https://github.com/huggingface/peft/issues/2043#issuecomment-2321522577

The gist of it is that if we take a single sample, repeat it X times,
and then forward it through a model (which is the training path in
p-tuning), we would expect the same output as if we forwarded this
sample only once and repeated the output X times (the inference path for
p-tuning). However, for sufficiently large models, the two approaches
can have tiny differences.

With the fixed approach, there is no difference between training and
inference code paths when it comes to this. The new code should also be
slightly more compute efficient, but in practice will not make a
noticeable difference.
2024-09-03 16:52:06 +02:00
37b9c5c74b FIX: Error with OLoRA init when using bnb (#2011) 2024-09-03 14:08:25 +02:00
01275b4cb3 ENH: Faster adapter loading if there are a lot of target modules (#2045)
This is an optimization to reduce the number of entries in the
target_modules list. The reason is that in some circumstances,
target_modules can contain hundreds of entries. Since each target module
is checked against each module of the net (which can be thousands), this
can become quite expensive when many adapters are being added. Often,
the target_modules can be condensed in such a case, which speeds up the
process.

A context in which this can happen is when diffusers loads non-PEFT
LoRAs. As there is no meta info on target_modules in that case, they are
just inferred by listing all keys from the state_dict, which can be
quite a lot. See: https://github.com/huggingface/diffusers/issues/9297

As shown there the speed improvements for loading many diffusers LoRAs
can be substantial. When loading 30 adapters, the time would go up from
0.6 sec per adapter to 3 sec per adapter. With this fix, the time goes
up from 0.6 sec per adapter to 1 sec per adapter.

As there is a small chance for undiscovered bugs, we apply this
optimization only if the list of target_modules is sufficiently big.
2024-09-02 12:59:51 +02:00
679bcd8777 ENH Warn if using tied target modules (#2025)
When users are targetting tied weights (e.g. embedding and LM head),
merging the adapter will lead to errors. Now users are warned about the
possibility when they create such a PEFT model and also when they try to
merge.
2024-08-29 10:51:13 +02:00
850eeb5c3a FIX Pre-commit version in config (#2034) 2024-08-26 11:50:02 +02:00
5996d39408 TST Enable more tests in XPU (#2036) 2024-08-26 11:49:18 +02:00
900f96c40d [Add] DoRA Embedding (#2006) 2024-08-23 20:20:42 +02:00
c3b63ce2c4 ENH Test and DoRA compatibility with XPU 2024-08-23 16:01:50 +02:00
1a5d0f8151 FIX: Don't target the classification head when using target_modules="all-linear" (#2033)
Fixes #2027

When using a transformers sequence classification model,
target_modules="all-linear" should not wrap the classification head with
LoRA. This is because it is already wrapped with ModulesToSave, i.e. it
will be fully fine-tuned, which is the generally desired behavior.

Before this bug fix, the classification head would be double-wrapped.
With #2028, this now raises an error. With this PR, it is avoided
completely. Still, keeping #2028 is good because it helps prevent other
situations where double-wrapping might occur due to misconfiguration.

Note that there is no fool-proof method to detect the classification
head, we have to rely on transformers convention.
2024-08-23 16:00:43 +02:00
f3c7c6e5c1 ENH Raise error when applying modules_to_save on tuner layer (#2028)
Relates to #2027

Normally, when selecting the layers for fine-tuning, PEFT already
ensures that the same layer is not targeted for both parameter-efficient
fine-tuning (e.g. LoRA layer) and full fine-tuning (via
modules_to_save), as that makes no sense.

However, there is a loophole when the modules_to_save is applied ex
post. This happens for instance when having a task type like sequence
classification, where PEFT will automatically add the classfication head
to modules_to_save for the user. This loophole is now closed by adding a
check to ModulesToSaveWrapper that validates that the targeted layer is
not a tuner layer.

This does not fully resolve #2027 but will raise an early error in the
future to avoid confusion.

On top of this, the error message inside of
ModulesToSaveWrapper.check_module has been slightly adjusted.
Previously, the class name would be used, which can be confusing. E.g.
for LoRA, the class name of the linear LoRA layer is just "Linear",
which looks the same as nn.Linear. Therefore, the full name is now
shown.
2024-08-22 17:10:39 +02:00
8fcb1951a5 MAINT: Update ruff version to ~0.6.1 (#1965)
Moving to ruff ~0.6.1. Changes:

- type comparisons now require is: str is str
- remove overridden class attribute active_adapter
- remove secondary import of fbd_cuda

Omit jupyter notebooks for now. We can think about adding that in a
separate PR.
2024-08-22 15:23:23 +02:00
fa218e1942 TST test_mixed_adapter_batches_lora_opt_timing on XPU (#2021) 2024-08-21 15:10:19 +02:00
6c832c1dd4 TST Make TestModelAndLayerStatus device-agnostic (#2026) 2024-08-21 12:43:35 +02:00
95821e5ce4 ENH: Better error msg for replace_lora_weights_loftq when using a local model. (#2022)
Resolves #2020

If users want to use a local model, they need to pass the model_path
argument. The error message now says so.
2024-08-21 10:10:54 +02:00
25ab6c9bb2 TST Enable regression tests on XPU (#2019) 2024-08-20 16:13:59 +02:00
b4cf1b3c46 CI Remove regression tests from BNB CI (#2024)
This is a test to see if the BNB CI for multi-backend single-GPU passes
if regression tests are disabled.
2024-08-20 14:15:37 +02:00
eb5eb6efb5 TST Enable test_vera_dtypes on XPU with bf16 (#2017) 2024-08-20 11:25:44 +02:00
f71e89f771 FIX Deprecated params/funcs in X-LoRA example (#2010) 2024-08-20 11:24:38 +02:00
e8ba7de573 CI Activate single core multi backend bnb tests (#2008)
See #1866 for context.

Let's check if this issue has resolved itself by now.
2024-08-16 17:19:20 +02:00
0222450f44 TST: Potentially Skip 8bit bnb regression test if compute capability is too low (#1998)
* TST Potentially Skip 8bit bnb regression test

The 8bit bnb LoRA regression test results are dependent on the
underlying compute capability. The logits are slightly different
depending on the version (up to 0.5 abs diff). Therefore, we now check
the compute capability for this test and skip it if it's too low. This
check may require updating if the hardware of the CI worker is updated.

Note that I have already invalidated the old regression artifacts and
created a new one.

* Fix pytest skip to work without cuda

* Instead of skipping, add a comment to explain

After internal discussion, we think this is the most practical solution
for the time being.
2024-08-16 17:18:25 +02:00
4c3a76fa68 FIX DOC Update X-LoRA docs, some bugfixes (#2002)
Bugs with dtype and loading of LoRA adapters.
2024-08-15 15:29:32 +02:00
670d0fac31 FIX CI Correctly report outcome of bnb import test (#2007) 2024-08-14 20:14:15 +02:00
22f042a107 ENH: Warn when a user provided model name in the config renamed (#2004)
Resolves #2001

In PEFT, users can provide a custom base_model_name_or_path argument to
the PEFT config. However, this value is overridden by the model's
name_or_path attribute. This can be surprising for users. Therefore,
there is now a warning about this.

To see why that can be relevant, check the original issue.
2024-08-14 15:42:58 +02:00
d6e772f192 TST Add LNTuningConfig and LoKrConfig to tests (#2005)
These two configs were missing in test_config.py. Also, reordered the
list of all config classes to be sorted, which makes it easier to spot
missing configs.
2024-08-14 15:42:32 +02:00
042123465c DOC Fix typos in lora.md (#2003) 2024-08-13 15:15:03 +02:00
41c274ecac FIX Import error in BOFT half precision test (#1995) 2024-08-08 15:15:47 +02:00
9988cb9d00 FIX BOFT, OFT saving merged layers (#1994)
Error occurred with safetensors when weights are not contiguous.
2024-08-07 19:26:33 +02:00
fcac30bef5 MAINT Default to loading weights_only for torch (#1993)
The torch.load function allows to pass weights_only=True, which is more
secure but may break some code that is more than just weights. For PEFT,
this should not be the case, so the switch should just work.

By making the switch now, we can find out early if there are any
problems, as torch.load will default to True in the future.
2024-08-07 19:16:55 +02:00
2a5d3132e9 ENH Small updates to helper.rescale_adapter_scale (#1989)
Some renaming, better docs.
2024-08-07 14:51:35 +02:00
c869664891 FIX BOFT mixed precision (#1925) 2024-08-07 14:12:34 +02:00
4611034ff8 FIX: Adjust transformers version check for bloom (#1992)
The fix to the bloom architecture was not actually released in
transformers 4.43.3, which makes the version check invalid. Instead, now
checking an attribute on the BloomPreTrainedModel.
2024-08-06 13:40:14 +02:00
b9260305e3 FIX Docker build CI (#1987)
Signed-off-by: Adrien <adrien@huggingface.co>
2024-08-02 16:51:48 +02:00
f51428313f DOC Docs and examples for X-LoRA (#1970) 2024-08-02 12:35:14 +02:00
9a087823c6 DOC Small fixes for HQQ and section title (#1986)
Changed:

- Helper section had placeholder title
- `device` is not a valid argument to `from_pretrained`
- Excess empty lines
- Move helpers section
2024-08-02 12:33:29 +02:00
46f78978f1 FEAT Context manager for scaling LoRA (#1951) 2024-08-01 17:21:55 +02:00
269aba5303 ENH AdaLoRA: warn when user use r argument (#1981)
For AdaLoRA, init_r is the correct one to use.
2024-08-01 12:24:42 +02:00
52a4ac9c2f ENH Faster bf16 merging on CPU (#1978)
Cast to fp32, as bf16 can be very slow on some CPUs.

This is already done for fp16.
2024-07-31 17:51:46 +02:00
c874ba3f1b CHORE Update CI configuration for workflows (#1985)
Signed-off-by: Adrien <adrien@huggingface.co>
2024-07-31 16:08:58 +02:00
f13d860e9f FIX Loading adapter honors offline mode (#1976)
HF_HUB_OFFLINE=1 was not honored when trying to load an adapter. This is
now fixed.
2024-07-30 16:11:27 +02:00
f6d3e38601 FIX active_adapters for transformers models (#1975)
Fixes the error reported here:

https://github.com/huggingface/transformers/pull/30790#issuecomment-2253808249

Unfortunately, transformers models have an active_adapters method but
it's 1) not a property and 2) calling it fails because the base
model (usually) has no loaded adapter. The base model can be a
transformers model for prompt learning, where the base model is not
wrapped in a LoraModel or similar. Therefore, this special case needs to
be handled separately.
2024-07-30 15:14:28 +02:00
7e7b55880e FIX: lora+: include lr in optimizer kwargs (#1973) 2024-07-30 14:20:04 +02:00
1b16753a6a ENH Update VeRA preconfigured models (#1941)
Some pre-configured models like mistral used not to work with VeRA
because the weight shapes were not identical. However, since #1817, this
is no longer a requirement. Therefore, this commented code can now be
uncommented.

I have tested mistral and gemma and they worked. I haven't tested btlm
and mixtral but with the update, I'm pretty sure they will work too.
2024-07-30 08:15:53 +05:30
27833a2e60 FIX: New bloom changes breaking prompt learning (#1969)
Bloom had two dimensions of the attention layer transposed (compared to
all other transformers models), which was fixed by:

https://github.com/huggingface/transformers/pull/31445

Therefore, for future transformers versions, skip the special handling
in PEFT.

There is also an issue that prompt injection did not take place when
past_key_values was a Cache object that is empty. This should now
hopefully work as expected.
2024-07-29 18:25:41 +02:00
273acf059e FEAT: Add LoRA+ (#1915)
Add LoRA+: Efficient Low Rank Adaptation of Large Models

https://arxiv.org/abs/2402.12354

Call create_loraplus_optimizer to initialize an optimizer with optimizer
parameters that are especially effective for LoRA training.

Builds upon this code base:

https://github.com/nikhil-ghosh-berkeley/loraplus

---------

Co-authored-by: moghadas76 <s.m.moghadas2012@gmail.com>
Co-authored-by: Chris Hua <stillmatic@users.noreply.github.com>
2024-07-29 12:50:30 +02:00
296fbcde3e FIX Prefix tuning if past_key_values is passed (#1942)
There was an error with prefix tuning when some models like Llava passed
past_key_values explicitly, even if it was None, because it resulted in
that argument passed twice (once explicit, once via kwargs). This is now
fixed.
2024-07-29 12:46:54 +02:00
f2b6d13f1d CI Fix Windows permission error on merge test (#1952)
For some reason, Windows CI suddenly started throwing permission
errors on test_merge_layers. These errors occur when using the
TempDirectory() context manager, which raises a PermissionError on
Windows when it tries to clean up after itself. Therefore, this context
manager is now avoided in favor of manual clean up.

More context:

I investigated this issue first in #1947. My suspicion that this could
be caused by a new pytest version was not confirmed. Maybe the reason is
that GH rolled out a new Windows worker, not sure.

Also note that this is not the first time that this workaround is
required, e.g. also here:

e6cd24c907/tests/test_custom_models.py (L1465)
2024-07-25 14:02:34 +02:00
8aacb993e7 Bump version to 0.12.1.dev0 (#1950) 2024-07-25 13:39:39 +02:00
e6cd24c907 Release v0.12.0 (#1946)
Also: Fix small error in doc: mentions wrong version
2024-07-24 13:13:40 +02:00
05f57e94ef PiSSA, OLoRA: Delete initial adapter after conversion instead of the active adapter (#1933)
Resolves #1860

As discussed in that issue, it's not user friendly to delete the default
adapter of a PiSSA/OLoRA model after calling save_pretrained with weight
conversion. Instead, it is much more intuitive to delete the initial
adapter instead, since it is loaded inside the method and not by the
user, so it's really an implementation detail.

Apart from this, I made the following related changes:

- Put everything in a try ... finally to ensure that the initial adapter
  does not hang around if there is an error (thus not hogging memory).
- Renamed initial_adapter to initial_adapter_name, to make it clear that
  this is the name and not the adapter itself.
2024-07-24 12:55:56 +02:00
2ce83e05c1 FIX Decrease memory overhead of merging (#1944) 2024-07-23 20:24:05 +02:00
ebcd0792b8 [WIP] ENH Add support for Qwen2 (#1906)
* [WIP] ENH Add support for Qwen2

Add Qwen2 to default target modules, use tiny Qwen2 in tests.

* Add target_modules for FourierFT

* Skip Qwen2 + weighted combination test

It fails when SVD is involved. See:
https://github.com/huggingface/peft/pull/1901#issuecomment-2235731685

---------

Co-authored-by: BenjaminBossan <b.bossan@gmail.com>
2024-07-23 15:04:13 +05:30
ba75bb14d1 FIX: More VeRA tests, fix tests, more checks (#1900)
* FIX More VeRA tests, fix tests, more checks

- Fixes incorrect config for VeRA in a test
- Add VeRA to multi-adapter tests
- Add more checks on the VeRA A/B shapes

The latter becomes necessary when we add more than one VeRA adapter. The
shapes for VeRA A and B are only determined once, when the first VeRA
adapter is created. After that, they are fixed. However, users may add a
second VeRA adapter. As long as that adapter targets the same layers and
has the same rank, we're good. But if it targets other, bigger layers,
or if it has increased rank, the shapes of VeRA A and/or VeRA B will be
too small, resulting in an error during the forward pass. To prevent
this, we already check the shapes during initialization of the new
adapter and raise an error right away.

* Revier feedback: wording, better error message

* Reviewer feedback: Clarify tests

---------

Co-authored-by: BenjaminBossan <b.bossan@gmail.com>
2024-07-22 19:12:15 +05:30
6472061a76 FIX Prefix tuning Grouped-Query Attention (#1901)
Fix prefix tuning when GQA is being used.
2024-07-22 11:46:24 +02:00
e02b938e02 FIX PiSSA & OLoRA with rank/alpha pattern, rslora (#1930)
* FIX PiSSA & OLoRA with rank/alpha pattern, rslora

See https://github.com/huggingface/peft/issues/1929#issuecomment-2230780802

At the moment, when using PiSSA or OLoRA with weight conversion to
restore the original base weights, there is an error when either of
rank_pattern, alpha_pattern, or rslora is being used. This PR fixes
this.

The issue is that we need to double the rank of the LoRA adapter. Right
now, this is done by simply doubling r and alpha. But if rank_pattern
and alpha_pattern are being used, those need to be doubled too.

Furthermore, when using rslora, the scaling is again different, namely
alpha / sqrt(r). This also needs to be adjusted.

Unfortunately, when using rslora with rank_pattern and alpha_pattern,
this gets way more complicated. Since we don't store the scaling in the
state_dict, we would have to resolve all the patterns here to determine
the correct scaling, i.e. reimplement the whole matching and init logic.
This is a lot of work for a very edgy edge case.

Therefore, I opted to raise an error instead. This is not super nice, as
the error is only raised when trying to save the model, i.e. a lot of
time may already have been spent to train the model. But we cannot know
this earlier, so not much can be done.

Overall, this fix is ugly because it further couples unrelated code. For
instance, if we add new init methods that affect the scaling, we need to
remember to change the saving logic accordingly. If anyone has a better
idea, LMK.

* Make style

* Also warn during init if there is a potential

... for saving not to work

* Ensure that GPU tests are run for PiSSA+OLoRA

* Use renamed argument name

* Make style

* Reviewer feedback: Better document the change

* Add clarifying comments to tests
2024-07-19 14:53:38 +05:30
5268495213 FEAT Add HRA: Householder Reflection Adaptation (#1864)
Implements method from https://arxiv.org/abs/2405.17484.
2024-07-16 14:37:32 +02:00
2aaf9cedbb ENH Sync LoRA tp_layer methods with vanilla LoRA (#1919) 2024-07-16 10:39:36 +02:00
a019f8690d FIX sft script print_trainable_parameters attr lookup (#1928) 2024-07-15 17:09:14 +02:00
2a6402f4b2 DOC Fix typo of encoder_reparameterization_type (#1926) 2024-07-15 12:06:12 +02:00
e72a96f7cf FEAT Add FourierFT Support (#1838)
Add Parameter-Efficient Fine-Tuning with Discrete Fourier Transform

https://arxiv.org/abs/2405.03003

---------

Co-authored-by: zqgao22 <zgaoat@connect.ust.hk>
Co-authored-by: Chaos96 <wangqch7@mail2.sysu.edu.cn>
Co-authored-by: DSAILatHKUST <dsailathkust@163.com>
2024-07-09 12:20:01 +02:00
48e136d9bd FIX: Flaky multitask prompt tuning test fixed by setting the seed (#1908)
Set the seed for test test_generate_text_with_other_init and 
test_generate_text_with_random_init because otherwise they are
flaky and fail with ~5% probability. Explanation in comment.
2024-07-09 10:05:10 +02:00
58afb34ea0 FEAT Integrate X-LoRA (#1491)
Implements X-LoRA: Mixture of Low-Rank Adapter Experts
Paper: https://arxiv.org/abs/2402.07148
2024-07-05 12:38:18 +02:00
01f1b992eb Example: DNA Language Model. (#1873) 2024-07-05 11:55:26 +02:00
09358aad30 Chore: Docs markdown formatting (#1899) 2024-07-03 18:12:53 +02:00
31c0d85755 FIX DeepSpeed recursion error (#1892)
Happened when accessing attribute before init.
2024-07-03 18:07:31 +02:00
018a1f49c4 FIX TEST Even higher tolerance for AdaLoRA in test (#1898)
See #1897 for more context. The test is still flaky, increasing
tolerance further.
2024-07-02 12:36:03 +02:00
1e2258d7f7 ENH Ephemeral GPU offload support for DoRA (#1857)
Adds the concept of ephemeral GPU offloading, i.e. where data in compute
intense operations is copied onto the GPU before the operation is
performed, after which the result is put back on CPU memory.

This PR adds support in the DoRA initialization code, but the approach
can be applied in a number of places: when the size of the data compared
to the time to perform the operation on CPU memory is heavily time
dominant, using ephemeral transfers has a fairly small VRAM overhead
(depending on the size of the model/adapter) with orders of magnitude
speed-up in certain operations.

For example, a Llama3-8B DoRA adapter with r=64 would put an overhead of
2 x (64 x 4096 x 2 + 4096 x 4096) bytes (assuming fp16), i.e. 33 MB or
so. A Llama3-70B adapter with r=32 would have 2 x (32 x 8192 x 2 + 8192
x 8192) bytes =130 MB.

By making use of ephemeral GPU offloading, more efficient juggling of
data between GPU and CPU may become possible, i.e. where instead of
always loading as much as we can onto the GPU and then endure the CPU
slowness for whatever happens to not fit in there, we intentionally
leave a (modest) chunk of VRAM for optimizations like these, and the end
result is a much (MUCH) faster experience.
2024-07-02 12:17:45 +02:00
1e5227ff90 TST Bump absolute tolerance for test (#1891)
The test test_4bit_lora_mixed_adapter_batches_lora allclose can fail on
some systems, even though it passes on others (like CI). Increase the
tolerance slightly to get rid of this.
2024-07-02 11:37:43 +02:00
62122b5add FIX TEST Higher tolerance for AdaLoRA in test (#1897)
The test is flaky on CI, so this PR increases the tolerance to hopefully
fix the flakines. I cannot reproduce the error locally (neither on GPU
nor CPU), so I'm not 100% sure if this tolerance is enough to make the
test reliable.
2024-07-01 15:42:10 +02:00
9dc53b8fd5 CI Don't fail fast in test matrix (#1896)
Currently, we have fail-fast enabled (the default). Although this is
generally reasonable -- if a test fails in one setting, we probably get
the same failure in other settings -- it is currently an impediment.
This is because we get occasional timeouts when loading models from the
Hub. With fail-fast enabled, if a single setting fails because of
timeouts, all other runs are cancelled, even if they would have passed.
Then we need to retrigger all of them again, creating even more pressure
on the Hub. With fail-fast disabled, we give those other runs a chance
to pass successfully.
2024-07-01 15:04:02 +02:00
db8b76fdb5 DOC DoRA example script & notebook (#1885) 2024-06-28 12:05:53 +02:00
7ffa43b16e FIX Avoid early import of torch extension by BOFT (#1879) 2024-06-26 17:25:26 +02:00
27bc3054a3 FIX sft script: only print trainable params if peft (#1888) 2024-06-26 12:02:35 +02:00
184beaf1d6 FIX Make special LoRA inits DeepSpeed compatible (#1887)
Resolves https://github.com/huggingface/accelerate/issues/2886

Possibly resolves
https://github.com/huggingface/peft/issues/896#issuecomment-2172455458

Some LoRA init methods need to access the base layer weight. Getting
this access can fail or stall in distributed settings. For DeepSpeed,
the weight is now gathered before trying to access it.

Note: Without DeepSpeed, this is a no-op and should thus not have any
disadvantage. We don't have DS in our CI, so this is not tested.

I also made some small changes to OLoRA init to use
self.get_base_layer() instead of self.base_layer.
2024-06-26 11:25:54 +02:00
c9b19bb8f3 FIX Init AdaLoRA to be identity transform (#1884)
Resolves #1836

There was an accidental change in a previous PR that initialized lora_E
as normal, when it should be zeros.
2024-06-25 13:33:28 +02:00
ef23712b13 ENH: LoRA support for dynamically dispatching to custom layers (#1875)
Description

This is an experimental feature with a private API for now. If this
feature finds adoption, I will work on adding an official API.

With this PR, we allow users to register their own LoRA layer types.
This way, they can add their own support for hitherto unsupported layer
types, say nn.Conv3d or nn.LSTM. Without this PR, they can only do that
by creating a PR on PEFT with support for this new type and getting it
merged.

The custom dispatch mechanism also allows users to override existing
layer type mapping. This way, they can, for instance, provide their own
lora.Linear layer type, instead of using the one from PEFT, to adapt
nn.Linear layers.

Implementation

The implementation required only very few changes because we already
have a mechanism for dynamic dispatching for LoRA. It is currently used,
for instance, to dynamically add quantized target layers in case the
right quantization library is installed.

This existing mechanism is now extended to include user provided LoRA
layers if those were passed. These are checked first before checking the
default PEFT supported layers.

What's missing for this to become an official API?

Right now, the main reason why this cannot be an official API is the
question of how to persist the config. In the current implementation, we
add an attribute that is a mapping from target layer type to LoRA layer
type:

config._custom_modules == {CustomBaseLayer: CustomLoraLayer}

The entries of this dict are Python classes. Therefore, they cannot be
json-serialized. We could think of possible solutions how to serialize
and deserialize custom Python objects, but this is not trivial and
potentially a security risk. Thus I would only really start working on
this if the demand is sufficiently high. At that point, I would also add
a public API instead of requiring the use of a private API.

As is, users can still save and load PEFT models with custom LoRA
layers, they only need to add two lines of code to their scripts, as
documented.

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-06-25 11:02:43 +02:00
d716adf31c Update bug-report.yml (#1882) 2024-06-21 16:45:44 +02:00
d37dde61e1 FIX Error when using VeRA with fp16 or bf16 (#1874)
The issue was that we didn't consider BufferDict when auto-casting the
adapter weights to float32 in PR #1706. This has now been addressed.

As #1706 was merged after the latest release, this bug should only
affect users who install from main, so a patch release should not be
needed.

As part of this PR, I also moved the buffer_dict.py up from
peft/tuners/vera to peft/tuners/ (with underscore to make it super clear
that this is not for public usage). This is because we need to use it
several times on a higher level than VeRA.
2024-06-19 13:21:17 +02:00
5364351446 CI Downgrade numpy to <2.0 for Mac and Windows (#1871) 2024-06-18 13:47:29 +02:00
717db6e1c2 CI testing BNB: remove single GPU tests (#1866)
CI testing BNB: remove single GPU tests

In #1859, we checked removing the import checks, but the single-GPU BNB
multi-backend branch is still stuck. Therefore, check commenting the next
step instead.

Also, add timeout of 60 min. Successful jobs currently take ~30 min. Default
timeout is 360 minutes.
2024-06-18 10:34:24 +02:00
5194aef509 Attempt to fix the red messages (#1868) 2024-06-17 15:34:31 +02:00
25c0fe9a55 FIX fix multitask prompt tuning paper link (#1862) 2024-06-17 10:57:34 +02:00
e0e8204bc3 Update lora_based_methods.md (#1861)
fixed typo in instructions for peft inference
2024-06-17 10:57:27 +02:00
076561bbd3 CI Testing: Remove bnb import check (#1859) 2024-06-14 18:02:27 +02:00
efda766f51 DOC Move helpers section to dev developer guide (#1856)
It was in the "Adapters" section, which doesn't really fit.
2024-06-13 12:44:25 +02:00
d608f8329a DOC FIX Comment about init of LoRA Embedding (#1855)
Fixes #1728
2024-06-13 11:58:26 +02:00
19461353aa Update nightly-bnb.yml (#1854) 2024-06-13 11:40:40 +02:00
3831e06ab5 FIX: Adalora ranknum loaded on wrong device (#1852)
Locally, multiple AdaLoRA-related tests are failing. We did not catch
this in the nightly run because the tests were missing the necessary
pytest marker.

The issue is related to the change in #1742, which made it possible to
load different adapters on different devices.

Although that PR itself was sound, the issue is that for AdaLoRA, one of
its parameters, ranknum, was not listed in the other_param_names and was
thus not moved to the correct device. This oversight is now fixed and
the GPU tests are now passing locally for me.

This PR also adds the missing pytest marker to the test class that was
missing it, so that these errors should be caught by our nightly CI in
the future.
2024-06-13 10:47:49 +02:00
2f5360a7da FEAT Add OLoRA initialization strategy to LoRA (#1828) 2024-06-12 17:46:43 +02:00
8843a767da MNT Upgrade ruff version to ~0.4.8 (#1851)
We currently use ruff v0.2.2, which is quite far behind the latest
version. This has the disadvantage that new contributors will often
install the latest version of ruff and then get CI errors, even though
they ran `make style`.

Here is the full list of changes:

- bump ruff version to ~0.4.8
- update the ruff commands in Makefile (ruff foo/ -> ruff check foo/)
- update coding style of two files that changed with the new ruff
  version
2024-06-12 15:01:45 +02:00
b6af7feb34 DOC Fix PeftMixedModel docstring example #1824 (#1850) 2024-06-12 14:27:14 +02:00
47b3d7422a CI Activate env to prevent bnb import error (#1845)
All bitsandbytes nightly CI runs are currently failing with:

Run python3 -m bitsandbytes
/opt/conda/bin/python3: No module named bitsandbytes

This fix should hopefully solve this, but it's untested.
2024-06-11 10:59:32 +02:00
7b1c08d2b5 ENH Support different layer shapes for VeRA (#1817) 2024-06-10 17:10:56 +02:00
a8286a7bff DOC Describe torch_device in from_pretrained docs (#1843) 2024-06-10 16:01:00 +02:00
683db0fa2c feat(ci): add trufflehog secrets detection (#1841)
* feat(ci): add trufflehog secrets detection

* fix(ci): remove unnecessary permissions
2024-06-10 11:40:36 +02:00
0f89d34d82 Fix broken messages (#1842) 2024-06-10 11:21:48 +02:00
0b40d1a304 Workflow / Bnb: Add a mechanism to inform us if the import fails (#1830)
* Update nightly-bnb.yml

* Update nightly-bnb.yml

* Update .github/workflows/nightly-bnb.yml

* Update .github/workflows/nightly-bnb.yml
2024-06-07 16:38:10 +02:00
03798a9143 FIX Failing Llama tests due to new kv cache (#1832)
The issue is that past_key_values can now be an instance of
DynamicCache. Therefore, just indexing into it won't work anymore. The
solution is to check the type and if it's not a tuple/list, use the methods
on the cache object instead.
2024-06-06 15:49:59 +02:00
d33c1f118e fix doc typo (#1833) 2024-06-06 15:34:10 +02:00
63a536b18e TST Make tests pass on Cambricon MLUs (#1747)
Small adjustments to tests to make them pass on Cambricon MLUs (mostly
tolerances). Note that we have no MLU test runners for PEFT, so have to
rely on others to run these tests.
2024-06-06 10:44:03 +02:00
ad8f7cb59e Update build_docker_images.yml (#1823) 2024-06-04 13:34:37 +02:00
3538e8ac7d FIX CI: Install pytest-reportlog package (#1822) 2024-06-04 13:09:09 +02:00
b213ea5fb9 Update tests-main.yml (#1821) 2024-06-04 12:31:31 +02:00
7ed94f3269 FIX CI: Remove potentially problematic git command (#1820)
See if this fixes the error in the workflow.

> fatal: detected dubious ownership in repository at '/__w/peft/peft'
2024-06-04 12:18:37 +02:00
a0788a3f92 Refactor to make DoRA and QDoRA work with FSDP (#1806)
This PR moves all the DoRA functionality into a separate module class.
Essentially, this is necessary because otherwise, the DoRA parameter
lives on the lora.Linear layer as a parameter, not a separate module.
Since FSDP auto wrap policy operates on the level of modules, not
parameters, there is no way to modify the auto wrap policy to wrap the
DoRA parameter, it must be its own module.

If not for this reason, #1797 would be preferable, since the number of
code changes is smaller overall. In this PR, there are more numerous
changes, but the majority only involves moving code around, not any
actual code changes.

Since we introduce a new submodule, an extra steps are required to
ensure that old DoRA state dicts can still be loaded correctly. This
involves a fairly trivial extra remapping step in
set_peft_model_state_dict. The test for this is performed via the new
regression DoRA tests introduced in #1792.

Similarly, there is a remapping step involved in
get_peft_model_state_dict to ensure that when new state dicts with DoRA
are saved, they still conform to the old format.

An additional required change was to make a defensive copy of the base
layer before dequantizing its weight in order to calculate the weight
norm for DoRA. Without this defensive copy, some side-effect is
triggered in FSDP that results in

> ValueError: Cannot flatten integer dtype tensors

even though the compute dtype of bnb is correctly set to float.

Creating a fully functioning deepcopy does currently not work with 8bit
BNB but there is a fix. Once the next BNB release is out, 8bit BNB will
be tested and enabled.

While working on this, I also noticed a small bug that dropout was not
correctly applied when using QDoRA. This is now also fixed.

This PR was tested successfully with FSDP and (Q)DoRA using the scripts
in examples/sft/ with a modification to enable DoRA.
2024-05-31 16:56:21 +02:00
cb0bf07774 MNT Remove deprecated use of load_in_8bit (#1811)
Don't pass load_in_8bit to AutoModel.from_pretrained, instead use
BitsAndBytesConfig.

There was already a PR to clean this up (#1552) but a slightly later
PR (#1518) re-added this usage.

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2024-05-30 15:39:26 +02:00
8cd2cb613b CI Make torch compile tests run on GPU (#1808)
Many of these tests require a GPU to run, so use custom runners.

Code was mostly copied from existing workflows.

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2024-05-30 12:37:18 +02:00
e7b75070c7 TST: Add simple BNB regression tests (#1602)
These are very basic and simplistic regression tests for bnb. Their
purpose is to ensure that there is no unnoticed change in bnb that leads
to different outputs. There is no check for "correctness", just that the
results haven't changed.

Eventually, this workflow should be improved and moved to the bnb repo.

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2024-05-28 11:36:38 +02:00
1b262167f3 Docs / LoRA: Add more information on merge_and_unload docs (#1805)
* put back lora merging diagram

* push

* Update docs/source/developer_guides/lora.md

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2024-05-28 11:13:44 +02:00
39c60ffca9 TST Add regression test for DoRA, VeRA, BOFT, LNT (#1792)
These new methods were added but the regression tests were not extended
yet. This PR adds regression tests for these methods. The regression
artifacts have been pushed based on PEFT v0.11.1. The new tests pass
locally.
2024-05-27 12:00:47 +02:00
8304017a9a FIX BOFT device error after PR 1742 (#1799)
PR #1742 introduced the feature that adapters of the same layer can be
on different devices. A new method was introduced that is responsible
for moving the parameters related to a specific adapter in a consistent
way.

In BOFT, however, one parameter was overlooked, boft_P. This parameter
is not stored inside a ParameterDict or ModuleDict, hence it was not
moved. The reason is (presumably) that this parameter is shared between
all BOFT adapters, as it's always identical. However, this clashes with
having different adapters on different devices.

To solve this, the parameter is now moved on the fly to the correct
device during the forward pass.
2024-05-27 10:12:22 +02:00
b2922565c4 TST Install bitsandbytes for compile tests (#1796)
Also, remove outdated comment.
2024-05-23 16:12:57 +02:00
3cf5359f11 FIX Allow same layer adapters on different devices (#1742)
The issue is that so far, we made the assumption in PEFT that all
adapter weights of a specific layer are on the same device. There can be
cases where it is useful to have adapters on different devices. E.g.
when a user loads a lot of LoRA adapters and wants to offload those not
currently in use to CPU, they would not currently be able to do so.

With this PR, we add this possibility. To achieve this, when we update
an adapter layer with a new adapter, we only move that specific adapter
to the device of the base layer, will not touching the other loaded
adapters.

While working on this, I discovered a small bug in VeRA when adding
multiple adapters, which is now also fixed.
2024-05-23 10:54:40 +02:00
cb7aedd9ba fix docs (#1793) 2024-05-23 11:37:30 +05:30
47745d57c2 FIX Use correct attribute name for HQQ in merge (#1791)
Without this fix, test_hqq_lora_model_outputs currently fails.
2024-05-22 16:35:27 +02:00
1fec23152a DOC TST Reproducibility of models using batch norm (#1734)
Fixes #1732

After loading a model that was trained with PEFT on a base model with
some kind of batch norm layer, the loaded model should produce the same
output. Right now, this does not happen.

The reason is that during training, buffers for running mean etc. are
updated, but they are not saved when calling save_pretrained on the
PeftModel instance. Normally in PEFT, we assume that during training,
the base model parameters are kept constant, which is not the case with
batch norm. We only save the PEFT parameters and assume that when the
user loads the base model, all parameters are restored exactly. That
way, the information in the buffers is lost completely.

The fix is to add the batch norm layers to modules_to_save. This fix is
now documented and tested.
2024-05-22 10:43:29 +02:00
bc6a99906c FIX Warning abt config.json when the base model is local. (#1668)
Fix incorrect warning when loading local model.
2024-05-21 15:45:06 +02:00
691bc22ea6 ENH Layer/model status shows devices now (#1743)
For each adapter, show all the devices of this adapter's parameters.

Also, while working on this, found a very minor bug in VeRA as its
linear layer didn't implement its own __repr__.
2024-05-21 15:35:51 +02:00
fb7f2796e5 Add add_weighted_adapter to IA3 adapters (#1701)
* Add add_weighted_adapter to IA3 adapters

* Refactor to simplify code

* refactor test

* Add IA3 merging docs

* Update docs/source/developer_guides/model_merging.md

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* Update docs/source/developer_guides/model_merging.md

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* address PR feedback

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2024-05-17 22:29:22 +05:30
4e32679f37 TST: torch compile tests (#1725)
Right now, we don't have specific tests for torch.compile. Instead, we
have a "hack" that allows to run _all_ tests with torch.compile if we
set the environment variable PEFT_DEBUG_WITH_TORCH_COMPILE=1.

This is not very practical because it takes a lot of time to run all
these tests with compilation enabled. Also, currently hundreds of tests
are failing, which makes it impossible to understand more closely what
does or does not work.

This PR removes the aforementioned "hack" and instead replaces it with a
list of explicit torch.compile tests. Currently, these tests cover
training/inference with a bunch of different tuner types, as well as
more advanced features with LoRA (e.g. quantization, multiple adapters,
etc.).

Some of these tests pass and some of them fail. This is documented now,
so that users can quickly look up if their use case would be compatible
with torch.compile. This is very useful to have, because sometimes
torch.compile may appear to work but actually returns the wrong result.
For users, it's not immediately obvious when this happens.

The test suite is not exhaustive, there are many combinations of
features that could be added. However, it should be a good starting
point and can be extended later.

The test suite does _not_ cover whether torch.compile actually
accelerates the code. This may not be the case even if it works
correctly (e.g. because of graph breaks). Testing this would require
bigger models and more data, which is prohibitively slow to test.

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-05-17 18:03:27 +02:00
3f7aacd601 Bump version to 0.11.2.dev0 (#1741)
After patch release of 0.11.1.
2024-05-17 15:37:30 +02:00
e3eeabfad2 FIX BOFT setting env vars breaks C++ compilation (#1739)
Resolves #1738
2024-05-17 12:43:03 +02:00
ae1ae20b76 Autocast adapter weights if fp16/bf16 (#1706)
As discussed internally, we want to automatically cast the weights of
the adapter to float32 when using float16. Float16 is not conducive to
stable training and raises errors when used with AMP.

Previously, we had to recommend to users to manually cast the weights
if they loaded the base model in float16, because PEFT would choose the
same dtype for the adapter as for the base weights. Forgetting this is a
common source of errors, so we choose to automate this.

If this causes trouble, users can prevent the behavior by passing
autocast_adapter_dtype=False to get_peft_model,
PeftModel.from_pretrained, or PeftModel.load_adapter.

This PR should be reviewed carefully, as it has the potential to break
existing code if something important was missed. We also need to add a
note for the upcoming release text about this change in behavior.
2024-05-16 17:11:36 +02:00
2535036c24 ENH Save and load base model with revision (#1658) 2024-05-16 16:27:53 +02:00
e003ae7850 Bump version to 0.11.1.dev0 (#1736) 2024-05-16 12:34:29 +02:00
169 changed files with 28229 additions and 994 deletions

View File

@ -23,7 +23,7 @@ body:
Please tag fewer than 3 people.
Library: @pacman100 @younesbelkada @benjaminbossan @sayakpaul
Library: @benjaminbossan @sayakpaul
Documentation: @stevhliu

View File

@ -16,18 +16,9 @@ env:
jobs:
latest-cpu:
name: "Latest Peft CPU [dev]"
runs-on: ubuntu-latest
runs-on:
group: aws-general-8-plus
steps:
- name: Cleanup disk
run: |
sudo ls -l /usr/local/lib/
sudo ls -l /usr/share/
sudo du -sh /usr/local/lib/
sudo du -sh /usr/share/
sudo rm -rf /usr/local/lib/android
sudo rm -rf /usr/share/dotnet
sudo du -sh /usr/local/lib/
sudo du -sh /usr/share/
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
- name: Check out code
@ -49,25 +40,16 @@ jobs:
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: "C06LKJB31RU"
title: 🤗 Results of the PEFT-CPU docker build
slack_channel: ${{ env.CI_SLACK_CHANNEL }}
title: 🤗 Results of the PEFT-CPU docker build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
latest-cuda:
name: "Latest Peft GPU [dev]"
runs-on: ubuntu-latest
runs-on:
group: aws-general-8-plus
steps:
- name: Cleanup disk
run: |
sudo ls -l /usr/local/lib/
sudo ls -l /usr/share/
sudo du -sh /usr/local/lib/
sudo du -sh /usr/share/
sudo rm -rf /usr/local/lib/android
sudo rm -rf /usr/share/dotnet
sudo du -sh /usr/local/lib/
sudo du -sh /usr/share/
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
- name: Check out code
@ -84,30 +66,21 @@ jobs:
context: ./docker/peft-gpu
push: true
tags: huggingface/peft-gpu
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: "C06LKJB31RU"
title: 🤗 Results of the PEFT-GPU docker build
slack_channel: ${{ env.CI_SLACK_CHANNEL }}
title: 🤗 Results of the PEFT-GPU docker build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
latest-cuda-bnb-source:
name: "Latest Peft GPU + bnb source [dev]"
runs-on: ubuntu-latest
runs-on:
group: aws-general-8-plus
steps:
- name: Cleanup disk
run: |
sudo ls -l /usr/local/lib/
sudo ls -l /usr/share/
sudo du -sh /usr/local/lib/
sudo du -sh /usr/share/
sudo rm -rf /usr/local/lib/android
sudo rm -rf /usr/share/dotnet
sudo du -sh /usr/local/lib/
sudo du -sh /usr/share/
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
- name: Check out code
@ -124,30 +97,21 @@ jobs:
context: ./docker/peft-gpu-bnb-source
push: true
tags: huggingface/peft-gpu-bnb-source
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: "C06LKJB31RU"
title: 🤗 Results of the PEFT-GPU (bnb source / HF latest) docker build
slack_channel: ${{ env.CI_SLACK_CHANNEL }}
title: 🤗 Results of the PEFT-GPU (bnb source / HF latest) docker build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
latest-cuda-bnb-source-latest:
name: "Latest Peft GPU + bnb source [accelerate / peft / transformers latest]"
runs-on: ubuntu-latest
runs-on:
group: aws-general-8-plus
steps:
- name: Cleanup disk
run: |
sudo ls -l /usr/local/lib/
sudo ls -l /usr/share/
sudo du -sh /usr/local/lib/
sudo du -sh /usr/share/
sudo rm -rf /usr/local/lib/android
sudo rm -rf /usr/share/dotnet
sudo du -sh /usr/local/lib/
sudo du -sh /usr/share/
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
- name: Check out code
@ -164,30 +128,21 @@ jobs:
context: ./docker/peft-gpu-bnb-latest
push: true
tags: huggingface/peft-gpu-bnb-latest
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: "C06LKJB31RU"
title: 🤗 Results of the PEFT-GPU (bnb source / HF source) docker build
slack_channel: ${{ env.CI_SLACK_CHANNEL }}
title: 🤗 Results of the PEFT-GPU (bnb source / HF source) docker build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
latest-cuda-bnb-source-multi:
name: "Latest Peft GPU + bnb (multi-backend) source [accelerate / peft / transformers source]"
runs-on: ubuntu-latest
runs-on:
group: aws-general-8-plus
steps:
- name: Cleanup disk
run: |
sudo ls -l /usr/local/lib/
sudo ls -l /usr/share/
sudo du -sh /usr/local/lib/
sudo du -sh /usr/share/
sudo rm -rf /usr/local/lib/android
sudo rm -rf /usr/share/dotnet
sudo du -sh /usr/local/lib/
sudo du -sh /usr/share/
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
- name: Check out code
@ -204,13 +159,13 @@ jobs:
context: ./docker/peft-gpu-bnb-multi-source
push: true
tags: huggingface/peft-gpu-bnb-multi-source
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: "C06LKJB31RU"
title: 🤗 Results of the PEFT-GPU (bnb source multi-backend / HF latest) docker build
slack_channel: ${{ env.CI_SLACK_CHANNEL }}
title: 🤗 Results of the PEFT-GPU (bnb source multi-backend / HF latest) docker build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}

View File

@ -15,11 +15,13 @@ env:
jobs:
run_all_tests_single_gpu:
timeout-minutes: 60
strategy:
fail-fast: false
matrix:
docker-image-name: ["huggingface/peft-gpu-bnb-source:latest", "huggingface/peft-gpu-bnb-latest:latest", "huggingface/peft-gpu-bnb-multi-source:latest"]
runs-on: [self-hosted, single-gpu, nvidia-gpu, t4, ci]
runs-on:
group: aws-g6-4xlarge-plus
env:
CUDA_VISIBLE_DEVICES: "0"
TEST_TYPE: "single_gpu_${{ matrix.docker-image-name }}"
@ -44,26 +46,91 @@ jobs:
echo "Checking out tag for Transformers version: v$transformers_version"
git fetch --tags
git checkout tags/v$transformers_version
cd ..
cd ..
fi
- name: Test bnb import
id: import
if: always()
run: |
source activate peft
python3 -m bitsandbytes
python3 -c "import bitsandbytes as bnb"
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.BNB_SLACK_CHANNEL_ID }}
title: 🤗 Results of bitsandbytes import
status: ${{ steps.import.outcome }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
- name: Run examples on single GPU
id: examples_tests
if: always()
run: |
source activate peft
make tests_examples_single_gpu_bnb
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.BNB_SLACK_CHANNEL_ID }}
title: 🤗 Results of bitsandbytes examples tests - single GPU
status: ${{ steps.examples_tests.outcome }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
- name: Run core tests on single GPU
id: core_tests
if: always()
run: |
source activate peft
make tests_core_single_gpu_bnb
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.BNB_SLACK_CHANNEL_ID }}
title: 🤗 Results of bitsandbytes core tests - single GPU
status: ${{ steps.core_tests.outcome }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
# TODO: this is a test to see if BNB multi-backend single-GPU tests succeed w/o regression tests
# - name: Run BNB regression tests on single GPU
# id: regression_tests
# if: always()
# run: |
# source activate peft
# make tests_gpu_bnb_regression
# - name: Post to Slack
# if: always()
# uses: huggingface/hf-workflows/.github/actions/post-slack@main
# with:
# slack_channel: ${{ secrets.BNB_SLACK_CHANNEL_ID }}
# title: 🤗 Results of bitsandbytes regression tests - single GPU
# status: ${{ steps.regression_tests.outcome }}
# slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
- name: Run transformers tests on single GPU
id: transformers_tests
if: always()
run: |
source activate peft
make transformers_tests
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.BNB_SLACK_CHANNEL_ID }}
title: 🤗 Results of bitsandbytes transformers tests - single GPU
status: ${{ steps.transformers_tests.outcome }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
- name: Generate Report
if: always()
run: |
@ -71,11 +138,13 @@ jobs:
python scripts/log_reports.py --slack_channel_name bnb-daily-ci-collab >> $GITHUB_STEP_SUMMARY
run_all_tests_multi_gpu:
timeout-minutes: 60
strategy:
fail-fast: false
matrix:
docker-image-name: ["huggingface/peft-gpu-bnb-source:latest", "huggingface/peft-gpu-bnb-latest:latest", "huggingface/peft-gpu-bnb-multi-source:latest"]
runs-on: [self-hosted, multi-gpu, nvidia-gpu, t4, ci]
runs-on:
group: aws-g6-12xlarge-plus
env:
CUDA_VISIBLE_DEVICES: "0,1"
TEST_TYPE: "multi_gpu_${{ matrix.docker-image-name }}"
@ -101,31 +170,78 @@ jobs:
git fetch --tags
git checkout tags/v$transformers_version
cd ..
fi
fi
- name: Test bnb import
id: import
if: always()
run: |
source activate peft
python3 -m bitsandbytes
python3 -c "import bitsandbytes as bnb"
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.BNB_SLACK_CHANNEL_ID }}
title: 🤗 Results of bitsandbytes import
status: ${{ steps.import.outcome }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
- name: Run core GPU tests on multi-gpu
if: always()
run: |
source activate peft
- name: Run examples on multi GPU
id: examples_tests
if: always()
run: |
source activate peft
make tests_examples_multi_gpu_bnb
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.BNB_SLACK_CHANNEL_ID }}
title: 🤗 Results of bitsandbytes examples tests - multi GPU
status: ${{ steps.examples_tests.outcome }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
- name: Run core tests on multi GPU
id: core_tests
if: always()
run: |
source activate peft
make tests_core_multi_gpu_bnb
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.BNB_SLACK_CHANNEL_ID }}
title: 🤗 Results of bitsandbytes core tests - multi GPU
status: ${{ steps.core_tests.outcome }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
- name: Run transformers tests on multi GPU
id: transformers_tests
if: always()
run: |
source activate peft
make transformers_tests
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.BNB_SLACK_CHANNEL_ID }}
title: 🤗 Results of bitsandbytes transformers tests - multi GPU
status: ${{ steps.transformers_tests.outcome }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
- name: Generate Report
if: always()
run: |

View File

@ -17,7 +17,8 @@ jobs:
run_all_tests_single_gpu:
strategy:
fail-fast: false
runs-on: [self-hosted, single-gpu, nvidia-gpu, t4, ci]
runs-on:
group: aws-g6-4xlarge-plus
env:
CUDA_VISIBLE_DEVICES: "0"
TEST_TYPE: "single_gpu"
@ -34,7 +35,7 @@ jobs:
source activate peft
pip install -e . --no-deps
pip install pytest-reportlog
- name: Run common tests on single GPU
run: |
source activate peft
@ -44,7 +45,7 @@ jobs:
run: |
source activate peft
make tests_examples_single_gpu
- name: Run core tests on single GPU
run: |
source activate peft
@ -54,7 +55,7 @@ jobs:
run: |
source activate peft
make tests_regression
- name: Generate Report
if: always()
run: |
@ -64,7 +65,8 @@ jobs:
run_all_tests_multi_gpu:
strategy:
fail-fast: false
runs-on: [self-hosted, multi-gpu, nvidia-gpu, t4, ci]
runs-on:
group: aws-g6-12xlarge-plus
env:
CUDA_VISIBLE_DEVICES: "0,1"
TEST_TYPE: "multi_gpu"
@ -85,22 +87,22 @@ jobs:
- name: Run core GPU tests on multi-gpu
run: |
source activate peft
- name: Run common tests on multi GPU
run: |
source activate peft
make tests_common_gpu
- name: Run examples on multi GPU
run: |
source activate peft
make tests_examples_multi_gpu
- name: Run core tests on multi GPU
run: |
source activate peft
make tests_core_multi_gpu
- name: Generate Report
if: always()
run: |

View File

@ -9,6 +9,9 @@ jobs:
name: Close Stale Issues
if: github.repository == 'huggingface/peft'
runs-on: ubuntu-latest
permissions:
issues: write
pull-requests: write
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
steps:
@ -24,4 +27,4 @@ jobs:
pip install PyGithub
- name: Close stale issues
run: |
python scripts/stale.py
python scripts/stale.py

View File

@ -26,3 +26,11 @@ jobs:
- name: Test with pytest
run: |
make test
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.SLACK_CHANNEL_ID }}
title: 🤗 Results of transformers main tests
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}

View File

@ -31,6 +31,8 @@ jobs:
tests:
needs: check_code_quality
strategy:
# TODO: remove 'fail-fast' line once timeout issue from the Hub is solved
fail-fast: false
matrix:
python-version: ["3.8", "3.9", "3.10", "3.11"]
os: ["ubuntu-latest", "macos-12", "windows-latest"]
@ -48,6 +50,12 @@ jobs:
python -m pip install --upgrade pip
# cpu version of pytorch
pip install -e .[test]
- name: Downgrade numpy on MacOS and Windows
# TODO: remove numpy downgrade on MacOS & Windows once torch fixes numpy 2.0 issue
shell: bash
if: matrix.os == 'windows-latest' || matrix.os == 'macos-12'
run: |
pip install --force-reinstall -U "numpy<2.0.0"
- name: Test with pytest
run: |
make test

View File

@ -1,7 +1,5 @@
name: torch compile tests
# see peft/tests/__init__.py
on:
workflow_dispatch:
inputs:
@ -13,31 +11,42 @@ on:
required: false
default: false
env:
RUN_SLOW: "yes"
IS_GITHUB_CI: "1"
# To be able to run tests on CUDA 12.2
NVIDIA_DISABLE_REQUIRE: "1"
jobs:
run_tests_with_compile:
runs-on: ubuntu-latest
runs-on:
group: aws-g6-4xlarge-plus
env:
PEFT_DEBUG_WITH_TORCH_COMPILE: 1
CUDA_VISIBLE_DEVICES: "0"
TEST_TYPE: "single_gpu_huggingface/peft-gpu-bnb-latest:latest"
container:
image: "huggingface/peft-gpu-bnb-latest:latest"
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
defaults:
run:
shell: bash
steps:
- uses: actions/checkout@v4
with:
ref: ${{ github.event.inputs.branch }}
repository: ${{ github.event.pull_request.head.repo.full_name }}
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.10"
cache: "pip"
cache-dependency-path: "setup.py"
- name: Install dependencies
- name: Pip install
run: |
python -m pip install --upgrade pip
python -m pip install .[test]
source activate peft
pip install -e . --no-deps
pip install pytest-cov pytest-reportlog parameterized datasets scipy einops
pip install "pytest>=7.2.0,<8.0.0" # see: https://github.com/huggingface/transformers/blob/ce4fff0be7f6464d713f7ac3e0bbaafbc6959ae5/setup.py#L148C6-L148C26
if [ "${{ github.event.inputs.pytorch_nightly }}" = "true" ]; then
python -m pip install --upgrade --pre torch --index-url https://download.pytorch.org/whl/nightly/cpu
fi
- name: Test compile with pytest
run: |
source activate peft
echo "PEFT_DEBUG_WITH_TORCH_COMPILE=$PEFT_DEBUG_WITH_TORCH_COMPILE"
git status
make test
make tests_torch_compile

15
.github/workflows/trufflehog.yml vendored Normal file
View File

@ -0,0 +1,15 @@
on:
push:
name: Secret Leaks
jobs:
trufflehog:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Secret Scanning
uses: trufflesecurity/trufflehog@main

View File

@ -1,13 +1,13 @@
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.2.1
rev: v0.6.1
hooks:
- id: ruff
args:
- --fix
- id: ruff-format
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
rev: v4.6.0
hooks:
- id: check-merge-conflict
- id: check-yaml

View File

@ -6,13 +6,13 @@ check_dirs := src tests examples docs scripts docker
# this target runs checks on all files
quality:
ruff $(check_dirs)
ruff check $(check_dirs)
ruff format --check $(check_dirs)
doc-builder style src/peft tests docs/source --max_len 119 --check_only
# Format source code automatically and check is there are any problems left that need manual fixing
style:
ruff $(check_dirs) --fix
ruff check --fix $(check_dirs)
ruff format $(check_dirs)
doc-builder style src/peft tests docs/source --max_len 119
@ -47,9 +47,15 @@ tests_core_multi_gpu_bnb:
tests_core_single_gpu_bnb:
python -m pytest -m "single_gpu_tests and bitsandbytes" tests/test_common_gpu.py $(if $(IS_GITHUB_CI),--report-log "core_single_gpu.log",)
tests_gpu_bnb_regression:
python -m pytest tests/bnb/test_bnb_regression.py $(if $(IS_GITHUB_CI),--report-log "bnb_regression_gpu.log",)
# For testing transformers tests for bnb runners
transformers_tests:
RUN_SLOW=1 python -m pytest transformers-clone/tests/quantization/bnb $(if $(IS_GITHUB_CI),--report-log "transformers_tests.log",)
tests_regression:
python -m pytest -s --regression tests/regression/ $(if $(IS_GITHUB_CI),--report-log "regression_tests.log",)
tests_torch_compile:
python -m pytest tests/test_torch_compile.py $(if $(IS_GITHUB_CI),--report-log "compile_tests.log",)

View File

@ -37,6 +37,8 @@
title: Adapter injection
- local: developer_guides/mixed_models
title: Mixed adapter types
- local: developer_guides/torch_compile
title: torch.compile
- local: developer_guides/contributing
title: Contribute to PEFT
- local: developer_guides/troubleshooting
@ -88,6 +90,8 @@
title: LoKr
- local: package_reference/lora
title: LoRA
- local: package_reference/xlora
title: X-LoRA
- local: package_reference/adapter_utils
title: LyCORIS
- local: package_reference/multitask_prompt_tuning
@ -108,12 +112,16 @@
title: Layernorm tuning
- local: package_reference/vera
title: VeRA
- local: package_reference/helpers
title: Helpers
- local: package_reference/fourierft
title: FourierFT
- local: package_reference/vblora
title: VB-LoRA
title: Adapters
- sections:
- local: package_reference/merge_utils
title: Model merge
- local: package_reference/helpers
title: Helpers
title: Utilities
title: API reference

View File

@ -94,7 +94,7 @@ accelerate launch --config_file "configs/deepspeed_config.yaml" train.py \
--logging_steps 5 \
--log_level "info" \
--logging_strategy "steps" \
--evaluation_strategy "epoch" \
--eval_strategy "epoch" \
--save_strategy "epoch" \
--push_to_hub \
--hub_private_repo True \
@ -217,7 +217,7 @@ accelerate launch --config_file "configs/deepspeed_config_z3_qlora.yaml" train.
--logging_steps 5 \
--log_level "info" \
--logging_strategy "steps" \
--evaluation_strategy "epoch" \
--eval_strategy "epoch" \
--save_strategy "epoch" \
--push_to_hub \
--hub_private_repo True \

View File

@ -74,7 +74,7 @@ accelerate launch --config_file "configs/fsdp_config.yaml" train.py \
--logging_steps 5 \
--log_level "info" \
--logging_strategy "steps" \
--evaluation_strategy "epoch" \
--eval_strategy "epoch" \
--save_strategy "epoch" \
--push_to_hub \
--hub_private_repo True \
@ -218,7 +218,7 @@ accelerate launch --config_file "configs/fsdp_config_qlora.yaml" train.py \
--logging_steps 5 \
--log_level "info" \
--logging_strategy "steps" \
--evaluation_strategy "epoch" \
--eval_strategy "epoch" \
--save_strategy "epoch" \
--push_to_hub \
--hub_private_repo True \
@ -249,7 +249,7 @@ accelerate launch --config_file "configs/fsdp_config_qlora.yaml" train.py \
--bnb_4bit_quant_storage_dtype "bfloat16"
```
Notice the new argument being passed, `bnb_4bit_quant_storage_dtype`, which denotes the data type for packing the 4-bit parameters. For example, when it is set to `bfloat16`, **32/4 = 8** 4-bit params are packed together post quantization. When using mixed precision training with `bfloat16`, `bnb_4bit_quant_storage_dtype` can be either `bfloat16` for pure `bfloat16` finetuning, or `float32` for automatic mixed precision (this consumes more GPU memory). When using mixed precision training with `float16`, `bnb_4bit_quant_storage_dtype` should be set to `float32` for stable automatic mixed precision training.
Notice the new argument being passed, `bnb_4bit_quant_storage_dtype`, which denotes the data type for packing the 4-bit parameters. For example, when it is set to `bfloat16`, **16/4 = 4** 4-bit params are packed together post quantization. When using mixed precision training with `bfloat16`, `bnb_4bit_quant_storage_dtype` can be either `bfloat16` for pure `bfloat16` finetuning, or `float32` for automatic mixed precision (this consumes more GPU memory). When using mixed precision training with `float16`, `bnb_4bit_quant_storage_dtype` should be set to `float32` for stable automatic mixed precision training.
In terms of training code, the important code changes are:
@ -288,4 +288,5 @@ You can also refer the [llama-recipes](https://github.com/facebookresearch/llama
1. Merging when using PEFT and FSDP is currently unsupported and will raise error.
2. Passing `modules_to_save` config parameter to is untested at present.
3. GPU Memory saving when using CPU Offloading is untested at present.
4. When using FSDP+QLoRA, `paged_adamw_8bit` currently results in an error when saving a checkpoint.
4. When using FSDP+QLoRA, `paged_adamw_8bit` currently results in an error when saving a checkpoint.
5. DoRA training with FSDP should work (albeit at lower speed than LoRA). If combined with bitsandbytes (QDoRA), 4-bit quantization should also work, but 8-bit quantization has known issues and is not recommended.

View File

@ -50,6 +50,18 @@ In principle, LoRA can be applied to any subset of weight matrices in a neural n
</div>
<small><a href="https://hf.co/papers/2103.10385">Navigating Text-To-Image Customization: From LyCORIS Fine-Tuning to Model Evaluation</a></small>
## Mixture of LoRA Experts (X-LoRA)
[X-LoRA](https://arxiv.org/abs/2402.07148) is a mixture of experts method for LoRA which works by using dense or sparse gating to dynamically activate LoRA experts. The LoRA experts as well as the base model are frozen during training, resulting in a low parameter count as only the gating layers must be trained. In particular, the gating layers output scalings which (depending on config) are granular on the layer and token level. Additionally, during inference, X-LoRA dynamically activates LoRA adapters to recall knowledge and effectively mix them:
The below graphic demonstrates how the scalings change for different prompts for each token. This highlights the activation of different adapters as the generation progresses and the sequence creates new context.
![Token-by-token scalings](https://github.com/EricLBuehler/xlora/raw/master/res/token_by_token_scalings.gif)
For each step, X-LoRA requires the base model to be run twice: first, to get hidden states without any LoRA adapters, and secondly, the hidden states are used to calculate scalings which are applied to the LoRA adapters and the model is run a second time. The output of the second run is the result of the model step.
Ultimately, X-LoRA allows the model to reflect upon it's knowledge because of the dual forward pass scheme, and dynamically reconfigure the architecture.
## Low-Rank Hadamard Product (LoHa)
Low-rank decomposition can impact performance because the weight updates are limited to the low-rank space, which can constrain a model's expressiveness. However, you don't necessarily want to use a larger rank because it increases the number of trainable parameters. To address this, [LoHa](https://huggingface.co/papers/2108.06098) (a method originally developed for computer vision) was applied to diffusion models where the ability to generate diverse images is an important consideration. LoHa should also work with general model types, but the embedding layers aren't currently implemented in PEFT.

View File

@ -64,9 +64,9 @@ Take a look at [P-tuning for sequence classification](../task_guides/ptuning-seq
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/mpt.png"/>
</div>
<small><a href="https://hf.co/papers/2103.10385">Multitask prompt tuning enables parameter-efficient transfer learning</a>.</small>
<small><a href="https://hf.co/papers/2303.02861">Multitask prompt tuning enables parameter-efficient transfer learning</a>.</small>
[Multitask prompt tuning (MPT)](https://hf.co/papers/2103.10385) learns a single prompt from data for multiple task types that can be shared for different target tasks. Other existing approaches learn a separate soft prompt for each task that need to be retrieved or aggregated for adaptation to target tasks. MPT consists of two stages:
[Multitask prompt tuning (MPT)](https://hf.co/papers/2303.02861) learns a single prompt from data for multiple task types that can be shared for different target tasks. Other existing approaches learn a separate soft prompt for each task that need to be retrieved or aggregated for adaptation to target tasks. MPT consists of two stages:
1. source training - for each task, its soft prompt is decomposed into task-specific vectors. The task-specific vectors are multiplied together to form another matrix W, and the Hadamard product is used between W and a shared prompt matrix P to generate a task-specific prompt matrix. The task-specific prompts are distilled into a single prompt matrix that is shared across all tasks. This prompt is trained with multitask training.
2. target adaptation - to adapt the single prompt for a target task, a target prompt is initialized and expressed as the Hadamard product of the shared prompt matrix and the task-specific low-rank prompt matrix.

View File

@ -238,3 +238,73 @@ peft_model.print_trainable_parameters()
```python
print(peft_model.targeted_module_names)
```
## Unsupported module types
Methods like LoRA only work if the target modules are supported by PEFT. For example, it's possible to apply LoRA to `nn.Linear` and `nn.Conv2d` layers, but not, for instance, to `nn.LSTM`. If you find a layer class you want to apply PEFT to is not supported, you can:
- define a custom mapping to dynamically dispatch custom modules in LoRA
- open an [issue](https://github.com/huggingface/peft/issues) and request the feature where maintainers will implement it or guide you on how to implement it yourself if demand for this module type is sufficiently high
### Experimental support for dynamic dispatch of custom modules in LoRA
> [!WARNING]
> This feature is experimental and subject to change, depending on its reception by the community. We will introduce a public and stable API if there is significant demand for it.
PEFT supports an experimental API for custom module types for LoRA. Let's assume you have a LoRA implementation for LSTMs. Normally, you would not be able to tell PEFT to use it, even if it would theoretically work with PEFT. However, this is possible with dynamic dispatch of custom layers.
The experimental API currently looks like this:
```python
class MyLoraLSTMLayer:
...
base_model = ... # load the base model that uses LSTMs
# add the LSTM layer names to target_modules
config = LoraConfig(..., target_modules=["lstm"])
# define a mapping from base layer type to LoRA layer type
custom_module_mapping = {nn.LSTM: MyLoraLSTMLayer}
# register the new mapping
config._register_custom_module(custom_module_mapping)
# after registration, create the PEFT model
peft_model = get_peft_model(base_model, config)
# do training
```
<Tip>
When you call [`get_peft_model`], you will see a warning because PEFT does not recognize the targeted module type. In this case, you can ignore this warning.
</Tip>
By supplying a custom mapping, PEFT first checks the base model's layers against the custom mapping and dispatches to the custom LoRA layer type if there is a match. If there is no match, PEFT checks the built-in LoRA layer types for a match.
Therefore, this feature can also be used to override existing dispatch logic, e.g. if you want to use your own LoRA layer for `nn.Linear` instead of using the one provided by PEFT.
When creating your custom LoRA module, please follow the same rules as the [existing LoRA modules](https://github.com/huggingface/peft/blob/main/src/peft/tuners/lora/layer.py). Some important constraints to consider:
- The custom module should inherit from `nn.Module` and `peft.tuners.lora.layer.LoraLayer`.
- The `__init__` method of the custom module should have the positional arguments `base_layer` and `adapter_name`. After this, there are additional `**kwargs` that you are free to use or ignore.
- The learnable parameters should be stored in an `nn.ModuleDict` or `nn.ParameterDict`, where the key corresponds to the name of the specific adapter (remember that a model can have more than one adapter at a time).
- The name of these learnable parameter attributes should start with `"lora_"`, e.g. `self.lora_new_param = ...`.
- Some methods are optional, e.g. you only need to implement `merge` and `unmerge` if you want to support weight merging.
Currently, the information about the custom module does not persist when you save the model. When loading the model, you have to register the custom modules again.
```python
# saving works as always and includes the parameters of the custom modules
peft_model.save_pretrained(<model-path>)
# loading the model later:
base_model = ...
# load the LoRA config that you saved earlier
config = LoraConfig.from_pretrained(<model-path>)
# register the custom module again, the same way as the first time
custom_module_mapping = {nn.LSTM: MyLoraLSTMLayer}
config._register_custom_module(custom_module_mapping)
# pass the config instance to from_pretrained:
peft_model = PeftModel.from_pretrained(model, tmp_path / "lora-custom-module", config=config)
```
If you use this feature and find it useful, or if you encounter problems, let us know by creating an issue or a discussion on GitHub. This allows us to estimate the demand for this feature and add a public API if it is sufficiently high.

View File

@ -9,7 +9,7 @@ Unless required by applicable law or agreed to in writing, software distributed
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
⚠️ Note that this file is in Markdown but contains specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
@ -54,6 +54,15 @@ lora_config = LoraConfig(init_lora_weights="pissa_niter_[number of iters]", ...)
```
For detailed instruction on using PiSSA, please follow [these instructions](https://github.com/fxmeng/peft/tree/main/examples/pissa_finetuning).
### OLoRA
[OLoRA](https://arxiv.org/abs/2406.01775) utilizes QR decomposition to initialize the LoRA adapters. OLoRA translates the base weights of the model by a factor of their QR decompositions, i.e., it mutates the weights before performing any training on them. This approach significantly improves stability, accelerates convergence speed, and ultimately achieves superior performance.
You just need to pass a single additional option to use OLoRA:
```python
from peft import LoraConfig
config = LoraConfig(init_lora_weights="olora", ...)
```
For more advanced usage, please refer to our [documentation](https://github.com/huggingface/peft/tree/main/examples/olora_finetuning).
### LoftQ
#### Standard approach
@ -62,7 +71,7 @@ When quantizing the base model for QLoRA training, consider using the [LoftQ ini
In general, for LoftQ to work best, it is recommended to target as many layers with LoRA as possible, since those not targeted cannot have LoftQ applied. This means that passing `LoraConfig(..., target_modules="all-linear")` will most likely give the best results. Also, you should use `nf4` as quant type in your quantization config when using 4bit quantization, i.e. `BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4")`.
#### A more convienient way
#### A more convenient way
An easier but more limited way to apply LoftQ initialization is to use the convenience function `replace_lora_weights_loftq`. This takes the quantized PEFT model as input and replaces the LoRA weights in-place with their LoftQ-initialized counterparts.
@ -113,9 +122,25 @@ from peft import LoraConfig
config = LoraConfig(use_dora=True, ...)
```
If parts of the model or the DoRA adapter are offloaded to CPU you can get a significant speedup at the cost of some temporary (ephemeral) VRAM overhead by using `ephemeral_gpu_offload=True` in `config.runtime_config`.
```py
from peft import LoraConfig, LoraRuntimeConfig
config = LoraConfig(use_dora=True, runtime_config=LoraRuntimeConfig(ephemeral_gpu_offload=True), ...)
```
A `PeftModel` with a DoRA adapter can also be loaded with `ephemeral_gpu_offload=True` flag using the `from_pretrained` method as well as the `load_adapter` method.
```py
from peft import PeftModel
model = PeftModel.from_pretrained(base_model, peft_model_id, ephemeral_gpu_offload=True)
```
#### Caveats
- DoRA only supports linear and Conv2d layers at the momement.
- DoRA only supports linear and Conv2d layers at the moment.
- DoRA introduces a bigger overhead than pure LoRA, so it is recommended to merge weights for inference, see [`LoraModel.merge_and_unload`].
- DoRA should work with weights quantized with bitsandbytes ("QDoRA"). However, issues have been reported when using QDoRA with DeepSpeed Zero2.
@ -135,15 +160,56 @@ An approach used to improve the performance of models is to expand a model by du
config = LoraConfig(layer_replication=[[0,4], [2,5]], ...)
```
Assuming the original model had 5 layers `[0, 1, 2 ,3, 4]`, this would create a model with 7 layers arranged as `[0, 1, 2, 3, 2, 3, 4]`. This follows the [mergekit](https://github.com/arcee-ai/mergekit) pass through merge convention where sequences of layers specified as start inclusive and end exclusive tuples are stacked to build the final model. Each layer in the final model gets its own distinct set of LoRA adpaters.
Assuming the original model had 5 layers `[0, 1, 2 ,3, 4]`, this would create a model with 7 layers arranged as `[0, 1, 2, 3, 2, 3, 4]`. This follows the [mergekit](https://github.com/arcee-ai/mergekit) pass through merge convention where sequences of layers specified as start inclusive and end exclusive tuples are stacked to build the final model. Each layer in the final model gets its own distinct set of LoRA adapters.
[Fewshot-Metamath-OrcaVicuna-Mistral-10B](https://huggingface.co/abacusai/Fewshot-Metamath-OrcaVicuna-Mistral-10B) is an example of a model trained using this method on Mistral-7B expanded to 10B. The
[adapter_config.json](https://huggingface.co/abacusai/Fewshot-Metamath-OrcaVicuna-Mistral-10B/blob/main/adapter_config.json) shows a sample LoRA adapter config applying this method for fine-tuning.
## Merge adapters
## Optimizers
LoRA training can optionally include special purpose optimizers. Currently the only such optimizer is LoRA+.
### LoRA+ optimized LoRA
LoRA training can be optimized using [LoRA+](https://arxiv.org/abs/2402.12354), which uses different learning rates for the adapter matrices A and B, shown to increase finetuning speed by up to 2x and performance by 1-2%.
```py
from peft import LoraConfig, get_peft_model
from peft.optimizers import create_loraplus_optimizer
from transformers import Trainer
import bitsandbytes as bnb
base_model = ...
config = LoraConfig(...)
model = get_peft_model(base_model, config)
optimizer = create_loraplus_optimizer(
model=model,
optimizer_cls=bnb.optim.Adam8bit,
lr=5e-5,
loraplus_lr_ratio=16,
)
scheduler = None
...
trainer = Trainer(
...,
optimizers=(optimizer, scheduler),
)
```
## Merge LoRA weights into the base model
While LoRA is significantly smaller and faster to train, you may encounter latency issues during inference due to separately loading the base model and the LoRA adapter. To eliminate latency, use the [`~LoraModel.merge_and_unload`] function to merge the adapter weights with the base model. This allows you to use the newly merged model as a standalone model. The [`~LoraModel.merge_and_unload`] function doesn't keep the adapter weights in memory.
Below is a diagram that explains the intuition of LoRA adapter merging:
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/lora_diagram.png"/>
</div>
We show in the snippets below how to run that using PEFT.
```py
from transformers import AutoModelForCausalLM
from peft import PeftModel
@ -258,7 +324,7 @@ model.delete_adapter("dpo")
Normally, each inference batch has to use the same adapter(s) in PEFT. This can sometimes be annoying, because we may have batches that contain samples intended to be used with different LoRA adapters. For example, we could have a base model that works well in English and two more LoRA adapters, one for French and one for German. Usually, we would have to split our batches such that each batch only contains samples of one of the languages, we cannot combine different languages in the same batch.
Thankfully, it is possible to mix different LoRA adapters in the same batch using the `adapter_name` argument. Below, we show an examle of how this works in practice. First, let's load the base model, English, and the two adapters, French and German, like this:
Thankfully, it is possible to mix different LoRA adapters in the same batch using the `adapter_name` argument. Below, we show an example of how this works in practice. First, let's load the base model, English, and the two adapters, French and German, like this:
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
@ -303,6 +369,8 @@ output = peft_model.generate(**inputs, adapter_names=adapter_names, max_new_toke
Note that the order does not matter here, i.e. the samples in the batch don't need to be grouped by adapter as in the example above. We just need to ensure that the `adapter_names` argument is aligned correctly with the samples.
Additionally, the same approach also works with the `modules_to_save` feature, which allows for saving and reusing specific neural network layers, such as custom heads for classification tasks, across different LoRA adapters.
### Caveats
Using this features has some drawbacks, namely:
@ -312,6 +380,7 @@ Using this features has some drawbacks, namely:
- You cannot pass `adapter_names` when some adapter weights where merged with base weight using the `merge_adapter` method. Please unmerge all adapters first by calling `model.unmerge_adapter()`.
- For obvious reasons, this cannot be used after calling `merge_and_unload()`, since all the LoRA adapters will be merged into the base weights in this case.
- This feature does not currently work with DoRA, so set `use_dora=False` in your `LoraConfig` if you want to use it.
- The `modules_to_save` feature is currently only supported for the layers of types `Linear`, `Embedding`, `Conv2d` and `Conv1d`.
- There is an expected overhead for inference with `adapter_names`, especially if the amount of different adapters in the batch is high. This is because the batch size is effectively reduced to the number of samples per adapter. If runtime performance is your top priority, try the following:
- Increase the batch size.
- Try to avoid having a large number of different adapters in the same batch, prefer homogeneous batches. This can be achieved by buffering samples with the same adapter and only perform inference with a small handfull of different adapters.

View File

@ -25,6 +25,8 @@ Check the table below to see when you should inject adapters.
| the model is modified inplace, keeping all the original attributes and methods | manually write the `from_pretrained` and `save_pretrained` utility functions from Hugging Face to save and load adapters |
| works for any `torch` module and modality | doesn't work with any of the utility methods provided by `PeftModel` such as disabling and merging adapters |
## Creating a new PEFT model
To perform the adapter injection, use the [`inject_adapter_in_model`] method. This method takes 3 arguments, the PEFT config, the model, and an optional adapter name. You can also attach multiple adapters to the model if you call [`inject_adapter_in_model`] multiple times with different adapter names.
For example, to inject LoRA adapters into the `linear` submodule of the `DummyModel` module:
@ -85,6 +87,8 @@ DummyModel(
)
```
## Saving the model
To only save the adapter, use the [`get_peft_model_state_dict`] function:
```python
@ -95,3 +99,28 @@ print(peft_state_dict)
```
Otherwise, `model.state_dict()` returns the full state dict of the model.
## Loading the model
After loading the saved `state_dict`, it can be applied using the [`set_peft_model_state_dict`] function:
```python
from peft import set_peft_model_state_dict
model = DummyModel()
model = inject_adapter_in_model(lora_config, model)
outcome = set_peft_model_state_dict(model, peft_state_dict)
# check that there were no wrong keys
print(outcome.unexpected_keys)
```
If injecting the adapter is slow or you need to load a large number of adapters, you may use an optimization that allows to create an "empty" adapter on meta device and only fills the weights with real weights when the [`set_peft_model_state_dict`] is called. To do this, pass `low_cpu_mem_usage=True` to both [`inject_adapter_in_model`] and [`set_peft_model_state_dict`].
```python
model = DummyModel()
model = inject_adapter_in_model(lora_config, model, low_cpu_mem_usage=True)
print(model.linear.lora_A["default"].weight.device.type == "meta") # should be True
set_peft_model_state_dict(model, peft_state_dict, low_cpu_mem_usage=True)
print(model.linear.lora_A["default"].weight.device.type == "cpu") # should be True
```

View File

@ -138,3 +138,20 @@ print(tokenizer.decode(outputs[0]))
</hfoption>
</hfoptions>
## Merging (IA)³ Models
The (IA)³ models facilitate linear merging of adapters. To merge adapters in an (IA)³ model, utilize the `add_weighted_adapter` method from the `IA3Model` class. This method is analogous to the `add_weighted_adapter` method used in `LoraModel`, with the key difference being the absence of the `combination_type` parameter. For example, to merge three (IA)³ adapters into a PEFT model, you would proceed as follows:
```py
adapters = ["adapter1", "adapter2", "adapter3"]
weights = [0.4, 0.3, 0.3]
adapter_name = "merge"
model.add_weighted_adapter(adapters, weights, adapter_name)
```
It is recommended that the weights sum to 1.0 to preserve the scale of the model. The merged model can then be set as the active model using the `set_adapter` method:
```py
model.set_adapter("merge")
```

View File

@ -168,13 +168,11 @@ model = get_peft_model(model, config)
The models that is quantized using Half-Quadratic Quantization of Large Machine Learning Models ([HQQ](https://mobiusml.github.io/hqq_blog/)) support LoRA adapter tuning. To tune the quantized model, you'll need to install the `hqq` library with: `pip install hqq`.
```py
```python
from hqq.engine.hf import HQQModelForCausalLM
quantized_model = HQQModelForCausalLM.from_quantized(save_dir_or_hfhub, device='cuda')
peft_config = LoraConfig(...)
quantized_model = get_peft_model(quantized_model, peft_config)
```
@ -184,11 +182,8 @@ Or using transformers version that is compatible with HQQ (e.g. by installing it
from transformers import HqqConfig, AutoModelForCausalLM
quant_config = HqqConfig(nbits=4, group_size=64)
quantized_model = AutoModelForCausalLM.from_pretrained(save_dir_or_hfhub, device='cuda', quantization_config=quant_config)
quantized_model = AutoModelForCausalLM.from_pretrained(save_dir_or_hfhub, device_map=device_map, quantization_config=quant_config)
peft_config = LoraConfig(...)
quantized_model = get_peft_model(quantized_model, peft_config)
```

View File

@ -0,0 +1,76 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# torch.compile
In PEFT, [torch.compile](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) works for some but not all features. The reason why it won't always work is because PEFT is highly dynamic in certain places (loading and switching between multiple adapters, for instance), which can cause trouble for `torch.compile`. In other places, `torch.compile` may work, but won't be as fast as expected because of graph breaks.
If you don't see an error, it doesn't necessarily mean that `torch.compile` worked correctly. It might give you an output, but the output is incorrect. This guide describes what works with `torch.compile` and what doesn't.
> [!TIP]
> Unless indicated otherwise, the default `torch.compile` settings were used.
## Training and inference with `torch.compile`
These features **work** with `torch.compile`. Everything listed below was tested with a causal LM:
- Training with `Trainer` from 🤗 transformers
- Training with a custom PyTorch loop
- Inference
- Generation
The following adapters were tested successfully:
- AdaLoRA
- BOFT
- IA³
- Layer Norm Tuning
- LoHa
- LoRA
- LoRA + DoRA
- OFT
- VeRA
- HRA
The following adapters **don't work** correctly for training or inference when using `torch.compile`:
- LoKr
- LoRA targeting embedding layers
## Advanced PEFT features with `torch.compile`
Below are some of the more advanced PEFT features that **work**. They were all tested with LoRA.
- `modules_to_save` (i.e. `config = LoraConfig(..., modules_to_save=...)`)
- Merging adapters (one or multiple)
- Merging multiple adapters into one adapter (i.e. calling `model.add_weighted_adapter(...)`)
Generally, we can expect that if a feature works correctly with LoRA and is also supported by other adapter types, it should also work for that adapter type.
The more advanced PEFT features below **don't work** in conjunction with `torch.compile`. Tests were run with LoRA:
- Using PEFT adapters with quantization (bitsandbytes)
- Inference with multiple adapters
- Unloading (i.e. calling `model.merge_and_unload()`)
- Disabling adapters (i.e. using `with model.disable_adapter()`)
- Mixed adapter batches (i.e. calling `model(batch, adapter_names=["__base__", "default", "other", ...])`)
## Test cases
All the use cases listed above are tested inside of [`peft/tests/test_torch_compile.py`](https://github.com/huggingface/peft/blob/main/tests/test_torch_compile.py). If you want to check in more detail how we tested a certain feature, please go to that file and check the test that corresponds to your use case.
> [!TIP]
> If you have another use case where you know that `torch.compile` does or does not work with PEFT, please contribute by letting us know or by opening a PR to add this use case to the covered test cases.

View File

@ -69,6 +69,12 @@ trainer = Trainer(model=peft_model, fp16=True, ...)
trainer.train()
```
<Tip>
Starting from PEFT verion v0.12.0, PEFT automatically promotes the dtype of adapter weights from `torch.float16` and `torch.bfloat16` to `torch.float32` where appropriate. To _prevent_ this behavior, you can pass `autocast_adapter_dtype=False` to [`~get_peft_model`], to [`~PeftModel.from_pretrained`], and to [`~PeftModel.load_adapter`].
</Tip>
## Bad results from a loaded PEFT model
There can be several reasons for getting a poor result from a loaded PEFT model which are listed below. If you're still unable to troubleshoot the problem, see if anyone else had a similar [issue](https://github.com/huggingface/peft/issues) on GitHub, and if you can't find any, open a new issue.
@ -208,6 +214,7 @@ It is possible to get this information for non-PEFT models if they are using PEF
>>> pipe = StableDiffusionPipeline.from_pretrained(path, torch_dtype=torch.float16)
>>> pipe.load_lora_weights(lora_id, adapter_name="adapter-1")
>>> pipe.load_lora_weights(lora_id, adapter_name="adapter-2")
>>> pipe.set_lora_device(["adapter-2"], "cuda")
>>> get_layer_status(pipe.text_encoder)
[TunerLayerStatus(name='text_model.encoder.layers.0.self_attn.k_proj',
module_type='lora.Linear',
@ -215,14 +222,15 @@ It is possible to get this information for non-PEFT models if they are using PEF
active_adapters=['adapter-2'],
merged_adapters=[],
requires_grad={'adapter-1': False, 'adapter-2': True},
available_adapters=['adapter-1', 'adapter-2']),
available_adapters=['adapter-1', 'adapter-2'],
devices={'adapter-1': ['cpu'], 'adapter-2': ['cuda']}),
TunerLayerStatus(name='text_model.encoder.layers.0.self_attn.v_proj',
module_type='lora.Linear',
enabled=True,
active_adapters=['adapter-2'],
merged_adapters=[],
requires_grad={'adapter-1': False, 'adapter-2': True},
available_adapters=['adapter-1', 'adapter-2']),
devices={'adapter-1': ['cpu'], 'adapter-2': ['cuda']}),
...]
>>> get_model_status(pipe.unet)
@ -238,5 +246,41 @@ TunerModelStatus(
merged_adapters=[],
requires_grad={'adapter-1': False, 'adapter-2': True},
available_adapters=['adapter-1', 'adapter-2'],
devices={'adapter-1': ['cpu'], 'adapter-2': ['cuda']},
)
```
## Speed
### Loading adapter weights is slow
Loading adapters like LoRA weights should generally be fast compared to loading the base model. However, there can be use cases where the adapter weights are quite large or where users need to load a large number of adapters -- the loading time can add up in this case. The reason for this is that the adapter weights are first initialized and then overridden by the loaded weights, which is wasteful. To speed up the loading time, you can pass the `low_cpu_mem_usage=True` argument to [`~PeftModel.from_pretrained`] and [`~PeftModel.load_adapter`].
<Tip>
If this option works well across different use casese, it may become the default for adapter loading in the future.
</Tip>
## Reproducibility
### Models using batch norm
When loading a trained PEFT model where the base model uses batch norm (e.g. `torch.nn.BatchNorm1d` or `torch.nn.BatchNorm2d`), you may find that you cannot reproduce the exact same outputs. This is because the batch norm layers keep track of running stats during training, but these stats are not part of the PEFT checkpoint. Therefore, when you load the PEFT model, the running stats of the base model will be used (i.e. from before training with PEFT).
Depending on your use case, this may not be a big deal. If, however, you need your outputs to be 100% reproducible, you can achieve this by adding the batch norm layers to `modules_to_save`. Below is an example of this using resnet and LoRA. Notice that we set `modules_to_save=["classifier", "normalization"]`. We need the `"classifier"` argument because our task is image classification, and we add the `"normalization"` argument to ensure that the batch norm layers are saved in the PEFT checkpoint.
```python
from transformers import AutoModelForImageClassification
from peft import LoraConfig, get_peft_model
model_id = "microsoft/resnet-18"
base_model = AutoModelForImageClassification.from_pretrained(self.model_id)
config = LoraConfig(
target_modules=["convolution"],
modules_to_save=["classifier", "normalization"],
),
```
Depending on the type of model you use, the batch norm layers could have different names than `"normalization"`, so please ensure that the name matches your model architecture.

View File

@ -0,0 +1,38 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# FourierFT: Discrete Fourier Transformation Fine-Tuning
[FourierFT](https://huggingface.co/papers/2405.03003) is a parameter-efficient fine-tuning technique that leverages Discrete Fourier Transform to compress the model's tunable weights. This method outperforms LoRA in the GLUE benchmark and common ViT classification tasks using much less parameters.
FourierFT currently has the following constraints:
- Only `nn.Linear` layers are supported.
- Quantized layers are not supported.
If these constraints don't work for your use case, consider other methods instead.
The abstract from the paper is:
> Low-rank adaptation (LoRA) has recently gained much interest in fine-tuning foundation models. It effectively reduces the number of trainable parameters by incorporating low-rank matrices A and B to represent the weight change, i.e., Delta W=BA. Despite LoRA's progress, it faces storage challenges when handling extensive customization adaptations or larger base models. In this work, we aim to further compress trainable parameters by enjoying the powerful expressiveness of the Fourier transform. Specifically, we introduce FourierFT, which treats Delta W as a matrix in the spatial domain and learns only a small fraction of its spectral coefficients. With the trained spectral coefficients, we implement the inverse discrete Fourier transform to recover Delta W. Empirically, our FourierFT method shows comparable or better performance with fewer parameters than LoRA on various tasks, including natural language understanding, natural language generation, instruction tuning, and image classification. For example, when performing instruction tuning on the LLaMA2-7B model, FourierFT surpasses LoRA with only 0.064M trainable parameters, compared to LoRA's 33.5M.
## FourierFTConfig
[[autodoc]] tuners.fourierft.config.FourierFTConfig
## FourierFTModel
[[autodoc]] tuners.fourierft.model.FourierFTModel

View File

@ -2,7 +2,7 @@
rendered properly in your Markdown viewer.
-->
# Document Title
# Helper methods
A collection of helper functions for PEFT.
@ -10,3 +10,8 @@ A collection of helper functions for PEFT.
[[autodoc]] helpers.check_if_peft_model
- all
## Temporarily Rescaling Adapter Scale in LoraLayer Modules
[[autodoc]] helpers.rescale_adapter_scale
- all

View File

@ -0,0 +1,40 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks
## Overview
[VB-LoRA](https://arxiv.org/abs/2405.15179) is a parameter-efficient fine-tuning technique that extends LoRA by learning a fine-grained parameter-sharing scheme at the sub-vector level, achieving significantly higher parameter efficiency. This makes VB-LoRA especially useful in scenarios where storage and transmission costs are critical. It works by decomposing low-rank matrices—from different layers and modules such as K, Q, V, and FFN—into sub-vectors, which are then globally shared through a vector bank.
The abstract from the paper is:
*As the adoption of large language models increases and the need for per-user or per-task model customization grows, the parameter-efficient fine-tuning (PEFT) methods, such as low-rank adaptation (LoRA) and its variants, incur substantial storage and transmission costs. To further reduce stored parameters, we introduce a "divide-and-share" paradigm that breaks the barriers of low-rank decomposition across matrix dimensions, modules and layers by sharing parameters globally via a vector bank. As an instantiation of the paradigm to LoRA, our proposed VB-LoRA composites all the low-rank matrices of LoRA from a shared vector bank with a differentiable top-k admixture module. VB-LoRA achieves extreme parameter efficiency while maintaining comparable or better performance compared to state-of-the-art PEFT methods. Extensive experiments demonstrate the effectiveness of VB-LoRA on natural language understanding, natural language generation, and instruction tuning tasks. When fine-tuning the Llama2-13B model, VB-LoRA only uses 0.4% of LoRA's stored parameters, yet achieves superior results.*
## Usage Tips
- VB-LoRA utilizes a sparse top-k module to learn the sharing machanism. When saving adapter parameters, you can either save only the top-k weights and their indices by setting `save_only_topk_weights = True` in `VBLoRAConfig`, or save all the trainable logits by setting it to `False`. Enabling `save_only_topk_weights = True` significantly reduces storage space; for instance, in Llama2-7B, the storage file size decreases from 308MB to 2.5MB. Note that models saved with `save_only_topk_weights = True` are intended for merging or inference only and cannot be used to resume training.
- VB-LoRA has two sets of training parameters: vector bank parameters and logit parameters. In practice, we found that logit parameters require a higher learning rate, while vector bank parameters require a lower learning rate. When using the AdamW optimizer, typical learning rates are 0.01 for logits and 0.001 for vector bank parameters.
## VBLoRAConfig
[[autodoc]] tuners.vblora.config.VBLoRAConfig
## VBLoRAModel
[[autodoc]] tuners.vblora.model.VBLoRAModel

View File

@ -20,9 +20,10 @@ rendered properly in your Markdown viewer.
When saving the adapter parameters, it's possible to eschew storing the low rank matrices by setting `save_projection=False` on the `VeraConfig`. In that case, these matrices will be restored based on the fixed random seed from the `projection_prng_key` argument. This cuts down on the size of the checkpoint, but we cannot guarantee reproducibility on all devices and for all future versions of PyTorch. If you want to ensure reproducibility, set `save_projection=True` (which is the default).
To handle different shapes of adapted layers, VeRA initializes shared A and B matrices with the largest required size for each dimension. During the forward pass, submatrices A and B for a given layer are sliced out from these shared matrices and used as described in the paper. For example, adapting two linear layers of shapes (100, 20) and (80, 50) will create A and B matrices of shapes (rank, 50) and (100, rank) respectively. Then, to adapt a layer of shape (100, 20), submatrices A and B of shapes (rank, 20) and (100, rank) will be extracted.
VeRA currently has the following constraints:
- All targeted parameters must have the same shape.
- Only `nn.Linear` layers are supported.
- Quantized layers are not supported.

View File

@ -0,0 +1,56 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# X-LoRA
Mixture of LoRA Experts ([X-LoRA](https://arxiv.org/abs/2402.07148)) is a PEFT method enabling sparse or dense mixture of LoRA experts based on a high granularity (token, layer, sequence) scalings matrix. This leverages frozen LoRA adapters and a frozen base model to drastically reduces the number of parameters that need to be fine-tuned.
A unique aspect of X-LoRA is its versatility: it can be applied to any `transformers` base model with LoRA adapters. This means that, despite the mixture of experts strategy, no changes to the model code must be made.
The below graphic demonstrates how the scalings change for different prompts for each token. This highlights the activation of different adapters as the generation progresses and the sequence creates new context.
![Token-by-token scalings](https://github.com/EricLBuehler/xlora/raw/master/res/token_by_token_scalings.gif)
The abstract from the paper is:
*We report a mixture of expert strategy to create fine-tuned large language models using a deep layer-wise token-level approach based on low-rank adaptation (LoRA). Starting with a set of pre-trained LoRA adapters, our gating strategy uses the hidden states to dynamically mix adapted layers, allowing the resulting X-LoRA model to draw upon different capabilities and create never-before-used deep layer-wise combinations to solve tasks. The design is inspired by the biological principles of universality and diversity, where neural network building blocks are reused in different hierarchical manifestations. Hence, the X-LoRA model can be easily implemented for any existing large language model (LLM) without a need for modifications of the underlying structure. We develop a tailored X-LoRA model that offers scientific capabilities including forward/inverse analysis tasks and enhanced reasoning capability, focused on biomaterial analysis, protein mechanics and design. The impact of this work include access to readily expandable and adaptable models with strong domain knowledge and the capability to integrate across areas of knowledge. Featuring experts in biology, mathematics, reasoning, bio-inspired materials, mechanics and materials, chemistry, protein biophysics, mechanics and quantum-mechanics based molecular properties, we conduct a series of physics-focused case studies. We examine knowledge recall, protein mechanics forward/inverse tasks, protein design, adversarial agentic modeling including ontological knowledge graph construction, as well as molecular design. The model is capable not only of making quantitative predictions of nanomechanical properties of proteins or quantum mechanical molecular properties, but also reasons over the results and correctly predicts likely mechanisms that explain distinct molecular behaviors.*.
Please cite X-LoRA as:
```bibtex
@article{10.1063/5.0203126,
author = {Buehler, Eric L. and Buehler, Markus J.},
title = "{X-LoRA: Mixture of low-rank adapter experts, a flexible framework for large language models with applications in protein mechanics and molecular design}",
journal = {APL Machine Learning},
volume = {2},
number = {2},
pages = {026119},
year = {2024},
month = {05},
abstract = "{We report a mixture of expert strategy to create fine-tuned large language models using a deep layer-wise token-level approach based on low-rank adaptation (LoRA). Starting with a set of pre-trained LoRA adapters, our gating strategy uses the hidden states to dynamically mix adapted layers, allowing the resulting X-LoRA model to draw upon different capabilities and create never-before-used deep layer-wise combinations to solve tasks. The design is inspired by the biological principles of universality and diversity, where neural network building blocks are reused in different hierarchical manifestations. Hence, the X-LoRA model can be easily implemented for any existing large language model without a need for modifications of the underlying structure. We develop a tailored X-LoRA model that offers scientific capabilities, including forward/inverse analysis tasks and enhanced reasoning capability, focused on biomaterial analysis, protein mechanics, and design. The impact of this work includes access to readily expandable and adaptable models with strong domain knowledge and the capability to integrate across areas of knowledge. Featuring experts in biology, mathematics, reasoning, bio-inspired materials, mechanics and materials, chemistry, protein biophysics, mechanics, and quantum-mechanics based molecular properties, we conduct a series of physics-focused case studies. We examine knowledge recall, protein mechanics forward/inverse tasks, protein design, adversarial agentic modeling including ontological knowledge graph construction, and molecular design. The model is capable not only of making quantitative predictions of nanomechanical properties of proteins or quantum mechanical molecular properties but also reasoning over the results and correctly predicting likely mechanisms that explain distinct molecular behaviors.}",
issn = {2770-9019},
doi = {10.1063/5.0203126},
url = {https://doi.org/10.1063/5.0203126},
eprint = {https://pubs.aip.org/aip/aml/article-pdf/doi/10.1063/5.0203126/19964043/026119\_1\_5.0203126.pdf},
}
```
## XLoraConfig
[[autodoc]] tuners.xlora.config.XLoraConfig
## XLoraModel
[[autodoc]] tuners.xlora.model.XLoraModel

View File

@ -76,7 +76,7 @@ training_args = TrainingArguments(
per_device_eval_batch_size=32,
num_train_epochs=2,
weight_decay=0.01,
evaluation_strategy="epoch",
eval_strategy="epoch",
save_strategy="epoch",
load_best_model_at_end=True,
)

View File

@ -20,6 +20,8 @@ A popular way to efficiently train large models is to insert (typically in the a
There are several different ways to express the weight matrix as a low-rank decomposition, but [Low-Rank Adaptation (LoRA)](../conceptual_guides/adapter#low-rank-adaptation-lora) is the most common method. The PEFT library supports several other LoRA variants, such as [Low-Rank Hadamard Product (LoHa)](../conceptual_guides/adapter#low-rank-hadamard-product-loha), [Low-Rank Kronecker Product (LoKr)](../conceptual_guides/adapter#low-rank-kronecker-product-lokr), and [Adaptive Low-Rank Adaptation (AdaLoRA)](../conceptual_guides/adapter#adaptive-low-rank-adaptation-adalora). You can learn more about how these methods work conceptually in the [Adapters](../conceptual_guides/adapter) guide. If you're interested in applying these methods to other tasks and use cases like semantic segmentation, token classification, take a look at our [notebook collection](https://huggingface.co/collections/PEFT/notebooks-6573b28b33e5a4bf5b157fc1)!
Additionally, PEFT supports the [X-LoRA](../conceptual_guides/adapter#mixture-of-lora-experts-x-lora) Mixture of LoRA Experts method.
This guide will show you how to quickly train an image classification model - with a low-rank decomposition method - to identify the class of food shown in an image.
<Tip>
@ -257,7 +259,7 @@ batch_size = 128
args = TrainingArguments(
peft_model_id,
remove_unused_columns=False,
evaluation_strategy="epoch",
eval_strategy="epoch",
save_strategy="epoch",
learning_rate=5e-3,
per_device_train_batch_size=batch_size,
@ -307,7 +309,7 @@ Let's load the model from the Hub and test it out on a food image.
```py
from peft import PeftConfig, PeftModel
from transfomers import AutoImageProcessor
from transformers import AutoImageProcessor
from PIL import Image
import requests

View File

@ -99,7 +99,7 @@ You can create your own configuration for training by initializing a [`PromptEnc
from peft import PromptEncoderConfig, TaskType
p_tuning_config = PromptEncoderConfig(
encoder_reprameterization_type="MLP",
encoder_reparameterization_type="MLP",
encoder_hidden_size=128,
num_attention_heads=16,
num_layers=24,

View File

@ -37,7 +37,7 @@ from utils.unet_2d_condition import UNet2DConditionNewModel
sys.path.append("../../src")
from peft import PeftModel
from peft import PeftModel # noqa: E402
# Will error if the minimal version of diffusers is not installed. Remove at your own risks.

View File

@ -168,7 +168,7 @@
"model = AutoModelForCausalLM.from_pretrained(\n",
" model_name,\n",
" low_cpu_mem_usage=True\n",
" # use_flash_attention_2=True, # leading to an error\n",
" # attn_implementation =\"flash_attention_2\", # leading to an error\n",
")\n",
"model.resize_token_embeddings(len(tokenizer))"
]
@ -956,7 +956,7 @@
"inference_model = AutoModelForCausalLM.from_pretrained(\n",
" model_name,\n",
" low_cpu_mem_usage=True,\n",
" # use_flash_attention_2=True,\n",
" # attn_implementation =\"flash_attention_2\",\n",
")\n",
"inference_model.resize_token_embeddings(len(tokenizer))\n",
"\n",

View File

@ -558,7 +558,7 @@
" per_device_train_batch_size=batch_size,\n",
" learning_rate=lr,\n",
" num_train_epochs=num_epochs,\n",
" evaluation_strategy=\"epoch\",\n",
" eval_strategy=\"epoch\",\n",
" logging_strategy=\"epoch\",\n",
" save_strategy=\"no\",\n",
" report_to=[],\n",

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,106 @@
# DoRA: Weight-Decomposed Low-Rank Adaptation
![dora](https://i.ytimg.com/vi/m7KQdGSr0Dg/maxresdefault.jpg)
## Introduction
[DoRA](https://arxiv.org/abs/2402.09353) is a novel approach that leverages low rank adaptation through weight decomposition analysis to investigate the inherent differences between full fine-tuning and LoRA. DoRA initially decomposes the pretrained weight into its magnitude and directional components and finetunes both of them. Because the directional component is large in terms of parameter numbers, we further decompose it with LoRA for efficient finetuning. This results in enhancing both the learning capacity and training stability of LoRA while avoiding any additional inference overhead.
## Quick start
```python
import torch
from peft import LoraConfig, get_peft_model
from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer
from datasets import load_dataset
model = AutoModelForCausalLM.from_pretrained("huggyllama/llama-7b", device_map="cuda")
tokenizer = AutoTokenizer.from_pretrained("huggyllama/llama-7b")
dataset = load_dataset("timdettmers/openassistant-guanaco", split="train")
lora_config = LoraConfig(
use_dora=True
)
peft_model = get_peft_model(model, lora_config)
trainer = transformers.Trainer(
model=peft_model,
train_dataset=dataset,
dataset_text_field="text",
max_seq_length=2048,
tokenizer=tokenizer,
)
trainer.train()
peft_model.save_pretrained("dora-llama-3-8b")
```
There is no additional change needed to your standard LoRA procedure, except for specifying `use_dora = True` option in your lora configuration.
Run the finetuning script simply by running:
```bash
python examples/dora_finetuning/dora_finetuning.py --base_model meta-llama/Meta-Llama-3-8B --data_path timdettmers/openassistant-guanaco
```
This 👆🏻 by default will load the model in peft set up with LoRA config. Now if you wanna quickly compare it with Dora, all you need to do is to input ` --use_dora` in the command line. So same above example would be 👇🏻;
```bash
python examples/dora_finetuning/dora_finetuning.py --base_model meta-llama/Meta-Llama-3-8B --data_path timdettmers/openassistant-guanaco --use_dora
```
DoRA also supports quantization. To use 4-bit quantization try:
```bash
python examples/dora_finetuning/dora_finetuning.py --base_model meta-llama/Meta-Llama-3-8B --quantize
```
Similarly, by default the LoRA layers are the attention and MLP layers of LLama model, if you get to choose a different set of layers for LoRA to be applied on, you can simply define it using:
```bash
python examples/dora_finetuning/dora_finetuning.py --lora_target_modules "q_proj,k_proj,v_proj,o_proj"
```
### Full example of the script
```bash
python dora_finetuning.py \
--base_model "PATH_TO_MODEL" \
--data_path "PATH_TO_DATASET" \
--output_dir "PATH_TO_OUTPUT_DIR" \
--batch_size 1 \
--num_epochs 3 \
--learning_rate 3e-4 \
--cutoff_len 512 \
--val_set_size 500 \
--use_dora \
--quantize \
--eval_step 10 \
--save_step 100 \
--device "cuda:0" \
--lora_r 16 \
--lora_alpha 32 \
--lora_dropout 0.05 \
--lora_target_modules "q_proj,k_proj,v_proj,o_proj" \
--hub_model_id "YOUR_HF_REPO" \
--push_to_hub
```
## Use the model on 🤗
You can load and use the model as any other 🤗 models.
```python
from transformers import AutoModel
model = AutoModel.from_pretrained("ShirinYamani/huggyllama-llama-7b-finetuned")
```
## DoRA vs. LoRA
In general, DoRA finetuning on diffusion models is still experimental and is likely to require different hyperparameter values to perform best compared to LoRA.
Specifically, people have noticed 2 differences to take into account in your training:
1. LoRA seem to converge faster than DoRA (so a set of parameters that may lead to overfitting when training a LoRA may be working well for a DoRA)
2. DoRA quality superior to LoRA especially in lower ranks: The difference in quality of DoRA of rank 8 and LoRA of rank 8 appears to be more significant than when training ranks of 32 or 64 for example.
## Citation
```
@article{liu2024dora,
title={DoRA: Weight-Decomposed Low-Rank Adaptation},
author={Liu, Shih-Yang and Wang, Chien-Yi and Yin, Hongxu and Molchanov, Pavlo and Wang, Yu-Chiang Frank and Cheng, Kwang-Ting and Chen, Min-Hung},
journal={arXiv preprint arXiv:2402.09353},
year={2024}
}
```

View File

@ -0,0 +1,200 @@
import os
import torch
from datasets import load_dataset
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
BitsAndBytesConfig,
DataCollatorWithPadding,
Trainer,
TrainingArguments,
)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
def train_model(
base_model: str,
data_path: str,
output_dir: str,
batch_size: int,
num_epochs: int,
learning_rate: float,
cutoff_len: int,
val_set_size: int,
use_dora: bool,
quantize: bool,
eval_step: int,
save_step: int,
device: str,
lora_r: int,
lora_alpha: int,
lora_dropout: float,
lora_target_modules: str,
hub_model_id: str,
push_to_hub: bool,
):
os.environ["TOKENIZERS_PARALLELISM"] = "false"
hf_token = os.getenv("HF_TOKEN")
# Setup device
device = torch.device(device)
print(f"Using device: {device}")
# load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model, token=hf_token)
# QDoRA (quantized dora): IF YOU WANNA QUANTIZE THE MODEL
if quantize:
model = AutoModelForCausalLM.from_pretrained(
base_model,
token=hf_token,
quantization_config=BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=(
torch.bfloat16 if torch.cuda.is_available() and torch.cuda.is_bf16_supported() else torch.float16
),
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
),
)
# setup for quantized training
model = prepare_model_for_kbit_training(model, use_gradient_checkpointing=True)
else:
model = AutoModelForCausalLM.from_pretrained(base_model, token=hf_token)
# LoRa config for the PEFT model
lora_config = LoraConfig(
use_dora=use_dora, # to use Dora OR compare to Lora just set the --use_dora
r=lora_r, # Rank of matrix
lora_alpha=lora_alpha,
target_modules=(
lora_target_modules.split(",")
if lora_target_modules
else ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
),
lora_dropout=lora_dropout,
bias="none",
)
# get the peft model with LoRa config
model = get_peft_model(model, lora_config)
model.to(device) # MODEL TO GPU/CUDA
tokenizer.pad_token = tokenizer.eos_token
# Load the dataset
dataset = load_dataset(data_path)
def tokenize_function(examples):
inputs = tokenizer(examples["text"], padding="max_length", truncation=True, max_length=cutoff_len)
inputs["labels"] = inputs["input_ids"].copy() # setting labels for a language modeling task
return inputs
# Tokenize the dataset and prepare for training
tokenized_datasets = dataset.map(tokenize_function, batched=True, remove_columns=dataset["train"].column_names)
# Data collator to dynamically pad the batched examples
data_collator = DataCollatorWithPadding(tokenizer)
# Define training arguments
training_args = TrainingArguments(
output_dir=output_dir,
num_train_epochs=num_epochs,
per_device_train_batch_size=batch_size,
per_device_eval_batch_size=batch_size,
warmup_steps=100,
weight_decay=0.01,
logging_dir="./logs",
logging_steps=eval_step,
save_steps=save_step,
save_total_limit=2,
push_to_hub=push_to_hub,
hub_model_id=hub_model_id,
gradient_accumulation_steps=16,
fp16=True,
learning_rate=learning_rate,
hub_token=hf_token,
)
# Clear CUDA cache to free memory
torch.cuda.empty_cache()
# Initialize the Trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_datasets["train"],
eval_dataset=tokenized_datasets["test"],
data_collator=data_collator,
)
# Start model training
trainer.train()
# Save and push the trained model and tokenizer
if push_to_hub:
# Push the main model to the hub
trainer.push_to_hub(commit_message="Fine-tuned model")
# Save the model and tokenizer locally
model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)
if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser(description="Fine-tune LLaMA with DoRA and PEFT")
parser.add_argument("--base_model", type=str, default="huggyllama/llama-7b", help="Base model path or name")
parser.add_argument(
"--data_path", type=str, default="timdettmers/openassistant-guanaco", help="Dataset path or name"
)
parser.add_argument(
"--output_dir", type=str, default="path/to/output", help="Output directory for the fine-tuned model"
)
parser.add_argument("--batch_size", type=int, default=1, help="Batch size")
parser.add_argument("--num_epochs", type=int, default=1, help="Number of training epochs")
parser.add_argument("--learning_rate", type=float, default=3e-4, help="Learning rate")
parser.add_argument("--cutoff_len", type=int, default=512, help="Cutoff length for tokenization")
parser.add_argument("--val_set_size", type=int, default=500, help="Validation set size")
parser.add_argument("--use_dora", action="store_true", help="Apply Dora")
parser.add_argument("--quantize", action="store_true", help="Use quantization")
parser.add_argument("--eval_step", type=int, default=10, help="Evaluation step interval")
parser.add_argument("--save_step", type=int, default=100, help="Save step interval")
parser.add_argument("--device", type=str, default="cuda:0", help="Device to use for training")
parser.add_argument("--lora_r", type=int, default=8, help="LoRA rank")
parser.add_argument("--lora_alpha", type=int, default=16, help="LoRA alpha")
parser.add_argument("--lora_dropout", type=float, default=0.05, help="LoRA dropout rate")
parser.add_argument(
"--lora_target_modules", type=str, default=None, help="Comma-separated list of target modules for LoRA"
)
parser.add_argument(
"--hub_model_id",
type=str,
default="path/to/repo",
help="Repository name to push the model on the Hugging Face Hub",
)
parser.add_argument("--push_to_hub", action="store_true", help="Whether to push the model to Hugging Face Hub")
args = parser.parse_args()
train_model(
base_model=args.base_model,
data_path=args.data_path,
output_dir=args.output_dir,
batch_size=args.batch_size,
num_epochs=args.num_epochs,
learning_rate=args.learning_rate,
cutoff_len=args.cutoff_len,
val_set_size=args.val_set_size,
use_dora=args.use_dora,
quantize=args.quantize,
eval_step=args.eval_step,
save_step=args.save_step,
device=args.device,
lora_r=args.lora_r,
lora_alpha=args.lora_alpha,
lora_dropout=args.lora_dropout,
lora_target_modules=args.lora_target_modules,
hub_model_id=args.hub_model_id,
push_to_hub=args.push_to_hub,
)

View File

@ -0,0 +1,103 @@
# Copyright 2024-present the HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Example script demonstrating the time difference loading a model with a DoRA using ephemeral GPU offloading vs doing it purely on the CPU.
Example outputs:
$ python load_with_dora.py
--- Loading model ---
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:04<00:00, 1.03s/it]
--- Loading PeftModel ---
--- Done ---
Model loading time: 4.83s
PeftModel loading time: 28.14s
Use ephemeral GPU offloading: False
(Note: if this was the first time you ran the script, or if your cache was cleared, the times shown above are invalid, due to the time taken to download the model and DoRA files. Just re-run the script in this case.)
$ python load_with_dora.py --ephemeral_gpu_offload
--- Loading model ---
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00, 1.11it/s]
--- Loading PeftModel ---
--- Done ---
Model loading time: 4.28s
PeftModel loading time: 16.59s
Use ephemeral GPU offloading: True
(Note: if this was the first time you ran the script, or if your cache was cleared, the times shown above are invalid, due to the time taken to download the model and DoRA files. Just re-run the script in this case.)
"""
import argparse
import time
from huggingface_hub import snapshot_download
from transformers import AutoModelForCausalLM
from peft import PeftModel
def main():
parser = argparse.ArgumentParser(description="Load a model with DoRA using ephemeral GPU offloading")
parser.add_argument("--model", type=str, default="NousResearch/Hermes-2-Pro-Mistral-7B", help="Model to load")
parser.add_argument(
"--dora",
type=str,
default="peft-internal-testing/DoRA-Hermes-2-Pro-Mistral-7B",
help="DoRA to use",
)
parser.add_argument("--ephemeral_gpu_offload", action="store_true", help="Use ephemeral GPU offloading")
parser.add_argument(
"--merge_model_path", type="str", help="Merge the model with the DoRA model and save to the given path"
)
args = parser.parse_args()
peft_model_kwargs = {
"ephemeral_gpu_offload": args.ephemeral_gpu_offload,
"max_memory": {"cpu": "256GiB"},
"device_map": {"": "cpu"},
}
# Predownload
try:
snapshot_download(repo_id=args.model)
except Exception as e:
print(f"Failed to download model: {e}")
# We continue anyway as this might be e.g. a local directory or something
try:
snapshot_download(repo_id=args.dora)
except Exception as e:
print(f"Failed to download DoRA: {e}")
# We continue anyway as this might be e.g. a local directory or something
start = time.perf_counter()
print("--- Loading model ---")
model = AutoModelForCausalLM.from_pretrained(args.model)
model_time = time.perf_counter() - start
print("--- Loading PeftModel ---")
peft_model = PeftModel.from_pretrained(model, args.dora, **peft_model_kwargs)
print("--- Done ---")
peft_model_time = time.perf_counter() - start
print(f"Model loading time: {model_time:.2f}s")
print(f"PeftModel loading time: {peft_model_time:.2f}s")
print(f"Use ephemeral GPU offloading: {args.ephemeral_gpu_offload}")
if args.merge_model_path is not None:
merged_model = peft_model.merge_and_unload(progressbar=True)
merged_model.save_pretrained(args.merge_model_path)
if __name__ == "__main__":
main()

View File

@ -194,6 +194,8 @@ class AutoModelForSentenceEmbedding(nn.Module):
try:
return super().__getattr__(name) # defer to nn.Module's logic
except AttributeError:
if name == "model": # see #1892: prevent infinite recursion if class is not initialized
raise
return getattr(self.model, name)

View File

@ -0,0 +1,98 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# DreamBooth fine-tuning with HRA
This guide demonstrates how to use Householder reflection adaptation (HRA) method, to fine-tune Dreambooth with `stabilityai/stable-diffusion-2-1` model.
HRA provides a new perspective connecting LoRA to OFT and achieves encouraging performance in various downstream tasks.
HRA adapts a pre-trained model by multiplying each frozen weight matrix with a chain of r learnable Householder reflections (HRs).
HRA can be interpreted as either an OFT adapter or an adaptive LoRA.
Consequently, it harnesses the advantages of both strategies, reducing parameters and computation costs while penalizing the loss of pre-training knowledge.
For further details on HRA, please consult the [original HRA paper](https://arxiv.org/abs/2405.17484).
In this guide we provide a Dreambooth fine-tuning script that is available in [PEFT's GitHub repo examples](https://github.com/huggingface/peft/tree/main/examples/hra_dreambooth). This implementation is adapted from [peft's boft_dreambooth](https://github.com/huggingface/peft/tree/main/examples/boft_dreambooth).
You can try it out and fine-tune on your custom images.
## Set up your environment
Start by cloning the PEFT repository:
```bash
git clone --recursive https://github.com/huggingface/peft
```
Navigate to the directory containing the training scripts for fine-tuning Dreambooth with HRA:
```bash
cd peft/examples/hra_dreambooth
```
Set up your environment: install PEFT, and all the required libraries. At the time of writing this guide we recommend installing PEFT from source. The following environment setup should work on A100 and H100:
```bash
conda create --name peft python=3.10
conda activate peft
conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=11.8 -c pytorch -c nvidia
conda install xformers -c xformers
pip install -r requirements.txt
pip install git+https://github.com/huggingface/peft
```
## Download the data
[dreambooth](https://github.com/google/dreambooth) dataset should have been automatically cloned in the following structure when running the training script.
```
hra_dreambooth
├── data
│ └── dreambooth
│ └── dataset
│ ├── backpack
│ └── backpack_dog
│ ...
```
You can also put your custom images into `hra_dreambooth/data/dreambooth/dataset`.
## Fine-tune Dreambooth with HRA
```bash
class_idx=0
bash ./train_dreambooth.sh $class_idx
```
where the `$class_idx` corresponds to different subjects ranging from 0 to 29.
Launch the training script with `accelerate` and pass hyperparameters, as well as LoRa-specific arguments to it such as:
- `use_hra`: Enables HRA in the training script.
- `hra_r`: the number of HRs (i.e., r) across different layers, expressed in `int`.
As r increases, the number of trainable parameters increases, which generally leads to improved performance.
However, this also results in higher memory consumption and longer computation times.
Therefore, r is usually set to 8.
**Note**, please set r to an even number to avoid potential issues during initialization.
- `hra_apply_GS`: Applies Gram-Schmidt orthogonalization. Default is `false`.
- `hra_bias`: specify if the `bias` parameters should be trained. Can be `none`, `all` or `hra_only`.
If you are running this script on Windows, you may need to set the `--num_dataloader_workers` to 0.
To learn more about DreamBooth fine-tuning with prior-preserving loss, check out the [Diffusers documentation](https://huggingface.co/docs/diffusers/training/dreambooth#finetuning-with-priorpreserving-loss).
## Generate images with the fine-tuned model
To generate images with the fine-tuned model, simply run the jupyter notebook `dreambooth_inference.ipynb` for visualization with `jupyter notebook` under `./examples/hra_dreambooth`.

Binary file not shown.

After

Width:  |  Height:  |  Size: 466 KiB

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1,13 @@
transformers==4.36.2
accelerate==0.25.0
evaluate
tqdm
datasets==2.16.1
diffusers==0.17.1
Pillow
huggingface_hub
safetensors
nb_conda_kernels
ipykernel
ipywidgets
wandb==0.16.1

View File

@ -0,0 +1,609 @@
#!/usr/bin/env python
# Copyright 2024-present the HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# The implementation is based on "Bridging The Gap between Low-rank and Orthogonal
# Adaptation via Householder Reflection Adaptation" (https://arxiv.org/abs/2405.17484).
import hashlib
import itertools
import logging
import math
import os
from contextlib import nullcontext
from pathlib import Path
import datasets
import diffusers
import numpy as np
import torch
import torch.nn.functional as F
import torch.utils.checkpoint
import transformers
from accelerate import Accelerator
from accelerate.logging import get_logger
from accelerate.utils import ProjectConfiguration, set_seed
from diffusers import (
AutoencoderKL,
DDIMScheduler,
DiffusionPipeline,
DPMSolverMultistepScheduler,
UNet2DConditionModel,
)
from diffusers.optimization import get_scheduler
from diffusers.utils import check_min_version
from diffusers.utils.import_utils import is_xformers_available
from huggingface_hub import Repository
from tqdm.auto import tqdm
from transformers import AutoTokenizer
from utils.args_loader import (
get_full_repo_name,
import_model_class_from_model_name_or_path,
parse_args,
)
from utils.dataset import DreamBoothDataset, PromptDataset, collate_fn
from utils.tracemalloc import TorchTracemalloc, b2mb
from peft import HRAConfig, get_peft_model
# Will error if the minimal version of diffusers is not installed. Remove at your own risks.
check_min_version("0.16.0.dev0")
logger = get_logger(__name__)
UNET_TARGET_MODULES = ["to_q", "to_v", "to_k", "query", "value", "key", "to_out.0", "add_k_proj", "add_v_proj"]
TEXT_ENCODER_TARGET_MODULES = ["q_proj", "v_proj"]
def save_adaptor(accelerator, step, unet, text_encoder, args):
unwarpped_unet = accelerator.unwrap_model(unet)
unwarpped_unet.save_pretrained(
os.path.join(args.output_dir, f"unet/{step}"), state_dict=accelerator.get_state_dict(unet)
)
if args.train_text_encoder:
unwarpped_text_encoder = accelerator.unwrap_model(text_encoder)
unwarpped_text_encoder.save_pretrained(
os.path.join(args.output_dir, f"text_encoder/{step}"),
state_dict=accelerator.get_state_dict(text_encoder),
)
def main(args):
validation_prompts = list(filter(None, args.validation_prompt[0].split(".")))
logging_dir = Path(args.output_dir, args.logging_dir)
accelerator_project_config = ProjectConfiguration(project_dir=args.output_dir, logging_dir=logging_dir)
accelerator = Accelerator(
gradient_accumulation_steps=args.gradient_accumulation_steps,
mixed_precision=args.mixed_precision,
log_with=args.report_to if args.report_to != "none" else None,
project_dir=accelerator_project_config,
)
if args.report_to == "wandb":
import wandb
args.wandb_project_name = args.project_name
args.wandb_run_name = args.run_name
wandb_init = {
"wandb": {
"name": args.wandb_run_name,
"mode": "online",
}
}
# Currently, it's not possible to do gradient accumulation when training two models with accelerate.accumulate
# This will be enabled soon in accelerate. For now, we don't allow gradient accumulation when training two models.
# TODO (patil-suraj): Remove this check when gradient accumulation with two models is enabled in accelerate.
if args.train_text_encoder and args.gradient_accumulation_steps > 1 and accelerator.num_processes > 1:
raise ValueError(
"Gradient accumulation is not supported when training the text encoder in distributed training. "
"Please set gradient_accumulation_steps to 1. This feature will be supported in the future."
)
# Make one log on every process with the configuration for debugging.
logging.basicConfig(
format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
datefmt="%m/%d/%Y %H:%M:%S",
level=logging.INFO,
)
logger.info(accelerator.state, main_process_only=False)
if accelerator.is_local_main_process:
datasets.utils.logging.set_verbosity_warning()
transformers.utils.logging.set_verbosity_warning()
diffusers.utils.logging.set_verbosity_info()
else:
datasets.utils.logging.set_verbosity_error()
transformers.utils.logging.set_verbosity_error()
diffusers.utils.logging.set_verbosity_error()
# If passed along, set the training seed now.
global_seed = hash(args.run_name) % (2**32)
set_seed(global_seed)
# Generate class images if prior preservation is enabled.
if args.with_prior_preservation:
class_images_dir = Path(args.class_data_dir)
if not class_images_dir.exists():
class_images_dir.mkdir(parents=True)
cur_class_images = len(list(class_images_dir.iterdir()))
if cur_class_images < args.num_class_images:
torch_dtype = torch.float16 if accelerator.device.type == "cuda" else torch.float32
if args.prior_generation_precision == "fp32":
torch_dtype = torch.float32
elif args.prior_generation_precision == "fp16":
torch_dtype = torch.float16
elif args.prior_generation_precision == "bf16":
torch_dtype = torch.bfloat16
pipeline = DiffusionPipeline.from_pretrained(
args.pretrained_model_name_or_path,
torch_dtype=torch_dtype,
safety_checker=None,
revision=args.revision,
)
pipeline.set_progress_bar_config(disable=True)
num_new_images = args.num_class_images - cur_class_images
logger.info(f"Number of class images to sample: {num_new_images}.")
sample_dataset = PromptDataset(args.class_prompt, num_new_images)
sample_dataloader = torch.utils.data.DataLoader(sample_dataset, batch_size=args.sample_batch_size)
sample_dataloader = accelerator.prepare(sample_dataloader)
pipeline.to(accelerator.device)
for example in tqdm(
sample_dataloader, desc="Generating class images", disable=not accelerator.is_local_main_process
):
images = pipeline(example["prompt"]).images
for i, image in enumerate(images):
hash_image = hashlib.sha1(image.tobytes()).hexdigest()
image_filename = class_images_dir / f"{example['index'][i] + cur_class_images}-{hash_image}.jpg"
image.save(image_filename)
del pipeline
if torch.cuda.is_available():
torch.cuda.empty_cache()
# Handle the repository creation
if accelerator.is_main_process:
if args.push_to_hub:
if args.hub_model_id is None:
repo_name = get_full_repo_name(Path(args.output_dir).name, token=args.hub_token)
else:
repo_name = args.hub_model_id
repo = Repository(args.output_dir, clone_from=repo_name) # noqa: F841
with open(os.path.join(args.output_dir, ".gitignore"), "w+") as gitignore:
if "step_*" not in gitignore:
gitignore.write("step_*\n")
if "epoch_*" not in gitignore:
gitignore.write("epoch_*\n")
elif args.output_dir is not None:
os.makedirs(args.output_dir, exist_ok=True)
# Load the tokenizer
if args.tokenizer_name:
tokenizer = AutoTokenizer.from_pretrained(args.tokenizer_name, revision=args.revision, use_fast=False)
elif args.pretrained_model_name_or_path:
tokenizer = AutoTokenizer.from_pretrained(
args.pretrained_model_name_or_path,
subfolder="tokenizer",
revision=args.revision,
use_fast=False,
)
# import correct text encoder class
text_encoder_cls = import_model_class_from_model_name_or_path(args.pretrained_model_name_or_path, args.revision)
# Load scheduler and models
noise_scheduler = DDIMScheduler.from_pretrained(args.pretrained_model_name_or_path, subfolder="scheduler")
text_encoder = text_encoder_cls.from_pretrained(
args.pretrained_model_name_or_path, subfolder="text_encoder", revision=args.revision
)
vae = AutoencoderKL.from_pretrained(args.pretrained_model_name_or_path, subfolder="vae", revision=args.revision)
unet = UNet2DConditionModel.from_pretrained(
args.pretrained_model_name_or_path, subfolder="unet", revision=args.revision
)
if args.use_hra:
config = HRAConfig(
r=args.hra_r,
apply_GS=args.hra_apply_GS,
target_modules=UNET_TARGET_MODULES,
bias=args.hra_bias,
)
unet = get_peft_model(unet, config, adapter_name=args.run_name)
unet.print_trainable_parameters()
vae.requires_grad_(False)
unet.train()
if args.train_text_encoder and args.use_hra:
config = HRAConfig(
r=args.hra_r,
apply_GS=args.hra_apply_GS,
target_modules=UNET_TARGET_MODULES,
bias=args.hra_bias,
)
text_encoder = get_peft_model(text_encoder, config, adapter_name=args.run_name)
text_encoder.print_trainable_parameters()
text_encoder.train()
else:
text_encoder.requires_grad_(False)
# For mixed precision training we cast the text_encoder and vae weights to half-precision
# as these models are only used for inference, keeping weights in full precision is not required.
weight_dtype = torch.float32
if accelerator.mixed_precision == "fp16":
weight_dtype = torch.float16
elif accelerator.mixed_precision == "bf16":
weight_dtype = torch.bfloat16
# Move unet, vae and text_encoder to device and cast to weight_dtype
unet.to(accelerator.device, dtype=weight_dtype)
vae.to(accelerator.device, dtype=weight_dtype)
text_encoder.to(accelerator.device, dtype=weight_dtype)
if args.enable_xformers_memory_efficient_attention:
if is_xformers_available():
unet.enable_xformers_memory_efficient_attention()
else:
raise ValueError("xformers is not available. Make sure it is installed correctly")
if args.gradient_checkpointing:
unet.enable_gradient_checkpointing()
# below fails when using hra so commenting it out
if args.train_text_encoder and not args.use_hra:
text_encoder.gradient_checkpointing_enable()
# Enable TF32 for faster training on Ampere GPUs,
# cf https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices
if args.allow_tf32:
torch.backends.cuda.matmul.allow_tf32 = True
if args.scale_lr:
args.learning_rate = (
args.learning_rate * args.gradient_accumulation_steps * args.train_batch_size * accelerator.num_processes
)
# Use 8-bit Adam for lower memory usage or to fine-tune the model in 16GB GPUs
if args.use_8bit_adam:
try:
import bitsandbytes as bnb
except ImportError:
raise ImportError(
"To use 8-bit Adam, please install the bitsandbytes library: `pip install bitsandbytes`."
)
optimizer_class = bnb.optim.AdamW8bit
else:
optimizer_class = torch.optim.AdamW
# Optimizer creation
params_to_optimize = [param for param in unet.parameters() if param.requires_grad]
if args.train_text_encoder:
params_to_optimize += [param for param in text_encoder.parameters() if param.requires_grad]
optimizer = optimizer_class(
params_to_optimize,
lr=args.learning_rate,
betas=(args.adam_beta1, args.adam_beta2),
weight_decay=args.adam_weight_decay,
eps=args.adam_epsilon,
)
# Download the official dreambooth dataset from the official repository: https://github.com/google/dreambooth.git
data_path = os.path.join(os.getcwd(), "data", "dreambooth")
if not os.path.exists(data_path):
os.makedirs(os.path.join(os.getcwd(), "data"), exist_ok=True)
os.system(f"git clone https://github.com/google/dreambooth.git '{data_path}'")
# Dataset and DataLoaders creation:
train_dataset = DreamBoothDataset(
instance_data_root=args.instance_data_dir,
instance_prompt=args.instance_prompt,
class_data_root=args.class_data_dir if args.with_prior_preservation else None,
class_prompt=args.class_prompt,
tokenizer=tokenizer,
size=args.resolution,
center_crop=args.center_crop,
)
train_dataloader = torch.utils.data.DataLoader(
train_dataset,
batch_size=args.train_batch_size,
shuffle=True,
collate_fn=lambda examples: collate_fn(examples, args.with_prior_preservation),
num_workers=args.num_dataloader_workers,
)
# Scheduler and math around the number of training steps.
overrode_max_train_steps = False
num_update_steps_per_epoch = math.ceil(len(train_dataloader) / args.gradient_accumulation_steps)
if args.max_train_steps is None:
args.max_train_steps = args.num_train_epochs * num_update_steps_per_epoch
overrode_max_train_steps = True
lr_scheduler = get_scheduler(
args.lr_scheduler,
optimizer=optimizer,
num_warmup_steps=args.lr_warmup_steps * args.gradient_accumulation_steps,
num_training_steps=args.max_train_steps * args.gradient_accumulation_steps,
num_cycles=args.lr_num_cycles,
power=args.lr_power,
)
# Prepare everything with our `accelerator`.
if args.train_text_encoder:
unet, text_encoder, optimizer, train_dataloader, lr_scheduler = accelerator.prepare(
unet, text_encoder, optimizer, train_dataloader, lr_scheduler
)
else:
unet, optimizer, train_dataloader, lr_scheduler = accelerator.prepare(
unet, optimizer, train_dataloader, lr_scheduler
)
# For mixed precision training we cast the text_encoder and vae weights to half-precision
# as these models are only used for inference, keeping weights in full precision is not required.
weight_dtype = torch.float32
if accelerator.mixed_precision == "fp16":
weight_dtype = torch.float16
elif accelerator.mixed_precision == "bf16":
weight_dtype = torch.bfloat16
# Move vae and text_encoder to device and cast to weight_dtype
vae.to(accelerator.device, dtype=weight_dtype)
if not args.train_text_encoder:
text_encoder.to(accelerator.device, dtype=weight_dtype)
# We need to recalculate our total training steps as the size of the training dataloader may have changed.
num_update_steps_per_epoch = math.ceil(len(train_dataloader) / args.gradient_accumulation_steps)
if overrode_max_train_steps:
args.max_train_steps = args.num_train_epochs * num_update_steps_per_epoch
# Afterwards we recalculate our number of training epochs
args.num_train_epochs = math.ceil(args.max_train_steps / num_update_steps_per_epoch)
# We need to initialize the trackers we use, and also store our configuration.
# The trackers initializes automatically on the main process.
if accelerator.is_main_process:
if args.report_to == "wandb":
accelerator.init_trackers(args.wandb_project_name, config=vars(args), init_kwargs=wandb_init)
else:
accelerator.init_trackers(args.project_name, config=vars(args))
# Train!
total_batch_size = args.train_batch_size * accelerator.num_processes * args.gradient_accumulation_steps
logger.info("***** Running training *****")
logger.info(f" Num examples = {len(train_dataset)}")
logger.info(f" Num batches each epoch = {len(train_dataloader)}")
logger.info(f" Num Epochs = {args.num_train_epochs}")
logger.info(f" Instantaneous batch size per device = {args.train_batch_size}")
logger.info(f" Total train batch size (w. parallel, distributed & accumulation) = {total_batch_size}")
logger.info(f" Gradient Accumulation steps = {args.gradient_accumulation_steps}")
logger.info(f" Total optimization steps = {args.max_train_steps}")
global_step = 0
first_epoch = 0
# Potentially load in the weights and states from a previous save
if args.resume_from_checkpoint:
if args.resume_from_checkpoint != "latest":
path = os.path.basename(args.resume_from_checkpoint)
else:
# Get the most recent checkpoint
dirs = os.listdir(args.output_dir)
dirs = [d for d in dirs if d.startswith("checkpoint")]
dirs = sorted(dirs, key=lambda x: int(x.split("-")[1]))
path = dirs[-1] if len(dirs) > 0 else None
accelerator.print(f"Resuming from checkpoint {path}")
accelerator.load_state(os.path.join(args.output_dir, path))
global_step = int(path.split("-")[1])
resume_global_step = global_step * args.gradient_accumulation_steps
first_epoch = resume_global_step // num_update_steps_per_epoch
resume_step = resume_global_step % num_update_steps_per_epoch
# Only show the progress bar once on each machine.
progress_bar = tqdm(range(global_step, args.max_train_steps), disable=not accelerator.is_local_main_process)
progress_bar.set_description("Steps")
if args.train_text_encoder:
text_encoder.train()
for epoch in range(first_epoch, args.num_train_epochs):
unet.train()
with TorchTracemalloc() if not args.no_tracemalloc else nullcontext() as tracemalloc:
for step, batch in enumerate(train_dataloader):
# Skip steps until we reach the resumed step
if args.resume_from_checkpoint and epoch == first_epoch and step < resume_step:
if step % args.gradient_accumulation_steps == 0:
progress_bar.update(1)
if args.report_to == "wandb":
accelerator.print(progress_bar)
continue
with accelerator.accumulate(unet):
# Convert images to latent space
latents = vae.encode(batch["pixel_values"].to(dtype=weight_dtype)).latent_dist.sample()
latents = latents * vae.config.scaling_factor
# Sample noise that we'll add to the latents
noise = torch.randn_like(latents)
bsz = latents.shape[0]
# Sample a random timestep for each image
timesteps = torch.randint(
0, noise_scheduler.config.num_train_timesteps, (bsz,), device=latents.device
)
timesteps = timesteps.long()
# Add noise to the latents according to the noise magnitude at each timestep
# (this is the forward diffusion process)
noisy_latents = noise_scheduler.add_noise(latents, noise, timesteps)
# Get the text embedding for conditioning
encoder_hidden_states = text_encoder(batch["input_ids"])[0]
# Predict the noise residual
model_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
# Get the target for loss depending on the prediction type
if noise_scheduler.config.prediction_type == "epsilon":
target = noise
elif noise_scheduler.config.prediction_type == "v_prediction":
target = noise_scheduler.get_velocity(latents, noise, timesteps)
else:
raise ValueError(f"Unknown prediction type {noise_scheduler.config.prediction_type}")
if args.with_prior_preservation:
# Chunk the noise and model_pred into two parts and compute the loss on each part separately.
model_pred, model_pred_prior = torch.chunk(model_pred, 2, dim=0)
target, target_prior = torch.chunk(target, 2, dim=0)
# Compute instance loss
loss = F.mse_loss(model_pred.float(), target.float(), reduction="mean")
# Compute prior loss
prior_loss = F.mse_loss(model_pred_prior.float(), target_prior.float(), reduction="mean")
# Add the prior loss to the instance loss.
loss = loss + args.prior_loss_weight * prior_loss
else:
loss = F.mse_loss(model_pred.float(), target.float(), reduction="mean")
accelerator.backward(loss)
if accelerator.sync_gradients:
params_to_clip = (
itertools.chain(unet.parameters(), text_encoder.parameters())
if args.train_text_encoder
else unet.parameters()
)
accelerator.clip_grad_norm_(params_to_clip, args.max_grad_norm)
optimizer.step()
lr_scheduler.step()
optimizer.zero_grad()
# Checks if the accelerator has performed an optimization step behind the scenes
if accelerator.sync_gradients:
progress_bar.update(1)
if args.report_to == "wandb":
accelerator.print(progress_bar)
global_step += 1
if global_step % args.checkpointing_steps == 0 and global_step != 0:
if accelerator.is_main_process:
save_adaptor(accelerator, global_step, unet, text_encoder, args)
logs = {"loss": loss.detach().item(), "lr": lr_scheduler.get_last_lr()[0]}
progress_bar.set_postfix(**logs)
accelerator.log(logs, step=global_step)
if (
args.validation_prompt is not None
and (step + num_update_steps_per_epoch * epoch) % args.validation_steps == 0
and global_step > 10
):
unet.eval()
logger.info(
f"Running validation... \n Generating {len(validation_prompts)} images with prompt:"
f" {validation_prompts[0]}, ......"
)
# create pipeline
pipeline = DiffusionPipeline.from_pretrained(
args.pretrained_model_name_or_path,
safety_checker=None,
revision=args.revision,
)
# set `keep_fp32_wrapper` to True because we do not want to remove
# mixed precision hooks while we are still training
pipeline.unet = accelerator.unwrap_model(unet, keep_fp32_wrapper=True)
pipeline.text_encoder = accelerator.unwrap_model(text_encoder, keep_fp32_wrapper=True)
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config)
pipeline = pipeline.to(accelerator.device)
pipeline.set_progress_bar_config(disable=True)
# run inference
if args.seed is not None:
generator = torch.Generator(device=accelerator.device).manual_seed(args.seed)
else:
generator = None
images = []
val_img_dir = os.path.join(
args.output_dir,
f"validation/{global_step}",
args.run_name,
)
os.makedirs(val_img_dir, exist_ok=True)
for val_promot in validation_prompts:
image = pipeline(val_promot, num_inference_steps=50, generator=generator).images[0]
image.save(os.path.join(val_img_dir, f"{'_'.join(val_promot.split(' '))}.png"[1:]))
images.append(image)
for tracker in accelerator.trackers:
if tracker.name == "tensorboard":
np_images = np.stack([np.asarray(img) for img in images])
tracker.writer.add_images("validation", np_images, epoch, dataformats="NHWC")
if tracker.name == "wandb":
import wandb
tracker.log(
{
"validation": [
wandb.Image(image, caption=f"{i}: {validation_prompts[i]}")
for i, image in enumerate(images)
]
}
)
del pipeline
torch.cuda.empty_cache()
if global_step >= args.max_train_steps:
break
# Printing the GPU memory usage details such as allocated memory, peak memory, and total memory usage
if not args.no_tracemalloc:
accelerator.print(f"GPU Memory before entering the train : {b2mb(tracemalloc.begin)}")
accelerator.print(f"GPU Memory consumed at the end of the train (end-begin): {tracemalloc.used}")
accelerator.print(f"GPU Peak Memory consumed during the train (max-begin): {tracemalloc.peaked}")
accelerator.print(
f"GPU Total Peak Memory consumed during the train (max): {tracemalloc.peaked + b2mb(tracemalloc.begin)}"
)
accelerator.print(f"CPU Memory before entering the train : {b2mb(tracemalloc.cpu_begin)}")
accelerator.print(f"CPU Memory consumed at the end of the train (end-begin): {tracemalloc.cpu_used}")
accelerator.print(f"CPU Peak Memory consumed during the train (max-begin): {tracemalloc.cpu_peaked}")
accelerator.print(
f"CPU Total Peak Memory consumed during the train (max): {tracemalloc.cpu_peaked + b2mb(tracemalloc.cpu_begin)}"
)
if args.push_to_hub:
repo.push_to_hub(commit_message="End of training", blocking=False, auto_lfs_prune=True)
accelerator.end_training()
if __name__ == "__main__":
args = parse_args()
main(args)

View File

@ -0,0 +1,185 @@
CLASS_IDX=$1
# Define the UNIQUE_TOKEN, CLASS_TOKENs, and SUBJECT_NAMES
UNIQUE_TOKEN="qwe"
SUBJECT_NAMES=(
"backpack" "backpack_dog" "bear_plushie" "berry_bowl" "can"
"candle" "cat" "cat2" "clock" "colorful_sneaker"
"dog" "dog2" "dog3" "dog5" "dog6"
"dog7" "dog8" "duck_toy" "fancy_boot" "grey_sloth_plushie"
"monster_toy" "pink_sunglasses" "poop_emoji" "rc_car" "red_cartoon"
"robot_toy" "shiny_sneaker" "teapot" "vase" "wolf_plushie"
)
CLASS_TOKENs=(
"backpack" "backpack" "stuffed animal" "bowl" "can"
"candle" "cat" "cat" "clock" "sneaker"
"dog" "dog" "dog" "dog" "dog"
"dog" "dog" "toy" "boot" "stuffed animal"
"toy" "glasses" "toy" "toy" "cartoon"
"toy" "sneaker" "teapot" "vase" "stuffed animal"
)
CLASS_TOKEN=${CLASS_TOKENs[$CLASS_IDX]}
SELECTED_SUBJECT=${SUBJECT_NAMES[$CLASS_IDX]}
if [[ $CLASS_IDX =~ ^(0|1|2|3|4|5|8|9|17|18|19|20|21|22|23|24|25|26|27|28|29)$ ]]; then
PROMPT_LIST=(
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} in the jungle."
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} in the snow."
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} on the beach."
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} on a cobblestone street."
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} on top of pink fabric."
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} on top of a wooden floor."
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} with a city in the background."
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} with a mountain in the background."
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} with a blue house in the background."
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} on top of a purple rug in a forest."
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} with a wheat field in the background."
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} with a tree and autumn leaves in the background."
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} with the Eiffel Tower in the background."
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} floating on top of water."
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} floating in an ocean of milk."
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} on top of green grass with sunflowers around it."
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} on top of a mirror."
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} on top of the sidewalk in a crowded street."
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} on top of a dirt road."
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} on top of a white rug."
"a red ${UNIQUE_TOKEN} ${CLASS_TOKEN}."
"a purple ${UNIQUE_TOKEN} ${CLASS_TOKEN}."
"a shiny ${UNIQUE_TOKEN} ${CLASS_TOKEN}."
"a wet ${UNIQUE_TOKEN} ${CLASS_TOKEN}."
"a cube shaped ${UNIQUE_TOKEN} ${CLASS_TOKEN}."
)
prompt_test_list=(
"a ${CLASS_TOKEN} in the jungle"
"a ${CLASS_TOKEN} in the snow"
"a ${CLASS_TOKEN} on the beach"
"a ${CLASS_TOKEN} on a cobblestone street"
"a ${CLASS_TOKEN} on top of pink fabric"
"a ${CLASS_TOKEN} on top of a wooden floor"
"a ${CLASS_TOKEN} with a city in the background"
"a ${CLASS_TOKEN} with a mountain in the background"
"a ${CLASS_TOKEN} with a blue house in the background"
"a ${CLASS_TOKEN} on top of a purple rug in a forest"
"a ${CLASS_TOKEN} with a wheat field in the background"
"a ${CLASS_TOKEN} with a tree and autumn leaves in the background"
"a ${CLASS_TOKEN} with the Eiffel Tower in the background"
"a ${CLASS_TOKEN} floating on top of water"
"a ${CLASS_TOKEN} floating in an ocean of milk"
"a ${CLASS_TOKEN} on top of green grass with sunflowers around it"
"a ${CLASS_TOKEN} on top of a mirror"
"a ${CLASS_TOKEN} on top of the sidewalk in a crowded street"
"a ${CLASS_TOKEN} on top of a dirt road"
"a ${CLASS_TOKEN} on top of a white rug"
"a red ${CLASS_TOKEN}"
"a purple ${CLASS_TOKEN}"
"a shiny ${CLASS_TOKEN}"
"a wet ${CLASS_TOKEN}"
"a cube shaped ${CLASS_TOKEN}"
)
else
PROMPT_LIST=(
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} in the jungle."
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} in the snow."
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} on the beach."
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} on a cobblestone street."
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} on top of pink fabric."
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} on top of a wooden floor."
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} with a city in the background."
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} with a mountain in the background."
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} with a blue house in the background."
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} on top of a purple rug in a forest."
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} wearing a red hat."
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} wearing a santa hat."
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} wearing a rainbow scarf."
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} wearing a black top hat and a monocle."
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} in a chef outfit."
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} in a firefighter outfit."
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} in a police outfit."
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} wearing pink glasses."
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} wearing a yellow shirt."
"a ${UNIQUE_TOKEN} ${CLASS_TOKEN} in a purple wizard outfit."
"a red ${UNIQUE_TOKEN} ${CLASS_TOKEN}."
"a purple ${UNIQUE_TOKEN} ${CLASS_TOKEN}."
"a shiny ${UNIQUE_TOKEN} ${CLASS_TOKEN}."
"a wet ${UNIQUE_TOKEN} ${CLASS_TOKEN}."
"a cube shaped ${UNIQUE_TOKEN} ${CLASS_TOKEN}."
)
prompt_test_list=(
"a ${CLASS_TOKEN} in the jungle"
"a ${CLASS_TOKEN} in the snow"
"a ${CLASS_TOKEN} on the beach"
"a ${CLASS_TOKEN} on a cobblestone street"
"a ${CLASS_TOKEN} on top of pink fabric"
"a ${CLASS_TOKEN} on top of a wooden floor"
"a ${CLASS_TOKEN} with a city in the background"
"a ${CLASS_TOKEN} with a mountain in the background"
"a ${CLASS_TOKEN} with a blue house in the background"
"a ${CLASS_TOKEN} on top of a purple rug in a forest"
"a ${CLASS_TOKEN} wearing a red hat"
"a ${CLASS_TOKEN} wearing a santa hat"
"a ${CLASS_TOKEN} wearing a rainbow scarf"
"a ${CLASS_TOKEN} wearing a black top hat and a monocle"
"a ${CLASS_TOKEN} in a chef outfit"
"a ${CLASS_TOKEN} in a firefighter outfit"
"a ${CLASS_TOKEN} in a police outfit"
"a ${CLASS_TOKEN} wearing pink glasses"
"a ${CLASS_TOKEN} wearing a yellow shirt"
"a ${CLASS_TOKEN} in a purple wizard outfit"
"a red ${CLASS_TOKEN}"
"a purple ${CLASS_TOKEN}"
"a shiny ${CLASS_TOKEN}"
"a wet ${CLASS_TOKEN}"
"a cube shaped ${CLASS_TOKEN}"
)
fi
VALIDATION_PROMPT=${PROMPT_LIST[@]}
INSTANCE_PROMPT="a photo of ${UNIQUE_TOKEN} ${CLASS_TOKEN}"
CLASS_PROMPT="a photo of ${CLASS_TOKEN}"
export MODEL_NAME="stabilityai/stable-diffusion-2-1"
PEFT_TYPE="hra"
HRA_R=8
export PROJECT_NAME="dreambooth_${PEFT_TYPE}"
export RUN_NAME="${SELECTED_SUBJECT}_${PEFT_TYPE}_${HRA_R}"
export INSTANCE_DIR="./data/dreambooth/dataset/${SELECTED_SUBJECT}"
export CLASS_DIR="./data/class_data/${CLASS_TOKEN}"
export OUTPUT_DIR="./data/output/${PEFT_TYPE}"
accelerate launch train_dreambooth.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--class_data_dir="$CLASS_DIR" \
--output_dir=$OUTPUT_DIR \
--project_name=$PROJECT_NAME \
--run_name=$RUN_NAME \
--with_prior_preservation \
--prior_loss_weight=1.0 \
--instance_prompt="$INSTANCE_PROMPT" \
--validation_prompt="$VALIDATION_PROMPT" \
--class_prompt="$CLASS_PROMPT" \
--resolution=512 \
--train_batch_size=1 \
--num_dataloader_workers=2 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--num_class_images=200 \
--use_hra \
--hra_r=$HRA_R \
--hra_bias="hra_only" \
--learning_rate=5e-3 \
--max_train_steps=510 \
--checkpointing_steps=200 \
--validation_steps=200 \
--enable_xformers_memory_efficient_attention \
--report_to="none" \

View File

@ -0,0 +1,377 @@
# adapted from [peft's boft_dreambooth](https://github.com/huggingface/peft/tree/main/examples/boft_dreambooth)
import argparse
import os
import warnings
from typing import Optional
from huggingface_hub import HfFolder, whoami
from transformers import PretrainedConfig
def import_model_class_from_model_name_or_path(pretrained_model_name_or_path: str, revision: str):
text_encoder_config = PretrainedConfig.from_pretrained(
pretrained_model_name_or_path,
subfolder="text_encoder",
revision=revision,
)
model_class = text_encoder_config.architectures[0]
if model_class == "CLIPTextModel":
from transformers import CLIPTextModel
return CLIPTextModel
elif model_class == "RobertaSeriesModelWithTransformation":
from diffusers.pipelines.alt_diffusion.modeling_roberta_series import RobertaSeriesModelWithTransformation
return RobertaSeriesModelWithTransformation
else:
raise ValueError(f"{model_class} is not supported.")
def get_full_repo_name(model_id: str, organization: Optional[str] = None, token: Optional[str] = None):
if token is None:
token = HfFolder.get_token()
if organization is None:
username = whoami(token)["name"]
return f"{username}/{model_id}"
else:
return f"{organization}/{model_id}"
def parse_args(input_args=None):
parser = argparse.ArgumentParser(description="Simple example of a Dreambooth training script.")
parser.add_argument(
"--pretrained_model_name_or_path",
type=str,
default=None,
required=True,
help="Path to pretrained model or model identifier from huggingface.co/models.",
)
parser.add_argument(
"--revision",
type=str,
default=None,
required=False,
help="Revision of pretrained model identifier from huggingface.co/models.",
)
parser.add_argument(
"--tokenizer_name",
type=str,
default=None,
help="Pretrained tokenizer name or path if not the same as model_name",
)
parser.add_argument(
"--instance_data_dir",
type=str,
default=None,
required=True,
help="A folder containing the training data of instance images.",
)
parser.add_argument(
"--class_data_dir",
type=str,
default=None,
required=False,
help="A folder containing the training data of class images.",
)
parser.add_argument(
"--instance_prompt",
type=str,
default=None,
required=True,
help="The prompt with identifier specifying the instance",
)
parser.add_argument(
"--class_prompt",
type=str,
default=None,
help="The prompt to specify images in the same class as provided instance images.",
)
parser.add_argument(
"--with_prior_preservation",
default=False,
action="store_true",
help="Flag to add prior preservation loss.",
)
parser.add_argument("--prior_loss_weight", type=float, default=1.0, help="The weight of prior preservation loss.")
parser.add_argument(
"--num_class_images",
type=int,
default=100,
help=(
"Minimal class images for prior preservation loss. If there are not enough images already present in"
" class_data_dir, additional images will be sampled with class_prompt."
),
)
parser.add_argument(
"--validation_prompt",
nargs="+",
help="A prompt that is used during validation to verify that the model is learning.",
)
parser.add_argument(
"--num_validation_images",
type=int,
default=4,
help="Number of images that should be generated during validation with `validation_prompt`.",
)
parser.add_argument(
"--validation_steps",
type=int,
default=500,
help=(
"Run dreambooth validation every X steps. Dreambooth validation consists of running the prompt"
" `args.validation_prompt` multiple times: `args.num_validation_images`."
),
)
parser.add_argument(
"--output_dir",
type=str,
default="text-inversion-model",
help="The output directory where the model predictions and checkpoints will be written.",
)
parser.add_argument("--seed", type=int, default=None, help="A seed for reproducible training.")
parser.add_argument(
"--resolution",
type=int,
default=512,
help=(
"The resolution for input images, all the images in the train/validation dataset will be resized to this"
" resolution"
),
)
parser.add_argument(
"--center_crop", action="store_true", help="Whether to center crop images before resizing to resolution"
)
parser.add_argument("--train_text_encoder", action="store_true", help="Whether to train the text encoder")
parser.add_argument(
"--set_grads_to_none",
action="store_true",
help=(
"Save more memory by using setting grads to None instead of zero. Be aware, that this changes certain"
" behaviors, so disable this argument if it causes any problems. More info:"
" https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.zero_grad.html"
),
)
# hra args
parser.add_argument("--use_hra", action="store_true", help="Whether to use HRA for parameter efficient tuning.")
parser.add_argument("--hra_r", type=int, default=8, help="The rank of HRA across different layers.")
parser.add_argument(
"--hra_apply_GS", default=False, action="store_true", help="Whether to apply Gram-Schmidt orthogonalization."
)
parser.add_argument(
"--hra_bias",
type=str,
default="none",
help="Bias type for HRA. Can be 'none', 'all' or 'hra_only', only used if use_hra is True.",
)
parser.add_argument(
"--num_dataloader_workers", type=int, default=1, help="Num of workers for the training dataloader."
)
parser.add_argument(
"--no_tracemalloc",
default=False,
action="store_true",
help="Flag to stop memory allocation tracing during training. This could speed up training on Windows.",
)
parser.add_argument(
"--train_batch_size", type=int, default=4, help="Batch size (per device) for the training dataloader."
)
parser.add_argument(
"--sample_batch_size", type=int, default=4, help="Batch size (per device) for sampling images."
)
parser.add_argument("--num_train_epochs", type=int, default=1)
parser.add_argument(
"--max_train_steps",
type=int,
default=None,
help="Total number of training steps to perform. If provided, overrides num_train_epochs.",
)
parser.add_argument(
"--checkpointing_steps",
type=int,
default=500,
help=(
"Save a checkpoint of the training state every X updates. These checkpoints can be used both as final"
" checkpoints in case they are better than the last checkpoint, and are also suitable for resuming"
" training using `--resume_from_checkpoint`."
),
)
parser.add_argument(
"--resume_from_checkpoint",
type=str,
default=None,
help=(
"Whether training should be resumed from a previous checkpoint. Use a path saved by"
' `--checkpointing_steps`, or `"latest"` to automatically select the last available checkpoint.'
),
)
parser.add_argument(
"--gradient_accumulation_steps",
type=int,
default=1,
help="Number of updates steps to accumulate before performing a backward/update pass.",
)
parser.add_argument(
"--gradient_checkpointing",
action="store_true",
help="Whether or not to use gradient checkpointing to save memory at the expense of slower backward pass.",
)
parser.add_argument(
"--learning_rate",
type=float,
default=5e-6,
help="Initial learning rate (after the potential warmup period) to use.",
)
parser.add_argument(
"--scale_lr",
action="store_true",
default=False,
help="Scale the learning rate by the number of GPUs, gradient accumulation steps, and batch size.",
)
parser.add_argument(
"--lr_scheduler",
type=str,
default="constant",
help=(
'The scheduler type to use. Choose between ["linear", "cosine", "cosine_with_restarts", "polynomial",'
' "constant", "constant_with_warmup"]'
),
)
parser.add_argument(
"--lr_warmup_steps", type=int, default=500, help="Number of steps for the warmup in the lr scheduler."
)
parser.add_argument(
"--lr_num_cycles",
type=int,
default=1,
help="Number of hard resets of the lr in cosine_with_restarts scheduler.",
)
parser.add_argument("--lr_power", type=float, default=1.0, help="Power factor of the polynomial scheduler.")
parser.add_argument(
"--use_8bit_adam", action="store_true", help="Whether or not to use 8-bit Adam from bitsandbytes."
)
parser.add_argument("--adam_beta1", type=float, default=0.9, help="The beta1 parameter for the Adam optimizer.")
parser.add_argument("--adam_beta2", type=float, default=0.999, help="The beta2 parameter for the Adam optimizer.")
parser.add_argument("--adam_weight_decay", type=float, default=1e-2, help="Weight decay to use.")
parser.add_argument("--adam_epsilon", type=float, default=1e-08, help="Epsilon value for the Adam optimizer")
parser.add_argument("--max_grad_norm", default=1.0, type=float, help="Max gradient norm.")
parser.add_argument("--push_to_hub", action="store_true", help="Whether or not to push the model to the Hub.")
parser.add_argument("--hub_token", type=str, default=None, help="The token to use to push to the Model Hub.")
parser.add_argument(
"--hub_model_id",
type=str,
default=None,
help="The name of the repository to keep in sync with the local `output_dir`.",
)
parser.add_argument(
"--logging_dir",
type=str,
default="logs",
help=(
"[TensorBoard](https://www.tensorflow.org/tensorboard) log directory. Will default to"
" *output_dir/runs/**CURRENT_DATETIME_HOSTNAME***."
),
)
parser.add_argument(
"--allow_tf32",
action="store_true",
help=(
"Whether or not to allow TF32 on Ampere GPUs. Can be used to speed up training. For more information, see"
" https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices"
),
)
parser.add_argument(
"--project_name",
type=str,
default=None,
help=("The project name for log tracking"),
)
parser.add_argument(
"--run_name",
type=str,
default=None,
help=("The run name for log tracking"),
)
parser.add_argument(
"--report_to",
type=str,
default="wandb",
help=(
'The integration to report the results and logs to. Supported platforms are `"wandb"`'
' (default), `"tensorboard"` and `"comet_ml"`. Use `"all"` to report to all integrations.'
),
)
parser.add_argument(
"--wandb_key",
type=str,
default=None,
help=("If report to option is set to wandb, api-key for wandb used for login to wandb "),
)
parser.add_argument(
"--wandb_project_name",
type=str,
default=None,
help=("If report to option is set to wandb, project name in wandb for log tracking "),
)
parser.add_argument(
"--wandb_run_name",
type=str,
default=None,
help=("If report to option is set to wandb, project name in wandb for log tracking "),
)
parser.add_argument(
"--mixed_precision",
type=str,
default=None,
choices=["no", "fp16", "bf16"],
help=(
"Whether to use mixed precision. Choose between fp16 and bf16 (bfloat16). Bf16 requires PyTorch >="
" 1.10.and an Nvidia Ampere GPU. Default to the value of accelerate config of the current system or the"
" flag passed with the `accelerate.launch` command. Use this argument to override the accelerate config."
),
)
parser.add_argument(
"--prior_generation_precision",
type=str,
default=None,
choices=["no", "fp32", "fp16", "bf16"],
help=(
"Choose prior generation precision between fp32, fp16 and bf16 (bfloat16). Bf16 requires PyTorch >="
" 1.10.and an Nvidia Ampere GPU. Default to fp16 if a GPU is available else fp32."
),
)
parser.add_argument("--local_rank", type=int, default=-1, help="For distributed training: local_rank")
parser.add_argument(
"--enable_xformers_memory_efficient_attention", action="store_true", help="Whether or not to use xformers."
)
if input_args is not None:
args = parser.parse_args(input_args)
else:
args = parser.parse_args()
env_local_rank = int(os.environ.get("LOCAL_RANK", -1))
if env_local_rank != -1 and env_local_rank != args.local_rank:
args.local_rank = env_local_rank
# Sanity checks
# if args.dataset_name is None and args.train_data_dir is None:
# raise ValueError("Need either a dataset name or a training folder.")
if args.with_prior_preservation:
if args.class_data_dir is None:
raise ValueError("You must specify a data directory for class images.")
if args.class_prompt is None:
raise ValueError("You must specify prompt for class images.")
else:
# logger is not available yet
if args.class_data_dir is not None:
warnings.warn("You need not use --class_data_dir without --with_prior_preservation.")
if args.class_prompt is not None:
warnings.warn("You need not use --class_prompt without --with_prior_preservation.")
return args

View File

@ -0,0 +1,128 @@
# adapted from [peft's boft_dreambooth](https://github.com/huggingface/peft/tree/main/examples/boft_dreambooth)
from pathlib import Path
import torch
from PIL import Image
from torch.utils.data import Dataset
from torchvision import transforms
class DreamBoothDataset(Dataset):
"""
A dataset to prepare the instance and class images with the prompts for fine-tuning the model.
It pre-processes the images and the tokenizes prompts.
"""
def __init__(
self,
instance_data_root,
instance_prompt,
tokenizer,
class_data_root=None,
class_prompt=None,
size=512,
center_crop=False,
):
self.size = size
self.center_crop = center_crop
self.tokenizer = tokenizer
self.instance_data_root = Path(instance_data_root)
if not self.instance_data_root.exists():
raise ValueError("Instance images root doesn't exists.")
self.instance_images_path = list(Path(instance_data_root).iterdir())
self.num_instance_images = len(self.instance_images_path)
self.instance_prompt = instance_prompt
self._length = self.num_instance_images
if class_data_root is not None:
self.class_data_root = Path(class_data_root)
self.class_data_root.mkdir(parents=True, exist_ok=True)
self.class_images_path = list(self.class_data_root.iterdir())
self.num_class_images = len(self.class_images_path)
self._length = max(self.num_class_images, self.num_instance_images)
self.class_prompt = class_prompt
else:
self.class_data_root = None
self.image_transforms = transforms.Compose(
[
transforms.Resize(size, interpolation=transforms.InterpolationMode.BILINEAR),
transforms.CenterCrop(size) if center_crop else transforms.RandomCrop(size),
transforms.ToTensor(),
transforms.Normalize([0.5], [0.5]),
]
)
def __len__(self):
return self._length
def __getitem__(self, index):
example = {}
instance_image = Image.open(self.instance_images_path[index % self.num_instance_images])
if not instance_image.mode == "RGB":
instance_image = instance_image.convert("RGB")
example["instance_images"] = self.image_transforms(instance_image)
example["instance_prompt_ids"] = self.tokenizer(
self.instance_prompt,
truncation=True,
padding="max_length",
max_length=self.tokenizer.model_max_length,
return_tensors="pt",
).input_ids
if self.class_data_root:
class_image = Image.open(self.class_images_path[index % self.num_class_images])
if not class_image.mode == "RGB":
class_image = class_image.convert("RGB")
example["class_images"] = self.image_transforms(class_image)
example["class_prompt_ids"] = self.tokenizer(
self.class_prompt,
truncation=True,
padding="max_length",
max_length=self.tokenizer.model_max_length,
return_tensors="pt",
).input_ids
return example
def collate_fn(examples, with_prior_preservation=False):
input_ids = [example["instance_prompt_ids"] for example in examples]
pixel_values = [example["instance_images"] for example in examples]
# Concat class and instance examples for prior preservation.
# We do this to avoid doing two forward passes.
if with_prior_preservation:
input_ids += [example["class_prompt_ids"] for example in examples]
pixel_values += [example["class_images"] for example in examples]
pixel_values = torch.stack(pixel_values)
pixel_values = pixel_values.to(memory_format=torch.contiguous_format).float()
input_ids = torch.cat(input_ids, dim=0)
batch = {
"input_ids": input_ids,
"pixel_values": pixel_values,
}
return batch
class PromptDataset(Dataset):
"A simple dataset to prepare the prompts to generate class images on multiple GPUs."
def __init__(self, prompt, num_samples):
self.prompt = prompt
self.num_samples = num_samples
def __len__(self):
return self.num_samples
def __getitem__(self, index):
example = {}
example["prompt"] = self.prompt
example["index"] = index
return example

View File

@ -0,0 +1,60 @@
# adapted from [peft's boft_dreambooth](https://github.com/huggingface/peft/tree/main/examples/boft_dreambooth)
import gc
import threading
import psutil
import torch
# Converting Bytes to Megabytes
def b2mb(x):
return int(x / 2**20)
# This context manager is used to track the peak memory usage of the process
class TorchTracemalloc:
def __enter__(self):
gc.collect()
torch.cuda.empty_cache()
torch.cuda.reset_max_memory_allocated() # reset the peak gauge to zero
self.begin = torch.cuda.memory_allocated()
self.process = psutil.Process()
self.cpu_begin = self.cpu_mem_used()
self.peak_monitoring = True
peak_monitor_thread = threading.Thread(target=self.peak_monitor_func)
peak_monitor_thread.daemon = True
peak_monitor_thread.start()
return self
def cpu_mem_used(self):
"""get resident set size memory for the current process"""
return self.process.memory_info().rss
def peak_monitor_func(self):
self.cpu_peak = -1
while True:
self.cpu_peak = max(self.cpu_mem_used(), self.cpu_peak)
# can't sleep or will not catch the peak right (this comment is here on purpose)
# time.sleep(0.001) # 1msec
if not self.peak_monitoring:
break
def __exit__(self, *exc):
self.peak_monitoring = False
gc.collect()
torch.cuda.empty_cache()
self.end = torch.cuda.memory_allocated()
self.peak = torch.cuda.max_memory_allocated()
self.used = b2mb(self.end - self.begin)
self.peaked = b2mb(self.peak - self.begin)
self.cpu_end = self.cpu_mem_used()
self.cpu_used = b2mb(self.cpu_end - self.cpu_begin)
self.cpu_peaked = b2mb(self.cpu_peak - self.cpu_begin)
# print(f"delta used/peak {self.used:4d}/{self.peaked:4d}")

View File

@ -1008,7 +1008,7 @@
"args = TrainingArguments(\n",
" f\"{model_name}-finetuned-lora-food101\",\n",
" remove_unused_columns=False,\n",
" evaluation_strategy=\"epoch\",\n",
" eval_strategy=\"epoch\",\n",
" save_strategy=\"epoch\",\n",
" learning_rate=5e-3,\n",
" per_device_train_batch_size=batch_size,\n",

View File

@ -819,7 +819,7 @@
"\n",
"training_args = TrainingArguments(\n",
" \"temp\",\n",
" evaluation_strategy=\"epoch\",\n",
" eval_strategy=\"epoch\",\n",
" learning_rate=1e-3,\n",
" gradient_accumulation_steps=1,\n",
" auto_find_batch_size=True,\n",

View File

@ -1246,7 +1246,7 @@
" learning_rate=1e-3,\n",
" warmup_steps=50,\n",
" num_train_epochs=3,\n",
" evaluation_strategy=\"epoch\",\n",
" eval_strategy=\"epoch\",\n",
" fp16=True,\n",
" per_device_eval_batch_size=8,\n",
" generation_max_length=128,\n",

View File

@ -0,0 +1,84 @@
# OLoRA: Orthonormal Low Rank Adaptation of Large Language Models
## Introduction
[OLoRA](https://arxiv.org/abs/2406.01775) is a novel approach that leverages orthonormal low rank adaptation through QR decomposition. Unlike the default LoRA implementation, OLoRA decomposes original weights into their $\mathbf{Q}$ and $\mathbf{R}$ parts, and then uses the first `rank` rows of $\mathbf{R}$ and the first `rank` columns of $\mathbf{Q}$ to initialize $\mathbf{A}$ and $\mathbf{B}$, respectively. This results in significantly faster convergence, more stable training, and superior performance.
## Quick start
```python
import torch
from peft import LoraConfig, get_peft_model
from transformers import AutoTokenizer, AutoModelForCausalLM
from trl import SFTTrainer
from datasets import load_dataset
model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m", torch_dtype=torch.bfloat16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m")
dataset = load_dataset("imdb", split="train[:1%]")
lora_config = LoraConfig(
init_lora_weights="olora"
)
peft_model = get_peft_model(model, lora_config)
trainer = SFTTrainer(
model=peft_model,
train_dataset=dataset,
dataset_text_field="text",
max_seq_length=512,
tokenizer=tokenizer,
)
trainer.train()
peft_model.save_pretrained("olora-opt-350m")
```
There is no additional change needed to your standard LoRA procedure, except for specifying `init_lora_weights = "olora"` option in your lora configuration.
Additionally you can refer to olora finetuning script.
Run the script simply by running:
```bash
python3 examples/olora_finetuning/olora_finetuning.py --base_model facebook/opt-350m
```
OLoRA also supports quantization. To use 4-bit quantization try:
```bash
python3 examples/olora_finetuning/olora_finetuning.py --base_model facebook/opt-350m --quantize
```
## Use the model
You can load and use the model as any other 🤗 PEFT model
```python
from peft import PeftModel
model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m")
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m")
olora_model = PeftModel.from_pretrained(model, "olora-opt-350m")
```
## OLoRA and LoRA
OLoRA differs from LoRA in that it mutates the original weights. To utilize multiple adapters simultaneously, you can leverage the `path_initial_model_for_weight_conversion` option. Below is a simple template illustrating how to convert OLoRA to conventional LoRA:
```python
base_model = AutoModel.from_pretrained("facebook/opt-350m")
olora_config = LoraConfig(
...
init_lora_weights = "olora" # Initialize the model with OLoRA
)
olora_model = get_peft_model(base_model, olora_config)
init_path = <path-to-untrained-olora-model>
olora_model.save_pretrained(init_path) # Save the model *before* performing any training
# Train the model
train(olora_model) # Your training loop
#Save the model after training
olora_model.save_pretrained(output_dir, path_initial_model_for_weight_conversion=init_path)
```
After completing training, you can save and convert your OLoRA model to a conventional LoRA model by setting `path_initial_model_for_weight_conversion` to `init_path`, that is the path of your untrained OLoRA model. This conversion enables you to use multiple adapters with your LoRA model. Note that this conversion is not supported if `rslora` is used in combination with `rank_pattern` or `alpha_pattern`.
## Citation
```
@misc{büyükakyüz2024olora,
title={OLoRA: Orthonormal Low-Rank Adaptation of Large Language Models},
author={Kerim Büyükakyüz},
year={2024},
eprint={2406.01775},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```

View File

@ -0,0 +1,184 @@
# Copyright 2024-present the HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import List
import torch
import transformers
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import (
LoraConfig,
get_peft_model,
)
def train(
base_model: str = "path/to/model",
data_path: str = "yahma/alpaca-cleaned",
output_dir: str = "olora",
batch_size: int = 16,
num_epochs: int = 1,
learning_rate: float = 3e-4,
cutoff_len: int = 256,
val_set_size: int = 16,
quantize: bool = False,
eval_step: int = 100,
save_step: int = 100,
device_map: str = "auto",
lora_r: int = 32,
lora_alpha: int = 16,
lora_dropout: float = 0.05,
lora_target_modules: List[str] = None,
init_lora_weights="olora",
):
model = AutoModelForCausalLM.from_pretrained(
base_model,
device_map=device_map,
quantization_config=BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
)
if quantize
else None,
torch_dtype=torch.float16,
)
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
def tokenize(prompt, add_eos_token=True):
result = tokenizer(
prompt,
truncation=True,
max_length=cutoff_len,
padding=False,
return_tensors=None,
)
if (
result["input_ids"][-1] != tokenizer.eos_token_id
and len(result["input_ids"]) < cutoff_len
and add_eos_token
):
result["input_ids"].append(tokenizer.eos_token_id)
result["attention_mask"].append(1)
result["labels"] = result["input_ids"].copy()
return result
def generate_and_tokenize_prompt(example):
full_prompt = generate_prompt(example)
tokenized_full_prompt = tokenize(full_prompt)
return tokenized_full_prompt
config = LoraConfig(
r=lora_r,
lora_alpha=lora_alpha,
target_modules=lora_target_modules,
lora_dropout=lora_dropout,
bias="none",
task_type="CAUSAL_LM",
init_lora_weights=init_lora_weights,
)
model = get_peft_model(model, config)
data = load_dataset(data_path)
train_val = data["train"].train_test_split(test_size=val_set_size, shuffle=True, seed=42)
train_data = train_val["train"].shuffle().map(generate_and_tokenize_prompt)
val_data = train_val["test"].shuffle().map(generate_and_tokenize_prompt)
trainer = transformers.Trainer(
model=model,
train_dataset=train_data,
eval_dataset=val_data,
args=transformers.TrainingArguments(
per_device_train_batch_size=batch_size,
warmup_steps=100,
num_train_epochs=num_epochs,
learning_rate=learning_rate,
fp16=True,
logging_steps=100,
optim="adamw_torch",
evaluation_strategy="steps",
save_strategy="steps",
eval_steps=eval_step,
save_steps=save_step,
output_dir=output_dir,
save_total_limit=3,
load_best_model_at_end=True,
),
data_collator=transformers.DataCollatorForSeq2Seq(
tokenizer, pad_to_multiple_of=8, return_tensors="pt", padding=True
),
)
trainer.train()
model.save_pretrained(output_dir)
def generate_prompt(example):
return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{example["instruction"]}
### Response:
{example["output"]}"""
if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("--base_model", type=str, default="path/to/model")
parser.add_argument("--data_path", type=str, default="yahma/alpaca-cleaned")
parser.add_argument("--output_dir", type=str, default="olora")
parser.add_argument("--batch_size", type=int, default=16)
parser.add_argument("--num_epochs", type=int, default=1)
parser.add_argument("--learning_rate", type=float, default=3e-4)
parser.add_argument("--cutoff_len", type=int, default=256)
parser.add_argument("--val_set_size", type=int, default=16)
parser.add_argument("--quantize", action="store_true")
parser.add_argument("--eval_step", type=int, default=100)
parser.add_argument("--save_step", type=int, default=100)
parser.add_argument("--device_map", type=str, default="auto")
parser.add_argument("--lora_r", type=int, default=32)
parser.add_argument("--lora_alpha", type=int, default=16)
parser.add_argument("--lora_dropout", type=float, default=0.05)
parser.add_argument("--lora_target_modules", type=str, default=None)
parser.add_argument("--init_lora_weights", type=str, default="olora")
args = parser.parse_args()
train(
base_model=args.base_model,
data_path=args.data_path,
output_dir=args.output_dir,
batch_size=args.batch_size,
num_epochs=args.num_epochs,
learning_rate=args.learning_rate,
cutoff_len=args.cutoff_len,
val_set_size=args.val_set_size,
quantize=args.quantize,
eval_step=args.eval_step,
save_step=args.save_step,
device_map=args.device_map,
lora_r=args.lora_r,
lora_alpha=args.lora_alpha,
lora_dropout=args.lora_dropout,
lora_target_modules=args.lora_target_modules,
init_lora_weights=args.init_lora_weights,
)

View File

@ -90,7 +90,7 @@ peft_model = PeftModel.from_pretrained(model, "pissa-llama-2-7b-lora")
```
Utilizing the converted LoRA does not require modifying the parameters of the base model. When multiple converted LoRAs are needed simultaneously, each adapter operates independently without interference, allowing for the adapters to be freely deleted or added.
Note that this conversion is not supported if `rslora` is used in combination with `rank_pattern` or `alpha_pattern`.
### Fine-tune in 4-bit or 8-bit
If quantization fine-tuning is desired, it is necessary to first decompose the original model at full precision and then reload the residual model in either 4-bit or 8-bit configurations.
@ -128,4 +128,4 @@ This approach ensures the preservation of high-frequency, out-of-distribution pa
journal={arXiv preprint arXiv:2404.02948},
year={2024}
}
```
```

View File

@ -21,10 +21,13 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model
parser = argparse.ArgumentParser(
description="Merge Adapter to Base Model", help="The name or path of the fp32/16 base model."
parser = argparse.ArgumentParser()
parser.add_argument(
"--base_model_name_or_path",
description="Merge Adapter to Base Model",
help="The name or path of the fp32/16 base model.",
)
parser.add_argument("--base_model_name_or_path", type=str, default="bf16")
parser.add_argument("--output_dir", type=str, help="The directory to save the PiSSA model.")
parser.add_argument("--bits", type=str, default="bf16", choices=["bf16", "fp16", "fp32"])
parser.add_argument(
"--init_lora_weights", type=str, default="pissa", help="(`['pissa', 'pissa_niter_[number of iters]']`)"

View File

@ -973,7 +973,7 @@
" per_device_eval_batch_size=batch_size,\n",
" learning_rate=lr,\n",
" num_train_epochs=num_epochs,\n",
" evaluation_strategy=\"epoch\",\n",
" eval_strategy=\"epoch\",\n",
" logging_strategy=\"epoch\",\n",
" save_strategy=\"no\",\n",
" report_to=[],\n",

View File

@ -587,7 +587,7 @@
" per_device_train_batch_size=4,\n",
" per_device_eval_batch_size=2,\n",
" save_total_limit=3,\n",
" evaluation_strategy=\"epoch\",\n",
" eval_strategy=\"epoch\",\n",
" save_strategy=\"epoch\",\n",
" logging_steps=5,\n",
" remove_unused_columns=False,\n",

View File

@ -0,0 +1,556 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "d36e1e93-ae93-4a4e-93c6-68fd868d2882",
"metadata": {},
"source": [
"# Using FourierFT for sequence classification"
]
},
{
"cell_type": "markdown",
"id": "ddfc0610-55f6-4343-a950-125ccf0f45ac",
"metadata": {},
"source": [
"In this example, we fine-tune Roberta (base) on a sequence classification task using FourierFT."
]
},
{
"cell_type": "markdown",
"id": "45addd81-d4f3-4dfd-960d-3920d347f0a6",
"metadata": {},
"source": [
"## Imports"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "a9935ae2",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/zgaoat/anaconda3/envs/pr2/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
" from .autonotebook import tqdm as notebook_tqdm\n"
]
}
],
"source": [
"# To run this notebook, please run `pip install evaluate` to install additional dependencies not covered by PEFT.\n",
"import torch\n",
"from torch.optim import AdamW\n",
"from torch.utils.data import DataLoader\n",
"from peft import (\n",
" get_peft_model,\n",
" FourierFTConfig,\n",
" PeftType,\n",
")\n",
"\n",
"import evaluate\n",
"from datasets import load_dataset\n",
"from transformers import AutoModelForSequenceClassification, AutoTokenizer, get_linear_schedule_with_warmup, set_seed, AutoConfig\n",
"from tqdm import tqdm"
]
},
{
"cell_type": "markdown",
"id": "62c959bf-7cc2-49e0-b97e-4c10ec3b9bf3",
"metadata": {},
"source": [
"## Parameters"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "e3b13308",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<torch._C.Generator at 0x78e2a49744b0>"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"batch_size = 32\n",
"model_name_or_path = \"roberta-base\"\n",
"task = \"mrpc\"\n",
"peft_type = PeftType.FOURIERFT\n",
"device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
"num_epochs = 5 # for better results, increase this number\n",
"n_frequency = 1000 # for better results, increase this number\n",
"scaling = 150.0\n",
"max_length = 512\n",
"torch.manual_seed(0)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "0526f571",
"metadata": {},
"outputs": [],
"source": [
"peft_config = FourierFTConfig(\n",
" task_type=\"SEQ_CLS\", \n",
" n_frequency=n_frequency,\n",
" target_modules=[\"query\", \"value\"],\n",
" scaling = scaling,\n",
")\n",
"head_lr = 6e-3 # the learning rate for the classification head for NLU tasks\n",
"fft_lr = 6e-2 # the learning rate for the parameters other than the classification head (q,v in this case)"
]
},
{
"cell_type": "markdown",
"id": "c075c5d2-a457-4f37-a7f1-94fd0d277972",
"metadata": {},
"source": [
"## Loading data"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "7bb52cb4-d1c3-4b04-8bf0-f39ca88af139",
"metadata": {},
"outputs": [],
"source": [
"if any(k in model_name_or_path for k in (\"gpt\", \"opt\", \"bloom\")):\n",
" padding_side = \"left\"\n",
"else:\n",
" padding_side = \"right\"\n",
"\n",
"tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, padding_side=padding_side)\n",
"if getattr(tokenizer, \"pad_token_id\") is None:\n",
" tokenizer.pad_token_id = tokenizer.eos_token_id"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "e69c5e1f-d27b-4264-a41e-fc9b99d025e6",
"metadata": {},
"outputs": [],
"source": [
"datasets = load_dataset(\"glue\", task)\n",
"metric = evaluate.load(\"glue\", task)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "0209f778-c93b-40eb-a4e0-24c25db03980",
"metadata": {},
"outputs": [],
"source": [
"def tokenize_function(examples):\n",
" # max_length=None => use the model max length (it's actually the default)\n",
" outputs = tokenizer(examples[\"sentence1\"], examples[\"sentence2\"], truncation=True, max_length=max_length)\n",
" return outputs\n",
"\n",
"\n",
"tokenized_datasets = datasets.map(\n",
" tokenize_function,\n",
" batched=True,\n",
" remove_columns=[\"idx\", \"sentence1\", \"sentence2\"],\n",
")\n",
"\n",
"# We also rename the 'label' column to 'labels' which is the expected name for labels by the models of the\n",
"# transformers library\n",
"tokenized_datasets = tokenized_datasets.rename_column(\"label\", \"labels\")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "7453954e-982c-46f0-b09c-589776e6d6cb",
"metadata": {},
"outputs": [],
"source": [
"def collate_fn(examples):\n",
" return tokenizer.pad(examples, padding=\"longest\", return_tensors=\"pt\")\n",
"\n",
"\n",
"# Instantiate dataloaders.\n",
"train_dataloader = DataLoader(tokenized_datasets[\"train\"], shuffle=True, collate_fn=collate_fn, batch_size=batch_size)\n",
"eval_dataloader = DataLoader(\n",
" tokenized_datasets[\"validation\"], shuffle=False, collate_fn=collate_fn, batch_size=batch_size\n",
")"
]
},
{
"cell_type": "markdown",
"id": "f3b9b2e8-f415-4d0f-9fb4-436f1a3585ea",
"metadata": {},
"source": [
"## Preparing the FourierFT model"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "2ed5ac74",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']\n",
"You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"trainable params: 616,130 || all params: 125,263,300 || trainable%: 0.4919\n"
]
}
],
"source": [
"model = AutoModelForSequenceClassification.from_pretrained(model_name_or_path, return_dict=True, max_length=None)\n",
"model = get_peft_model(model, peft_config)\n",
"model.print_trainable_parameters()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "0d2d0381",
"metadata": {},
"outputs": [],
"source": [
"head_param = list(map(id, model.classifier.parameters()))\n",
"\n",
"others_param = filter(lambda p: id(p) not in head_param, model.parameters()) \n",
"\n",
"optimizer = AdamW([\n",
" {\"params\": model.classifier.parameters(), \"lr\": head_lr},\n",
" {\"params\": others_param, \"lr\": fft_lr}\n",
"],weight_decay=0.)\n",
"\n",
"\n",
"# Instantiate scheduler\n",
"lr_scheduler = get_linear_schedule_with_warmup(\n",
" optimizer=optimizer,\n",
" num_warmup_steps=0.06 * (len(train_dataloader) * num_epochs),\n",
" num_training_steps=(len(train_dataloader) * num_epochs),\n",
")"
]
},
{
"cell_type": "markdown",
"id": "c0dd5aa8-977b-4ac0-8b96-884b17bcdd00",
"metadata": {},
"source": [
"## Training"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "fa0e73be",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
" 0%| | 0/115 [00:00<?, ?it/s]You're using a RobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n",
"100%|██████████| 115/115 [00:06<00:00, 19.03it/s]\n",
"100%|██████████| 13/13 [00:00<00:00, 41.72it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 0: {'accuracy': 0.8161764705882353, 'f1': 0.8709122203098106}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 115/115 [00:05<00:00, 20.61it/s]\n",
"100%|██████████| 13/13 [00:00<00:00, 42.91it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 1: {'accuracy': 0.8480392156862745, 'f1': 0.8966666666666666}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 115/115 [00:05<00:00, 20.63it/s]\n",
"100%|██████████| 13/13 [00:00<00:00, 42.65it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 2: {'accuracy': 0.8676470588235294, 'f1': 0.9075342465753424}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 115/115 [00:05<00:00, 20.56it/s]\n",
"100%|██████████| 13/13 [00:00<00:00, 42.11it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 3: {'accuracy': 0.8504901960784313, 'f1': 0.8988391376451078}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 115/115 [00:05<00:00, 20.50it/s]\n",
"100%|██████████| 13/13 [00:00<00:00, 43.15it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 4: {'accuracy': 0.8725490196078431, 'f1': 0.9103448275862069}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n"
]
}
],
"source": [
"model.to(device)\n",
"for epoch in range(num_epochs):\n",
" model.train()\n",
" for step, batch in enumerate(tqdm(train_dataloader)):\n",
" batch.to(device)\n",
" outputs = model(**batch)\n",
" loss = outputs.loss\n",
" loss.backward()\n",
" optimizer.step()\n",
" lr_scheduler.step()\n",
" optimizer.zero_grad()\n",
"\n",
" model.eval()\n",
" for step, batch in enumerate(tqdm(eval_dataloader)):\n",
" batch.to(device)\n",
" with torch.no_grad():\n",
" outputs = model(**batch)\n",
" predictions = outputs.logits.argmax(dim=-1)\n",
" predictions, references = predictions, batch[\"labels\"]\n",
" metric.add_batch(\n",
" predictions=predictions,\n",
" references=references,\n",
" )\n",
"\n",
" eval_metric = metric.compute()\n",
" print(f\"epoch {epoch}:\", eval_metric)"
]
},
{
"cell_type": "markdown",
"id": "f2b2caca",
"metadata": {},
"source": [
"## Share adapters on the 🤗 Hub"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "7b23af6f-cf6e-486f-9d10-0eada95b631f",
"metadata": {},
"outputs": [],
"source": [
"account_id = ... # your Hugging Face Hub account ID"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "990b3c93",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/zgaoat/anaconda3/envs/pr2/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
" warnings.warn(\n"
]
},
{
"data": {
"text/plain": [
"CommitInfo(commit_url='https://huggingface.co/zgaoat/roberta-base-mrpc-peft-fourierft/commit/064eb35cbb7a1073b4d8fafbeccee43a0a4e37c9', commit_message='Upload model', commit_description='', oid='064eb35cbb7a1073b4d8fafbeccee43a0a4e37c9', pr_url=None, pr_revision=None, pr_num=None)"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model.push_to_hub(f\"{account_id}/roberta-base-mrpc-peft-fourierft\")"
]
},
{
"cell_type": "markdown",
"id": "9d140b26",
"metadata": {},
"source": [
"## Load adapters from the Hub\n",
"\n",
"You can also directly load adapters from the Hub using the commands below:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "c283e028-b349-46b0-a20e-cde0ee5fbd7b",
"metadata": {},
"outputs": [],
"source": [
"import torch\n",
"from peft import PeftModel, PeftConfig\n",
"from transformers import AutoTokenizer"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "320b10a0-4ea8-4786-9f3c-4670019c6b18",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']\n",
"You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n"
]
}
],
"source": [
"peft_model_id = f\"{account_id}/roberta-base-mrpc-peft-fourierft\"\n",
"config = PeftConfig.from_pretrained(peft_model_id)\n",
"inference_model = AutoModelForSequenceClassification.from_pretrained(config.base_model_name_or_path)\n",
"tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "b3a94049-bc01-4f2e-8cf9-66daf24a4402",
"metadata": {},
"outputs": [],
"source": [
"# Load the FourierFT model\n",
"inference_model = PeftModel.from_pretrained(inference_model, peft_model_id, config=config)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "bd919fef-4e9a-4dc5-a957-7b879cfc5d38",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
" 0%| | 0/13 [00:00<?, ?it/s]You're using a RobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n",
"100%|██████████| 13/13 [00:00<00:00, 43.06it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'accuracy': 0.8725490196078431, 'f1': 0.9103448275862069}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n"
]
}
],
"source": [
"inference_model.to(device)\n",
"inference_model.eval()\n",
"for step, batch in enumerate(tqdm(eval_dataloader)):\n",
" batch.to(device)\n",
" with torch.no_grad():\n",
" outputs = inference_model(**batch)\n",
" predictions = outputs.logits.argmax(dim=-1)\n",
" predictions, references = predictions, batch[\"labels\"]\n",
" metric.add_batch(\n",
" predictions=predictions,\n",
" references=references,\n",
" )\n",
"\n",
"eval_metric = metric.compute()\n",
"print(eval_metric)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
},
"vscode": {
"interpreter": {
"hash": "aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49"
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@ -0,0 +1,787 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "d36e1e93-ae93-4a4e-93c6-68fd868d2882",
"metadata": {},
"source": [
"# Using VB-LoRA for sequence classification"
]
},
{
"cell_type": "markdown",
"id": "ddfc0610-55f6-4343-a950-125ccf0f45ac",
"metadata": {},
"source": [
"In this example, we fine-tune Roberta on a sequence classification task using VB-LoRA.\n",
"\n",
"This notebook is adapted from `examples/sequence_classification/VeRA.ipynb`."
]
},
{
"cell_type": "markdown",
"id": "45addd81-d4f3-4dfd-960d-3920d347f0a6",
"metadata": {},
"source": [
"## Imports"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a9935ae2",
"metadata": {},
"outputs": [],
"source": [
"import torch\n",
"from torch.optim import AdamW\n",
"from torch.utils.data import DataLoader\n",
"from peft import (\n",
" get_peft_model,\n",
" VBLoRAConfig,\n",
" PeftType,\n",
")\n",
"\n",
"import evaluate\n",
"from datasets import load_dataset\n",
"from transformers import AutoModelForSequenceClassification, AutoTokenizer, get_linear_schedule_with_warmup\n",
"from tqdm import tqdm"
]
},
{
"cell_type": "markdown",
"id": "62c959bf-7cc2-49e0-b97e-4c10ec3b9bf3",
"metadata": {},
"source": [
"## Parameters"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "e3b13308",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<torch._C.Generator at 0x7f4fc7c3c750>"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"batch_size = 32\n",
"model_name_or_path = \"roberta-large\"\n",
"task = \"mrpc\"\n",
"peft_type = PeftType.VBLORA\n",
"device = \"cuda\"\n",
"num_epochs = 20\n",
"rank = 4\n",
"max_length = 128\n",
"num_vectors = 90\n",
"vector_length = 256\n",
"torch.manual_seed(0)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "0526f571",
"metadata": {},
"outputs": [],
"source": [
"peft_config = VBLoRAConfig(\n",
" task_type=\"SEQ_CLS\", \n",
" r=rank,\n",
" topk=2,\n",
" target_modules=['key', 'value', 'query', 'output.dense', 'intermediate.dense'],\n",
" num_vectors=num_vectors,\n",
" vector_length=vector_length,\n",
" save_only_topk_weights=True, # Set to True to reduce storage space. Note that the saved parameters cannot be used to resume training from checkpoints.\n",
" vblora_dropout=0.,\n",
")\n",
"head_lr = 4e-3\n",
"vector_bank_lr = 1e-3\n",
"logits_lr = 1e-2"
]
},
{
"cell_type": "markdown",
"id": "c075c5d2-a457-4f37-a7f1-94fd0d277972",
"metadata": {},
"source": [
"## Loading data"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "7bb52cb4-d1c3-4b04-8bf0-f39ca88af139",
"metadata": {},
"outputs": [],
"source": [
"if any(k in model_name_or_path for k in (\"gpt\", \"opt\", \"bloom\")):\n",
" padding_side = \"left\"\n",
"else:\n",
" padding_side = \"right\"\n",
"\n",
"tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, padding_side=padding_side)\n",
"if getattr(tokenizer, \"pad_token_id\") is None:\n",
" tokenizer.pad_token_id = tokenizer.eos_token_id"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "e69c5e1f-d27b-4264-a41e-fc9b99d025e6",
"metadata": {},
"outputs": [],
"source": [
"datasets = load_dataset(\"glue\", task)\n",
"metric = evaluate.load(\"glue\", task)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "0209f778-c93b-40eb-a4e0-24c25db03980",
"metadata": {},
"outputs": [],
"source": [
"def tokenize_function(examples):\n",
" # max_length=None => use the model max length (it's actually the default)\n",
" outputs = tokenizer(examples[\"sentence1\"], examples[\"sentence2\"], truncation=True, max_length=max_length)\n",
" return outputs\n",
"\n",
"\n",
"tokenized_datasets = datasets.map(\n",
" tokenize_function,\n",
" batched=True,\n",
" remove_columns=[\"idx\", \"sentence1\", \"sentence2\"],\n",
")\n",
"\n",
"# We also rename the 'label' column to 'labels' which is the expected name for labels by the models of the\n",
"# transformers library\n",
"tokenized_datasets = tokenized_datasets.rename_column(\"label\", \"labels\")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "7453954e-982c-46f0-b09c-589776e6d6cb",
"metadata": {},
"outputs": [],
"source": [
"def collate_fn(examples):\n",
" return tokenizer.pad(examples, padding=\"longest\", return_tensors=\"pt\")\n",
"\n",
"\n",
"# Instantiate dataloaders.\n",
"train_dataloader = DataLoader(tokenized_datasets[\"train\"], shuffle=True, collate_fn=collate_fn, batch_size=batch_size)\n",
"eval_dataloader = DataLoader(\n",
" tokenized_datasets[\"validation\"], shuffle=False, collate_fn=collate_fn, batch_size=batch_size\n",
")"
]
},
{
"cell_type": "markdown",
"id": "f3b9b2e8-f415-4d0f-9fb4-436f1a3585ea",
"metadata": {},
"source": [
"## Preparing the VB-LoRA model"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "2ed5ac74",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-large and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']\n",
"You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"trainable params: 1,696,770 || all params: 357,058,564 || trainable%: 0.4752\n",
"VB-LoRA params to-be-saved (float32-equivalent): 33,408 || total params to-be-saved: 1,085,058\n"
]
}
],
"source": [
"model = AutoModelForSequenceClassification.from_pretrained(model_name_or_path, return_dict=True, max_length=None)\n",
"model = get_peft_model(model, peft_config)\n",
"model.print_trainable_parameters()\n",
"model.print_savable_parameters()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "0d2d0381",
"metadata": {},
"outputs": [],
"source": [
"\n",
"from transformers.pytorch_utils import ALL_LAYERNORM_LAYERS\n",
"from transformers.trainer_pt_utils import get_parameter_names\n",
"\n",
"decay_parameters = get_parameter_names(model, ALL_LAYERNORM_LAYERS)\n",
"decay_parameters = [name for name in decay_parameters if \"bias\" not in name]\n",
"vector_bank_parameters = [name for name, _ in model.named_parameters() if \"vector_bank\" in name]\n",
"logits_parameters = [name for name, _ in model.named_parameters() if \"logits\" in name ]\n",
"\n",
"optimizer_grouped_parameters = [\n",
" {\n",
" \"params\": [p for n, p in model.named_parameters() if n in decay_parameters and \\\n",
" n not in logits_parameters and n not in vector_bank_parameters],\n",
" \"weight_decay\": 0.1,\n",
" \"lr\": head_lr,\n",
" },\n",
" {\n",
" \"params\": [p for n, p in model.named_parameters() if n not in decay_parameters and \\\n",
" n not in logits_parameters and n not in vector_bank_parameters],\n",
" \"weight_decay\": 0.0,\n",
" \"lr\": head_lr,\n",
" },\n",
" {\n",
" \"params\": [p for n, p in model.named_parameters() if n in vector_bank_parameters],\n",
" \"lr\": vector_bank_lr,\n",
" \"weight_decay\": 0.0,\n",
" },\n",
" {\n",
" \"params\": [p for n, p in model.named_parameters() if n in logits_parameters],\n",
" \"lr\": logits_lr,\n",
" \"weight_decay\": 0.0,\n",
" },\n",
"]\n",
"\n",
"optimizer = AdamW(optimizer_grouped_parameters)\n",
"lr_scheduler = get_linear_schedule_with_warmup(\n",
" optimizer=optimizer,\n",
" num_warmup_steps=0.06 * (len(train_dataloader) * num_epochs),\n",
" num_training_steps=(len(train_dataloader) * num_epochs),\n",
")"
]
},
{
"cell_type": "markdown",
"id": "c0dd5aa8-977b-4ac0-8b96-884b17bcdd00",
"metadata": {},
"source": [
"## Training"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "fa0e73be",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
" 0%| | 0/115 [00:00<?, ?it/s]You're using a RobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n",
"100%|██████████| 115/115 [00:34<00:00, 3.33it/s]\n",
"100%|██████████| 13/13 [00:01<00:00, 7.84it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 0: {'accuracy': 0.6691176470588235, 'f1': 0.786053882725832}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 115/115 [00:34<00:00, 3.37it/s]\n",
"100%|██████████| 13/13 [00:01<00:00, 7.83it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 1: {'accuracy': 0.5833333333333334, 'f1': 0.6136363636363636}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 115/115 [00:34<00:00, 3.34it/s]\n",
"100%|██████████| 13/13 [00:01<00:00, 7.82it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 2: {'accuracy': 0.7107843137254902, 'f1': 0.8238805970149253}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 115/115 [00:34<00:00, 3.34it/s]\n",
"100%|██████████| 13/13 [00:01<00:00, 7.80it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 3: {'accuracy': 0.8284313725490197, 'f1': 0.8833333333333333}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 115/115 [00:34<00:00, 3.34it/s]\n",
"100%|██████████| 13/13 [00:01<00:00, 7.79it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 4: {'accuracy': 0.8480392156862745, 'f1': 0.8847583643122676}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 115/115 [00:34<00:00, 3.30it/s]\n",
"100%|██████████| 13/13 [00:01<00:00, 7.78it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 5: {'accuracy': 0.8676470588235294, 'f1': 0.898876404494382}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 115/115 [00:34<00:00, 3.31it/s]\n",
"100%|██████████| 13/13 [00:01<00:00, 7.76it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 6: {'accuracy': 0.8602941176470589, 'f1': 0.9035532994923858}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 115/115 [00:34<00:00, 3.32it/s]\n",
"100%|██████████| 13/13 [00:01<00:00, 7.76it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 7: {'accuracy': 0.8774509803921569, 'f1': 0.911660777385159}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 115/115 [00:34<00:00, 3.33it/s]\n",
"100%|██████████| 13/13 [00:01<00:00, 7.79it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 8: {'accuracy': 0.8872549019607843, 'f1': 0.9172661870503597}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 115/115 [00:34<00:00, 3.32it/s]\n",
"100%|██████████| 13/13 [00:01<00:00, 7.78it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 9: {'accuracy': 0.875, 'f1': 0.9113043478260869}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 115/115 [00:34<00:00, 3.32it/s]\n",
"100%|██████████| 13/13 [00:01<00:00, 7.76it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 10: {'accuracy': 0.8823529411764706, 'f1': 0.9166666666666666}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 115/115 [00:34<00:00, 3.33it/s]\n",
"100%|██████████| 13/13 [00:01<00:00, 7.76it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 11: {'accuracy': 0.8970588235294118, 'f1': 0.9252669039145908}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 115/115 [00:34<00:00, 3.32it/s]\n",
"100%|██████████| 13/13 [00:01<00:00, 7.75it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 12: {'accuracy': 0.8946078431372549, 'f1': 0.9246935201401051}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 115/115 [00:34<00:00, 3.33it/s]\n",
"100%|██████████| 13/13 [00:01<00:00, 7.76it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 13: {'accuracy': 0.9068627450980392, 'f1': 0.9316546762589928}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 115/115 [00:34<00:00, 3.33it/s]\n",
"100%|██████████| 13/13 [00:01<00:00, 7.76it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 14: {'accuracy': 0.8946078431372549, 'f1': 0.9225225225225225}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 115/115 [00:34<00:00, 3.33it/s]\n",
"100%|██████████| 13/13 [00:01<00:00, 7.76it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 15: {'accuracy': 0.8995098039215687, 'f1': 0.926391382405745}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 115/115 [00:34<00:00, 3.30it/s]\n",
"100%|██████████| 13/13 [00:01<00:00, 7.76it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 16: {'accuracy': 0.9068627450980392, 'f1': 0.9316546762589928}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 115/115 [00:34<00:00, 3.31it/s]\n",
"100%|██████████| 13/13 [00:01<00:00, 7.77it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 17: {'accuracy': 0.8921568627450981, 'f1': 0.9217081850533808}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 115/115 [00:34<00:00, 3.33it/s]\n",
"100%|██████████| 13/13 [00:01<00:00, 7.77it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 18: {'accuracy': 0.8995098039215687, 'f1': 0.9266547406082289}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 115/115 [00:34<00:00, 3.33it/s]\n",
"100%|██████████| 13/13 [00:01<00:00, 7.77it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 19: {'accuracy': 0.9044117647058824, 'f1': 0.9297297297297298}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n"
]
}
],
"source": [
"model.to(device)\n",
"\n",
"for epoch in range(num_epochs):\n",
" model.train()\n",
" for step, batch in enumerate(tqdm(train_dataloader)):\n",
" batch.to(device)\n",
" outputs = model(**batch)\n",
" loss = outputs.loss\n",
" loss.backward()\n",
" optimizer.step()\n",
" lr_scheduler.step()\n",
" optimizer.zero_grad()\n",
"\n",
" model.eval()\n",
" for step, batch in enumerate(tqdm(eval_dataloader)):\n",
" batch.to(device)\n",
" with torch.no_grad():\n",
" outputs = model(**batch)\n",
" predictions = outputs.logits.argmax(dim=-1)\n",
" predictions, references = predictions, batch[\"labels\"]\n",
" metric.add_batch(\n",
" predictions=predictions,\n",
" references=references,\n",
" )\n",
"\n",
" eval_metric = metric.compute()\n",
" print(f\"epoch {epoch}:\", eval_metric)"
]
},
{
"cell_type": "markdown",
"id": "f2b2caca",
"metadata": {},
"source": [
"## Share adapters on the 🤗 Hub"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "7b23af6f-cf6e-486f-9d10-0eada95b631f",
"metadata": {},
"outputs": [],
"source": [
"account_id = ... # your Hugging Face Hub account ID"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "990b3c93",
"metadata": {},
"outputs": [],
"source": [
"model.push_to_hub(f\"{account_id}/roberta-large-peft-vblora\")"
]
},
{
"cell_type": "markdown",
"id": "9d140b26",
"metadata": {},
"source": [
"## Load adapters from the Hub\n",
"\n",
"You can also directly load adapters from the Hub using the commands below:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "c283e028-b349-46b0-a20e-cde0ee5fbd7b",
"metadata": {},
"outputs": [],
"source": [
"import torch\n",
"from peft import PeftModel, PeftConfig\n",
"from transformers import AutoTokenizer"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "320b10a0-4ea8-4786-9f3c-4670019c6b18",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-large and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']\n",
"You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n"
]
}
],
"source": [
"peft_model_id = f\"{account_id}/roberta-large-peft-vblora\"\n",
"config = PeftConfig.from_pretrained(peft_model_id)\n",
"inference_model = AutoModelForSequenceClassification.from_pretrained(config.base_model_name_or_path)\n",
"tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "b3a94049-bc01-4f2e-8cf9-66daf24a4402",
"metadata": {},
"outputs": [],
"source": [
"# Load the model\n",
"inference_model = PeftModel.from_pretrained(inference_model, peft_model_id)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "bd919fef-4e9a-4dc5-a957-7b879cfc5d38",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
" 0%| | 0/13 [00:00<?, ?it/s]You're using a RobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n",
"100%|██████████| 13/13 [00:01<00:00, 7.81it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'accuracy': 0.9044117647058824, 'f1': 0.9297297297297298}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n"
]
}
],
"source": [
"inference_model.to(device)\n",
"inference_model.eval()\n",
"for step, batch in enumerate(tqdm(eval_dataloader)):\n",
" batch.to(device)\n",
" with torch.no_grad():\n",
" outputs = inference_model(**batch)\n",
" predictions = outputs.logits.argmax(dim=-1)\n",
" predictions, references = predictions, batch[\"labels\"]\n",
" metric.add_batch(\n",
" predictions=predictions,\n",
" references=references,\n",
" )\n",
"\n",
"eval_metric = metric.compute()\n",
"print(eval_metric)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.14"
},
"vscode": {
"interpreter": {
"hash": "aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49"
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@ -94,7 +94,7 @@
" task_type=\"SEQ_CLS\", \n",
" r=rank,\n",
" d_initial=0.1,\n",
" target_modules=[\"query\", \"value\"],\n",
" target_modules=[\"query\", \"value\", \"intermediate.dense\"],\n",
" save_projection=True,\n",
")\n",
"head_lr = 1e-2\n",
@ -205,7 +205,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"trainable params: 610,754 || all params: 125,257,924 || trainable%: 0.48759709605278145\n"
"trainable params: 647,714 || all params: 125,294,884 || trainable%: 0.5170\n"
]
}
],
@ -255,76 +255,76 @@
"name": "stderr",
"output_type": "stream",
"text": [
" 0%| | 0/29 [00:00<?, ?it/s]You're using a RobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n",
"100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:23<00:00, 1.24it/s]\n",
"100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00, 2.33it/s]\n"
" 0%| | 0/29 [00:00<?, ?it/s]You're using a RobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n",
"100%|██████████| 29/29 [00:18<00:00, 1.58it/s]\n",
"100%|██████████| 4/4 [00:01<00:00, 3.52it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 0: {'accuracy': 0.7132352941176471, 'f1': 0.823529411764706}\n"
"epoch 0: {'accuracy': 0.7475490196078431, 'f1': 0.8367670364500792}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:23<00:00, 1.26it/s]\n",
"100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00, 2.30it/s]\n"
"100%|██████████| 29/29 [00:17<00:00, 1.68it/s]\n",
"100%|██████████| 4/4 [00:01<00:00, 3.37it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 1: {'accuracy': 0.7671568627450981, 'f1': 0.8484848484848485}\n"
"epoch 1: {'accuracy': 0.7671568627450981, 'f1': 0.8536209553158706}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:23<00:00, 1.24it/s]\n",
"100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00, 2.30it/s]\n"
"100%|██████████| 29/29 [00:17<00:00, 1.66it/s]\n",
"100%|██████████| 4/4 [00:01<00:00, 3.33it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 2: {'accuracy': 0.8259803921568627, 'f1': 0.8738898756660745}\n"
"epoch 2: {'accuracy': 0.8553921568627451, 'f1': 0.8959435626102292}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:23<00:00, 1.25it/s]\n",
"100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00, 2.41it/s]\n"
"100%|██████████| 29/29 [00:17<00:00, 1.64it/s]\n",
"100%|██████████| 4/4 [00:01<00:00, 3.35it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 3: {'accuracy': 0.8431372549019608, 'f1': 0.891156462585034}\n"
"epoch 3: {'accuracy': 0.8823529411764706, 'f1': 0.9133574007220215}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:23<00:00, 1.25it/s]\n",
"100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00, 2.35it/s]"
"100%|██████████| 29/29 [00:17<00:00, 1.63it/s]\n",
"100%|██████████| 4/4 [00:01<00:00, 3.17it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 4: {'accuracy': 0.8480392156862745, 'f1': 0.8938356164383561}\n"
"epoch 4: {'accuracy': 0.8897058823529411, 'f1': 0.9183303085299456}\n"
]
},
{
@ -520,18 +520,6 @@
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
},
"vscode": {
"interpreter": {
"hash": "aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49"

View File

@ -11,7 +11,7 @@ python train.py \
--logging_steps 5 \
--log_level "info" \
--logging_strategy "steps" \
--evaluation_strategy "epoch" \
--eval_strategy "epoch" \
--save_strategy "epoch" \
--push_to_hub \
--hub_private_repo True \

View File

@ -11,7 +11,7 @@ accelerate launch --config_file "configs/deepspeed_config.yaml" train.py \
--logging_steps 5 \
--log_level "info" \
--logging_strategy "steps" \
--evaluation_strategy "epoch" \
--eval_strategy "epoch" \
--save_strategy "epoch" \
--push_to_hub \
--hub_private_repo True \

View File

@ -11,7 +11,7 @@ accelerate launch --config_file "configs/fsdp_config.yaml" train.py \
--logging_steps 5 \
--log_level "info" \
--logging_strategy "steps" \
--evaluation_strategy "epoch" \
--eval_strategy "epoch" \
--save_strategy "epoch" \
--push_to_hub \
--hub_private_repo True \

View File

@ -11,7 +11,7 @@ torchrun --nproc_per_node 8 --nnodes 1 train.py \
--logging_steps 5 \
--log_level "info" \
--logging_strategy "steps" \
--evaluation_strategy "epoch" \
--eval_strategy "epoch" \
--save_strategy "epoch" \
--push_to_hub \
--hub_private_repo True \

View File

@ -11,7 +11,7 @@ accelerate launch --config_file "configs/deepspeed_config_z3_qlora.yaml" train.
--logging_steps 5 \
--log_level "info" \
--logging_strategy "steps" \
--evaluation_strategy "epoch" \
--eval_strategy "epoch" \
--save_strategy "epoch" \
--push_to_hub \
--hub_private_repo True \

View File

@ -11,7 +11,7 @@ accelerate launch --config_file "configs/fsdp_config_qlora.yaml" train.py \
--logging_steps 5 \
--log_level "info" \
--logging_strategy "steps" \
--evaluation_strategy "epoch" \
--eval_strategy "epoch" \
--save_strategy "epoch" \
--push_to_hub \
--hub_private_repo True \

View File

@ -11,7 +11,7 @@ python train.py \
--logging_steps 5 \
--log_level "info" \
--logging_strategy "steps" \
--evaluation_strategy "epoch" \
--eval_strategy "epoch" \
--save_strategy "epoch" \
--push_to_hub \
--hub_private_repo True \

View File

@ -137,7 +137,8 @@ def main(model_args, data_args, training_args):
max_seq_length=data_args.max_seq_length,
)
trainer.accelerator.print(f"{trainer.model}")
trainer.model.print_trainable_parameters()
if hasattr(trainer.model, "print_trainable_parameters"):
trainer.model.print_trainable_parameters()
# train
checkpoint = None

15
examples/xlora/README.md Normal file
View File

@ -0,0 +1,15 @@
# X-LoRA examples
## `xlora_inference_mistralrs.py`
Perform inference of an X-LoRA model using the inference engine mistral.rs.
Mistral.rs supports many base models besides Mistral, and can load models directly from saved LoRA checkpoints. Check out [adapter model docs](https://github.com/EricLBuehler/mistral.rs/blob/master/docs/ADAPTER_MODELS.md) and the [models support matrix](https://github.com/EricLBuehler/mistral.rs?tab=readme-ov-file#support-matrix).
Mistral.rs features X-LoRA support and incorporates techniques such as a dual-KV cache, continuous batching, Paged Attention, and optional non granular scalings, will allow vastly improved throughput.
Links:
- Installation: https://github.com/EricLBuehler/mistral.rs/blob/master/mistralrs-pyo3/README.md
- Runnable example: https://github.com/EricLBuehler/mistral.rs/blob/master/examples/python/xlora_zephyr.py
- Adapter model docs and making the ordering file: https://github.com/EricLBuehler/mistral.rs/blob/master/docs/ADAPTER_MODELS.md

View File

@ -0,0 +1,25 @@
from mistralrs import ChatCompletionRequest, Runner, Which
runner = Runner(
which=Which.XLora(
tok_model_id=None, # Automatically determine from ordering file
model_id=..., # Model ID of the base model (local path of HF model ID)
xlora_model_id=..., # X-LoRA Model ID of the base model (local path of HF model ID)
order=..., # Ordering file to ensure compatability with PEFT
tgt_non_granular_index=3, # Only generate scalings for the first 3 decoding tokens, and then use the last generated one
)
)
res = runner.send_chat_completion_request(
ChatCompletionRequest(
model="mistral",
messages=[{"role": "user", "content": "Tell me a story about 2 low rank matrices."}],
max_tokens=256,
presence_penalty=1.0,
top_p=0.1,
temperature=0.5,
)
)
print(res.choices[0].message.content)
print(res.usage)

View File

@ -6,6 +6,7 @@ target-version = ['py38']
[tool.ruff]
target-version = "py38"
line-length = 119
extend-exclude = ["*.ipynb"]
[tool.ruff.lint]
extend-select = [

View File

@ -15,13 +15,13 @@
from setuptools import find_packages, setup
VERSION = "0.11.0"
VERSION = "0.13.1"
extras = {}
extras["quality"] = [
"black", # doc-builder has an implicit dependency on Black, see huggingface/doc-builder#434
"hf-doc-builder",
"ruff~=0.2.1",
"ruff~=0.6.1",
]
extras["docs_specific"] = [
"black", # doc-builder has an implicit dependency on Black, see huggingface/doc-builder#434
@ -48,7 +48,7 @@ setup(
keywords="deep learning",
license="Apache",
author="The HuggingFace team",
author_email="sourab@huggingface.co",
author_email="benjamin@huggingface.co",
url="https://github.com/huggingface/peft",
package_dir={"": "src"},
packages=find_packages("src"),

View File

@ -17,7 +17,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
__version__ = "0.11.0"
__version__ = "0.13.1"
from .auto import (
AutoPeftModel,
@ -51,6 +51,7 @@ from .tuners import (
AdaptionPromptConfig,
AdaptionPromptModel,
LoraConfig,
LoraRuntimeConfig,
LoftQConfig,
LoraModel,
LoHaConfig,
@ -79,8 +80,17 @@ from .tuners import (
PolyModel,
LNTuningConfig,
LNTuningModel,
VBLoRAConfig,
VBLoRAModel,
VeraConfig,
VeraModel,
FourierFTConfig,
FourierFTModel,
XLoraConfig,
XLoraModel,
HRAConfig,
HRAModel,
VBLoRAConfig,
)
from .utils import (
TRANSFORMERS_MODELS_TO_PREFIX_TUNING_POSTPROCESS_MAPPING,

View File

@ -62,6 +62,7 @@ class _BaseAutoPeftModel:
adapter_name: str = "default",
is_trainable: bool = False,
config: Optional[PeftConfig] = None,
revision: Optional[str] = None,
**kwargs,
):
r"""
@ -69,8 +70,9 @@ class _BaseAutoPeftModel:
are passed along to `PeftConfig` that automatically takes care of filtering the kwargs of the Hub methods and
the config object init.
"""
peft_config = PeftConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)
peft_config = PeftConfig.from_pretrained(pretrained_model_name_or_path, revision=revision, **kwargs)
base_model_path = peft_config.base_model_name_or_path
base_model_revision = peft_config.revision
task_type = getattr(peft_config, "task_type", None)
@ -101,7 +103,7 @@ class _BaseAutoPeftModel:
"Cannot infer the auto class from the config, please make sure that you are loading the correct model for your task type."
)
base_model = target_class.from_pretrained(base_model_path, **kwargs)
base_model = target_class.from_pretrained(base_model_path, revision=base_model_revision, **kwargs)
tokenizer_exists = False
if os.path.exists(os.path.join(pretrained_model_name_or_path, TOKENIZER_CONFIG_NAME)):
@ -114,7 +116,7 @@ class _BaseAutoPeftModel:
tokenizer_exists = check_file_exists_on_hf_hub(
repo_id=pretrained_model_name_or_path,
filename=TOKENIZER_CONFIG_NAME,
revision=kwargs.get("revision", None),
revision=revision,
repo_type=kwargs.get("repo_type", None),
token=token,
)

View File

@ -14,6 +14,7 @@
import inspect
import json
import os
import warnings
from dataclasses import asdict, dataclass, field
from typing import Dict, Optional, Union
@ -63,7 +64,7 @@ class PeftConfigMixin(PushToHubMixin):
os.makedirs(save_directory, exist_ok=True)
auto_mapping_dict = kwargs.pop("auto_mapping_dict", None)
output_dict = asdict(self)
output_dict = self.to_dict()
# converting set type to list
for key, value in output_dict.items():
if isinstance(value, set):
@ -97,7 +98,7 @@ class PeftConfigMixin(PushToHubMixin):
# TODO: this hack is needed to fix the following issue (on commit 702f937):
# if someone saves a default config and loads it back with `PeftConfig` class it yields to
# not loading the correct config class.
#
# from peft import AdaLoraConfig, PeftConfig
# peft_config = AdaLoraConfig()
# print(peft_config)
@ -162,6 +163,13 @@ class PeftConfigMixin(PushToHubMixin):
with open(path_json_file) as file:
json_object = json.load(file)
# Sanity check that config does not contain a runtime_config
if "runtime_config" in json_object:
warnings.warn(
"The configuration file contains a `runtime_config` key. This is ignored. Runtime configurations are only valid at runtime."
)
del json_object["runtime_config"]
return json_object
@classmethod
@ -232,7 +240,7 @@ class PeftConfig(PeftConfigMixin):
base_model_name_or_path: Optional[str] = field(
default=None, metadata={"help": "The name of the base model to use."}
)
revision: Optional[str] = field(default=None, metadata={"help": "The specific model version to use."})
revision: Optional[str] = field(default=None, metadata={"help": "The specific base model version to use."})
peft_type: Optional[Union[str, PeftType]] = field(default=None, metadata={"help": "Peft type"})
task_type: Optional[Union[str, TaskType]] = field(default=None, metadata={"help": "Task type"})
inference_mode: bool = field(default=False, metadata={"help": "Whether to use inference mode"})

View File

@ -13,11 +13,13 @@
# limitations under the License.
import inspect
from contextlib import contextmanager
from copy import deepcopy
from functools import update_wrapper
from types import MethodType
from .peft_model import PeftConfig, PeftModel
from .tuners.lora.layer import LoraLayer
def update_forward_signature(model: PeftModel) -> None:
@ -146,3 +148,63 @@ def check_if_peft_model(model_name_or_path: str) -> bool:
is_peft_model = False
return is_peft_model
@contextmanager
def rescale_adapter_scale(model, multiplier):
"""
Context manager to temporarily rescale the scaling of the LoRA adapter in a model.
The original scaling values are restored when the context manager exits. This context manager works with the
transformers and diffusers models that have directly loaded LoRA adapters.
For LoRA, applying this context manager with multiplier in [0, 1] is strictly equivalent to applying
[wise-ft](https://arxiv.org/abs/2109.01903) (see [#1940](https://github.com/huggingface/peft/issues/1940) for
details). It can improve the performances of the model if there is a distribution shiftbetween the training data
used for fine-tuning, and the test data used during inference.
Warning: It has been reported that when using Apple's MPS backend for PyTorch, it is necessary to add a short sleep
time after exiting the context before the scales are fully restored.
Args:
model: The model containing `LoraLayer` modules whose scaling is to be adjusted.
multiplier (float or int): The multiplier that rescales the `scaling` attribute. Must be of type float or int.
Raises:
ValueError: If the model does not contain any `LoraLayer`
instances, indicating that the model does not support scaling.
Example:
```python
>>> model = ModelWithLoraLayer()
>>> multiplier = 0.5
>>> with rescale_adapter_scale(model, multiplier):
... outputs = model(**inputs) # Perform operations with the scaled model
>>> outputs = model(**inputs) # The original scaling values are restored here
```
"""
# check if multiplier has a valid data type
if not isinstance(multiplier, (float, int)):
raise TypeError(f"Argument multiplier should be of type float, got {type(multiplier)}")
# iterate on the model's modules and grab the original scaling attribute
# from the lora layers if present
original_scaling = {}
for module in model.modules():
if isinstance(module, LoraLayer):
original_scaling[module] = module.scaling.copy()
module.scaling = {k: v * multiplier for k, v in module.scaling.items()}
# check whether scaling is prohibited on model
# the original scaling dictionary should be empty
# if there were no lora layers
if not original_scaling:
raise ValueError("scaling is only supported for models with `LoraLayer`s")
try:
yield
finally:
# restore original scaling values after exiting the context
for module, scaling in original_scaling.items():
module.scaling = scaling

View File

@ -14,10 +14,13 @@
from __future__ import annotations
from typing import TYPE_CHECKING, Any
import warnings
from typing import TYPE_CHECKING, Any, Optional
import torch
from peft.tuners.xlora.model import XLoraModel
from .config import PeftConfig
from .mixed_model import PeftMixedModel
from .peft_model import (
@ -35,6 +38,10 @@ from .tuners import (
AdaptionPromptConfig,
BOFTConfig,
BOFTModel,
FourierFTConfig,
FourierFTModel,
HRAConfig,
HRAModel,
IA3Config,
IA3Model,
LNTuningConfig,
@ -53,10 +60,13 @@ from .tuners import (
PrefixTuningConfig,
PromptEncoderConfig,
PromptTuningConfig,
VBLoRAConfig,
VBLoRAModel,
VeraConfig,
VeraModel,
XLoraConfig,
)
from .tuners.tuners_utils import BaseTuner as _BaseTuner
from .tuners.tuners_utils import BaseTuner
from .utils import _prepare_prompt_learning_config
@ -80,6 +90,7 @@ PEFT_TYPE_TO_CONFIG_MAPPING: dict[str, type[PeftConfig]] = {
"P_TUNING": PromptEncoderConfig,
"LORA": LoraConfig,
"LOHA": LoHaConfig,
"LORAPLUS": LoraConfig,
"LOKR": LoKrConfig,
"ADALORA": AdaLoraConfig,
"BOFT": BOFTConfig,
@ -89,9 +100,13 @@ PEFT_TYPE_TO_CONFIG_MAPPING: dict[str, type[PeftConfig]] = {
"POLY": PolyConfig,
"LN_TUNING": LNTuningConfig,
"VERA": VeraConfig,
"FOURIERFT": FourierFTConfig,
"XLORA": XLoraConfig,
"HRA": HRAConfig,
"VBLORA": VBLoRAConfig,
}
PEFT_TYPE_TO_TUNER_MAPPING: dict[str, type[_BaseTuner]] = {
PEFT_TYPE_TO_TUNER_MAPPING: dict[str, type[BaseTuner]] = {
"LORA": LoraModel,
"LOHA": LoHaModel,
"LOKR": LoKrModel,
@ -102,6 +117,10 @@ PEFT_TYPE_TO_TUNER_MAPPING: dict[str, type[_BaseTuner]] = {
"POLY": PolyModel,
"LN_TUNING": LNTuningModel,
"VERA": VeraModel,
"FOURIERFT": FourierFTModel,
"XLORA": XLoraModel,
"HRA": HRAModel,
"VBLORA": VBLoRAModel,
}
@ -117,7 +136,12 @@ def get_peft_config(config_dict: dict[str, Any]) -> PeftConfig:
def get_peft_model(
model: PreTrainedModel, peft_config: PeftConfig, adapter_name: str = "default", mixed: bool = False
model: PreTrainedModel,
peft_config: PeftConfig,
adapter_name: str = "default",
mixed: bool = False,
autocast_adapter_dtype: bool = True,
revision: Optional[str] = None,
) -> PeftModel | PeftMixedModel:
"""
Returns a Peft model object from a model and a config.
@ -131,26 +155,48 @@ def get_peft_model(
The name of the adapter to be injected, if not provided, the default adapter name is used ("default").
mixed (`bool`, `optional`, defaults to `False`):
Whether to allow mixing different (compatible) adapter types.
autocast_adapter_dtype (`bool`, *optional*):
Whether to autocast the adapter dtype. Defaults to `True`. Right now, this will only cast adapter weights
using float16 or bfloat16 to float32, as this is typically required for stable training, and only affect
select PEFT tuners.
revision (`str`, `optional`, defaults to `main`):
The revision of the base model. If this isn't set, the saved peft model will load the `main` revision for
the base model
"""
model_config = getattr(model, "config", {"model_type": "custom"})
if hasattr(model_config, "to_dict"):
model_config = model_config.to_dict()
model_config = BaseTuner.get_model_config(model)
old_name = peft_config.base_model_name_or_path
new_name = model.__dict__.get("name_or_path", None)
peft_config.base_model_name_or_path = new_name
peft_config.base_model_name_or_path = model.__dict__.get("name_or_path", None)
if (old_name is not None) and (old_name != new_name):
warnings.warn(
f"The PEFT config's `base_model_name_or_path` was renamed from '{old_name}' to '{new_name}'. "
"Please ensure that the correct base model is loaded when loading this checkpoint."
)
if revision is not None:
if peft_config.revision is not None and peft_config.revision != revision:
warnings.warn(
f"peft config has already set base model revision to {peft_config.revision}, overwriting with revision {revision}"
)
peft_config.revision = revision
if mixed:
# note: PeftMixedModel does not support autocast_adapter_dtype, so don't pass it
return PeftMixedModel(model, peft_config, adapter_name=adapter_name)
if peft_config.task_type not in MODEL_TYPE_TO_PEFT_MODEL_MAPPING.keys() and not peft_config.is_prompt_learning:
return PeftModel(model, peft_config, adapter_name=adapter_name)
return PeftModel(model, peft_config, adapter_name=adapter_name, autocast_adapter_dtype=autocast_adapter_dtype)
if peft_config.is_prompt_learning:
peft_config = _prepare_prompt_learning_config(peft_config, model_config)
return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](model, peft_config, adapter_name=adapter_name)
return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](
model, peft_config, adapter_name=adapter_name, autocast_adapter_dtype=autocast_adapter_dtype
)
def inject_adapter_in_model(
peft_config: PeftConfig, model: torch.nn.Module, adapter_name: str = "default"
peft_config: PeftConfig, model: torch.nn.Module, adapter_name: str = "default", low_cpu_mem_usage: bool = False
) -> torch.nn.Module:
r"""
A simple API to create and inject adapter in-place into a model. Currently the API does not support prompt learning
@ -164,6 +210,8 @@ def inject_adapter_in_model(
The input model where the adapter will be injected.
adapter_name (`str`, `optional`, defaults to `"default"`):
The name of the adapter to be injected, if not provided, the default adapter name is used ("default").
low_cpu_mem_usage (`bool`, `optional`, defaults to `False`):
Create empty adapter weights on meta device. Useful to speed up the loading process.
"""
if peft_config.is_prompt_learning or peft_config.is_adaption_prompt:
raise ValueError("`create_and_replace` does not support prompt learning and adaption prompt yet.")
@ -176,6 +224,6 @@ def inject_adapter_in_model(
tuner_cls = PEFT_TYPE_TO_TUNER_MAPPING[peft_config.peft_type]
# By instantiating a peft model we are injecting randomly initialized LoRA layers into the model's modules.
peft_model = tuner_cls(model, peft_config, adapter_name=adapter_name)
peft_model = tuner_cls(model, peft_config, adapter_name=adapter_name, low_cpu_mem_usage=low_cpu_mem_usage)
return peft_model.model

View File

@ -23,7 +23,7 @@ from accelerate.hooks import remove_hook_from_submodules
from torch import nn
from transformers.utils import PushToHubMixin
from peft.tuners.mixed import COMPATIBLE_TUNER_TYPES
from peft.utils.constants import DUMMY_MODEL_CONFIG
from .config import PeftConfig
from .peft_model import PeftModel
@ -36,6 +36,7 @@ from .tuners import (
MixedModel,
OFTModel,
)
from .tuners.mixed import COMPATIBLE_TUNER_TYPES
from .utils import PeftType, _set_adapter, _set_trainable
@ -97,8 +98,6 @@ class PeftMixedModel(PushToHubMixin, torch.nn.Module):
Example:
```py
>>> from peft import get_peft_model
>>> base_model = ... # load the base model, e.g. from transformers
>>> peft_model = PeftMixedModel.from_pretrained(base_model, path_to_adapter1, "adapter1").eval()
>>> peft_model.load_adapter(path_to_adapter2, "adapter2")
@ -113,6 +112,8 @@ class PeftMixedModel(PushToHubMixin, torch.nn.Module):
The config of the model to be tuned. The adapter type must be compatible.
adapter_name (`str`, `optional`, defaults to `"default"`):
The name of the first adapter.
low_cpu_mem_usage (`bool`, `optional`, defaults to `False`):
Create empty adapter weights on meta device. Useful to speed up the loading process.
"""
def __init__(self, model: nn.Module, peft_config: PeftConfig, adapter_name: str = "default") -> None:
@ -123,7 +124,7 @@ class PeftMixedModel(PushToHubMixin, torch.nn.Module):
self.base_model = MixedModel(model, {adapter_name: peft_config}, adapter_name)
self.set_modules_to_save(peft_config, adapter_name)
self.config = getattr(model, "config", {"model_type": "custom"})
self.config = getattr(model, "config", DUMMY_MODEL_CONFIG)
# the `pretraining_tp` is set for some models to simulate Tensor Parallelism during inference to avoid
# numerical differences, https://github.com/pytorch/pytorch/issues/76232 - to avoid any unexpected
@ -193,6 +194,8 @@ class PeftMixedModel(PushToHubMixin, torch.nn.Module):
try:
return super().__getattr__(name) # defer to nn.Module's logic
except AttributeError:
if name == "base_model": # see #1892: prevent infinite recursion if class is not initialized
raise
return getattr(self.base_model, name)
def forward(self, *args: Any, **kwargs: Any):
@ -218,12 +221,38 @@ class PeftMixedModel(PushToHubMixin, torch.nn.Module):
finally:
self.base_model.enable_adapter_layers()
def add_adapter(self, adapter_name: str, peft_config: PeftConfig):
def add_adapter(self, adapter_name: str, peft_config: PeftConfig, low_cpu_mem_usage: bool = False) -> None:
"""
Add an adapter to the model based on the passed configuration.
This adapter is not trained. To load a trained adapter, check out [`PeftModel.load_adapter`].
The name for the new adapter should be unique.
The new adapter is not automatically set as the active adapter. Use [`PeftModel.set_adapter`] to set the active
adapter.
Args:
adapter_name (`str`):
The name of the adapter to be added.
peft_config ([`PeftConfig`]):
The configuration of the adapter to be added.
low_cpu_mem_usage (`bool`, `optional`, defaults to `False`):
Create empty adapter weights on meta device. Useful to speed up the process when loading saved
adapters.
<Tip>
Don't use `low_cpu_mem_usage=True` when creating a new PEFT adapter for training (training is untested
and discouraged for PeftMixedModel in general).
</Tip>
"""
_check_config_compatible(peft_config)
try:
self.peft_config[adapter_name] = peft_config
self.base_model.inject_adapter(self, adapter_name)
self.base_model.inject_adapter(self, adapter_name, low_cpu_mem_usage=low_cpu_mem_usage)
except Exception: # something went wrong, roll back
if adapter_name in self.peft_config:
del self.peft_config[adapter_name]
@ -322,6 +351,37 @@ class PeftMixedModel(PushToHubMixin, torch.nn.Module):
return PeftModel._split_kwargs(kwargs)
def load_adapter(self, model_id: str, adapter_name: str, *args: Any, **kwargs: Any):
"""
Load a trained adapter into the model.
The name for the new adapter should be unique.
The new adapter is not automatically set as the active adapter. Use [`PeftModel.set_adapter`] to set the active
adapter.
Args:
adapter_name (`str`):
The name of the adapter to be added.
peft_config ([`PeftConfig`]):
The configuration of the adapter to be added.
is_trainable (`bool`, *optional*, defaults to `False`):
Whether the adapter should be trainable or not. If `False`, the adapter will be frozen and can only be
used for inference.
torch_device (`str`, *optional*, defaults to None):
The device to load the adapter on. If `None`, the device will be inferred.
autocast_adapter_dtype (`bool`, *optional*, defaults to `True`):
Whether to autocast the adapter dtype. Defaults to `True`. Right now, this will only cast adapter
weights using float16 and bfloat16 to float32, as this is typically required for stable training, and
only affect select PEFT tuners.
ephemeral_gpu_offload (`bool`, *optional*, defaults to `False`):
Whether to use ephemeral GPU offloading for partially loaded modules. Defaults to `False`.
low_cpu_mem_usage (`bool`, `optional`, defaults to `False`):
Create empty adapter weights on meta device before loading the saved weights. Useful to speed up the
process.
kwargs: (`optional`):
Additional arguments to modify the way the adapter is loaded, e.g. the token for Hugging Face Hub.
"""
# the low_cpu_mem_usage option is handled through kwargs
output = PeftModel.load_adapter(self, model_id, adapter_name, *args, **kwargs)
# TODO: not quite clear why this is necessary but tests fail without it
self.set_adapter(self.active_adapters)
@ -372,6 +432,9 @@ class PeftMixedModel(PushToHubMixin, torch.nn.Module):
The configuration object to use instead of an automatically loaded configuration. This configuration
object is mutually exclusive with `model_id` and `kwargs`. This is useful when configuration is already
loaded before calling `from_pretrained`.
low_cpu_mem_usage (`bool`, `optional`, defaults to `False`):
Create empty adapter weights on meta device before loading the saved weights. Useful to speed up the
process.
kwargs: (`optional`):
Additional keyword arguments passed along to the specific PEFT configuration class.
"""
@ -411,5 +474,6 @@ class PeftMixedModel(PushToHubMixin, torch.nn.Module):
# note: this is different from PeftModel.from_pretrained, we always return a PeftMixedModel
model = cls(model, config, adapter_name)
# the low_cpu_mem_usage option is handled through kwargs
model.load_adapter(model_id, adapter_name, is_trainable=is_trainable, **kwargs)
return model

View File

@ -0,0 +1,18 @@
# Copyright 2024-present the HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from .loraplus import create_loraplus_optimizer
__all__ = ["create_loraplus_optimizer"]

View File

@ -0,0 +1,121 @@
# Copyright 2024-present the HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
This module contains the implementation of the LoraPlus optimizer.
"""
from __future__ import annotations
from operator import attrgetter
import torch.nn as nn
from torch.optim import Optimizer
from transformers.pytorch_utils import ALL_LAYERNORM_LAYERS
from transformers.trainer_pt_utils import get_parameter_names
from ..peft_model import PeftModel
from ..tuners.lora.layer import Embedding
def create_loraplus_optimizer(
model: PeftModel, optimizer_cls: type[Optimizer], *, lr: float, loraplus_lr_ratio: float, **kwargs
) -> Optimizer:
"""
Creates a LoraPlus optimizer.
Efficient Low Rank Adaptation of Large Models: https://arxiv.org/abs/2402.12354
Reference: https://github.com/nikhil-ghosh-berkeley/loraplus/
Args:
model (`torch.nn.Module`): The model to be optimized.
optimizer_cls (`torch.optim.Optimizer`): The optimizer class to be used.
lr (`float`): The learning rate to be used for the optimizer.
loraplus_lr_ratio (`float`):
The ratio of learning ηB/ηA where ηA (lr) is passed in as the optimizer learning rate. Should be ≥1. Should
be set in tandem with the optimizer learning rate (lr); should be larger when the task is more difficult
and the model needs to update its features to learn well. In this case, it helps to make the learning rate
slightly smaller (e.g., by a factor of 2) than typical vanilla LoRA learning rates
loraplus_lr_embedding (optional `float`):
If LoRA modules are added to embedding layers your can specify a different learning rate for them. Default
value 1e-6.
kwargs (`dict`): Additional keyword arguments to be passed to the optimizer.
Returns:
`torch.optim.Optimizer`: An instance of the specified optimizer class configured with the model's parameters
organized into groups with custom learning rates.
"""
decay_parameters = get_parameter_names(model, ALL_LAYERNORM_LAYERS)
decay_parameters = [name for name in decay_parameters if "bias" not in name]
param_groups = {
"groupA": {},
"groupB": {},
"groupB_no_decay": {},
"embedding": {},
}
for name, param in model.named_parameters():
if not param.requires_grad:
continue
module = attrgetter(name)(model)
if isinstance(module, Embedding):
param_groups["embedding"][name] = param
elif "lora_B" in name or param.ndim == 1:
if name in decay_parameters:
param_groups["groupB"][name] = param
else:
param_groups["groupB_no_decay"][name] = param
else:
param_groups["groupA"][name] = param
kwargs["lr"] = lr
loraplus_weight_decay = kwargs.pop("loraplus_weight_decay", 0.0)
loraplus_lr_embedding = kwargs.pop("loraplus_lr_embedding", 1e-6)
optimizer_grouped_parameters = [
{
"params": list(param_groups["groupA"].values()),
"weight_decay": loraplus_weight_decay,
"lr": lr,
},
{
"params": list(param_groups["embedding"].values()),
"weight_decay": loraplus_weight_decay,
"lr": loraplus_lr_embedding,
},
{
"params": list(param_groups["groupB"].values()),
"weight_decay": loraplus_weight_decay,
"lr": lr * loraplus_lr_ratio,
},
{
"params": list(param_groups["groupB_no_decay"].values()),
"weight_decay": 0.0,
"lr": lr * loraplus_lr_ratio,
},
]
optimizer = optimizer_cls(optimizer_grouped_parameters, **kwargs)
eight_bit_names = ["Adam8bit", "AdamW8bit", "PagedAdam8bit", "PagedAdamW8bit"]
if optimizer_cls.__name__ in eight_bit_names:
import bitsandbytes
manager = bitsandbytes.optim.GlobalOptimManager.get_instance()
for module in model.modules():
if isinstance(module, nn.Embedding):
manager.register_module_override(module, "weight", {"optim_bits": 32})
return optimizer

View File

@ -15,10 +15,11 @@
from __future__ import annotations
import collections
import copy
import inspect
import os
import warnings
from contextlib import contextmanager
from contextlib import contextmanager, nullcontext
from copy import deepcopy
from dataclasses import dataclass
from typing import Any, Literal, Optional, Union
@ -26,10 +27,10 @@ from typing import Any, Literal, Optional, Union
import packaging.version
import torch
import transformers
from accelerate import dispatch_model, infer_auto_device_map
from accelerate import dispatch_model, infer_auto_device_map, init_empty_weights
from accelerate.hooks import AlignDevicesHook, add_hook_to_module, remove_hook_from_submodules
from accelerate.utils import get_balanced_memory, named_module_tensors
from huggingface_hub import ModelCard, ModelCardData, hf_hub_download
from huggingface_hub import HfFileSystem, ModelCard, ModelCardData, hf_hub_download
from safetensors import safe_open
from safetensors.torch import save_file as safe_save_file
from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
@ -37,12 +38,16 @@ from transformers import PreTrainedModel
from transformers.modeling_outputs import QuestionAnsweringModelOutput, SequenceClassifierOutput, TokenClassifierOutput
from transformers.utils import PushToHubMixin
from peft.utils.constants import DUMMY_MODEL_CONFIG
from . import __version__
from .config import PeftConfig
from .tuners import (
AdaLoraModel,
AdaptionPromptModel,
BOFTModel,
FourierFTModel,
HRAModel,
IA3Model,
LNTuningModel,
LoHaModel,
@ -54,7 +59,10 @@ from .tuners import (
PrefixEncoder,
PromptEmbedding,
PromptEncoder,
VBLoRAModel,
VeraModel,
XLoraConfig,
XLoraModel,
)
from .tuners.tuners_utils import BaseTuner, BaseTunerLayer
from .utils import (
@ -91,6 +99,10 @@ PEFT_TYPE_TO_MODEL_MAPPING = {
PeftType.POLY: PolyModel,
PeftType.LN_TUNING: LNTuningModel,
PeftType.VERA: VeraModel,
PeftType.FOURIERFT: FourierFTModel,
PeftType.XLORA: XLoraModel,
PeftType.HRA: HRAModel,
PeftType.VBLORA: VBLoRAModel,
}
@ -102,6 +114,18 @@ class PeftModel(PushToHubMixin, torch.nn.Module):
model ([`~transformers.PreTrainedModel`]): The base transformer model used for Peft.
peft_config ([`PeftConfig`]): The configuration of the Peft model.
adapter_name (`str`, *optional*): The name of the adapter, defaults to `"default"`.
autocast_adapter_dtype (`bool`, *optional*):
Whether to autocast the adapter dtype. Defaults to `True`. Right now, this will only cast adapter weights
using float16 and bfloat16 to float32, as this is typically required for stable training, and only affect
select PEFT tuners.
low_cpu_mem_usage (`bool`, `optional`, defaults to `False`):
Create empty adapter weights on meta device. Useful to speed up the loading loading process.
<Tip>
Don't use `low_cpu_mem_usage=True` when creating a new PEFT adapter for training.
</Tip>
**Attributes**:
- **base_model** ([`torch.nn.Module`]) -- The base transformer model used for Peft.
@ -118,7 +142,14 @@ class PeftModel(PushToHubMixin, torch.nn.Module):
in the base model if using [`PromptLearningConfig`].
"""
def __init__(self, model: PreTrainedModel, peft_config: PeftConfig, adapter_name: str = "default") -> None:
def __init__(
self,
model: PreTrainedModel,
peft_config: PeftConfig,
adapter_name: str = "default",
autocast_adapter_dtype: bool = True,
low_cpu_mem_usage: bool = False,
) -> None:
super().__init__()
self.modules_to_save = None
self.active_adapter = adapter_name
@ -131,13 +162,20 @@ class PeftModel(PushToHubMixin, torch.nn.Module):
if self._is_prompt_learning:
self._peft_config = {adapter_name: peft_config}
self.base_model = model
self.add_adapter(adapter_name, peft_config)
self.add_adapter(adapter_name, peft_config, low_cpu_mem_usage=low_cpu_mem_usage)
else:
self._peft_config = None
cls = PEFT_TYPE_TO_MODEL_MAPPING[peft_config.peft_type]
self.base_model = cls(model, {adapter_name: peft_config}, adapter_name)
ctx = init_empty_weights if low_cpu_mem_usage else nullcontext
with ctx():
self.base_model = cls(model, {adapter_name: peft_config}, adapter_name)
self.set_additional_trainable_modules(peft_config, adapter_name)
if hasattr(self.base_model, "_cast_adapter_dtype"):
self.base_model._cast_adapter_dtype(
adapter_name=adapter_name, autocast_adapter_dtype=autocast_adapter_dtype
)
if getattr(model, "is_gradient_checkpointing", True):
model = self._prepare_model_for_gradient_checkpointing(model)
@ -157,6 +195,15 @@ class PeftModel(PushToHubMixin, torch.nn.Module):
def active_adapters(self) -> list[str]:
try:
adapters = self.base_model.active_adapters
if not isinstance(adapters, list):
# Base model is probably a transformers model, see:
# https://github.com/huggingface/transformers/pull/30790#issuecomment-2253808249
# Unfortunately, transformers models also have an active_adapters method but it's 1) not a property and
# 2) calling it fails because the base model (usually) has no loaded adapter. The base model can be a
# transformers model for prompt learning, where the base model is not wrapped in a LoraModel or similar.
adapters = self.active_adapter
if isinstance(adapters, str):
adapters = [adapters]
except AttributeError:
adapters = self.active_adapter
if isinstance(adapters, str):
@ -178,6 +225,7 @@ class PeftModel(PushToHubMixin, torch.nn.Module):
save_embedding_layers: Union[str, bool] = "auto",
is_main_process: bool = True,
convert_pissa_to_lora: Optional[str] = None,
path_initial_model_for_weight_conversion: Optional[str] = None,
**kwargs: Any,
) -> None:
r"""
@ -200,15 +248,19 @@ class PeftModel(PushToHubMixin, torch.nn.Module):
is_main_process (`bool`, *optional*):
Whether the process calling this is the main process or not. Will default to `True`. Will not save the
checkpoint if not on the main process, which is important for multi device setups (e.g. DDP).
convert_pissa_to_lora (`str`):
The path to the initialized PiSSA adapter, which is obtained after initializing the model with PiSSA
and before performing any training. When `convert_pissa_to_lora` is not None, the difference in PISSA
before and after fine-tuning is calculated. This difference can be represented as the parameters of a
of a standard LoRA adapter. Using this converted adapter does not require changes to the base model,
thus conveniently allowing the use of multiple PISSA and LoRA adapters, and the activation or
deactivation of any adapters.
convert_pissa_to_lora (`str, *optional*`):
Deprecated. Use `path_initial_model_for_weight_conversion` instead.
path_initial_model_for_weight_conversion (`str, *optional*`):
The path to the initialized adapter, which is obtained after initializing the model with PiSSA or OLoRA
and before performing any training. When `path_initial_model_for_weight_conversion` is not None, the
difference in adapter before and after fine-tuning is calculated. This difference can be represented as
the parameters of a standard LoRA adapter. Using this converted adapter does not require changes to the
base model, thus conveniently allowing the use of multiple PiSSA or OLoRA adapters with LoRA adapters,
and the activation or deactivation of any adapters. Note that this conversion is not supported if
`rslora` is used in combination with `rank_pattern` or `alpha_pattern`.
kwargs (additional keyword arguments, *optional*):
Additional keyword arguments passed along to the `push_to_hub` method.
"""
if os.path.isfile(save_directory):
raise ValueError(f"Provided path ({save_directory}) should be a directory, not a file")
@ -224,21 +276,49 @@ class PeftModel(PushToHubMixin, torch.nn.Module):
f"You passed an invalid `selected_adapters` arguments, current supported adapter names are"
f" {list(self.peft_config.keys())} - got {selected_adapters}."
)
def save_pissa_as_lora(peft_config, convert_pissa_to_lora, output_state_dict, kwargs):
if not str(peft_config.init_lora_weights).startswith("pissa"):
warnings.warn("`convert_pissa_to_lora` only works for converting a PiSSA adapter to a LoRA adapter")
initial_adapter = os.path.basename(convert_pissa_to_lora)
self.load_adapter(
os.path.dirname(convert_pissa_to_lora), subfolder=initial_adapter, adapter_name=initial_adapter
# TODO: remove deprecated parameter in PEFT v0.14.0
if convert_pissa_to_lora is not None:
warnings.warn(
"`convert_pissa_to_lora` is deprecated and will be removed in a future version. "
"Use `path_initial_model_for_weight_conversion` instead."
)
if str(self.peft_config[initial_adapter].init_lora_weights).startswith("pissa"):
raise ValueError(
"The `init_lora_weights` parameter of the initial PiSSA adapter should be set to `True`. "
"Otherwise, `self.load_adapter` will subtract the principal singular value and vector again based on the residual model."
path_initial_model_for_weight_conversion = convert_pissa_to_lora
def save_mutated_as_lora(peft_config, path_initial_model_for_weight_conversion, output_state_dict, kwargs):
if peft_config.use_rslora and (peft_config.rank_pattern or peft_config.alpha_pattern):
msg = (
"Passing `path_initial_model_for_weight_conversion` to `save_pretrained` is not supported when "
"using `rank_pattern` or `alpha_pattern` at the same time as `use_rslora=True`."
)
output_state_dict = self.base_model.subtract_pissa_init(output_state_dict, initial_adapter, kwargs)
self.delete_adapter(adapter_name)
raise ValueError(msg)
if not any(
str(peft_config.init_lora_weights).lower().startswith(prefix) for prefix in ["pissa", "olora", "true"]
):
warnings.warn(
"`path_initial_model_for_weight_conversion` only works for converting a PiSSA or OLoRA adapter to "
"a LoRA adapter"
)
initial_adapter_name = os.path.basename(path_initial_model_for_weight_conversion)
try:
self.load_adapter(
os.path.dirname(path_initial_model_for_weight_conversion),
subfolder=initial_adapter_name,
adapter_name=initial_adapter_name,
)
is_pissa = str(self.peft_config[initial_adapter_name].init_lora_weights).lower().startswith("pissa")
is_olora = str(self.peft_config[initial_adapter_name].init_lora_weights).lower() == "olora"
if is_pissa or is_olora:
raise ValueError(
"The `init_lora_weights` parameter of the initial adapter should be set to `True`. "
"Otherwise, `self.load_adapter` will subtract the decomposed values again based on the "
"residual model."
)
output_state_dict = self.base_model.subtract_mutated_init(
output_state_dict, initial_adapter_name, kwargs
)
finally:
self.delete_adapter(initial_adapter_name)
return output_state_dict
if is_main_process:
@ -279,9 +359,12 @@ class PeftModel(PushToHubMixin, torch.nn.Module):
# not supported in safetensors.
for shared_tensor_name in names[1:]:
output_state_dict[shared_tensor_name] = output_state_dict[shared_tensor_name].clone()
if convert_pissa_to_lora is not None:
output_state_dict = save_pissa_as_lora(
peft_config, convert_pissa_to_lora, output_state_dict, kwargs
if path_initial_model_for_weight_conversion is not None:
peft_config = copy.deepcopy(peft_config)
peft_config.init_lora_weights = True
peft_config.save_pretrained(path_initial_model_for_weight_conversion)
output_state_dict = save_mutated_as_lora(
peft_config, path_initial_model_for_weight_conversion, output_state_dict, kwargs
)
safe_save_file(
output_state_dict,
@ -289,9 +372,12 @@ class PeftModel(PushToHubMixin, torch.nn.Module):
metadata={"format": "pt"},
)
elif is_main_process:
if convert_pissa_to_lora is not None:
output_state_dict = save_pissa_as_lora(
peft_config, convert_pissa_to_lora, output_state_dict, kwargs
if path_initial_model_for_weight_conversion is not None:
peft_config = copy.deepcopy(peft_config)
peft_config.init_lora_weights = True
peft_config.save_pretrained(path_initial_model_for_weight_conversion)
output_state_dict = save_mutated_as_lora(
peft_config, path_initial_model_for_weight_conversion, output_state_dict, kwargs
)
torch.save(output_state_dict, os.path.join(output_dir, WEIGHTS_NAME))
@ -320,10 +406,20 @@ class PeftModel(PushToHubMixin, torch.nn.Module):
auto_mapping_dict = None
if is_main_process:
if convert_pissa_to_lora is not None:
if path_initial_model_for_weight_conversion is not None:
peft_config.init_lora_weights = True
peft_config.r *= 2
peft_config.lora_alpha *= 2
if not peft_config.use_rslora:
peft_config.lora_alpha *= 2
else:
# with rslora, we have scaling = alpha / sqrt(r), we thus adjust alpha to keep the same scaling
peft_config.lora_alpha *= 2**0.5
if peft_config.rank_pattern:
peft_config.rank_pattern = {key: 2 * val for key, val in peft_config.rank_pattern.items()}
if peft_config.alpha_pattern:
peft_config.alpha_pattern = {key: 2 * val for key, val in peft_config.alpha_pattern.items()}
peft_config.save_pretrained(output_dir, auto_mapping_dict=auto_mapping_dict)
peft_config.inference_mode = inference_mode
@ -335,6 +431,9 @@ class PeftModel(PushToHubMixin, torch.nn.Module):
adapter_name: str = "default",
is_trainable: bool = False,
config: Optional[PeftConfig] = None,
autocast_adapter_dtype: bool = True,
ephemeral_gpu_offload: bool = False,
low_cpu_mem_usage: bool = False,
**kwargs: Any,
) -> PeftModel:
r"""
@ -361,6 +460,19 @@ class PeftModel(PushToHubMixin, torch.nn.Module):
The configuration object to use instead of an automatically loaded configuration. This configuration
object is mutually exclusive with `model_id` and `kwargs`. This is useful when configuration is already
loaded before calling `from_pretrained`.
autocast_adapter_dtype (`bool`, *optional*):
Whether to autocast the adapter dtype. Defaults to `True`. Only relevant for specific adapter types.
ephemeral_gpu_offload (`bool`, *optional*):
Whether to use ephemeral GPU offloading for partially loaded modules. Defaults to `False`. This is
useful when parts of the model and/or components (such as adapters) are kept in CPU memory until they
are needed. Rather than perform expensive operations on small data, the data is transferred to the GPU
on-demand, the operation(s) performed, and the results moved back to CPU memory. This brings a slight
momentary VRAM overhead but gives orders of magnitude speedup in certain cases.
low_cpu_mem_usage (`bool`, `optional`, defaults to `False`):
Create empty adapter weights on meta device before loading the saved weights. Useful to speed up the
process.
torch_device (`str`, *optional*, defaults to None):
The device to load the adapter on. If `None`, the device will be inferred.
kwargs: (`optional`):
Additional keyword arguments passed along to the specific PEFT configuration class.
"""
@ -383,6 +495,13 @@ class PeftModel(PushToHubMixin, torch.nn.Module):
else:
raise ValueError(f"The input config must be a PeftConfig, got {config.__class__}")
# Runtime configuration, if supported
if hasattr(config, "runtime_config"):
config.runtime_config.ephemeral_gpu_offload = ephemeral_gpu_offload
else:
if ephemeral_gpu_offload:
warnings.warn("Ephemeral GPU offloading is not supported for this model. Ignoring.")
if hasattr(model, "hf_device_map"):
weight_map = dict(named_module_tensors(model, recurse=True))
@ -422,12 +541,57 @@ class PeftModel(PushToHubMixin, torch.nn.Module):
raise ValueError("Cannot set a prompt learning adapter to trainable when loading pretrained adapter.")
else:
config.inference_mode = not is_trainable
if isinstance(getattr(model, "base_model", None), XLoraModel):
if not isinstance(config, XLoraConfig):
raise TypeError(f"Expected 'XLoraConfig', got '{type(config)}' instead.")
if "adapters" in kwargs:
config.adapters = kwargs["adapters"]
else:
# If the path is on HF hub, then we get the adapter names to create a subfolders list which tells
# `load_adapter` where the adapters are.
if not os.path.exists(model_id):
s = HfFileSystem()
# The names of the adapters which must be in folders
adapter_names = [
file["name"][len(model_id) + 1 :] for file in s.ls(model_id) if file["type"] == "directory"
]
# Prepare a dict of adapter paths, which really just point to the hf id; we will use the subfolders
adapter_paths = {}
for adapter_name in adapter_names:
adapter_paths[adapter_name] = os.path.join(model_id, model_id)
config.adapters = adapter_paths
config._subfolders = adapter_names
else:
if "adapters" not in kwargs:
raise ValueError("If model_id is a local path, then `adapters` must be passed in kwargs.")
if config.task_type not in MODEL_TYPE_TO_PEFT_MODEL_MAPPING.keys():
model = cls(model, config, adapter_name)
model = cls(
model,
config,
adapter_name,
autocast_adapter_dtype=autocast_adapter_dtype,
low_cpu_mem_usage=low_cpu_mem_usage,
)
else:
model = MODEL_TYPE_TO_PEFT_MODEL_MAPPING[config.task_type](model, config, adapter_name)
model.load_adapter(model_id, adapter_name, is_trainable=is_trainable, **kwargs)
model = MODEL_TYPE_TO_PEFT_MODEL_MAPPING[config.task_type](
model,
config,
adapter_name,
autocast_adapter_dtype=autocast_adapter_dtype,
low_cpu_mem_usage=low_cpu_mem_usage,
)
model.load_adapter(
model_id,
adapter_name,
is_trainable=is_trainable,
autocast_adapter_dtype=autocast_adapter_dtype,
low_cpu_mem_usage=low_cpu_mem_usage,
**kwargs,
)
return model
def _setup_prompt_encoder(self, adapter_name: str):
@ -561,9 +725,13 @@ class PeftModel(PushToHubMixin, torch.nn.Module):
prompts = prompt_encoder(prompt_tokens, task_ids)
else:
if peft_config.inference_mode:
prompts = prompt_encoder.embedding.weight.repeat(batch_size, 1, 1)
prompts = prompt_encoder.embedding.weight
else:
# Take only one prompt token sample and expand the output instead of expanding the input, see:
# https://github.com/huggingface/peft/issues/2043#issuecomment-2321522577
prompt_tokens = prompt_tokens[:1]
prompts = prompt_encoder(prompt_tokens)
prompts = prompts.repeat(batch_size, 1, 1)
return prompts
def get_nb_trainable_parameters(self) -> tuple[int, int]:
@ -618,6 +786,8 @@ class PeftModel(PushToHubMixin, torch.nn.Module):
try:
return super().__getattr__(name) # defer to nn.Module's logic
except AttributeError:
if name == "base_model": # see #1892: prevent infinite recursion if class is not initialized
raise
return getattr(self.base_model, name)
@contextmanager
@ -712,7 +882,7 @@ class PeftModel(PushToHubMixin, torch.nn.Module):
else self.base_model.model
)
def add_adapter(self, adapter_name: str, peft_config: PeftConfig) -> None:
def add_adapter(self, adapter_name: str, peft_config: PeftConfig, low_cpu_mem_usage: bool = False) -> None:
"""
Add an adapter to the model based on the passed configuration.
@ -728,6 +898,10 @@ class PeftModel(PushToHubMixin, torch.nn.Module):
The name of the adapter to be added.
peft_config ([`PeftConfig`]):
The configuration of the adapter to be added.
low_cpu_mem_usage (`bool`, `optional`, defaults to `False`):
Create empty adapter weights on meta device. Useful to speed up the process when loading saved
adapters. Don't use this option when creating a new PEFT adapter for training.
"""
if peft_config.peft_type != self.peft_type:
raise ValueError(
@ -749,7 +923,9 @@ class PeftModel(PushToHubMixin, torch.nn.Module):
self.base_model.add_adapter(adapter_name, peft_config)
else:
self.peft_config[adapter_name] = peft_config
self.base_model.inject_adapter(self.base_model.model, adapter_name)
self.base_model.inject_adapter(
self.base_model.model, adapter_name, low_cpu_mem_usage=low_cpu_mem_usage
)
except Exception: # something went wrong, roll back
if adapter_name in self.peft_config:
del self.peft_config[adapter_name]
@ -931,10 +1107,13 @@ class PeftModel(PushToHubMixin, torch.nn.Module):
def load_adapter(
self,
model_id: str,
model_id: Union[str, os.PathLike],
adapter_name: str,
is_trainable: bool = False,
torch_device: Optional[str] = None,
autocast_adapter_dtype: bool = True,
ephemeral_gpu_offload: bool = False,
low_cpu_mem_usage: bool = False,
**kwargs: Any,
):
"""
@ -946,15 +1125,28 @@ class PeftModel(PushToHubMixin, torch.nn.Module):
adapter.
Args:
model_id (`str` or `os.PathLike`):
The name of the PEFT configuration to use. Can be either:
- A string, the `model id` of a PEFT configuration hosted inside a model repo on the Hugging Face
Hub.
- A path to a directory containing a PEFT configuration file saved using the `save_pretrained`
method (`./my_peft_config_directory/`).
adapter_name (`str`):
The name of the adapter to be added.
peft_config ([`PeftConfig`]):
The configuration of the adapter to be added.
is_trainable (`bool`, *optional*, defaults to `False`):
Whether the adapter should be trainable or not. If `False`, the adapter will be frozen and can only be
used for inference.
torch_device (`str`, *optional*, defaults to None):
The device to load the adapter on. If `None`, the device will be inferred.
autocast_adapter_dtype (`bool`, *optional*, defaults to `True`):
Whether to autocast the adapter dtype. Defaults to `True`. Right now, this will only cast adapter
weights using float16 and bfloat16 to float32, as this is typically required for stable training, and
only affect select PEFT tuners.
ephemeral_gpu_offload (`bool`, *optional*, defaults to `False`):
Whether to use ephemeral GPU offloading for partially loaded modules. Defaults to `False`.
low_cpu_mem_usage (`bool`, `optional`, defaults to `False`):
Create empty adapter weights on meta device before loading the saved weights. Useful to speed up the
process.
kwargs: (`optional`):
Additional arguments to modify the way the adapter is loaded, e.g. the token for Hugging Face Hub.
"""
@ -973,20 +1165,25 @@ class PeftModel(PushToHubMixin, torch.nn.Module):
)
].from_pretrained(
model_id,
ephemeral_gpu_offload=ephemeral_gpu_offload,
**hf_hub_download_kwargs,
)
if peft_config.is_prompt_learning and is_trainable:
raise ValueError("Cannot set a prompt learning adapter to trainable when loading pretrained adapter.")
else:
peft_config.inference_mode = not is_trainable
self.add_adapter(adapter_name, peft_config)
self.add_adapter(adapter_name, peft_config, low_cpu_mem_usage=low_cpu_mem_usage)
adapters_weights = load_peft_weights(model_id, device=torch_device, **hf_hub_download_kwargs)
# load the weights into the model
ignore_mismatched_sizes = kwargs.get("ignore_mismatched_sizes", False)
load_result = set_peft_model_state_dict(
self, adapters_weights, adapter_name=adapter_name, ignore_mismatched_sizes=ignore_mismatched_sizes
self,
adapters_weights,
adapter_name=adapter_name,
ignore_mismatched_sizes=ignore_mismatched_sizes,
low_cpu_mem_usage=low_cpu_mem_usage,
)
if (
(getattr(self, "hf_device_map", None) is not None)
@ -1034,6 +1231,11 @@ class PeftModel(PushToHubMixin, torch.nn.Module):
remove_hook_from_submodules(self.prompt_encoder)
add_hook_to_module(self.get_base_model(), hook)
if hasattr(self.base_model, "_cast_adapter_dtype"):
self.base_model._cast_adapter_dtype(
adapter_name=adapter_name, autocast_adapter_dtype=autocast_adapter_dtype
)
# Set model in evaluation mode to deactivate Dropout modules by default
if not is_trainable:
self.eval()
@ -1088,9 +1290,8 @@ class PeftModel(PushToHubMixin, torch.nn.Module):
card.data["library_name"] = "peft"
model_config = getattr(self, "config", None)
if hasattr(model_config, "to_dict"):
model_config = model_config.to_dict()
model_config = BaseTuner.get_model_config(self)
model_config = None if model_config == DUMMY_MODEL_CONFIG else model_config
if model_config is not None and "_name_or_path" in model_config:
card.data["base_model"] = model_config["_name_or_path"]
@ -1133,6 +1334,11 @@ class PeftModelForSequenceClassification(PeftModel):
Args:
model ([`~transformers.PreTrainedModel`]): Base transformer model.
peft_config ([`PeftConfig`]): Peft config.
adapter_name (`str`, *optional*): The name of the adapter, defaults to `"default"`.
autocast_adapter_dtype (`bool`, *optional*):
Whether to autocast the adapter dtype. Defaults to `True`. Right now, this will only cast adapter weights
using float16 and bfloat16 to float32, as this is typically required for stable training, and only affect
select PEFT tuners.
**Attributes**:
- **config** ([`~transformers.PretrainedConfig`]) -- The configuration object of the base model.
@ -1166,8 +1372,10 @@ class PeftModelForSequenceClassification(PeftModel):
```
"""
def __init__(self, model: torch.nn.Module, peft_config: PeftConfig, adapter_name: str = "default") -> None:
super().__init__(model, peft_config, adapter_name)
def __init__(
self, model: torch.nn.Module, peft_config: PeftConfig, adapter_name: str = "default", **kwargs
) -> None:
super().__init__(model, peft_config, adapter_name, **kwargs)
classifier_module_names = ["classifier", "score"]
if self.modules_to_save is None:
@ -1361,7 +1569,11 @@ class PeftModelForCausalLM(PeftModel):
Args:
model ([`~transformers.PreTrainedModel`]): Base transformer model.
peft_config ([`PeftConfig`]): Peft config.
adapter_name (`str`, *optional*): The name of the adapter, defaults to `"default"`.
autocast_adapter_dtype (`bool`, *optional*):
Whether to autocast the adapter dtype. Defaults to `True`. Right now, this will only cast adapter weights
using float16 and bfloat16 to float32, as this is typically required for stable training, and only affect
select PEFT tuners.
Example:
@ -1391,8 +1603,10 @@ class PeftModelForCausalLM(PeftModel):
```
"""
def __init__(self, model: torch.nn.Module, peft_config: PeftConfig, adapter_name: str = "default") -> None:
super().__init__(model, peft_config, adapter_name)
def __init__(
self, model: torch.nn.Module, peft_config: PeftConfig, adapter_name: str = "default", **kwargs
) -> None:
super().__init__(model, peft_config, adapter_name, **kwargs)
self.base_model_prepare_inputs_for_generation = self.base_model.prepare_inputs_for_generation
def forward(
@ -1461,10 +1675,9 @@ class PeftModelForCausalLM(PeftModel):
)
if peft_config.peft_type == PeftType.PREFIX_TUNING:
past_key_values = self.get_prompt(batch_size)
return self.base_model(
input_ids=input_ids, inputs_embeds=inputs_embeds, past_key_values=past_key_values, **kwargs
)
# overwrite past_kv in kwargs
kwargs["past_key_values"] = self.get_prompt(batch_size)
return self.base_model(input_ids=input_ids, inputs_embeds=inputs_embeds, **kwargs)
else:
if inputs_embeds is None:
inputs_embeds = self.word_embeddings(input_ids)
@ -1508,6 +1721,10 @@ class PeftModelForCausalLM(PeftModel):
uses_transformers_4_38 = packaging.version.parse(transformers.__version__) >= packaging.version.parse("4.38.0")
uses_transformers_4_36 = packaging.version.parse(transformers.__version__) >= packaging.version.parse("4.36.0")
transformers_new_cache_archs = ["llama", "mistral", "persimmon", "phi"]
if packaging.version.parse(transformers.__version__) > packaging.version.parse("4.43.3"):
# https://github.com/huggingface/transformers/pull/31445
transformers_new_cache_archs.append("bloom")
uses_cache = uses_transformers_4_38 or (
uses_transformers_4_36 and self.base_model.config.model_type in transformers_new_cache_archs
)
@ -1519,7 +1736,12 @@ class PeftModelForCausalLM(PeftModel):
# change in the logic of `prepare_inputs_for_generation` makes the below code necessary
# In prompt learning methods, past key values are longer when compared to the `input_ids`.
# As such only consider the last input ids in the autogressive generation phase.
if model_kwargs["past_key_values"][0][0].shape[-2] >= model_kwargs["input_ids"].shape[1]:
past_key_values = model_kwargs["past_key_values"]
if isinstance(past_key_values, (tuple, list)):
seq_len = past_key_values[0][0].shape[-2]
else: # using transformers kv cache
seq_len = past_key_values.get_seq_length()
if seq_len >= model_kwargs["input_ids"].shape[1]:
model_kwargs["input_ids"] = model_kwargs["input_ids"][:, -1:]
if model_kwargs.get("attention_mask", None) is not None:
@ -1539,16 +1761,20 @@ class PeftModelForCausalLM(PeftModel):
)
kwargs["token_type_ids"] = None
if model_kwargs["past_key_values"] is None and peft_config.peft_type == PeftType.PREFIX_TUNING:
past_key_values = self.get_prompt(batch_size=model_kwargs["input_ids"].shape[0])
model_kwargs["past_key_values"] = past_key_values
else:
if model_kwargs["past_key_values"] is None:
inputs_embeds = self.word_embeddings(model_kwargs["input_ids"])
prompts = self.get_prompt(batch_size=model_kwargs["input_ids"].shape[0], task_ids=task_ids)
prompts = prompts.to(inputs_embeds.dtype)
model_kwargs["inputs_embeds"] = torch.cat((prompts, inputs_embeds), dim=1)
model_kwargs["input_ids"] = None
# no past_key_values or past_key_values empty cache
requires_prompt_injection = (model_kwargs["past_key_values"] is None) or (
isinstance(model_kwargs["past_key_values"], transformers.Cache) and not model_kwargs["past_key_values"]
)
if requires_prompt_injection and peft_config.peft_type == PeftType.PREFIX_TUNING:
new_past_key_values = self.get_prompt(batch_size=model_kwargs["input_ids"].shape[0])
model_kwargs["past_key_values"] = new_past_key_values
elif requires_prompt_injection:
inputs_embeds = self.word_embeddings(model_kwargs["input_ids"])
prompts = self.get_prompt(batch_size=model_kwargs["input_ids"].shape[0], task_ids=task_ids)
prompts = prompts.to(inputs_embeds.dtype)
model_kwargs["inputs_embeds"] = torch.cat((prompts, inputs_embeds), dim=1)
model_kwargs["input_ids"] = None
# For transformers>=4.38.0 - for some architectures such as Llama, `cache_position` is
# passed in the forward pass to keep track of the position ids of the cache. We have to
@ -1566,7 +1792,11 @@ class PeftModelForSeq2SeqLM(PeftModel):
Args:
model ([`~transformers.PreTrainedModel`]): Base transformer model.
peft_config ([`PeftConfig`]): Peft config.
adapter_name (`str`, *optional*): The name of the adapter, defaults to `"default"`.
autocast_adapter_dtype (`bool`, *optional*):
Whether to autocast the adapter dtype. Defaults to `True`. Right now, this will only cast adapter weights
using float16 and bfloat16 to float32, as this is typically required for stable training, and only affect
select PEFT tuners.
Example:
@ -1595,8 +1825,10 @@ class PeftModelForSeq2SeqLM(PeftModel):
```
"""
def __init__(self, model: torch.nn.Module, peft_config: PeftConfig, adapter_name: str = "default") -> None:
super().__init__(model, peft_config, adapter_name)
def __init__(
self, model: torch.nn.Module, peft_config: PeftConfig, adapter_name: str = "default", **kwargs
) -> None:
super().__init__(model, peft_config, adapter_name, **kwargs)
self.base_model_prepare_inputs_for_generation = self.base_model.prepare_inputs_for_generation
self.base_model_prepare_encoder_decoder_kwargs_for_generation = (
self.base_model._prepare_encoder_decoder_kwargs_for_generation
@ -1665,12 +1897,12 @@ class PeftModelForSeq2SeqLM(PeftModel):
)
if peft_config.peft_type == PeftType.PREFIX_TUNING:
past_key_values = self.get_prompt(batch_size)
# overwrite past_kv in kwargs
kwargs["past_key_values"] = self.get_prompt(batch_size)
return self.base_model(
input_ids=input_ids,
decoder_input_ids=decoder_input_ids,
decoder_inputs_embeds=decoder_inputs_embeds,
past_key_values=past_key_values,
**kwargs,
)
elif peft_config.peft_type in [PeftType.PROMPT_TUNING, PeftType.P_TUNING]:
@ -1820,6 +2052,11 @@ class PeftModelForTokenClassification(PeftModel):
Args:
model ([`~transformers.PreTrainedModel`]): Base transformer model.
peft_config ([`PeftConfig`]): Peft config.
adapter_name (`str`, *optional*): The name of the adapter, defaults to `"default"`.
autocast_adapter_dtype (`bool`, *optional*):
Whether to autocast the adapter dtype. Defaults to `True`. Right now, this will only cast adapter weights
using float16 and bfloat16 to float32, as this is typically required for stable training, and only affect
select PEFT tuners.
**Attributes**:
- **config** ([`~transformers.PretrainedConfig`]) -- The configuration object of the base model.
@ -1853,8 +2090,10 @@ class PeftModelForTokenClassification(PeftModel):
```
"""
def __init__(self, model: torch.nn.Module, peft_config: PeftConfig = None, adapter_name: str = "default") -> None:
super().__init__(model, peft_config, adapter_name)
def __init__(
self, model: torch.nn.Module, peft_config: PeftConfig = None, adapter_name: str = "default", **kwargs
) -> None:
super().__init__(model, peft_config, adapter_name, **kwargs)
classifier_module_names = ["classifier", "score"]
if self.modules_to_save is None:
@ -2032,6 +2271,11 @@ class PeftModelForQuestionAnswering(PeftModel):
Args:
model ([`~transformers.PreTrainedModel`]): Base transformer model.
peft_config ([`PeftConfig`]): Peft config.
adapter_name (`str`, *optional*): The name of the adapter, defaults to `"default"`.
autocast_adapter_dtype (`bool`, *optional*):
Whether to autocast the adapter dtype. Defaults to `True`. Right now, this will only cast adapter weights
using float16 and bfloat16 to float32, as this is typically required for stable training, and only affect
select PEFT tuners.
**Attributes**:
- **config** ([`~transformers.PretrainedConfig`]) -- The configuration object of the base model.
@ -2063,8 +2307,10 @@ class PeftModelForQuestionAnswering(PeftModel):
```
"""
def __init__(self, model: torch.nn.Module, peft_config: PeftConfig, adapter_name: str = "default") -> None:
super().__init__(model, peft_config, adapter_name)
def __init__(
self, model: torch.nn.Module, peft_config: PeftConfig, adapter_name: str = "default", **kwargs
) -> None:
super().__init__(model, peft_config, adapter_name, **kwargs)
qa_module_names = ["qa_outputs"]
if self.modules_to_save is None:
@ -2265,6 +2511,11 @@ class PeftModelForFeatureExtraction(PeftModel):
Args:
model ([`~transformers.PreTrainedModel`]): Base transformer model.
peft_config ([`PeftConfig`]): Peft config.
adapter_name (`str`, *optional*): The name of the adapter, defaults to `"default"`.
autocast_adapter_dtype (`bool`, *optional*):
Whether to autocast the adapter dtype. Defaults to `True`. Right now, this will only cast adapter weights
using float16 and bfloat16 to float32, as this is typically required for stable training, and only affect
select PEFT tuners.
**Attributes**:
- **config** ([`~transformers.PretrainedConfig`]) -- The configuration object of the base model.
@ -2293,8 +2544,8 @@ class PeftModelForFeatureExtraction(PeftModel):
```
"""
def __init__(self, model: torch.nn.Module, peft_config: PeftConfig, adapter_name: str = "default"):
super().__init__(model, peft_config, adapter_name)
def __init__(self, model: torch.nn.Module, peft_config: PeftConfig, adapter_name: str = "default", **kwargs):
super().__init__(model, peft_config, adapter_name, **kwargs)
def forward(
self,
@ -2346,8 +2597,9 @@ class PeftModelForFeatureExtraction(PeftModel):
)
if peft_config.peft_type == PeftType.PREFIX_TUNING:
past_key_values = self.get_prompt(batch_size)
return self.base_model(input_ids=input_ids, past_key_values=past_key_values, **kwargs)
# overwrite past_kv in kwargs
kwargs["past_key_values"] = self.get_prompt(batch_size)
return self.base_model(input_ids=input_ids, **kwargs)
else:
if inputs_embeds is None:
inputs_embeds = self.word_embeddings(input_ids)
@ -2366,6 +2618,7 @@ class TunerLayerStatus:
merged_adapters: list[str]
requires_grad: dict[str, bool | Literal["irregular"]]
available_adapters: list[str]
devices: dict[str, list[str]]
def get_layer_status(model: torch.nn.Module) -> list[TunerLayerStatus]:
@ -2390,6 +2643,8 @@ def get_layer_status(model: torch.nn.Module) -> list[TunerLayerStatus]:
`"irregular"`.
- `available_adapters` (`list[str]`):
The names of the available adapters, e.g. `["default"]`.
- `devices` (`dict[str, list[str]]`):
The devices where the parameters of the given adapter are stored, e.g. `["cuda"]`.
Args:
model ([Union[`~PeftModel`, `~transformers.PreTrainedModel`, `nn.Module`]]):
@ -2439,6 +2694,19 @@ def get_layer_status(model: torch.nn.Module) -> list[TunerLayerStatus]:
requires_grad = {key: check_irrgular(vals) for key, vals in mapping_requires_grad_list.items()}
devices_dd = collections.defaultdict(list)
for adapter_module_name in module.adapter_layer_names + module.other_param_names:
adapter_module = getattr(module, adapter_module_name)
if isinstance(adapter_module, torch.nn.ModuleDict):
for key, submodule in adapter_module.items():
devices_dd[key].extend([param.device.type for param in submodule.parameters()])
elif isinstance(adapter_module, torch.nn.ParameterDict) or (
adapter_module.__class__.__name__ == "BufferDict"
): # VeRA
for key, param in adapter_module.items():
devices_dd[key].append(param.device.type)
devices = {key: sorted(set(val)) for key, val in devices_dd.items()}
status = TunerLayerStatus(
name=name,
module_type=repr(module).partition("(")[0],
@ -2447,6 +2715,7 @@ def get_layer_status(model: torch.nn.Module) -> list[TunerLayerStatus]:
merged_adapters=module.merged_adapters,
requires_grad=requires_grad,
available_adapters=sorted(module._get_available_adapters()),
devices=devices,
)
layer_status.append(status)
@ -2472,6 +2741,7 @@ class TunerModelStatus:
merged_adapters: list[str] | Literal["irregular"]
requires_grad: dict[str, bool | Literal["irregular"]]
available_adapters: list[str]
devices: dict[str, list[str]]
def get_model_status(model: torch.nn.Module) -> TunerModelStatus:
@ -2506,6 +2776,8 @@ def get_model_status(model: torch.nn.Module) -> TunerModelStatus:
work as expected.
- `available_adapters` (`list[str]`):
The names of the available adapters, e.g. `["default"]`.
- `devices` (`dict[str, list[str]]`):
The devices where the parameters of the given adapter are stored, e.g. `["cuda"]`.
Args:
model ([Union[`~PeftModel`, `~transformers.PreTrainedModel`, `nn.Module`]]):
@ -2597,6 +2869,12 @@ def get_model_status(model: torch.nn.Module) -> TunerModelStatus:
requires_grad = {key: check_irrgular(vals) for key, vals in requires_grad_all.items()}
devices_dd = collections.defaultdict(list)
for status in layer_status:
for key, val in status.devices.items():
devices_dd[key].extend(val)
devices = {key: sorted(set(val)) for key, val in devices_dd.items()}
adapter_model_status = TunerModelStatus(
base_model_type=base_model_type,
adapter_model_type=adapter_model_type,
@ -2609,5 +2887,6 @@ def get_model_status(model: torch.nn.Module) -> TunerModelStatus:
merged_adapters=merged_adapters,
requires_grad=requires_grad,
available_adapters=available_adapters,
devices=devices,
)
return adapter_model_status

View File

@ -18,7 +18,7 @@
# limitations under the License.
from .adaption_prompt import AdaptionPromptConfig, AdaptionPromptModel
from .lora import LoraConfig, LoraModel, LoftQConfig
from .lora import LoraConfig, LoraModel, LoftQConfig, LoraRuntimeConfig
from .loha import LoHaConfig, LoHaModel
from .lokr import LoKrConfig, LoKrModel
from .ia3 import IA3Config, IA3Model
@ -33,3 +33,7 @@ from .mixed import MixedModel
from .poly import PolyConfig, PolyModel
from .ln_tuning import LNTuningConfig, LNTuningModel
from .vera import VeraConfig, VeraModel
from .fourierft import FourierFTConfig, FourierFTModel
from .xlora import XLoraConfig, XLoraModel
from .hra import HRAConfig, HRAModel
from .vblora import VBLoRAConfig, VBLoRAModel

View File

@ -12,6 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
import warnings
from dataclasses import dataclass, field
from typing import Optional
@ -67,3 +68,10 @@ class AdaLoraConfig(LoraConfig):
# if target_modules is a regex expression, then layers_pattern should be None
if isinstance(self.target_modules, str) and self.layers_pattern is not None:
raise ValueError("`layers_pattern` cannot be used when `target_modules` is a str.")
# Check if 'r' has been set to a non-default value
if self.r != 8: # 8 is the default value for 'r' in LoraConfig
warnings.warn(
"Note that `r` is not used in AdaLora and will be ignored."
"If you intended to set the initial rank, use `init_r` instead."
)

View File

@ -35,7 +35,8 @@ class AdaLoraLayer(LoraLayer):
# List all names of layers that may contain adapter weights
# Note: ranknum doesn't need to be included as it is not an nn.Module
adapter_layer_names = ("lora_A", "lora_B", "lora_E", "lora_embedding_A", "lora_embedding_B")
# other_param_names is defined in LoraLayer
# All names of other parameters that may contain adapter-related parameters
other_param_names = ("r", "lora_alpha", "scaling", "lora_dropout", "ranknum")
def __init__(self, base_layer: nn.Module) -> None:
super().__init__(base_layer)
@ -72,16 +73,12 @@ class AdaLoraLayer(LoraLayer):
if init_lora_weights:
self.reset_lora_parameters(adapter_name)
if hasattr(self.get_base_layer(), "qweight"):
# QuantLinear
self.to(self.get_base_layer().qweight.device)
else:
self.to(self.get_base_layer().weight.device)
self._move_adapter_to_device_of_base_layer(adapter_name)
self.set_adapter(self.active_adapters)
def reset_lora_parameters(self, adapter_name):
if adapter_name in self.lora_A.keys():
nn.init.normal_(self.lora_E[adapter_name], mean=0.0, std=0.02)
nn.init.zeros_(self.lora_E[adapter_name])
nn.init.normal_(self.lora_A[adapter_name], mean=0.0, std=0.02)
nn.init.normal_(self.lora_B[adapter_name], mean=0.0, std=0.02)

View File

@ -42,15 +42,17 @@ class AdaLoraModel(LoraModel):
model ([`transformers.PreTrainedModel`]): The model to be adapted.
config ([`AdaLoraConfig`]): The configuration of the AdaLora model.
adapter_name (`str`): The name of the adapter, defaults to `"default"`.
low_cpu_mem_usage (`bool`, `optional`, defaults to `False`):
Create empty adapter weights on meta device. Useful to speed up the loading process.
Returns:
`torch.nn.Module`: The AdaLora model.
Example::
>>> from transformers import AutoModelForSeq2SeqLM, LoraConfig >>> from peft import AdaLoraModel, AdaLoraConfig
>>> from transformers import AutoModelForSeq2SeqLM >>> from peft import LoraConfig, AdaLoraModel, AdaLoraConfig
>>> config = AdaLoraConfig(
peft_type="ADALORA", task_type="SEQ_2_SEQ_LM", r=8, lora_alpha=32, target_modules=["q", "v"],
peft_type="ADALORA", task_type="SEQ_2_SEQ_LM", init_r=12, lora_alpha=32, target_modules=["q", "v"],
lora_dropout=0.01,
)
>>> model = AutoModelForSeq2SeqLM.from_pretrained("t5-base") >>> model = AdaLoraModel(model, config, "default")
@ -229,6 +231,8 @@ class AdaLoraModel(LoraModel):
try:
return super().__getattr__(name) # defer to nn.Module's logic
except AttributeError:
if name == "model": # see #1892: prevent infinite recursion if class is not initialized
raise
return getattr(self.model, name)
def forward(self, *args, **kwargs):

View File

@ -158,4 +158,6 @@ class AdaptionPromptModel(nn.Module):
except AttributeError:
# This is necessary as e.g. causal models have various methods that we
# don't want to re-implement here.
if name == "model": # see #1892: prevent infinite recursion if class is not initialized
raise
return getattr(self.model, name)

View File

@ -20,41 +20,77 @@ from __future__ import annotations
import math
import os
import warnings
from contextlib import contextmanager
from typing import Any, Optional, Union
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Function
from torch.utils.cpp_extension import load
from peft.tuners.tuners_utils import BaseTunerLayer, check_adapters_to_merge
os.environ["CC"] = "gcc"
os.environ["CXX"] = "gcc"
curr_dir = os.path.dirname(__file__)
_FBD_CUDA = None
# this function is a 1:1 copy from accelerate
@contextmanager
def patch_environment(**kwargs):
"""
A context manager that will add each keyword argument passed to `os.environ` and remove them when exiting.
Will convert the values in `kwargs` to strings and upper-case all the keys.
Example:
```python
>>> import os
>>> from accelerate.utils import patch_environment
>>> with patch_environment(FOO="bar"):
... print(os.environ["FOO"]) # prints "bar"
>>> print(os.environ["FOO"]) # raises KeyError
```
"""
existing_vars = {}
for key, value in kwargs.items():
key = key.upper()
if key in os.environ:
existing_vars[key] = os.environ[key]
os.environ[key] = str(value)
yield
for key in kwargs:
key = key.upper()
if key in existing_vars:
# restore previous value
os.environ[key] = existing_vars[key]
else:
os.environ.pop(key, None)
def get_fbd_cuda():
global _FBD_CUDA
if _FBD_CUDA is not None:
return _FBD_CUDA
# This import initializes cuda context and should thus be local, see issue 1877
from torch.utils.cpp_extension import load
curr_dir = os.path.dirname(__file__)
# need ninja to build the extension
try:
fbd_cuda = load(
name="fbd_cuda",
sources=[f"{curr_dir}/fbd/fbd_cuda.cpp", f"{curr_dir}/fbd/fbd_cuda_kernel.cu"],
verbose=True,
# build_directory='/tmp/' # for debugging
)
# extra_cuda_cflags = ['-std=c++14', '-ccbin=$$(which gcc-7)']) # cuda10.2 is not compatible with gcc9. Specify gcc 7
import fbd_cuda
with patch_environment(CC="gcc", CXX="gcc"):
fbd_cuda = load(
name="fbd_cuda",
sources=[f"{curr_dir}/fbd/fbd_cuda.cpp", f"{curr_dir}/fbd/fbd_cuda_kernel.cu"],
verbose=True,
# build_directory='/tmp/' # for debugging
)
# extra_cuda_cflags = ['-std=c++14', '-ccbin=$$(which gcc-7)']) # cuda10.2 is not compatible with gcc9. Specify gcc 7
except Exception as e:
warnings.warn(f"Failed to load the CUDA extension: {e}, check if ninja is available.")
warnings.warn("Setting boft_n_butterfly_factor to 1 to speed up the finetuning process.")
@ -228,6 +264,14 @@ class BOFTLayer(BaseTunerLayer):
"""
Update the linear layer with trainable BOFT weights. Override for other layer types.
"""
# Attempt to load the CUDA extension during model initialization
if not get_fbd_cuda():
self.fbd_cuda_available = False
# If the CUDA extension is not available, set the butterfly factor to 1 to speed up the finetuning process
boft_n_butterfly_factor = 1
else:
self.fbd_cuda_available = True
# to be consistent with the paper notation
boft_n_butterfly_factor = boft_n_butterfly_factor - 1
if boft_n_butterfly_factor < 0:
@ -301,7 +345,7 @@ class BOFTLayer(BaseTunerLayer):
perm_mat = self.perm2mat(perm)
P[i] = perm_mat
self.register_buffer("boft_P", P)
self.register_buffer("boft_P", P, persistent=False)
self.boft_R[adapter_name] = nn.Parameter(
torch.zeros(boft_n_butterfly_factor + 1, boft_block_num, boft_block_size, boft_block_size)
@ -310,18 +354,11 @@ class BOFTLayer(BaseTunerLayer):
self.reset_boft_parameters(adapter_name, init_weights)
weight = getattr(self, "weight", None)
if weight is not None:
# the layer is already completely initialized, this is an update
if weight.dtype.is_floating_point or weight.dtype.is_complex:
self.to(weight.device, dtype=weight.dtype)
else:
self.to(weight.device)
# set the boft block size and number
self.boft_block_size[adapter_name] = boft_block_size
self.boft_block_num[adapter_name] = boft_block_num
self._move_adapter_to_device_of_base_layer(adapter_name)
self.set_adapter(self.active_adapters)
def reset_boft_parameters(self, adapter_name, init_weights):
@ -441,14 +478,6 @@ class Linear(nn.Module, BOFTLayer):
self._active_adapter = adapter_name
# Attempt to load the CUDA extension during model initialization
if not get_fbd_cuda():
self.fbd_cuda_available = False
# If the CUDA extension is not available, set the butterfly factor to 1 to speed up the finetuning process
boft_n_butterfly_factor = 1
else:
self.fbd_cuda_available = True
self.update_layer(
adapter_name, boft_block_size, boft_block_num, boft_n_butterfly_factor, boft_dropout, init_weights
)
@ -490,7 +519,7 @@ class Linear(nn.Module, BOFTLayer):
f"NaNs detected in the merged weights. The adapter {active_adapter} seems to be broken"
)
self.base_layer.weight.data = orig_weight
self.base_layer.weight.data = orig_weight.contiguous()
else:
butterfly_oft_mat, boft_s = self.get_delta_weight(active_adapter)
orig_weight = base_layer.weight.data.clone()
@ -499,7 +528,7 @@ class Linear(nn.Module, BOFTLayer):
orig_weight = torch.transpose(orig_weight, 0, 1)
orig_weight = orig_weight * boft_s
self.base_layer.weight.data = orig_weight
self.base_layer.weight.data = orig_weight.contiguous()
self.merged_adapters.append(active_adapter)
@ -544,8 +573,9 @@ class Linear(nn.Module, BOFTLayer):
block_diagonal_butterfly = torch.block_diag(*torch.unbind(orth_rotate_butterfly))
block_diagonal_butterfly = block_diagonal_butterfly.unsqueeze(0)
butterfly_oft_mat_batch = torch.bmm(block_diagonal_butterfly, self.boft_P.permute(0, 2, 1))
butterfly_oft_mat_batch = torch.bmm(self.boft_P, butterfly_oft_mat_batch)
boft_P = self.boft_P.to(block_diagonal_butterfly.device)
butterfly_oft_mat_batch = torch.bmm(block_diagonal_butterfly, boft_P.permute(0, 2, 1))
butterfly_oft_mat_batch = torch.bmm(boft_P, butterfly_oft_mat_batch)
butterfly_oft_mat = butterfly_oft_mat_batch[0]
for i in range(1, butterfly_oft_mat_batch.shape[0]):
@ -563,8 +593,8 @@ class Linear(nn.Module, BOFTLayer):
elif self.merged:
result = self.base_layer(x, *args, **kwargs)
else:
boft_rotation = torch.eye(self.in_features, device=x.device)
boft_scale = torch.ones((int(self.out_features), 1), device=x.device)
boft_rotation = torch.eye(self.in_features, device=x.device, dtype=previous_dtype)
boft_scale = torch.ones((int(self.out_features), 1), device=x.device, dtype=previous_dtype)
for active_adapter in self.active_adapters:
if active_adapter not in self.boft_R.keys():
@ -585,8 +615,11 @@ class Linear(nn.Module, BOFTLayer):
block_diagonal_butterfly = torch.block_diag(*torch.unbind(orth_rotate_butterfly))
block_diagonal_butterfly = block_diagonal_butterfly.unsqueeze(0)
butterfly_oft_mat_batch = torch.bmm(block_diagonal_butterfly, self.boft_P.permute(0, 2, 1))
butterfly_oft_mat_batch = torch.bmm(self.boft_P, butterfly_oft_mat_batch)
# The BOFT author's cayley_batch, dropout and FastBlockDiag ONLY return fp32 outputs.
boft_P = self.boft_P.to(x)
block_diagonal_butterfly = block_diagonal_butterfly.to(x)
butterfly_oft_mat_batch = torch.bmm(block_diagonal_butterfly, boft_P.permute(0, 2, 1))
butterfly_oft_mat_batch = torch.bmm(boft_P, butterfly_oft_mat_batch)
butterfly_oft_mat = butterfly_oft_mat_batch[0]
for i in range(1, butterfly_oft_mat_batch.shape[0]):
@ -599,11 +632,16 @@ class Linear(nn.Module, BOFTLayer):
orig_weight = self.get_base_layer().weight.data
orig_weight = torch.transpose(orig_weight, 0, 1)
boft_rotation = boft_rotation.to(previous_dtype)
orig_weight = orig_weight.to(previous_dtype)
rotated_weight = torch.mm(boft_rotation, orig_weight)
rotated_weight = torch.transpose(rotated_weight, 0, 1)
scaled_rotated_weight = rotated_weight * boft_scale
scaled_rotated_weight = scaled_rotated_weight.to(previous_dtype)
if self.base_layer.bias is not None:
self.base_layer.bias = self.base_layer.bias.to(previous_dtype)
result = F.linear(input=x, weight=scaled_rotated_weight, bias=self.base_layer.bias)
result = result.to(previous_dtype)
@ -634,15 +672,6 @@ class Conv2d(nn.Module, BOFTLayer):
BOFTLayer.__init__(self, base_layer)
self._active_adapter = adapter_name
# Attempt to load the CUDA extension during model initialization
if not get_fbd_cuda():
self.fbd_cuda_available = False
# If the CUDA extension is not available, set the butterfly factor to 1 to speed up the finetuning process
boft_n_butterfly_factor = 1
else:
self.fbd_cuda_available = True
self.update_layer(
adapter_name, boft_block_size, boft_block_num, boft_n_butterfly_factor, boft_dropout, init_weights
)
@ -653,6 +682,15 @@ class Conv2d(nn.Module, BOFTLayer):
"""
Update the conv2d layer with trainable BOFT weights.
"""
# Attempt to load the CUDA extension during model initialization
if not get_fbd_cuda():
self.fbd_cuda_available = False
# If the CUDA extension is not available, set the butterfly factor to 1 to speed up the finetuning process
boft_n_butterfly_factor = 1
else:
self.fbd_cuda_available = True
# to be consistent with the paper notation
boft_n_butterfly_factor = boft_n_butterfly_factor - 1
if boft_n_butterfly_factor < 0:
@ -733,7 +771,7 @@ class Conv2d(nn.Module, BOFTLayer):
perm_mat = self.perm2mat(perm)
P[i] = perm_mat
self.register_buffer("boft_P", P)
self.register_buffer("boft_P", P, persistent=False)
self.boft_R[adapter_name] = nn.Parameter(
torch.zeros(boft_n_butterfly_factor + 1, boft_block_num, boft_block_size, boft_block_size)
@ -742,19 +780,13 @@ class Conv2d(nn.Module, BOFTLayer):
self.reset_boft_parameters(adapter_name, init_weights)
weight = getattr(self, "weight", None)
if weight is not None:
# the layer is already completely initialized, this is an update
if weight.dtype.is_floating_point or weight.dtype.is_complex:
self.to(weight.device, dtype=weight.dtype)
else:
self.to(weight.device)
self.set_adapter(self.active_adapters)
# set the boft block size and number
self.boft_block_size[adapter_name] = boft_block_size
self.boft_block_num[adapter_name] = boft_block_num
self._move_adapter_to_device_of_base_layer(adapter_name)
self.set_adapter(self.active_adapters)
def merge(self, safe_merge: bool = False, adapter_names: Optional[list[str]] = None) -> None:
"""
Merge the active adapter weights into the base weights
@ -791,7 +823,7 @@ class Conv2d(nn.Module, BOFTLayer):
self.out_features, self.in_features, base_layer.kernel_size[0], base_layer.kernel_size[0]
)
self.base_layer.weight.data = orig_weight
self.base_layer.weight.data = orig_weight.contiguous()
else:
butterfly_oft_mat, boft_s = self.get_delta_weight(active_adapter)
@ -805,7 +837,7 @@ class Conv2d(nn.Module, BOFTLayer):
self.out_features, self.in_features, base_layer.kernel_size[0], base_layer.kernel_size[0]
)
self.base_layer.weight.data = orig_weight
self.base_layer.weight.data = orig_weight.contiguous()
self.merged_adapters.append(active_adapter)
@ -860,8 +892,9 @@ class Conv2d(nn.Module, BOFTLayer):
block_diagonal_butterfly = torch.block_diag(*torch.unbind(orth_rotate_butterfly))
block_diagonal_butterfly = block_diagonal_butterfly.unsqueeze(0)
butterfly_oft_mat_batch = torch.bmm(block_diagonal_butterfly, self.boft_P.permute(0, 2, 1))
butterfly_oft_mat_batch = torch.bmm(self.boft_P, butterfly_oft_mat_batch)
boft_P = self.boft_P.to(block_diagonal_butterfly.device)
butterfly_oft_mat_batch = torch.bmm(block_diagonal_butterfly, boft_P.permute(0, 2, 1))
butterfly_oft_mat_batch = torch.bmm(boft_P, butterfly_oft_mat_batch)
butterfly_oft_mat = butterfly_oft_mat_batch[0]
for i in range(1, butterfly_oft_mat_batch.shape[0]):
@ -880,9 +913,11 @@ class Conv2d(nn.Module, BOFTLayer):
result = self.base_layer(x, *args, **kwargs)
else:
boft_rotation = torch.eye(
self.in_features * self.base_layer.kernel_size[0] * self.base_layer.kernel_size[0], device=x.device
self.in_features * self.base_layer.kernel_size[0] * self.base_layer.kernel_size[0],
device=x.device,
dtype=x.dtype,
)
boft_scale = torch.ones((1, int(self.out_features)), device=x.device)
boft_scale = torch.ones((1, int(self.out_features)), device=x.device, dtype=x.dtype)
for active_adapter in self.active_adapters:
if active_adapter not in self.boft_R.keys():
@ -903,8 +938,10 @@ class Conv2d(nn.Module, BOFTLayer):
block_diagonal_butterfly = torch.block_diag(*torch.unbind(orth_rotate_butterfly))
block_diagonal_butterfly = block_diagonal_butterfly.unsqueeze(0)
butterfly_oft_mat_batch = torch.bmm(block_diagonal_butterfly, self.boft_P.permute(0, 2, 1))
butterfly_oft_mat_batch = torch.bmm(self.boft_P, butterfly_oft_mat_batch)
boft_P = self.boft_P.to(x)
block_diagonal_butterfly = block_diagonal_butterfly.to(x)
butterfly_oft_mat_batch = torch.bmm(block_diagonal_butterfly, boft_P.permute(0, 2, 1))
butterfly_oft_mat_batch = torch.bmm(boft_P, butterfly_oft_mat_batch)
butterfly_oft_mat = butterfly_oft_mat_batch[0]
for i in range(1, butterfly_oft_mat_batch.shape[0]):

View File

@ -24,7 +24,12 @@ import torch
from torch import nn
from tqdm import tqdm
from peft.tuners.tuners_utils import BaseTuner, BaseTunerLayer, check_target_module_exists
from peft.tuners.tuners_utils import (
BaseTuner,
BaseTunerLayer,
check_target_module_exists,
onload_layer,
)
from peft.utils import (
TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING,
ModulesToSaveWrapper,
@ -44,6 +49,8 @@ class BOFTModel(BaseTuner):
model ([`transformers.PreTrainedModel`]): The model to be adapted.
config ([`BOFTConfig`]): The configuration of the BOFT model.
adapter_name (`str`): The name of the adapter, defaults to `"default"`.
low_cpu_mem_usage (`bool`, `optional`, defaults to `False`):
Create empty adapter weights on meta device. Useful to speed up the loading process.
Returns:
`torch.nn.Module`: The BOFT model.
@ -67,8 +74,8 @@ class BOFTModel(BaseTuner):
prefix: str = "boft_"
def __init__(self, model, config, adapter_name) -> None:
super().__init__(model, config, adapter_name)
def __init__(self, model, config, adapter_name, low_cpu_mem_usage: bool = False) -> None:
super().__init__(model, config, adapter_name, low_cpu_mem_usage=low_cpu_mem_usage)
def _check_new_adapter_config(self, config: BOFTConfig) -> None:
"""
@ -151,10 +158,12 @@ class BOFTModel(BaseTuner):
new_module.state = child.state
new_module.to(child.weight.device)
meta = torch.device("meta")
# dispatch to correct device
for name, module in new_module.named_modules():
if self.prefix in name:
module.to(child.weight.device)
if not any(p.device == meta for p in module.parameters()):
module.to(child.weight.device)
def _mark_only_adapters_as_trainable(self, model: nn.Module) -> None:
for n, p in model.named_parameters():
@ -207,6 +216,8 @@ class BOFTModel(BaseTuner):
try:
return super().__getattr__(name) # defer to nn.Module's logic
except AttributeError:
if name == "model": # see #1892: prevent infinite recursion if class is not initialized
raise
return getattr(self.model, name)
def get_peft_config_as_dict(self, inference: bool = False):
@ -263,7 +274,9 @@ class BOFTModel(BaseTuner):
safe_merge: bool = False,
adapter_names: Optional[List[str]] = None,
):
self._unloading_checks(adapter_names)
if merge:
self._check_merge_allowed()
key_list = [key for key, _ in self.model.named_modules() if self.prefix not in key]
desc = "Unloading " + ("and merging " if merge else "") + "model"
for key in tqdm(key_list, disable=not progressbar, desc=desc):
@ -271,14 +284,20 @@ class BOFTModel(BaseTuner):
parent, target, target_name = _get_submodules(self.model, key)
except AttributeError:
continue
if hasattr(target, "base_layer"):
if merge:
target.merge(safe_merge=safe_merge, adapter_names=adapter_names)
self._replace_module(parent, target_name, target.get_base_layer(), target)
elif isinstance(target, ModulesToSaveWrapper):
# save any additional trainable modules part of `modules_to_save`
setattr(parent, target_name, target.modules_to_save[target.active_adapter])
with onload_layer(target):
if hasattr(target, "base_layer"):
if merge:
target.merge(safe_merge=safe_merge, adapter_names=adapter_names)
self._replace_module(parent, target_name, target.get_base_layer(), target)
elif isinstance(target, ModulesToSaveWrapper):
# save any additional trainable modules part of `modules_to_save`
new_module = target.modules_to_save[target.active_adapter]
if hasattr(new_module, "base_layer"):
# check if the module is itself a tuner layer
if merge:
new_module.merge(safe_merge=safe_merge, adapter_names=adapter_names)
new_module = new_module.get_base_layer()
setattr(parent, target_name, new_module)
return self.model

View File

@ -0,0 +1,20 @@
# Copyright 2024-present the HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from .config import FourierFTConfig
from .layer import FourierFTLayer, FourierFTLinear
from .model import FourierFTModel
__all__ = ["FourierFTConfig", "FourierFTLayer", "FourierFTLinear", "FourierFTModel"]

View File

@ -0,0 +1,188 @@
# Copyright 2024-present the HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import annotations
from dataclasses import dataclass, field
from typing import Optional, Union
from peft.config import PeftConfig
from peft.utils import PeftType
@dataclass
class FourierFTConfig(PeftConfig):
"""
This is the configuration class to store the configuration of a [`FourierFTModel`].
Args:
n_frequency (`int`):
Num of learnable frequencies for the Discrete Fourier Transform. 'n_frequency' is an integer that is
greater than 0 and less than or equal to d^2 (assuming the weight W has dimensions of d by d).
Additionally, it is the number of trainable parameters required to update each delta W weight.
'n_frequency' will affect the performance and efficiency for PEFT. Specifically, it has little impact on
training speed, but higher values of it (typically) result in larger GPU memory costs and better accuracy.
With the same `target_modules`, the number of parameters of LoRA is (2*d*r/n_frequency) times that of
FourierFT. The following examples of settings regarding 'n_frequency' can be used as reference for users.
For NLU tasks with the RoBERTa-large model, adopting 'n_frequency': 1000 can almost achieve similar results
as 'r': 8 in LoRA. At this time, the number of parameters of LoRA is about 16 times that of FourierFT. For
image classification tasks with Vit-large models, adopting 'n_frequency': 3000 can almost achieve similar
results as 'r': 16 in LoRA, where the number of parameters of LoRA is about 11 times that of FourierFT.
scaling (`float`):
The scaling value for the delta W matrix. This is an important hyperparameter used for scaling, similar to
the 'lora_alpha' parameter in the LoRA method. 'scaling' can be determined during the hyperparameter search
process. However, if users want to skip this process, one can refer to the settings in the following
scenarios. This parameter can be set to 100.0 or 150.0 for both RoBERTa-base and RoBERTa-large models
across all NLU (GLUE) tasks. This parameter can be set to 300.0 for both LLaMA family models for all
instruction tuning. This parameter can be set to 300.0 for both ViT-base and ViT-large models across all
image classification tasks.
random_loc_seed (`int`):
Seed for the random location of the frequencies, i.e., the spectral entry matrix.
target_modules (`Union[list[str],str]`):
List of module names or regex expression of the module names to replace with FourierFT. For example, ['q',
'v'] or '.*decoder.*(SelfAttention|EncDecAttention).*(q|v)$'. Only linear layers are supported.
fan_in_fan_out (`bool`):
Set this to True if the layer to replace stores weight like (fan_in, fan_out).
bias (`str`):
Bias type for FourierFT. Can be 'none', 'all' or 'fourier_only'.
modules_to_save (`list[str]`):
List of modules apart from FourierFT layers to be set as trainable and saved in the final checkpoint. For
example, in Sequence Classification or Token Classification tasks, the final layer `classifier/score` are
randomly initialized and as such need to be trainable and saved.
layers_to_transform (`Union[list[int],int]`):
The layer indexes to transform, is this argument is specified, PEFT will transform only the layers indexes
that are specified inside this list. If a single integer is passed, PEFT will transform only the layer at
this index.
layers_pattern (`str`):
The layer pattern name, used only if `layers_to_transform` is different to None and if the layer pattern is
not in the common layers pattern.
n_frequency_pattern (`dict`):
The mapping from layer names or regexp expression to n_frequency which are different from the default
specified. For example, `{model.decoder.layers.0.encoder_attn.k_proj: 1000`}.
init_weights (`bool`):
The initialization of the Fourier weights. Set this to False if the spectrum are initialized to a standard
normal distribution. Set this to True if the spectrum are initialized to zeros.
"""
n_frequency: int = field(
default=1000,
metadata={
"help": (
"Num of learnable frequencies for the Discrete Fourier Transform. 'n_frequency' is an integer that is"
"greater than 0 and less than or equal to d^2 (assuming the weight W has dimensions of d by d)."
"Additionally, it is the number of trainable parameters required to update each delta W weight."
"'n_frequency' will affect the performance and efficiency for PEFT. Specifically, it has little impact on"
"training speed, but higher values of it (typically) result in larger GPU memory costs and better accuracy."
"With the same `target_modules`, the number of parameters of LoRA is (2*d*r/n_frequency) times that of FourierFT."
"The following examples of settings regarding 'n_frequency' can be used as reference for users. For NLU"
"tasks with the RoBERTa-large model, adopting 'n_frequency': 1000 can almost achieve similar results as"
"'r': 8 in LoRA. At this time, the number of parameters of LoRA is about 16 times that of FourierFT."
"For image classification tasks with Vit-large models, adopting 'n_frequency': 3000 can almost achieve"
"similar results as 'r': 16 in LoRA, where the number of parameters of LoRA is about 11 times that of FourierFT."
)
},
)
scaling: float = field(
default=150.0,
metadata={
"help": (
"The scaling value for the delta W matrix. This is an important hyperparameter used for scaling, similar to the"
"'lora_alpha' parameter in the LoRA method. 'scaling' can be determined during the hyperparameter search process."
"However, if users want to skip this process, one can refer to the settings in the following scenarios."
"This parameter can be set to 100.0 or 150.0 for both RoBERTa-base and RoBERTa-large models across all NLU (GLUE) tasks."
"This parameter can be set to 300.0 for both LLaMA family models for all instruction tuning."
"This parameter can be set to 300.0 for both ViT-base and ViT-large models across all image classification tasks."
)
},
)
random_loc_seed: Optional[int] = field(
default=777, metadata={"help": "Seed for the random location of the frequencies."}
)
fan_in_fan_out: bool = field(
default=False,
metadata={"help": "Set this to True if the layer to replace stores weight like (fan_in, fan_out)"},
)
target_modules: Optional[Union[list[str], str]] = field(
default=None,
metadata={
"help": (
"List of module names or regex expression of the module names to replace with FourierFT."
"For example, ['q', 'v'] or '.*decoder.*(SelfAttention|EncDecAttention).*(q|v)$'. "
"Only linear layers are supported."
)
},
)
bias: str = field(
default="none", metadata={"help": "Bias type for FourierFT. Can be 'none', 'all' or 'fourier_only'."}
)
modules_to_save: Optional[list[str]] = field(
default=None,
metadata={
"help": (
"List of modules apart from FourierFT layers to be set as trainable and saved in the final checkpoint. For"
" example, in Sequence Classification or Token Classification tasks, the final layer"
" `classifier/score` are randomly initialized and as such need to be trainable and saved."
)
},
)
layers_to_transform: Optional[Union[list[int], int]] = field(
default=None,
metadata={
"help": (
"The layer indexes to transform, is this argument is specified, PEFT will transform only the layers"
" indexes that are specified inside this list. If a single integer is passed, PEFT will transform only"
" the layer at this index."
)
},
)
layers_pattern: Optional[str] = field(
default=None,
metadata={
"help": (
"The layer pattern name, used only if `layers_to_transform` is different to None and if the layer"
" pattern is not in the common layers pattern."
)
},
)
n_frequency_pattern: Optional[dict] = field(
default_factory=dict,
metadata={
"help": (
"The mapping from layer names or regexp expression to n_frequency which are different from the default specified."
"For example, `{model.decoder.layers.0.encoder_attn.k_proj: 500`}."
)
},
)
init_weights: bool = field(
default=False,
metadata={
"help": (
"The initialization of the Fourier weights. Set this to False if the spectrum should be initialized to a standard normal distribution."
"Set this to True if the spectrum should be initialized to zeros."
)
},
)
def __post_init__(self):
self.peft_type = PeftType.FOURIERFT
self.target_modules = (
set(self.target_modules) if isinstance(self.target_modules, list) else self.target_modules
)
# if target_modules is a regex expression, then layers_to_transform should be None
if isinstance(self.target_modules, str) and self.layers_to_transform is not None:
raise ValueError("`layers_to_transform` cannot be used when `target_modules` is a str.")
# if target_modules is a regex expression, then layers_pattern should be None
if isinstance(self.target_modules, str) and self.layers_pattern is not None:
raise ValueError("`layers_pattern` cannot be used when `target_modules` is a str.")

View File

@ -0,0 +1,190 @@
# Copyright 2024-present the HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import warnings
from typing import Any, List, Optional, Union
import torch
import torch.nn as nn
import torch.nn.functional as F
from transformers.pytorch_utils import Conv1D
from peft.tuners.tuners_utils import BaseTunerLayer, check_adapters_to_merge
class FourierFTLayer(BaseTunerLayer):
# All names of layers that may contain (trainable) adapter weights
adapter_layer_names = ("fourierft_spectrum",)
# All names of other parameters that may contain adapter-related parameters
other_param_names = ("fourierft_n_frequency", "fourierft_scaling", "fourierft_random_loc_seed")
def __init__(self, base_layer: nn.Module, **kwargs) -> None:
self.base_layer = base_layer
self.fourierft_n_frequency = {}
self.fourierft_scaling = {}
self.fourierft_spectrum = nn.ParameterDict({})
self.indices = {}
self.fourierft_random_loc_seed = {}
# Mark the weight as unmerged
self._disable_adapters = False
self.merged_adapters = []
self.kwargs = kwargs
base_layer = self.get_base_layer()
if isinstance(base_layer, nn.Linear):
self.in_features, self.out_features = base_layer.in_features, base_layer.out_features
elif isinstance(base_layer, Conv1D):
self.in_features, self.out_features = (
base_layer.weight.ds_shape if hasattr(base_layer.weight, "ds_shape") else base_layer.weight.shape
)
else:
raise ValueError(f"Unsupported layer type {type(base_layer)}")
def update_layer(self, adapter_name, n_frequency, scaling, init_weights, random_loc_seed):
if n_frequency <= 0:
raise ValueError(f"`n_frequency` should be a positive integer value but the value passed is {n_frequency}")
if n_frequency > self.in_features * self.out_features:
raise ValueError(
f"`n_frequency` should be less than or equal to the product of the input and output dimensions "
f"but the value passed is {n_frequency} and the product is {self.in_features * self.out_features}"
)
self.fourierft_n_frequency[adapter_name] = n_frequency
self.fourierft_random_loc_seed[adapter_name] = random_loc_seed
self.indices[adapter_name] = torch.randperm(
self.out_features * self.in_features,
generator=torch.Generator().manual_seed(self.fourierft_random_loc_seed[adapter_name]),
)[:n_frequency]
self.indices[adapter_name] = torch.stack(
[self.indices[adapter_name] // self.in_features, self.indices[adapter_name] % self.in_features], dim=0
)
self.fourierft_scaling[adapter_name] = scaling
# Actual trainable parameters
self.fourierft_spectrum[adapter_name] = nn.Parameter(torch.randn(n_frequency), requires_grad=True)
if init_weights:
self.reset_fourier_parameters(adapter_name)
self._move_adapter_to_device_of_base_layer(adapter_name)
self.set_adapter(self.active_adapters)
@torch.no_grad()
def reset_fourier_parameters(self, adapter_name):
if adapter_name in self.fourierft_spectrum.keys():
nn.init.zeros_(self.fourierft_spectrum[adapter_name])
def get_delta_weight(self, adapter) -> torch.Tensor:
spectrum = self.fourierft_spectrum[adapter]
indices = self.indices[adapter].to(spectrum.device)
dense_spectrum = torch.zeros(self.out_features, self.in_features, device=spectrum.device, dtype=spectrum.dtype)
dense_spectrum[indices[0, :], indices[1, :]] = spectrum
delta_weight = torch.fft.ifft2(dense_spectrum).real * self.fourierft_scaling[adapter]
return delta_weight
class FourierFTLinear(nn.Module, FourierFTLayer):
# FourierFT implemented in a dense layer
def __init__(
self,
base_layer,
adapter_name: str,
n_frequency: int = 1000,
scaling: float = 150.0,
fan_in_fan_out: bool = False, # Set this to True if the layer to replace stores weight like (fan_in, fan_out)
init_weights: Union[bool, str] = False,
random_loc_seed: int = 777,
**kwargs,
) -> None:
super().__init__()
FourierFTLayer.__init__(self, base_layer, **kwargs)
self.fan_in_fan_out = fan_in_fan_out
self._active_adapter = adapter_name
self.update_layer(adapter_name, n_frequency, scaling, init_weights, random_loc_seed)
def merge(self, safe_merge: bool = False, adapter_names: Optional[List[str]] = None) -> None:
"""
Merge the active adapter weights into the base weights
Args:
safe_merge (`bool`, *optional*):
If True, the merge operation will be performed in a copy of the original weights and check for NaNs
before merging the weights. This is useful if you want to check if the merge operation will produce
NaNs. Defaults to `False`.
adapter_names (`List[str]`, *optional*):
The list of adapter names that should be merged. If None, all active adapters will be merged. Defaults
to `None`.
"""
adapter_names = check_adapters_to_merge(self, adapter_names)
if not adapter_names:
# no adapter to merge
return
for active_adapter in adapter_names:
if active_adapter in self.fourierft_spectrum.keys():
base_layer = self.get_base_layer()
if safe_merge:
# Note that safe_merge will be slower than the normal merge
# because of the copy operation.
orig_weights = base_layer.weight.data.clone()
orig_weights += self.get_delta_weight(active_adapter)
if not torch.isfinite(orig_weights).all():
raise ValueError(
f"NaNs detected in the merged weights. The adapter {active_adapter} seems to be broken"
)
base_layer.weight.data = orig_weights
else:
base_layer.weight.data += self.get_delta_weight(active_adapter)
self.merged_adapters.append(active_adapter)
def unmerge(self) -> None:
"""
This method unmerges all merged adapter layers from the base weights.
"""
if not self.merged:
warnings.warn("Already unmerged. Nothing to do.")
return
while len(self.merged_adapters) > 0:
active_adapter = self.merged_adapters.pop()
if active_adapter in self.fourierft_spectrum.keys():
self.get_base_layer().weight.data -= self.get_delta_weight(active_adapter)
def get_delta_weight(self, adapter) -> torch.Tensor:
return super().get_delta_weight(adapter)
def forward(self, x: torch.Tensor, *args: Any, **kwargs: Any) -> torch.Tensor:
previous_dtype = x.dtype
if self.disable_adapters:
if self.merged:
self.unmerge()
result = self.base_layer(x, *args, **kwargs)
elif self.merged:
result = self.base_layer(x, *args, **kwargs)
else:
result = self.base_layer(x, *args, **kwargs)
for active_adapter in self.active_adapters:
if active_adapter not in self.fourierft_spectrum.keys():
continue
delta_w = self.get_delta_weight(active_adapter)
x = x.to(delta_w.dtype)
result = result + F.linear(x, delta_w)
result = result.to(previous_dtype)
return result
def __repr__(self) -> str:
rep = super().__repr__()
return "fourierft." + rep

View File

@ -0,0 +1,350 @@
# Copyright 2024-present the HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import annotations
import re
import warnings
from dataclasses import asdict
from enum import Enum
from itertools import chain
from typing import Optional
import torch
from tqdm import tqdm
from transformers.pytorch_utils import Conv1D
from peft.tuners.tuners_utils import BaseTuner, BaseTunerLayer, check_target_module_exists
from peft.utils import (
TRANSFORMERS_MODELS_TO_FOURIERFT_TARGET_MODULES_MAPPING,
ModulesToSaveWrapper,
_get_submodules,
)
from .config import FourierFTConfig
from .layer import FourierFTLayer, FourierFTLinear
class FourierFTModel(BaseTuner):
"""
Creates FourierFT model from a pretrained transformers model.
The method is described in detail in https://arxiv.org/abs/2405.03003.
Args:
model ([`torch.nn.Module`]): The model to be adapted.
config ([`FourierFTConfig`]): The configuration of the FourierFT model.
adapter_name (`str`): The name of the adapter, defaults to `"default"`.
low_cpu_mem_usage (`bool`, `optional`, defaults to `False`):
Create empty adapter weights on meta device. Useful to speed up the loading process.
Returns:
`torch.nn.Module`: The FourierFT model.
**Attributes**:
- **model** ([`~transformers.PreTrainedModel`]) -- The model to be adapted.
- **peft_config** ([`FourierFTConfig`]): The configuration of the Fourier model.
"""
prefix: str = "fourierft_"
def __init__(self, model, config, adapter_name, low_cpu_mem_usage: bool = False) -> None:
super().__init__(model, config, adapter_name, low_cpu_mem_usage=low_cpu_mem_usage)
def _check_new_adapter_config(self, config: FourierFTConfig) -> None:
"""
A helper method to check the config when a new adapter is being added.
Raise a ValueError if there is something wrong with the config or if it conflicts with existing adapters.
"""
# TODO: there should be a check if any of the existing adapters actually has bias != "none", or else the check
# does not fully correspond to the error message.
if (len(self.peft_config) > 1) and (config.bias != "none"):
raise ValueError(
f"{self.__class__.__name__} supports only 1 adapter with bias. When using multiple adapters, "
"set bias to 'none' for all adapters."
)
@staticmethod
def _check_target_module_exists(fourierft_config, key):
return check_target_module_exists(fourierft_config, key)
def _create_and_replace(
self,
fourierft_config,
adapter_name,
target,
target_name,
parent,
current_key,
**optional_kwargs,
):
if current_key is None:
raise ValueError("Current Key shouldn't be `None`")
# Regexp matching - Find key which matches current target_name in patterns provided
pattern_keys = list(chain(fourierft_config.n_frequency_pattern.keys()))
target_name_key = next(filter(lambda key: re.match(rf".*\.{key}$", current_key), pattern_keys), current_key)
n_frequency = fourierft_config.n_frequency_pattern.get(target_name_key, fourierft_config.n_frequency)
scaling = fourierft_config.scaling
random_loc_seed = fourierft_config.random_loc_seed
bias = hasattr(target, "bias") and target.bias is not None
kwargs = {
"n_frequency": n_frequency,
"scaling": scaling,
"fan_in_fan_out": fourierft_config.fan_in_fan_out,
"init_weights": fourierft_config.init_weights,
"random_loc_seed": fourierft_config.random_loc_seed,
}
kwargs["bias"] = bias
if isinstance(target, FourierFTLayer):
target.update_layer(
adapter_name,
n_frequency,
scaling,
fourierft_config.init_weights,
random_loc_seed,
)
else:
new_module = self._create_new_module(fourierft_config, adapter_name, target, **kwargs)
if adapter_name != self.active_adapter:
# adding an additional adapter: it is not automatically trainable
new_module.requires_grad_(False)
self._replace_module(parent, target_name, new_module, target)
def _replace_module(self, parent, child_name, new_module, child):
setattr(parent, child_name, new_module)
# It's not necessary to set requires_grad here, as that is handled by
# _mark_only_adapters_as_trainable
# child layer wraps the original module, unpack it
if hasattr(child, "base_layer"):
child = child.base_layer
if not hasattr(new_module, "base_layer"):
new_module.weight = child.weight
if hasattr(child, "bias"):
new_module.bias = child.bias
if getattr(child, "state", None) is not None:
if hasattr(new_module, "base_layer"):
new_module.base_layer.state = child.state
else:
new_module.state = child.state
new_module.to(child.weight.device)
meta = torch.device("meta")
# dispatch to correct device
for name, module in new_module.named_modules():
if "fourierft_" in name:
if not any(p.device == meta for p in module.parameters()):
module.to(child.weight.device)
def _mark_only_adapters_as_trainable(self, model: torch.nn.Module) -> None:
for n, p in model.named_parameters():
if self.prefix not in n:
p.requires_grad = False
for active_adapter in self.active_adapters:
bias = self.peft_config[active_adapter].bias
if bias == "none":
continue
if bias == "all":
for n, p in model.named_parameters():
if "bias" in n:
p.requires_grad = True
elif bias == "fourier_only":
for m in model.modules():
if isinstance(m, FourierFTLayer) and hasattr(m, "bias") and m.bias is not None:
m.bias.requires_grad = True
else:
raise NotImplementedError(f"Requested bias: {bias}, is not implemented.")
@staticmethod
def _create_new_module(fourierft_config, adapter_name, target, **kwargs):
if isinstance(target, BaseTunerLayer):
target_base_layer = target.get_base_layer()
else:
target_base_layer = target
if isinstance(target_base_layer, torch.nn.Linear):
if kwargs["fan_in_fan_out"]:
warnings.warn(
"fan_in_fan_out is set to True but the target module is `torch.nn.Linear`. "
"Setting fan_in_fan_out to False."
)
kwargs["fan_in_fan_out"] = fourierft_config.fan_in_fan_out = False
elif isinstance(target_base_layer, Conv1D):
kwargs["is_target_conv_1d_layer"] = True
if not kwargs["fan_in_fan_out"]:
warnings.warn(
"fan_in_fan_out is set to False but the target module is `Conv1D`. "
"Setting fan_in_fan_out to True."
)
kwargs["fan_in_fan_out"] = fourierft_config.fan_in_fan_out = True
else:
raise ValueError(
f"Target module {target} is not supported. Currently, only the following modules are supported: "
"`torch.nn.Linear`."
)
new_module = FourierFTLinear(target, adapter_name, **kwargs)
return new_module
def __getattr__(self, name: str):
"""Forward missing attributes to the wrapped module."""
try:
return super().__getattr__(name) # defer to nn.Module's logic
except AttributeError:
if name == "model":
raise
return getattr(self.model, name)
def get_peft_config_as_dict(self, inference: bool = False):
config_dict = {}
for key, value in self.peft_config.items():
config = {k: v.value if isinstance(v, Enum) else v for k, v in asdict(value).items()}
if inference:
config["inference_mode"] = True
config_dict[key] = config
return config
def _set_adapter_layers(self, enabled: bool = True) -> None:
for module in self.model.modules():
if isinstance(module, (BaseTunerLayer, ModulesToSaveWrapper)):
module.enable_adapters(enabled)
def enable_adapter_layers(self) -> None:
"""Enable all adapters.
Call this if you have previously disabled all adapters and want to re-enable them.
"""
self._set_adapter_layers(enabled=True)
def disable_adapter_layers(self) -> None:
"""Disable all adapters.
When disabling all adapters, the model output corresponds to the output of the base model.
"""
for active_adapter in self.active_adapters:
val = self.peft_config[active_adapter].bias
if val != "none":
msg = (
f"Careful, disabling adapter layers with bias configured to be '{val}' does not produce the same "
"output as the the base model would without adaption."
)
warnings.warn(msg)
self._set_adapter_layers(enabled=False)
def set_adapter(self, adapter_name: str | list[str]) -> None:
"""Set the active adapter(s).
Args:
adapter_name (`str` or `list[str]`): Name of the adapter(s) to be activated.
"""
for module in self.model.modules():
if isinstance(module, FourierFTLayer):
if module.merged:
warnings.warn("Adapter cannot be set when the model is merged. Unmerging the model first.")
module.unmerge()
module.set_adapter(adapter_name)
self.active_adapter = adapter_name
@staticmethod
def _prepare_adapter_config(peft_config, model_config):
if peft_config.target_modules is None:
if model_config["model_type"] not in TRANSFORMERS_MODELS_TO_FOURIERFT_TARGET_MODULES_MAPPING:
raise ValueError("Please specify `target_modules` in `peft_config`")
peft_config.target_modules = set(
TRANSFORMERS_MODELS_TO_FOURIERFT_TARGET_MODULES_MAPPING[model_config["model_type"]]
)
return peft_config
def _unload_and_optionally_merge(
self,
merge=True,
progressbar: bool = False,
safe_merge: bool = False,
adapter_names: Optional[list[str]] = None,
):
key_list = [key for key, _ in self.model.named_modules() if self.prefix not in key]
desc = "Unloading " + ("and merging " if merge else "") + "model"
for key in tqdm(key_list, disable=not progressbar, desc=desc):
try:
parent, target, target_name = _get_submodules(self.model, key)
except AttributeError:
continue
if hasattr(target, "base_layer"):
if merge:
target.merge(safe_merge=safe_merge, adapter_names=adapter_names)
self._replace_module(parent, target_name, target.get_base_layer(), target)
elif isinstance(target, ModulesToSaveWrapper):
# save any additional trainable modules part of `modules_to_save`
setattr(parent, target_name, target.modules_to_save[target.active_adapter])
return self.model
def delete_adapter(self, adapter_name: str):
"""
Deletes an existing adapter.
Args:
adapter_name (str): Name of the adapter to be deleted.
"""
if adapter_name not in list(self.peft_config.keys()):
raise ValueError(f"Adapter {adapter_name} does not exist")
del self.peft_config[adapter_name]
# we cannot use self.prefix as we want to include non-trainable fourierft parameters
key_list = [key for key, _ in self.model.named_modules() if "fourierft" not in key]
new_adapter = None
for key in key_list:
_, target, _ = _get_submodules(self.model, key)
if isinstance(target, FourierFTLayer):
target.delete_adapter(adapter_name)
if new_adapter is None:
new_adapter = target.active_adapter[:]
self.active_adapter = new_adapter or []
def merge_and_unload(
self, progressbar: bool = False, safe_merge: bool = False, adapter_names: Optional[list[str]] = None
) -> torch.nn.Module:
r"""
This method merges the Fourier layers into the base model. This is needed if someone wants to use the base
model as a standalone model.
Args:
progressbar (`bool`):
whether to show a progressbar indicating the unload and merge process
safe_merge (`bool`):
whether to activate the safe merging check to check if there is any potential Nan in the adapter
weights
adapter_names (`List[str]`, *optional*):
The list of adapter names that should be merged. If None, all active adapters will be merged. Defaults
to `None`.
"""
return self._unload_and_optionally_merge(
progressbar=progressbar, safe_merge=safe_merge, adapter_names=adapter_names
)
def unload(self) -> torch.nn.Module:
"""
Gets back the base model by removing all the Fourier modules without merging. This gives back the original base
model.
"""
return self._unload_and_optionally_merge(merge=False)

View File

@ -0,0 +1,20 @@
# Copyright 2024-present the HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from .config import HRAConfig
from .layer import HRAConv2d, HRALayer, HRALinear
from .model import HRAModel
__all__ = ["HRAConfig", "HRAModel", "HRAConv2d", "HRALinear", "HRALayer"]

View File

@ -0,0 +1,116 @@
# Copyright 2024-present the HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from dataclasses import dataclass, field
from typing import List, Optional, Union
from peft.config import PeftConfig
from peft.utils import PeftType
@dataclass
class HRAConfig(PeftConfig):
"""
This is the configuration class to store the configuration of a [`HRAModel`].
Args:
r (`int`):
The rank of HRA across different layers. It is best to set 'r' to an even number; otherwise, the default
initialization method will not work.
apply_GS (`bool`):
Whether to apply Gram-Schmidt orthogonalization.
target_modules (`Optional[Union[List[str], str]]`):
The names of the modules to apply the adapter to. If this is specified, only the modules with the specified
names will be replaced. When passing a string, a regex match will be performed. When passing a list of
strings, either an exact match will be performed or it is checked if the name of the module ends with any
of the passed strings. If this is specified as 'all-linear', then all linear modules are chosen, excluding
the output layer. If this is not specified, modules will be chosen according to the model architecture. If
the architecture is not known, an error will be raised -- in this case, you should specify the target
modules manually.
init_weights (`bool`):
Whether to perform initialization of HRA weights.
layers_to_transform (`Union[List[int], int]`):
The layer indices to transform. If a list of ints is passed, it will apply the adapter to the layer indices
that are specified in this list. If a single integer is passed, it will apply the transformations on the
layer at this index.
layers_pattern (`str`):
The layer pattern name, used only if `layers_to_transform` is different from `None`.
rank_pattern (`dict`):
The mapping from layer names or regexp expression to ranks which are different from the default rank
specified by `r`.
modules_to_save (`List[str]`):
List of modules apart from adapter layers to be set as trainable and saved in the final checkpoint.
"""
r: int = field(
default=8,
metadata={
"help": "The rank of HRA across different layers.",
"note": "It is best to set 'r' to an even number; otherwise, the default initialization method will not work.",
},
)
apply_GS: bool = field(
default=False,
metadata={"help": "Whether to apply Gram-Schmidt orthogonalization or not."},
)
target_modules: Optional[Union[List[str], str]] = field(
default=None,
metadata={
"help": "List of module names or regex expression of the module names to replace with HRA.",
"example": "For example, ['q', 'v'] or '.*decoder.*(SelfAttention|EncDecAttention).*(q|v)$' ",
},
)
init_weights: bool = field(
default=True,
metadata={
"help": (
"Whether to initialize the weights of the HRA layers with their default initialization. Don't change "
"this setting, except if you know exactly what you're doing."
),
},
)
layers_to_transform: Optional[Union[List[int], int]] = field(
default=None,
metadata={
"help": "The layer indexes to transform, is this argument is specified, PEFT will transform only the layers indexes that are specified inside this list. If a single integer is passed, PEFT will transform only the layer at this index."
},
)
layers_pattern: Optional[str] = field(
default=None,
metadata={
"help": "The layer pattern name, used only if `layers_to_transform` is different to None and if the layer pattern is not in the common layers pattern."
},
)
bias: str = field(default="none", metadata={"help": "Bias type for HRA. Can be 'none', 'all' or 'hra_only'"})
modules_to_save: Optional[List[str]] = field(
default=None,
metadata={
"help": "List of modules apart from HRA layers to be set as trainable and saved in the final checkpoint. "
"For example, in Sequence Classification or Token Classification tasks, "
"the final layer `classifier/score` are randomly initialized and as such need to be trainable and saved."
},
)
def __post_init__(self):
self.peft_type = PeftType.HRA
self.target_modules = (
set(self.target_modules) if isinstance(self.target_modules, list) else self.target_modules
)
# if target_modules is a regex expression, then layers_to_transform should be None
if isinstance(self.target_modules, str) and self.layers_to_transform is not None:
raise ValueError("`layers_to_transform` cannot be used when `target_modules` is a str.")
# if target_modules is a regex expression, then layers_pattern should be None
if isinstance(self.target_modules, str) and self.layers_pattern is not None:
raise ValueError("`layers_pattern` cannot be used when `target_modules` is a str.")

View File

@ -0,0 +1,435 @@
# Copyright 2024-present the HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import math
import warnings
from typing import Any, List, Optional, Union
import torch
import torch.nn as nn
import torch.nn.functional as F
from peft.tuners.tuners_utils import BaseTunerLayer, check_adapters_to_merge
class HRALayer(BaseTunerLayer):
# All names of layers that may contain (trainable) adapter weights
adapter_layer_names = ("hra_u",)
# All names of other parameters that may contain adapter-related parameters
other_param_names = ("hra_r", "hra_apply_GS")
def __init__(self, base_layer: nn.Module, **kwargs) -> None:
self.base_layer = base_layer
self.hra_r = {}
self.hra_apply_GS = {}
self.hra_u = nn.ParameterDict({})
# Mark the weight as unmerged
self._disable_adapters = False
self.merged_adapters = []
self.kwargs = kwargs
base_layer = self.get_base_layer()
if isinstance(base_layer, nn.Linear):
self.in_features, self.out_features = base_layer.in_features, base_layer.out_features
elif isinstance(base_layer, nn.Conv2d):
self.in_features, self.out_features = base_layer.in_channels, base_layer.out_channels
else:
raise ValueError(f"Unsupported layer type {type(base_layer)}")
def update_layer(
self,
adapter_name: str,
r: int,
apply_GS: bool,
init_weights: bool,
**kwargs,
) -> None:
"""Internal function to create hra adapter
Args:
adapter_name (`str`): Name for the adapter to add.
r (`int`): Rank for the added adapter.
init_weights (`bool`): Whether to initialize weights.
apply_GS (`bool`): Whether to apply Gram-Schmidt orthogonalization or not.
"""
if r <= 0:
raise ValueError(f"`r` should be a positive integer value but the value passed is {r}")
self.hra_r[adapter_name] = r
self.hra_apply_GS[adapter_name] = apply_GS
# Determine shape of HRA weights
base_layer = self.get_base_layer()
if isinstance(base_layer, nn.Linear):
self.hra_u[adapter_name] = nn.Parameter(torch.empty(self.in_features, r), requires_grad=True)
elif isinstance(base_layer, nn.Conv2d):
self.hra_u[adapter_name] = nn.Parameter(
torch.empty(self.in_features * base_layer.kernel_size[0] * base_layer.kernel_size[0], r),
requires_grad=True,
)
else:
raise TypeError(f"HRA is not implemented for base layers of type {type(base_layer).__name__}")
# Initialize weights
if init_weights:
self.reset_hra_parameters(adapter_name)
else:
self.reset_hra_parameters_random(adapter_name)
# Move new weights to device
self._move_adapter_to_device_of_base_layer(adapter_name)
self.set_adapter(self.active_adapters)
def reset_hra_parameters(self, adapter_name: str):
if self.hra_r[adapter_name] % 2 != 0:
warnings.warn("The symmetric initialization can NOT be performed when r is odd!")
nn.init.kaiming_uniform_(self.hra_u[adapter_name], a=math.sqrt(5))
else:
shape = self.hra_u[adapter_name].shape
half_u = torch.zeros(shape[0], shape[1] // 2)
nn.init.kaiming_uniform_(half_u, a=math.sqrt(5))
self.hra_u[adapter_name] = nn.Parameter(torch.repeat_interleave(half_u, 2, dim=1))
def reset_hra_parameters_random(self, adapter_name: str):
nn.init.kaiming_uniform_(self.hra_u[adapter_name], a=math.sqrt(5))
def scale_layer(self, scale: float) -> None:
if scale == 1:
return
for active_adapter in self.active_adapters:
if active_adapter not in self.hra_u.keys():
continue
warnings.warn("Scaling operation for HRA not supported! Automatically set scale to 1.")
def unscale_layer(self, scale=None) -> None:
for active_adapter in self.active_adapters:
if active_adapter not in self.hra_u.keys():
continue
warnings.warn("Unscaling operation for HRA not supported! Keeping scale at 1.")
class HRALinear(nn.Module, HRALayer):
"""
HRA implemented in a dense layer.
"""
def __init__(
self,
base_layer,
adapter_name: str,
r: int = 0,
apply_GS: bool = False,
init_weights: Union[bool, str] = True,
**kwargs,
) -> None:
super().__init__()
HRALayer.__init__(self, base_layer, **kwargs)
self._active_adapter = adapter_name
self.update_layer(adapter_name, r, apply_GS, init_weights, **kwargs)
def merge(self, safe_merge: bool = False, adapter_names: Optional[List[str]] = None) -> None:
"""
Merge the active adapter weights into the base weights
Args:
safe_merge (`bool`, *optional*):
If `True`, the merge operation will be performed in a copy of the original weights and check for NaNs
before merging the weights. This is useful if you want to check if the merge operation will produce
NaNs. Defaults to `False`.
adapter_names (`List[str]`, *optional*):
The list of adapter names that should be merged. If `None`, all active adapters will be merged.
Defaults to `None`.
"""
adapter_names = check_adapters_to_merge(self, adapter_names)
if not adapter_names:
# no adapter to merge
return
for active_adapter in adapter_names:
if active_adapter in self.hra_u.keys():
base_layer = self.get_base_layer()
if safe_merge:
# Note that safe_merge will be slower than the normal merge
# because of the copy operation.
orig_weight = base_layer.weight.data.clone()
delta_weight = self.get_delta_weight(active_adapter)
orig_weight = torch.mm(orig_weight, delta_weight)
if not torch.isfinite(orig_weight).all():
raise ValueError(
f"NaNs detected in the merged weights. The adapter {active_adapter} seems to be broken"
)
self.base_layer.weight.data = orig_weight
else:
delta_weight = self.get_delta_weight(active_adapter)
self.base_layer.weight.data = torch.mm(self.base_layer.weight.data, delta_weight)
self.merged_adapters.append(active_adapter)
def unmerge(self) -> None:
"""
This method unmerges all merged adapter layers from the base weights.
"""
if not self.merged:
warnings.warn("Already unmerged. Nothing to do.")
return
while len(self.merged_adapters) > 0:
active_adapter = self.merged_adapters.pop()
if active_adapter in self.hra_u.keys():
orig_weight = self.get_base_layer().weight.data.clone()
delta_weight = self.get_delta_weight(active_adapter, reverse=True)
self.get_base_layer().weight.data = torch.mm(orig_weight, delta_weight)
def get_delta_weight(self, adapter_name: str, reverse: bool = False) -> torch.Tensor:
rank = self.hra_r[adapter_name]
apply_GS = self.hra_apply_GS[adapter_name]
opt_u = self.hra_u[adapter_name]
shape = opt_u.shape
if apply_GS:
weight = [(opt_u[:, 0] / opt_u[:, 0].norm()).view(-1, 1)]
for i in range(1, rank):
ui = opt_u[:, i].view(-1, 1)
for j in range(i):
ui = ui - (weight[j].t() @ ui) * weight[j]
weight.append((ui / ui.norm()).view(-1, 1))
weight = torch.cat(weight, dim=1)
weight = torch.eye(shape[0], device=opt_u.device, dtype=opt_u.dtype) - 2 * weight @ weight.t()
else:
opt_u = opt_u / opt_u.norm(dim=0)
weight = torch.eye(shape[0], device=opt_u.device, dtype=opt_u.dtype)
if reverse:
indices = range(rank - 1, -1, -1)
else:
indices = range(rank)
for i in indices:
ui = opt_u[:, i].view(-1, 1)
weight = weight @ (torch.eye(shape[0], device=opt_u.device, dtype=opt_u.dtype) - 2 * ui @ ui.t())
return weight
def forward(self, x: torch.Tensor, *args: Any, **kwargs: Any) -> torch.Tensor:
previous_dtype = x.dtype
if self.disable_adapters:
if self.merged:
self.unmerge()
result = self.base_layer(x, *args, **kwargs)
elif self.merged:
result = self.base_layer(x, *args, **kwargs)
else:
new_weight = torch.eye(self.in_features, device=x.device)
for active_adapter in self.active_adapters:
if active_adapter not in self.hra_u.keys():
continue
delta_weight = self.get_delta_weight(active_adapter)
new_weight = torch.mm(new_weight, delta_weight)
x = x.to(self.get_base_layer().weight.data.dtype)
orig_weight = self.get_base_layer().weight.data
new_weight = torch.mm(orig_weight, new_weight)
result = F.linear(input=x, weight=new_weight, bias=self.base_layer.bias)
result = result.to(previous_dtype)
return result
def __repr__(self) -> str:
rep = super().__repr__()
return "hra." + rep
class HRAConv2d(nn.Module, HRALayer):
"""HRA implemented in Conv2d layer"""
def __init__(
self,
base_layer,
adapter_name: str,
r: int = 0,
apply_GS: bool = False,
init_weights: Union[bool, str] = True,
**kwargs,
):
super().__init__()
HRALayer.__init__(self, base_layer)
self._active_adapter = adapter_name
self.update_layer(adapter_name, r, apply_GS, init_weights, **kwargs)
def merge(self, safe_merge: bool = False, adapter_names: Optional[List[str]] = None) -> None:
"""
Merge the active adapter weights into the base weights
Args:
safe_merge (`bool`, *optional*):
If `True`, the merge operation will be performed in a copy of the original weights and check for NaNs
before merging the weights. This is useful if you want to check if the merge operation will produce
NaNs. Defaults to `False`.
adapter_names (`List[str]`, *optional*):
The list of adapter names that should be merged. If `None`, all active adapters will be merged.
Defaults to `None`.
"""
adapter_names = check_adapters_to_merge(self, adapter_names)
if not adapter_names:
# no adapter to merge
return
for active_adapter in adapter_names:
if active_adapter in self.hra_u.keys():
base_layer = self.get_base_layer()
if safe_merge:
# Note that safe_merge will be slower than the normal merge
# because of the copy operation.
orig_weight = base_layer.weight.data.clone()
orig_weight = orig_weight.view(
self.out_features,
self.in_features * self.base_layer.kernel_size[0] * self.base_layer.kernel_size[0],
)
delta_weight = self.get_delta_weight(active_adapter)
orig_weight = torch.mm(orig_weight, delta_weight)
orig_weight = orig_weight.view(
self.out_features,
self.in_features,
self.base_layer.kernel_size[0],
self.base_layer.kernel_size[0],
)
if not torch.isfinite(orig_weight).all():
raise ValueError(
f"NaNs detected in the merged weights. The adapter {active_adapter} seems to be broken"
)
self.base_layer.weight.data = orig_weight
else:
orig_weight = base_layer.weight.data
orig_weight = orig_weight.view(
self.out_features,
self.in_features * self.base_layer.kernel_size[0] * self.base_layer.kernel_size[0],
)
delta_weight = self.get_delta_weight(active_adapter)
orig_weight = torch.mm(orig_weight, delta_weight)
orig_weight = orig_weight.view(
self.out_features,
self.in_features,
self.base_layer.kernel_size[0],
self.base_layer.kernel_size[0],
)
self.base_layer.weight.data = orig_weight
self.merged_adapters.append(active_adapter)
def unmerge(self) -> None:
"""
This method unmerges all merged adapter layers from the base weights.
"""
if not self.merged:
warnings.warn("Already unmerged. Nothing to do.")
return
while len(self.merged_adapters) > 0:
active_adapter = self.merged_adapters.pop()
if active_adapter in self.hra_u.keys():
orig_weight = self.get_base_layer().weight.data.clone()
orig_weight = orig_weight.view(
self.out_features,
self.in_features * self.base_layer.kernel_size[0] * self.base_layer.kernel_size[0],
)
delta_weight = self.get_delta_weight(active_adapter, reverse=True)
orig_weight = torch.mm(orig_weight, delta_weight)
orig_weight = orig_weight.view(
self.out_features, self.in_features, self.base_layer.kernel_size[0], self.base_layer.kernel_size[0]
)
self.get_base_layer().weight.data = orig_weight
def get_delta_weight(self, adapter_name: str, reverse: bool = False) -> torch.Tensor:
rank = self.hra_r[adapter_name]
apply_GS = self.hra_apply_GS[adapter_name]
opt_u = self.hra_u[adapter_name]
shape = opt_u.shape
if apply_GS:
weight = [(opt_u[:, 0] / opt_u[:, 0].norm()).view(-1, 1)]
for i in range(1, rank):
ui = opt_u[:, i].view(-1, 1)
for j in range(i):
ui = ui - (weight[j].t() @ ui) * weight[j]
weight.append((ui / ui.norm()).view(-1, 1))
weight = torch.cat(weight, dim=1)
weight = torch.eye(shape[0], device=opt_u.device, dtype=opt_u.dtype) - 2 * weight @ weight.t()
else:
opt_u = opt_u / opt_u.norm(dim=0)
weight = torch.eye(shape[0], device=opt_u.device, dtype=opt_u.dtype)
if reverse:
indices = range(rank - 1, -1, -1)
else:
indices = range(rank)
for i in indices:
ui = opt_u[:, i].view(-1, 1)
weight = weight @ (torch.eye(shape[0], device=opt_u.device, dtype=opt_u.dtype) - 2 * ui @ ui.t())
return weight
def forward(self, x: torch.Tensor, *args: Any, **kwargs: Any) -> torch.Tensor:
previous_dtype = x.dtype
if self.disable_adapters:
if self.merged:
self.unmerge()
result = self.base_layer(x, *args, **kwargs)
elif self.merged:
result = self.base_layer(x, *args, **kwargs)
else:
new_weight = torch.eye(
self.in_features * self.base_layer.kernel_size[0] * self.base_layer.kernel_size[0], device=x.device
)
for active_adapter in self.active_adapters:
if active_adapter not in self.hra_u.keys():
continue
delta_weight = self.get_delta_weight(active_adapter)
new_weight = torch.mm(new_weight, delta_weight)
x = x.to(self.base_layer.weight.data.dtype)
orig_weight = self.base_layer.weight.data
orig_weight = orig_weight.view(
self.out_features,
self.in_features * self.base_layer.kernel_size[0] * self.base_layer.kernel_size[0],
)
new_weight = torch.mm(orig_weight, new_weight)
new_weight = new_weight.view(
self.out_features, self.in_features, self.base_layer.kernel_size[0], self.base_layer.kernel_size[0]
)
result = F.conv2d(
input=x,
weight=new_weight,
bias=self.base_layer.bias,
padding=self.base_layer.padding[0],
stride=self.base_layer.stride[0],
)
result = result.to(previous_dtype)
return result
def __repr__(self) -> str:
rep = super().__repr__()
return "hra." + rep

View File

@ -0,0 +1,341 @@
# Copyright 2024-present the HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import warnings
from dataclasses import asdict
from enum import Enum
from typing import List, Optional
import torch
from torch import nn
from tqdm import tqdm
from peft.tuners.tuners_utils import BaseTuner, BaseTunerLayer, check_target_module_exists
from peft.utils import (
TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING,
ModulesToSaveWrapper,
_get_submodules,
)
from .config import HRAConfig
from .layer import HRAConv2d, HRALayer, HRALinear
class HRAModel(BaseTuner):
"""
Creates Householder reflection adaptation (HRA) model from a pretrained model. The method is described in
https://arxiv.org/abs/2405.17484
Args:
model (`torch.nn.Module`): The model to which the adapter tuner layers will be attached.
config ([`HRAConfig`]): The configuration of the HRA model.
adapter_name (`str`): The name of the adapter, defaults to `"default"`.
low_cpu_mem_usage (`bool`, `optional`, defaults to `False`):
Create empty adapter weights on meta device. Useful to speed up the loading process.
Returns:
`torch.nn.Module`: The HRA model.
Example:
```py
>>> from diffusers import StableDiffusionPipeline
>>> from peft import HRAModel, HRAConfig
>>> config_te = HRAConfig(
... r=8,
... target_modules=["k_proj", "q_proj", "v_proj", "out_proj", "fc1", "fc2"],
... init_weights=True,
... )
>>> config_unet = HRAConfig(
... r=8,
... target_modules=[
... "proj_in",
... "proj_out",
... "to_k",
... "to_q",
... "to_v",
... "to_out.0",
... "ff.net.0.proj",
... "ff.net.2",
... ],
... init_weights=True,
... )
>>> model = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
>>> model.text_encoder = HRAModel(model.text_encoder, config_te, "default")
>>> model.unet = HRAModel(model.unet, config_unet, "default")
```
**Attributes**:
- **model** ([`~torch.nn.Module`]) -- The model to be adapted.
- **peft_config** ([`HRAConfig`]): The configuration of the HRA model.
"""
prefix: str = "hra_"
def _check_new_adapter_config(self, config: HRAConfig) -> None:
"""
A helper method to check the config when a new adapter is being added.
Raise a ValueError if there is something wrong with the config or if it conflicts with existing adapters.
"""
# TODO: there should be a check if any of the existing adapters actually has bias != "none", or else the check
# does not fully correspond to the error message.
if (len(self.peft_config) > 1) and (config.bias != "none"):
raise ValueError(
f"{self.__class__.__name__} supports only 1 adapter with bias. When using multiple adapters, "
"set bias to 'none' for all adapters."
)
@staticmethod
def _check_target_module_exists(hra_config, key):
return check_target_module_exists(hra_config, key)
def _create_and_replace(
self,
hra_config,
adapter_name,
target,
target_name,
parent,
current_key,
**optional_kwargs,
):
if current_key is None:
raise ValueError("Current Key shouldn't be `None`")
bias = hasattr(target, "bias") and target.bias is not None
kwargs = {
"r": hra_config.r,
"apply_GS": hra_config.apply_GS,
"init_weights": hra_config.init_weights,
}
kwargs["bias"] = bias
# If it is not a HRALayer, create a new module, else update it with new adapters
if not isinstance(target, HRALayer):
new_module = self._create_new_module(hra_config, adapter_name, target, **kwargs)
if adapter_name not in self.active_adapters:
# adding an additional adapter: it is not automatically trainable
new_module.requires_grad_(False)
self._replace_module(parent, target_name, new_module, target)
else:
target.update_layer(
adapter_name,
r=hra_config.r,
apply_GS=hra_config.apply_GS,
init_weights=hra_config.init_weights,
)
def _replace_module(self, parent, child_name, new_module, child):
setattr(parent, child_name, new_module)
# It's not necessary to set requires_grad here, as that is handled by
# _mark_only_adapters_as_trainable
# child layer wraps the original module, unpack it
if hasattr(child, "base_layer"):
child = child.base_layer
if not hasattr(new_module, "base_layer"):
new_module.weight = child.weight
if hasattr(child, "bias"):
new_module.bias = child.bias
if getattr(child, "state", None) is not None:
if hasattr(new_module, "base_layer"):
new_module.base_layer.state = child.state
else:
new_module.state = child.state
new_module.to(child.weight.device)
meta = torch.device("meta")
# dispatch to correct device
for name, module in new_module.named_modules():
if self.prefix in name:
if not any(p.device == meta for p in module.parameters()):
module.to(child.weight.device)
def _mark_only_adapters_as_trainable(self, model: nn.Module) -> None:
for n, p in model.named_parameters():
if self.prefix not in n:
p.requires_grad = False
for active_adapter in self.active_adapters:
bias = self.peft_config[active_adapter].bias
if bias == "none":
continue
if bias == "all":
for n, p in model.named_parameters():
if "bias" in n:
p.requires_grad = True
elif bias == "hra_only":
for name, m in model.named_modules():
if isinstance(m, HRALayer) and hasattr(m, "bias") and m.bias is not None:
m.bias.requires_grad = True
else:
raise NotImplementedError(f"Requested bias: {bias}, is not implemented.")
@staticmethod
def _create_new_module(hra_config, adapter_name, target, **kwargs):
if isinstance(target, BaseTunerLayer):
target_base_layer = target.get_base_layer()
else:
target_base_layer = target
if isinstance(target_base_layer, torch.nn.Linear):
new_module = HRALinear(target, adapter_name, **kwargs)
elif isinstance(target_base_layer, torch.nn.Conv2d):
new_module = HRAConv2d(target, adapter_name, **kwargs)
else:
raise ValueError(
f"Target module {target} is not supported. "
"Currently, only `torch.nn.Linear` and `torch.nn.Conv2d` are supported."
)
return new_module
def __getattr__(self, name: str):
"""Forward missing attributes to the wrapped module."""
try:
return super().__getattr__(name) # defer to nn.Module's logic
except AttributeError:
if name == "base_model":
raise
return getattr(self.model, name)
def get_peft_config_as_dict(self, inference: bool = False):
config_dict = {}
for key, value in self.peft_config.items():
config = {k: v.value if isinstance(v, Enum) else v for k, v in asdict(value).items()}
if inference:
config["inference_mode"] = True
config_dict[key] = config
return config
def _set_adapter_layers(self, enabled=True):
for module in self.model.modules():
if isinstance(module, (BaseTunerLayer, ModulesToSaveWrapper)):
module.enable_adapters(enabled)
def enable_adapter_layers(self):
self._set_adapter_layers(enabled=True)
def disable_adapter_layers(self):
for active_adapter in self.active_adapters:
val = self.peft_config[active_adapter].bias
if val != "none":
msg = (
f"Careful, disabling adapter layers with bias configured to be '{val}' does not produce the same "
"output as the the base model would without adaption."
)
warnings.warn(msg)
self._set_adapter_layers(enabled=False)
def set_adapter(self, adapter_name):
for module in self.model.modules():
if isinstance(module, HRALayer):
if module.merged:
warnings.warn("Adapter cannot be set when the model is merged. Unmerging the model first.")
module.unmerge()
module.set_adapter(adapter_name)
self.active_adapter = adapter_name
@staticmethod
def _prepare_adapter_config(peft_config, model_config):
if peft_config.target_modules is None:
if model_config["model_type"] not in TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING:
raise ValueError("Please specify `target_modules` in `peft_config`")
peft_config.target_modules = set(
TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING[model_config["model_type"]]
)
return peft_config
def _unload_and_optionally_merge(
self,
merge=True,
progressbar: bool = False,
safe_merge: bool = False,
adapter_names: Optional[List[str]] = None,
):
self._unloading_checks(adapter_names)
key_list = [key for key, _ in self.model.named_modules() if self.prefix not in key]
desc = "Unloading " + ("and merging " if merge else "") + "model"
for key in tqdm(key_list, disable=not progressbar, desc=desc):
try:
parent, target, target_name = _get_submodules(self.model, key)
except AttributeError:
continue
if hasattr(target, "base_layer"):
if merge:
target.merge(safe_merge=safe_merge, adapter_names=adapter_names)
self._replace_module(parent, target_name, target.get_base_layer(), target)
elif isinstance(target, ModulesToSaveWrapper):
# save any additional trainable modules part of `modules_to_save`
setattr(parent, target_name, target.modules_to_save[target.active_adapter])
return self.model
def delete_adapter(self, adapter_name: str) -> None:
"""
Deletes an existing adapter.
Args:
adapter_name (str): Name of the adapter to be deleted.
"""
if adapter_name not in list(self.peft_config.keys()):
raise ValueError(f"Adapter {adapter_name} does not exist")
del self.peft_config[adapter_name]
key_list = [key for key, _ in self.model.named_modules() if self.prefix not in key]
new_adapter = None
for key in key_list:
_, target, _ = _get_submodules(self.model, key)
if isinstance(target, HRALayer):
target.delete_adapter(adapter_name)
if new_adapter is None:
new_adapter = target.active_adapters[:]
self.active_adapter = new_adapter or []
def merge_and_unload(
self, progressbar: bool = False, safe_merge: bool = False, adapter_names: Optional[List[str]] = None
) -> torch.nn.Module:
r"""
This method merges the HRA layers into the base model. This is needed if someone wants to use the base model as
a standalone model.
Args:
progressbar (`bool`):
whether to show a progressbar indicating the unload and merge process
safe_merge (`bool`):
whether to activate the safe merging check to check if there is any potential Nan in the adapter
weights
adapter_names (`List[str]`, *optional*):
The list of adapter names that should be merged. If None, all active adapters will be merged. Defaults
to `None`.
"""
return self._unload_and_optionally_merge(
progressbar=progressbar, safe_merge=safe_merge, adapter_names=adapter_names
)
def unload(self) -> torch.nn.Module:
"""
Gets back the base model by removing all the hra modules without merging. This gives back the original base
model.
"""
return self._unload_and_optionally_merge(merge=False)

View File

@ -61,7 +61,7 @@ class IA3Layer(BaseTunerLayer):
self.ia3_l[adapter_name] = nn.Parameter(weight)
if init_ia3_weights:
self.reset_ia3_parameters(adapter_name)
self.to(self.get_base_layer().weight.device)
self._move_adapter_to_device_of_base_layer(adapter_name)
self.set_adapter(self.active_adapters)
def reset_ia3_parameters(self, adapter_name):
@ -210,7 +210,7 @@ class Conv2d(nn.Module, IA3Layer):
self.ia3_l[adapter_name] = nn.Parameter(weight)
if init_ia3_weights:
self.reset_ia3_parameters(adapter_name)
self.to(self.get_base_layer().weight.device)
self._move_adapter_to_device_of_base_layer(adapter_name)
self.set_adapter(self.active_adapters)
def merge(self, safe_merge: bool = False, adapter_names: Optional[List[str]] = None) -> None:

Some files were not shown because too many files have changed in this diff Show More